Our publications

Marketing & Sales

Anomaly Detection in Time Series

Anomaly Detection is becoming ubiquitous throughout all industries as one of the most important data science use cases to address.
An anomaly, or outlier, can be defined as a point that does not fit the pattern of the rest of the data or expected results. While there are many different origins to an

2024

Financial Services & Compliance

Synthetic data or how to share sensitive data while staying GDPR compliant

For a period of six months, 5 students from Centrale Supélec and ESSEC worked collaboratively with Sia Partners on building a Python library to create fake - which we'll call synthetic - data.
But what's the point of creating fake data? How could it help organizations?

2024

Financial Services & Compliance

Boosting search engine capabilities of RegReview: Introducing document multi-languages management

RegReview is an AI solution for compliance teams, to automate regulatory monitoring and processes, which brings together several tools, the most essential of which is a search engine operating on a compiled database of custom-built regulatory sources.

The database contains ~300k documents.

2024

Natural language processing on tweets by Heka

Marketing & Sales

A Twitter vision on the campaign for the mayor of Paris

Applying natural language processing techniques on Tweets

2024

Data driven customer segmentation by Heka

Marketing & Sales

Data driven customer segmentation: Deep dive on segmentation and interpretation

In the previous article, we described the generic approach to define an actionable data driven customer segmentation. In this article, we will explain in detail the clustering and interpretation steps, focusing on data processing methodologies and data science algorithms.

2024

Marketing & Sales

Topic Modeling : An end-to-end process for semi-automatic topic modeling from a huge corpus of short texts

We propose an end-to-end process for applying topic modeling on any business case minimizing the needed human resources. This article follows our previous article about Topic Modeling which presented a detailed benchmark of various topic modeling techniques applied to a specific business case.

2024

Government

Labeling text clusters with keywords

We propose to explore several keyword extraction techniques to label text clusters obtained after a Text Clustering or a Topic Modeling pipeline. This work is following our previous articles about Topic Modeling and Text Clustering (here and here).

2024

Marketing & Sales

Time series clustering

Overview of the various methods

2024

Financial Services & Compliance

How to avoid GDPR issues using GAN (Generative Adversarial Networks)

With Big Data development, we have seen in those last decades the emergence of many issues raised by the massive use of data. In particular, personal data protection issues are pervasive for companies that store their customers’ data.

2024

Our publications

Anomaly Detection in Time Series

Synthetic data or how to share sensitive data while staying GDPR compliant

Boosting search engine capabilities of RegReview: Introducing document multi-languages management

A Twitter vision on the campaign for the mayor of Paris

Data driven customer segmentation: Deep dive on segmentation and interpretation

Topic Modeling : An end-to-end process for semi-automatic topic modeling from a huge corpus of short texts

Labeling text clusters with keywords

Time series clustering

How to avoid GDPR issues using GAN (Generative Adversarial Networks)

Our R&D activities

Operations Research

Marketing Analytics

Time Series