The NLP Lab is Heka.ai's center of expertise on Natural Language Processing (NLP).
A team of people passionate about NLP challenges and technologies
Heka.ai's center of expertise on TALN and NLP topics
- Technology monitoring
- Centralization and diffusion of NLP expertise within our Heka.ai team
- Benchmark and development of new approaches to major NLP use cases
Our research topics
Within our solutions (e.g., Deep Review and CRM4.0), we need unsupervised detection of themes in a corpus of documents (in other words, without interpretation). This NLP use case is known as Topic Modeling.
We benchmark different methods from the literature and combinations of approaches imagined by our teams. This benchmark is based on a dataset of Google comments in French. It allows us to determine methods according to different criteria: the time required to detect themes, consistency, and gaps between the proposed pieces.
Once we identified the best approaches, we were able to integrate them into a pre-existing theme analysis package.
To complete the benchmark carried out in the NLP lab, we supervised two groups of students (from CentraleSupélec and Mines Saint-Etienne) in the framework of their final thesis on Topic Modeling subjects.
A similar approach of research and capitalization was carried out on related use cases:
Supervised theme detection (multi-label and multi-class) in a document corpus.
Attachment of named entities to the themes detected in the document.
Textual data annotation tool
As part of our work for our Heka.ai solutions and client engagements, we sometimes need to annotate textual data: named entities, themes, sentiments, etc.
This is why we are developing a textual data annotation tool. The challenge is identifying helpful annotation types and building an architecture that can interface itself with any kind of database and environment.
Thanks to a graphical interface accessible to each user profile, the tool allows functional experts to annotate textual data in an advanced way: multi-label management, the possibility to label a subset of the text, automated multi-annotator reconciliation, and administration of
annotation campaigns with access management to the user's mesh.
The first step is to detect the themes in an unsupervised way with Topic Modeling and a certain number of verbatims annotated thanks to our annotation tool. The next logical step is to move from an unsupervised model to a supervised way with Topic Detection.
For this purpose, we have carried out a benchmark of the existing approaches in the literature and new methods imagined by the team to solve supervised topic detection problems.
This benchmark has been realized to take into account both the detection performances with an F1-score metric and the associated training/inference times and an estimation of the minimum number of annotated verbatims needed. Thus, we can choose the most appropriate method depending on the use cases and constraints.