Résumé
Background
I received my Ph.D. in Computer Science (2018) from Paris-Saclay University, France, where my thesis proposed several neural architectures for event extraction from unstructured text.
Before that, I earned:
- an M.Sc. in Software Engineering and a B.Sc. in Computer Science from Alexandru I. Cuza University, Iași, Romania
- an M.A. and B.A. in Visual Arts (video & photography) from the George Enescu National University of Arts, Iași, Romania
During my Ph.D., I spent three years at CEA, List (Laboratory for Integration of Systems and Technology) and five years at LIMSI (now LISN) in Orsay, France.
I then worked as a research engineer at the LAL (Linear Accelerator Laboratory), within the Paris-Saclay Center for Data Science, focusing on NLP applications for French law (case outcome prediction) and fake news detection.
From 2017 to 2020, I joined Teklia as a machine learning scientist, developing technologies for digitised and historical document understanding, including handwriting recognition, article separation, and named entity recognition.
From 2020 to 2022, I worked as a postdoctoral researcher at the University of La Rochelle, within the L3i Laboratory (IT, Image & Interaction).
From 2023 to 2026, I was a scientist at the Digital Humanities Laboratory (DHLAB) at École Polytechnique Fédérale de Lausanne (EPFL), where I worked on the Impresso project, focusing on large-scale multilingual historical newspaper processing, named entity recognition and linking, OCR post-correction, semantic enrichment, and the development of evaluation benchmarks for historical NLP.
My main research interests include:
- Information extraction — entities, relations, and events
- Digitised and historical documents
- Multilingual and multimodal learning
- Machine learning for cultural heritage data
Education
======
- B.S. in Computer Science, Alexandru I. Cuza University, Faculty of Computer Science, 2010
- B.S. in Arts, George Enescu National University of Arts, 2010
- M.S. in Software Engineering, Alexandru I. Cuza University, Faculty of Computer Science, 2012
- M.S. in Arts, George Enescu National University of Arts, 2012
- Ph.D in Computer Science, Paris-Saclay University, 2018
Work experience
======
- Jan 2023-Dec 2025: Research Scientist at EPFL, DHLAB (Digital Humanities Laboratory)
- Jan 2020-Dec 2022: Postdoctoral Researcher at University of La Rochelle, France, L3i (IT, Image and Interaction Laboratory)
- Jan 2017-Dec 2020: Machine Learning Scientist at Teklia
- Oct 2016-Apr 2017: Research Assistant at LAL (Linear Accelerator Laboratory, now IJCLab)
- Feb 2013-Sep 2018: PhD Candidate at CEA, List (Laboratory for Integration of Systems and Technology) and LIMSI (now LISN, Interdisciplinary Laboratory of Digital Sciences)
Publications
====== For a complete list of publications, please visit my Google Scholar profile.
Talks
====== November 7, 2024
🎤 Keynote speaker at the GDR TAL CNRS 2024 annual meeting: “Traitement Automatique des Langues et les Humanités Numériques”, La Rochelle, France.
Talk: The Ongoing Struggle for Alleviating Digitisation Errors in Historical Document Processing: A Necessary Effort?
🔗 Event page
September 16, 2022
🎤 Invited speaker at the NER for OCR’ed Historical Documents Seminar Series, Maison de la Recherche, Paris-Sorbonne, France.
Talk (with Antoine Doucet): Impact of Optical Character Recognition on Named Entity Recognition
🔗 Event site
March 3, 2022 🎤 Invited speaker: Reconnaissance d’entités nommées et extraction d’événements dans les documents historiques at the CERES study day on digital methods for humanities, La Rochelle, France.
🔗 CERES
November 8, 2017
🎤 Presentation: Fake News Detection at the Paris-Saclay Center for Data Science (CDS) Annual Pitching Day.
Talk focused on defining fake news, detection tactics, and evaluation metrics used in a student competition.
🔗 Pitching Day Info
October 15, 2014
🎤 Presentation: Learning word representations for event extraction from text
(Fr: Apprentissage des représentations de mots pour l’extraction d’événements à partir de texte)
At Paris Machine Learning Group #2, Season 2: Learning Causality, Words, the Higgs & more.
This talk was based on my first-year PhD research, later published at EMNLP (Core A).
🔗 Group page
🔗 Meetup event
Teaching
======
Courses and Responsibilities
2023–2025 · EPFL – École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Academic Coordination and Administration
- Organized Bachelor’s and Master’s midterm and final presentations, including scheduling, communication with students, and evaluation logistics.
- Coordinated with supervisors to ensure smooth progress tracking and assessments.
- Supervised and co-supervised student semester projects (Bachelor and Master).
2020–2022 · University of La Rochelle, France – L3i Laboratory
Databases (Bachelor): Prepared tutorials and supervised sessions; designed and graded final project.
Process Dematerialization (Master): Lectures and tutorials on information extraction from documents, from text representations to NLP techniques.
Introduction to Computer Systems (Bachelor): Led tutorials and practicals prepared by the professor; introduced topics through demonstrations.
EU-CONEXUS – Tourism Facing Digital Transition (All Bachelor levels, in English): Delivered lectures and practicals on machine learning in tourism for non-specialist students.
Cloud Developer – Introduction to NLP Methods and Techniques (Professional Bachelor, IUT): Designed lectures and practical sessions on basic ML algorithms for NLP.
Big Data Analysis – Text and Image Classification (DUT): Lectures on machine learning techniques and tutorials on text/image preprocessing and classification methods.
2019 · French Institute of Hanoi (Remote)
- Multimedia Indexing (Master)
- Deep Learning for Natural Language Processing (Master): Lectures and practicals on text representations, ML, and DL for NLP.
2017–2019 · University of Rouen Normandy, France
- Deep Learning for Natural Language Processing (Master): Prepared advanced practical sessions on embeddings, classification, and language models.
2013 · Paris-Sud University – IUT d’Orsay, France
- Foundations of Object-Oriented Design (Bachelor): Delivered tutorials and practical sessions prepared by the course professor.
Scientific Organization and Mentorship
- Organizer and discussion chair at DAS – Document Analysis Systems
- Mentor at ICPRAI – Doctoral Consortium
- Supervised or co-supervised 22 students across PhD (4), Master’s (14), and Bachelor’s (2) levels, internship (2), on interdisciplinary projects spanning NLP, digital humanities, AI, and cultural heritage.
Program Committee Member / Organizational Roles / Reviewer
ARR (ACL Rolling Review) (every year since 2019), CLEF (2020, 2026), SIGIR (2022, 2025, 2026), SIGIR-AP (2025), CIKM (2023–2026), ECIR (2023–2026), CHR (Computational Humanities Research) (2021–2025), ICADL (2022–2024; 2024 Program Chair), ISCRAM (2022), ICPRAI (2022), WWW (The Web Conference) (2026), LREC (2025 Reviewer, 2026 Area Chair), DAS (2022 – Reviewer, Organizing Committee, Chair), ICFHR (2018), ASAR (IEEE) (2018), SwissText (2024), HIP (2023), SoICT (2022), RobustAL (2022).
Reviewer / External Reviewer
ACL (ARR, Main Conference), EMNLP (including Industry Track), NAACL-HLT, COLING (including Demos), EACL, LREC-COLING, LaTeCH-CLfL, SemEval, CoNLL, TALN, CORIA, FinNLP, Hackashop, IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal on Computing and Cultural Heritage (JOCCH), Language Resources and Evaluation (Springer).
Public Outreach
- Participated in Open Days at the University of La Rochelle, presenting research to prospective students and their families.
- Presenter at Fête de la Science, a national outreach event — showcased research on event extraction from digitized French newspapers (1900–1944).
- Organizer of the “JIDAP” event at L3i — a lab-wide open day for postdocs, PhDs, engineers, and newcomers to present research and share resources.
- Organizer of a student competition on fake news detection at MINES ParisTech as part of the Paris-Saclay Center for Data Science initiative.
RAMP Challenge · GitHub
Awards and Competitions
- 🥈 SemEval 2022 – 2nd place: Multilingual News Article Similarity; strong performance in MultiCoNER
- 🥇 CLEF-HIPE 2022 – 1st place: Global Adaptation Challenge; 2nd place: Multilingual Newspaper Challenge
- 🥇 TREC 2021 (Incident Streams) – 1st in recall across 33 submissions
- 🥇 TREC 2021 (NewsTrack) – 1st place across all metrics
- 🥇 TAC 2020 – RUFES – 1st place in ultra-fine entity recognition
- 🥇 CLEF-HIPE 2020 – 1st out of 13 international teams across all historical NER leaderboards
- 🥈 ImageCLEF 2009–2012 – Top 3 finishes in Robot Vision and Wikipedia Retrieval tracks
(2nd in 2009, 3rd in 2010, 2011, and 2012)
Fun facts
- Currently signed in to 30 different Slack teams.
- Don’t believe in fantasy books, but can dream up neural architectures that read medieval charters.
- Once built a pipeline that processed 40 million historical newspaper articles — and still found time to watch movies and go to metal festivals.
- Known to advance the state of the art by gently correcting it. “Sometimes the most important discoveries are about what doesn’t work.” (Mikhail Biriuchinskii, Lisbon, DH2025)
- When Ema joins a shared task, everyone updates their baselines.
