A quick visual map of the curated papers below. The lines show when the main research subjects become more visible across years; it is a readable map, not a full bibliography.

Noisy DocumentsSemantic ExtractionHistorical AICultural HeritageMultimodal ArchivesApplied Document Intelligence

Noisy Documents

I keep returning to documents that arrive already damaged: OCR errors, HTR errors, broken segmentation, missing accents, strange line breaks, degraded scans, and transcripts that look clean until a model has to reason over them.

OCRHTRpost-correctionevaluationrobustness

What this contains

Measuring how OCR/HTR noise changes downstream semantic tasks, not only character accuracy.
Designing post-correction and robustness experiments for historical and degraded text.
Testing whether models remain reliable when the input is damaged, fragmented, or historically unstable.

Selected high-impact / most relevant papers

Semantic Extraction

I work on extracting the small structures that make collections searchable and arguable: people, places, organizations, events, relations, roles, mentions, links, and evidence. I care less about clean labels and more about whether the extracted structure survives noisy documents and real use.

NERentity linkingrelationseventsevidence

What this contains

Named entity recognition, entity linking, relation extraction, and event extraction.
Entity-centered ways of making large document collections searchable, comparable, and arguable.
Evaluation setups where semantic structure matters more than a clean-looking label.

Selected high-impact / most relevant papers

Historical AI

Historical language is not stable. Names move. Borders move. Political words change meaning. A system that treats all periods as one flat present will make confident mistakes. I work on temporal modelling, long-horizon representations, and evaluation setups that make these failures visible.

temporalitydiachronic languagehistorical collectionslong horizon

What this contains

Temporal modelling for documents where names, roles, borders, and meanings do not stay fixed.
Experiments with temporal knowledge injection, temporal fusion, and long-horizon representations.
Critiques of fluent AI answers when evidence is weak, noisy, or historically misplaced.

Selected high-impact / most relevant papers

Cultural Heritage

I work with cultural heritage as data, evidence, memory, and conflict. This includes digital epigraphy, Armenian and Ukrainian inscriptions, structured vocabularies, cultural weaponization, contested narratives, and computational methods that must remain accountable to domain experts.

epigraphyheritageSKOSEpiDoc/TEImemory

What this contains

Structured vocabularies and computational methods for underrepresented cultural heritage material.
Digital epigraphy, inscription corpora, and standards-oriented encoding.
Computational analysis of cultural heritage narratives, memory, and contested heritage.

Selected high-impact / most relevant papers

Multimodal Archives

Documents are not only text. Layout, typography, images, tables, margins, page structure, photographs, and visual noise often carry the evidence. I am interested in models that connect these signals without pretending that the page is just a bag of words.

layoutvision-languagedocument imagesphotographsvisual evidence

What this contains

Document images, page structure, layout, photographs, and visual evidence.
Connections between image processing, document analysis, and semantic modelling.
Work that treats the page as a visual object, not only as text after OCR.

Selected high-impact / most relevant papers

Applied Document Intelligence

The same problems reappear outside archives: forged receipts, insurance claims, fake news, epidemic monitoring, emergency events, administrative records, and production workflows where models must be robust enough to be useful and transparent enough to be questioned.

fraud detectionmisinformationevent monitoringpipelinesdeployment

What this contains

Document fraud detection, forged receipts, insurance claims, and applied document reasoning.
Multilingual epidemic monitoring and emergency event detection.
Fake news, misinformation, and workflows where models have to survive real constraints.

Selected high-impact / most relevant papers

Posts / opinions → Photo projects →