Doc2Graph-X

A multilingual graph-based framework for form understanding

Doc2Graph-X is a multilingual extension of Doc2Graph that enables language-agnostic structured document understanding for forms.

Problem

The original Doc2Graph relied on monolingual text processing, limiting generalization across languages.

Approach

  • Combines XLM-RoBERTa (word-level) and S-BERT (sentence-level) embeddings for language-agnostic entity detection
  • A multimodal GNN fuses textual, visual, and geometric features
  • A node classifier performs Semantic Entity Recognition (SER) and an edge classifier handles Relation Extraction (RE)

Results

Multilingual SER F1 (fine-tune on 8 languages) — Doc2Graph-X matches or beats much larger LayoutXLM with ~55× fewer parameters:

Model (SER) #Params FUNSD IT Avg.
XLM-RoBERTa 125M 66.70 66.87 70.47
InfoXLM 362M 68.52 67.51 72.07
LayoutXLM 345M 79.40 80.82 80.56
Doc2Graph-X 6.2M 80.07 81.38 77.39
{
  "tooltip": { "trigger": "axis" },
  "grid": { "left": "3%", "right": "4%", "bottom": "3%", "containLabel": true },
  "xAxis": { "type": "category", "data": ["XLM-RoBERTa", "InfoXLM", "LayoutXLM", "Doc2Graph-X"] },
  "yAxis": { "type": "value", "name": "SER F1 on FUNSD", "max": 100 },
  "series": [
    {
      "type": "bar",
      "data": [
        { "value": 66.7, "itemStyle": { "color": "#b9c4d0" } },
        { "value": 68.52, "itemStyle": { "color": "#b9c4d0" } },
        { "value": 79.4, "itemStyle": { "color": "#b9c4d0" } },
        { "value": 80.07, "itemStyle": { "color": "#4f8ef7" } }
      ],
      "barMaxWidth": 55,
      "itemStyle": { "borderRadius": [4, 4, 0, 0] },
      "label": { "show": true, "position": "top" }
    }
  ]
}

Publication

Mazumder, S., Biswas, S., Das, A., Lladós, J. Doc2Graph-X: A Multilingual Graph-Based Framework for Form Understanding. GbR 2025.