Doc2GraphFormer
Bridging graph learning with transformer attention for document understanding
Doc2GraphFormer is a hybrid graph-transformer framework for document understanding that integrates the structured reasoning of Graph Neural Networks with the global context modeling of transformers.
Problem
GraphSAGE-based message passing struggles with long-range dependencies and global context, while token-based transformers lack an explicit structured representation of document elements.
Approach
- Converts documents into graph representations and applies multi-head self-attention for structured parsing
- Jointly optimizes three tasks: Entity Recognition, Subgraph Clustering, and Entity Linking
- Dynamically refines entity relationships, capturing both local and global dependencies
Results
Semantic Entity Recognition (SER) and Relation Extraction (RE) F1 on FUNSD — Doc2GraphFormer leads while using a fraction of the parameters:
| Method | SER F1 | RE F1 | #Params (M) |
|---|---|---|---|
| BROS | 0.812 | 0.670 | 138 |
| LayoutLM | 0.790 | 0.428 | 343 |
| Doc2Graph | 0.823 | 0.534 | 6.2 |
| Doc2GraphFormer+GL | 0.862 | 0.555 | 3.62 |
{
"tooltip": { "trigger": "axis" },
"grid": { "left": "3%", "right": "4%", "bottom": "3%", "containLabel": true },
"xAxis": { "type": "category", "data": ["BROS", "LayoutLM", "Doc2Graph", "Doc2GraphFormer+GL"] },
"yAxis": { "type": "value", "name": "SER F1", "max": 1 },
"series": [
{
"type": "bar",
"data": [
{ "value": 0.812, "itemStyle": { "color": "#b9c4d0" } },
{ "value": 0.79, "itemStyle": { "color": "#b9c4d0" } },
{ "value": 0.823, "itemStyle": { "color": "#b9c4d0" } },
{ "value": 0.862, "itemStyle": { "color": "#4f8ef7" } }
],
"barMaxWidth": 55,
"itemStyle": { "borderRadius": [4, 4, 0, 0] },
"label": { "show": true, "position": "top" }
}
]
}
Publication
Mazumder, S., Biswas, S., Pal, A., Das, A., Pal, U., Lladós, J. Doc2GraphFormer: Bridging Structured Graph Learning with Transformer Attention for Efficient Document Understanding. ICDAR 2025.