NoTeS-Bank

Benchmarking Neural Transcription and Search for scientific notes

NoTeS-Bank is an evaluation benchmark for Neural Transcription and Search in note-based question answering over complex, handwritten academic notes — mathematical equations, diagrams, and scientific notation that break OCR-based document AI.

Problem

Existing Document VQA benchmarks focus on printed or structured handwritten text, limiting generalization to real-world note-taking with unstructured, multimodal content.

Benchmark

  • Evidence-Based VQA — retrieve localized answers with bounding-box evidence
  • Open-Domain VQA — classify the domain, then retrieve relevant documents and answers
  • Demands vision–language fusion, retrieval, and multimodal reasoning rather than OCR alone
  • Evaluated with ANLS*, IoU, NDCG@5, MRR, and Recall@K across state-of-the-art VLMs

Results

Evidence-Based VQA — even the strongest VLMs trail far behind the human baseline, exposing a large reasoning gap (ANLS* = answer accuracy):

Model ANLS* Local Acc Global Acc
Qwen-2.5-VL (open) 28.21 11.83 4.86
Gemini 2.5 Pro 24.49 0.20
GPT-4.5 21.65 13.40 7.19
GPT-4o 17.88 12.00 9.00
Human baseline 61.11 83.0 79.0
{
  "tooltip": { "trigger": "axis" },
  "grid": { "left": "3%", "right": "4%", "bottom": "3%", "containLabel": true },
  "xAxis": { "type": "category", "data": ["Qwen2.5-VL", "Gemini2.5Pro", "GPT-4.5", "GPT-4o", "Human"] },
  "yAxis": { "type": "value", "name": "ANLS* (answer accuracy)" },
  "series": [
    {
      "type": "bar",
      "data": [
        { "value": 28.21, "itemStyle": { "color": "#4f8ef7" } },
        { "value": 24.49, "itemStyle": { "color": "#4f8ef7" } },
        { "value": 21.65, "itemStyle": { "color": "#4f8ef7" } },
        { "value": 17.88, "itemStyle": { "color": "#4f8ef7" } },
        { "value": 61.11, "itemStyle": { "color": "#5cc88a" } }
      ],
      "barMaxWidth": 50,
      "itemStyle": { "borderRadius": [4, 4, 0, 0] },
      "label": { "show": true, "position": "top" }
    }
  ]
}

Publication

Pal, A., Biswas, S., Das, A. et al. NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding. 2025. (arXiv:2504.09249)