Swin-TESTR

Multi-lingual pre-training for domain-adaptive text spotting

Swin-TESTR investigates domain-adaptive scene text spotting — leveraging multi-lingual datasets for pre-training to enhance text spotting across languages, synth-to-real, and document domains. Published at WACV 2024.

Problem

Existing approaches pretrain on natural-scene text without exploiting the intermediate feature representations shared between multiple domains.

Approach

  • A transformer baseline (Swin-TESTR) for both regular and arbitrary-shaped text spotting
  • Exploits intermediate representations across domains to improve transfer
  • Exhaustive evaluation demonstrating gains in accuracy and efficiency across multiple text-spotting benchmarks

Results

Detection H-mean across standard benchmarks plus low-resource Vietnamese (VinText):

Method TotalText CTW1500 ICDAR-15 IC15 E2E (S)
ABCNet v2 87.0 84.7 88.1 82.7
TESTR 86.90 86.3 90.0 85.2
Swin-TESTR 87.95 88.19 90.13 86.63
{
  "tooltip": { "trigger": "axis", "formatter": "{b}: {c}%" },
  "grid": { "left": "3%", "right": "4%", "bottom": "3%", "containLabel": true },
  "xAxis": { "type": "category", "data": ["TotalText", "CTW1500", "ICDAR-15", "VinText"] },
  "yAxis": { "type": "value", "name": "detection H-mean (%)", "max": 100 },
  "series": [
    {
      "name": "Swin-TESTR",
      "type": "bar",
      "data": [87.95, 88.19, 90.13, 73.2],
      "barMaxWidth": 55,
      "itemStyle": { "color": "#4f8ef7", "borderRadius": [4, 4, 0, 0] },
      "label": { "show": true, "position": "top" }
    }
  ]
}

Publication

Das, A., Biswas, S., Banerjee, A., Lladós, J., Pal, U., Bhattacharya, S. Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance. WACV 2024. (arXiv:2310.00917)