DA-TextSpotter

Domain-agnostic scene text spotting in multi-domain noisy scenes

DA-TextSpotter tackles domain-agnostic scene text spotting — training a single model on multi-domain source data so it generalizes directly to unseen target domains, rather than specializing for one scenario. Published at ICRA 2024.

Problem

State-of-the-art methods pretrain and fine-tune on natural-scene datasets and fail to exploit feature interaction across complex domains (e.g. underwater and document scenes).

Approach

  • An efficient super-resolution based end-to-end transformer with a Swin backbone and dual text-localization / recognition decoders
  • Introduces the Under-Water Text (UWT) validation benchmark for noisy underwater scenes
  • Matches or exceeds existing spotters on regular and arbitrary-shaped benchmarks in both accuracy and efficiency

Results

On the new Under-Water Text (UWT) benchmark, DA-TextSpotter more than doubles end-to-end accuracy over prior spotters:

Method P R F E2E (None)
TESTR 92.24 33.86 49.54 29.63
SwinTextSpotter 83.21 34.49 48.77 29.08
DA-TextSpotter 95.65 48.73 64.57 64.15
{
  "tooltip": { "trigger": "axis", "formatter": "{b}: {c}" },
  "grid": { "left": "3%", "right": "4%", "bottom": "3%", "containLabel": true },
  "xAxis": { "type": "category", "data": ["TESTR", "SwinTextSpotter", "DA-TextSpotter"] },
  "yAxis": { "type": "value", "name": "UWT end-to-end (None) %" },
  "series": [
    {
      "type": "bar",
      "data": [
        { "value": 29.63, "itemStyle": { "color": "#b9c4d0" } },
        { "value": 29.08, "itemStyle": { "color": "#b9c4d0" } },
        { "value": 64.15, "itemStyle": { "color": "#4f8ef7" } }
      ],
      "barMaxWidth": 60,
      "itemStyle": { "borderRadius": [4, 4, 0, 0] },
      "label": { "show": true, "position": "top" }
    }
  ]
}

Publication

Das, A., Biswas, S., Pal, U., Lladós, J. Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes. ICRA 2024. (arXiv:2310.00558)