TexTAR – Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images

Accepted at ICDAR 2025 (ORAL)

Rohan Kumar · Jyothi Swaroopa Jinka · Ravi Kiran Sarvadevabhatla
International Institute of Information Technology Hyderabad

Paper

Hugging Face GitHub

Abstract

Recognising textual attributes such as bold, italic, underline and ~~strikeout~~ is essential for understanding text semantics, structure and visual presentation. Existing methods struggle with computational efficiency or adaptability in noisy, multilingual settings. To address this, we introduce TexTAR, a multi-task, context-aware Transformer for Textual Attribute Recognition (TAR). Our data-selection pipeline enhances context awareness and our architecture employs a 2-D RoPE mechanism to incorporate spatial context for more accurate predictions. We also present MMTAD, a diverse multilingual dataset annotated with text attributes across real-world documents. TexTAR achieves state-of-the-art performance in extensive evaluations.

Textual Attributes in the Dataset

Image	T₁		T₂
Image	bold	italic	underline	strikeout
	✗	✗	✗	✗
	✗	✗	✓	✗
	✓	✗	✓	✗
	✗	✗	✓	✓
	✓	✓	✗	✗
	✗	✓	✓	✗
	✓	✗	✗	✓

Chart – distribution of annotated attributes in our dataset.

Data-selection Pipeline

Model Architecture

Comparison with State-of-the-Art Approaches

Methods	normal	T₁ group			T₂ group			Average
Methods	normal	bold	italic	b & i	underline	strikeout	u & s	Average
Baselines
ResNet-18 [5]	0.97	0.75	0.88	0.77	0.68	0.97	0.99	0.86
ResNet-50 [5]	0.97	0.74	0.89	0.69	0.73	0.98	0.99	0.86
ResNeXt-101 [18]	0.97	0.77	0.91	0.74	0.78	0.99	0.99	0.88
EfficientNet-b4 [15]	0.97	0.75	0.90	0.62	0.75	0.98	0.99	0.85
Variants
DeepFont [17]	0.97	0.72	0.80	0.44	0.64	0.93	0.98	0.78
DropRegion† [21]	0.98	0.77	0.90	0.61	0.75	0.97	0.99	0.85
MTL [9]	0.97	0.75	0.89	0.64	0.70	0.97	0.99	0.84
TaCo† [10]	0.97	0.79	0.90	0.60	0.78	0.86	0.89	0.83
CONSENT† [12]	0.98	0.86	0.93	0.84	0.81	0.96	0.98	0.91
TexTAR (Ours)	0.99	0.92	0.95	0.90	0.87	0.99	0.99	0.94

All scores are F₁. u & s = underline & strikeout, b & i = bold & italic. † = our re-implementation.

Visualization of results for a subset of baselines and variants in comparison with TexTAR

Download the Dataset and Weights

Model weights and the MMTAD testset can be downloaded from the link. To get access to the full dataset, please contact ravi.kiran@iiit.ac.in.

Citation

@article{Kumar2025TexTAR,
  title   = {TexTAR: Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images},
  author  = {Rohan Kumar and Jyothi Swaroopa Jinka and Ravi Kiran Sarvadevabhatla},
  booktitle = {International Conference on Document Analysis and Recognition, ICDAR},
  year    = {2025}
}

Acknowledgements

International Institute of Information Technology Hyderabad, India.

Contact

rohan.kumar@students.iiit.ac.in
jinka.swaroopa@research.iiit.ac.in
ravi.kiran@iiit.ac.in