

Odia Lipi - ଓଡ଼ିଆ ଲିପି
Empowering Odia with OCR Technology

Problem Statement
Odia is a low-resource Indic language with limited support for Optical Character Recognition (OCR). Existing OCR systems struggle with both printed and handwritten Odia text, and none perform reliably on handwritten documents.
Most Odia literature, newspapers, and historical manuscripts exist in palm-leaf manuscripts, scanned images, or physical formats, making them difficult to digitize, search, and process.
Key challenges:
- Complex Odia ligatures and diacritics are hard for current OCR systems to recognize.
- Limited datasets for training modern OCR models.
- Lack of open-source, high-accuracy OCR tools.
- Handwritten Odia text is largely unsupported by existing solutions.
Without a robust OCR system, Odia text remains inaccessible for digital archiving, computational analysis, and language technology applications.
Solution
OdiaLipi (ଓଡ଼ିଆ ଲିପି) is a modern OCR system designed to accurately recognize both printed and handwritten Odia text. Our approach includes:
- Preparing high-quality Odia datasets from scanned books, manuscripts, newspapers, and palm-leaf documents.
- Advanced text recognition models to handle complex Odia ligatures and diacritics.
- Preprocessing pipelines to enhance scanned documents and improve OCR performance.
- Conversion to Unicode digital text, making Odia content editable, searchable, and machine-readable.
- Open-source datasets and tools to support research, development, and community use.
- Handwritten text recognition, addressing the gap in existing OCR solutions.
- Multimodal LLM development, combining text and image inputs to improve OCR accuracy and enable advanced Odia language understanding.
This solution aims to digitize Odia literature and historical documents, preserve cultural heritage, and empower language technology applications such as NLP, text-to-speech, and AI-based content analysis.
Scope
Its scope includes:
- Digitization of Odia literature, newspapers, and manuscripts for education, research, and cultural preservation.
- Support for both printed and handwritten text, addressing gaps in existing OCR systems.
- High-quality dataset creation for training and benchmarking OCR and multimodal models.
- Development of multimodal LLMs, combining image and text inputs to improve OCR accuracy and enable advanced language understanding.
- Integration with language technology applications, including text-to-speech, translation, summarization, and NLP pipelines.
- Open-source resources for the research community, developers, and Odia language enthusiasts.
- Empowering digital access to Odia script, making it searchable, editable, and usable for AI-driven applications.
Contact
Feel free to reach out to us with any questions about the project or collaboration opportunities.
Name *Email *Phone *Message *






