• Odia Lipi - ଓଡ଼ିଆ ଲିପି

    Empowering Odia with OCR Technology

  • Problem Statement

    Odia is a low-resource Indic language with limited support for Optical Character Recognition (OCR). Existing OCR systems struggle with both printed and handwritten Odia text, and none perform reliably on handwritten documents.

    Most Odia literature, newspapers, and historical manuscripts exist in palm-leaf manuscripts, scanned images, or physical formats, making them difficult to digitize, search, and process.

    Key challenges:

    • Complex Odia ligatures and diacritics are hard for current OCR systems to recognize.
    • Limited datasets for training modern OCR models.
    • Lack of open-source, high-accuracy OCR tools.
    • Handwritten Odia text is largely unsupported by existing solutions.

    Without a robust OCR system, Odia text remains inaccessible for digital archiving, computational analysis, and language technology applications.

    Solution

    OdiaLipi (ଓଡ଼ିଆ ଲିପି) is a modern OCR system designed to accurately recognize both printed and handwritten Odia text. Our approach includes:

    • Preparing high-quality Odia datasets from scanned books, manuscripts, newspapers, and palm-leaf documents.
    • Advanced text recognition models to handle complex Odia ligatures and diacritics.
    • Preprocessing pipelines to enhance scanned documents and improve OCR performance.
    • Conversion to Unicode digital text, making Odia content editable, searchable, and machine-readable.
    • Open-source datasets and tools to support research, development, and community use.
    • Handwritten text recognition, addressing the gap in existing OCR solutions.
    • Multimodal LLM development, combining text and image inputs to improve OCR accuracy and enable advanced Odia language understanding.

    This solution aims to digitize Odia literature and historical documents, preserve cultural heritage, and empower language technology applications such as NLP, text-to-speech, and AI-based content analysis.

    Scope

    Its scope includes:

    • Digitization of Odia literature, newspapers, and manuscripts for education, research, and cultural preservation.
    • Support for both printed and handwritten text, addressing gaps in existing OCR systems.
    • High-quality dataset creation for training and benchmarking OCR and multimodal models.
    • Development of multimodal LLMs, combining image and text inputs to improve OCR accuracy and enable advanced language understanding.
    • Integration with language technology applications, including text-to-speech, translation, summarization, and NLP pipelines.
    • Open-source resources for the research community, developers, and Odia language enthusiasts.
    • Empowering digital access to Odia script, making it searchable, editable, and usable for AI-driven applications.
  • Section image

    Odia OCR Annotation Platform

    Developed By OdiaGenAI

  • Dataset and Model

    OCR Dataset and Model Developed By OdiaGenAI Collaboratively

    Section image
  • Team

    Section image

    Ganeshan K

    Researcher

    Section image

    Sk Shahid

    Researcher

  • Contact

    Feel free to reach out to us with any questions about the project or collaboration opportunities.

    Name *
    Email *
    Phone *
    Message *