• ବାର୍ତ୍ତାଳାପ (Bartālāpa)

    Building Odia Speech Technology for the People

  • Problem Statement

    Odia remains an underrepresented language in conversational AI, with existing speech systems struggling to handle strong dialectal variations, regional accents, and noisy real-world speech data. Most available Odia speech datasets are limited in scale, biased toward standard forms, and recorded in controlled environments, leading to poor performance in natural conversations. As a result, Odia speakers experience inaccurate recognition, unnatural responses, and limited access to voice-based AI technologies.

    Solution

    ବାର୍ତ୍ତାଳାପ ( Bartālāpa) is a community-driven initiative to collect, curate, and process Odia speech data for developing advanced speech technologies.

    The project focuses on building robust models for speech recognition, text-to-speech, and conversational AI, making Odia language technologies accessible to the common people while preserving linguistic and cultural heritage.

    Scope

    The scope of ବାର୍ତ୍ତାଳାପ (Bartalapa) is centered on Odia speech dataset preparation and ASR research, with a particular emphasis on dialectal variation and noisy real-world speech.

    The project includes:

    • Collection and curation of diverse Odia speech data covering multiple regional dialects, accents, speaking styles, and background noise conditions
    • Annotation and quality control of speech transcripts with rich metadata (dialect, region, noise type, recording setup)
    • Design and benchmarking of Odia ASR systems, including dialect-aware and noise-robust models, to study performance gaps and mitigation strategies
    • Dataset standardization and documentation to support reproducibility and broader research use

    As one component of the project, selected datasets and evaluation protocols will be adapted for participation in international shared tasks (e.g., IWSLT-style low-resource or dialectal ASR tracks), enabling external benchmarking and community engagement.

  • Ongoing Speech Data Collection & Annotation

    We are currently conducting ongoing data collection and annotation of Odia speech, targeting high-quality transcribed speech for use in a shared task setting.

    To volunteer, contact Anshuman (anshumanmishra274@gmail.com) for annotation guidelines and process details.

    Section image
    Section image
  • Team

    Researcher and Volunteers Contributing to the Project

    Section image

    Dr. Atul Kr. Ojha

    Researcher, Insight Research Ireland Centre for Data Analytics, DSI, University of Galway

    Section image

    Anshuman Mishra

    Volunteer

    Section image

    Sushanta Mishra

    Volunteer

    Section image

    Dr. John Philip McCrae

    Associate professor, Insight Research Ireland Centre for Data Analytics, DSI, University of Galway

    Section image

    Srinibash Samal

    Volunteer

    Section image

    Bhakti Kanungo

    Volunteer

  • OdiaGenAI Developed Speech Processing System

    ASR

    Section image

    Olive Odia ASR

    Whisper is an open-source, multilingual speech recognition and translation model from OpenAI, supporting 98+ languages, with human-level English ASR, LoRA-based optimization, and accelerated inference via CTranslate2 and GGML.

    Section image

    OdiaGenAI Speech Recognition

    OdiaGenAI Speech Recognition, which convert spoken audio into written text

  • Contact

    Feel free to reach out to us with any questions about the project or collaboration opportunities.