Generative Chatbot for Odia Language: ChatGPT Vs. OdiaG...

Generative Chatbot for Odia Language: ChatGPT Vs. OdiaGenAI 

Authors: Aisha Asif and Parul Agarwal

(Note:: The blog post is based on the paper titled "Generative Chatbot Adaptation for Odia Language: A Critical Evaluation" presented at the IEEE International Conference on Circuits, Power, and Intelligent Systems (CCPIS) 2023 organized by Silicon Institute of Technology, Bhubaneswar, India) 

Overview

The field of Natural Language Processing and AI has witnessed considerable interest in Large Language Models (LLMs) due to their ability to generate text and engage in conversations that closely resemble human-like behavior. However, their prevalence in English limits their utility for non-English speakers, especially in India where only 10% of the population is proficient in English, the need for LLM models adapted to regional languages becomes crucial. LLMs for Odia, spoken by 50 million Indians, are the focus of this project. We want to improve Odia's conversational outputs by evaluating instruction-following models like ChatGPT and Olive. Our study tries to identify strengths, shortcomings, and areas for improvement to enhance Odia-speaking chatbots. While evaluating multilingual LLMs for Indic languages, we found that these are LLM models that support Indic languages:

ChatGPT is a chatbot that produces text that sounds like a person wrote it. The model produces accurate sequential replies and can recognize natural languages as input.
BLOOM is a dataset that includes a large number of sources. We discover that it performs competitively on a range of benchmarks, with better outcomes following multitask-prompted fine-tuning. 
For Indian consumers, BharatGPT is a novel chatbot design. Because of its sophisticated design, powerful processing, and low latency, it can comprehend and react to users promptly and accurately. 

Evaluation

The following parameters are used to evaluate: 

Temperature: LLM's temperature parameter affects text production unpredictably. Lower temperatures (like < 1) generate more focused and predictable writing, whereas higher temperatures (like > 1) produce more diversified and innovative writing. 

Top K: LLM uses top-K sampling to control text variation. It selects the top K tokens from a probability distribution at each generation stage.

Top P: Large language models govern text generation using top-p, also known as nucleus sampling. A cumulative probability threshold limits tokens. 

Beams: The beams represent many possibilities, and the algorithm keeps the highest-scoring sequences at each level. By considering alternative options, beam search improves text quality and raises the possibility of diverse and high-quality results. 

Max Tokens: LLM's max tokens parameter limits text length. Max tokens can be adjusted to a value to restrict output length and prevent the model from creating nonsensical or excessive text.

The different results that were generated according to the different parameters by OdiaGenAI are presented in Figure 1:

Figure 1: Examples illustrate answers presented by the OdiaGenAI model on the basis of Parameter tuning

Here we represent the different categories for which we asked questions, the questions we asked, and the answers provided by the OdiaGenAI model and ChatGPT are given in Fig 2 and Fig 3 respectively.

 Figure 2: Examples illustrate answers from ChatGPT, asked in Odia Language

Comparison and Analysis

To conduct the comparison, We selected a sample of 20 categories and posed 5 questions in each category to both models. The evaluation was based on the answering capabilities of the models.

Here we define the parameters on the basis of which the comparison was done. The parameters are as follows:

Correct Answers: It signifies a response from the models that is both accurate and meaningful.

Incorrect Answers: This means that the answer provided is entirely incorrect and does not contain any accurate information.

Partially Correct Answers: It indicates situations where the models offered some accurate points but did not provide a comprehensive response.

Repeated Answers: The term refers to situations where the generated answer contained duplicate words or phrases.

Incomplete Answers: The label is used to describe situations where the models failed to provide sufficient information, not addressing all the points mentioned in the question.

AI Generated: The classification "AI" is used to identify instances where the language model acknowledged its limitations as artificial intelligence and couldn't provide answers to certain questions.

The complete information is given in Table 1.

Table 1:  This table represents the number of Correct, Incorrect, Partially correct, Repeated, Incomplete, AI Generated answers given by Olive and ChatGPT - 3.5.

We present the analysis as shown in Fig 4 and Fig 5:

Figure. 4: Graph presents the analysis of answer generation from OdiaGenAI model When parameters were temp = 0.55, top p= 0.75, top k = 34, Beams = 4, Max tokens = 413.

Figure 5: Graph presents the analysis of answer generation from ChatGPT

Result and Discussion

These results contribute to the understanding of the strengths and weaknesses of the models, aiding further advancements and improvements in natural language processing for low-resource languages like Odia.

Olive exhibited lower accuracy and higher rates of incorrect answers, along with tendencies for repetition and incomplete responses but it has the Text To Speech feature, which is unique. On the other hand, ChatGPT demonstrated a relatively higher accuracy rate, albeit with the reliance on continuous prompts for generating correct answers.

These findings highlight the need for further research and development to enhance the performance of both models. Improvements in Olive are necessary to increase its accuracy, reduce incorrect and incomplete responses, and minimize repetition. In the case of ChatGPT, efforts should focus on enabling the autonomous generation of accurate answers without the need for continuous prompts.

Conclusion

The study contributes to advancing generative chatbots in Odia, fostering inclusivity in AI-powered conversational systems, and promoting enhanced language support for non-English speakers. Ongoing initiatives like Olive continues to explore these advancements for a more diverse AI ecosystem.

Team members