Hindi News Sentiment Classifier : Streamlit App

Saurabhk
3 min readFeb 11, 2022

--

HIndi News Sentiment Classifier App using finetuned RoBERTa classifier

Pic Credit: Mr. Saraf (Kotigao, Goa)

Demonstrating ML models capability in jupyter notebook or via an API can be less impactful to convey overall models possibilities and its uses..

To mitigate this there are various framework in python like Gradio, Streamlit, Plotly that lets you quickly build an WebApp around it. Often now a days Streamlit is a first choice considering its simplicity , opensource plugin / component and ease of hosting which has made it a preferred ecosystem for quick prototyping.

So I also decided to try my hands dirty on the streamlit framework to build what I call News SentiMeter App.

News SentiMeter:

The News SentiMeter is an Intelligent System that classifies हिंदी News Articles in current affairs domain broadly into Positive, Negative, Neutral labels. The classification is primarily performed at each sentence level of the article and output prediction labels are combined to give final score.

The post focus around streamlit app and not on finetuning details.

Fig 1: News URL as Input

News SentiMeter app expects a valid News URL that you wish to analyze as shown in Fig. 1

Fig 1: Output of Article

In Fig. 2 The pie chart shows positive, negative, neutral sentiment score sentence of the Overall News Article along with a wordcloud figure of important terms appeared in the application. Sentences are highlighted as per color scheme to denote model predictions (good for Probe Model Prediction).

Model Building:

Simple Transformer is used to finetune pretrained RoBERTa model on Hindi language. Data distribution among annotated labels is positive— 3040,
negative — 3104, neutral — 2591 sentences. After training for 3 epochs model gave best result (triggered due to early stopping).

Model gave modest 69% on held out dataset for this domain specific task

Streamlit App Building Blocks:

Important components used to build the app

> Streamlit Components :-

st-annotated-text — Streamlit component is used to highlight sentences with respective text colors.

Wordcloud package — Displaying max word count based of positive & negative sentence prediction

matplotlib package — To display Pie Chart

@st.cache(persist=True) — To persist model and avoid reloading using streamlit default @st.cache decorator when making model prediction.

> Webscraping :-

To extract data(News Headline and Article) directly from the provided URL. Implemented Webscraper for various news agency like IndiaTV, NDTV, TV9, AajTak .

> Inferencing :-

The best performing model along with the preprocessing steps is used as a part of Streamlit app to make sentence level prediction.

In this manner we can put along messy Jupyter code to nice scripts and use the Streamlit component to make pretty UI, quickly!.

Also checkout how to perform Topic Modelling on Hindi Text: NLP

Streamlit lets you turn data scripts into sharable web apps with interactive charts, visualization which enhances the app presentation.

--

--