Harnessing Power of Zero-Shot to Tag News Headline

Image by Hrishikesh Kanekar

Zero-Shot Learning in Text Classification is an effective way to predict the class label without any prior training data, It can be used for tasks such as sentiment analysis, document classification, emotion analysis. Zero-Shot approach uses a transfer learning approach to achieve this amazing feat.

ero-Shot Classification model is pre-trained language model by default loads abart-large-mnliwhich serves as the knowledge base as it has been trained on a huge amount of text data and which is essentially fine-tuned on a Natural Language Inference(NLI) task to classify corresponding labels as contradiction , neutral and entailment.

The Multi-Genre Natural Language Inference Corpus (Williams et al., 2018) has 433k Statement pairs annotated with textual entailment information.
The NLI is a supervised task that determines if statement pairs i.e Premise/ Hypothesis is entailment, contradiction orneutral.

Supervised Natural Language Inference(NLI) Task

In the Zero-Shot classification task, the input to the model is constructed as Premise/Hypothesis pair such that Premise is the Text itself & Hypothesis has a template of “this example is {}” for each Candidate Label.

: Two militants killed as Army foils infiltration in J&K
Candidate Label: [‘defence’, ‘politics’, ‘sports’]

The input to model is constructed in this manner:

Premise : “Two militants killed as Army foils infiltration in J&K”
Hypothesis 1: This example is about defence.

Premise : “Two militants killed as Army foils infiltration in J&K”
Hypothesis 2: This example is about politics.

Premise : “Two militants killed as Army foils infiltration in J&K”
Hypothesis 3: This example is about sports.

We can clearly see that the Hypothesis 1 entails the Premise.

For each Premise/Hypothesis pairs the model outputs logits for over the three categories i.e(entailment, contradiction or neutral). Only the values of entailment score is SoftMax over the entire Candidate Label such that its values sums up to 1; The max probability value is the output label prediction made by the model.

For this example, model outputs entailment for defence label with max probability score.

Experiment on News Headline

In the example below we will see how to use HuggingFace Zero-Shot pipeline to categorize news headline into any one of the 3 Candidate Label i.e(‘politics’, ‘defence’, ‘sports’’). For this, I collected 1037 news headline by web-scraping news website’s /Defence section.

All of the collected headlines with defence tag serve as labelled data, thus allowing us to evaluate the effectiveness of prediction.

Evaluation & Interpretation

After Inferencing the output prediction has 763 defence, 268 politics & 6 sports related headlines . Below we take a look at some output results and try to the interpret the model predictions.

Sample Output of the Experiment

· Defence:- The correctly classified headlines under Defence Category which account to almost 58.76% .

· Politics:- The headlines that appear in Politics section often overlap with the vocabulary, context and reporting style found in Defence headlines. Also the frequent occurrences of words like neta, political association , policies, reforms ,decision & mentions about killing, attacks, ceasefires etc. are both mutually inclusive to both Defence & Politics. There is indeed very thin line of distinction among these two classes; However the prediction made are clever on model’s part considering no training at all.

· Sports:- The misclassified defence related headlines as Sports has reporting style similar to those found in sports reporting with vocabulary like country names eg. India, Japan & words such as participation, exercise, represent, long range drones, test, meet, clash which forces model to predict SPORTS tag, which seems to confuse it due to limited context.

As we saw Zero-Shot leverages the latent information learned from unsupervised learning objective like LM and other supervised task such as NLI in a way that it gives remarkably stable and worthy output without any need for expensive & time-consuming annotation/labelling.

Use cases of Zero-Shot Vs Few-Shot Vs Fine-Tuning:

Zero-Shot classification pipeline should be used to quickly bootstrap an idea especially in absence of annotated data.

Few-Shot classification pipeline is an extension of zero shot which can be used where little annotated data are available to further enhance the model classification capability.

Fine-Tuning classification pipeline should be considered if you have a good set of annotated data as it gives better performance and at a lower computational cost.

Additionally, ZSL allows prediction of multi-class categorization setup and also has multilingual model.

Library offering Zero-Shot pipeline:




Data Science Enthusiast. Love Applied Research.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store