Zero Shot Text Classification Via Knowledge Graph Embedding For Social Media Data
Abstract
This project focuses on using advanced Natural
Language Processing (NLP) techniques for
classifying tweets related to COVID-19. By
employing Zero-Shot Classification, Sentence-
BERT (SBERT), and Roberta, the system predicts
possible labels for the tweets, which are then used to
train machine learning models for further
classification tasks. The Zero-Shot approach,
utilizing the Hugging Face pipeline, dynamically
assigns labels to a dataset without the need for predefined
label sets, making it particularly useful for
datasets with unknown or diverse categories.The
system proceeds by first extracting and embedding
tweet data using SBERT and Roberta, two popular
pre-trained transformer models, which are finetuned
for semantic similarity and sentence-level
representations. These embeddings are then used to
train deep learning models, specifically a Graph
Convolutional Neural Network (Graph-CNN),
designed to improve classification performance by
leveraging spatial correlations in tweet
embeddings.The evaluation of the models is
conducted using common metrics like accuracy,
precision, recall, and F1 score, and results are
visualized through confusion matrices and
performance graphs. This approach allows for an
in-depth analysis of the models' capabilities in
classifying tweet sentiments and topics, particularly
in the context of public health crises like COVID-
19. The system also provides an interactive interface
where users can input new sentences, which are
then classified into one of the pre-predicted
categories based on the learned models.
