In this Project, we perform multimodal sentiment analysis on twitter data comprised of tweets containing both text and images Mohammed, D. J., & Aleqabie, H. J. (2022, September). The Enrichment Of MVSA Twitter Data Via Caption-Generated Label Using Sentiment Analysis. In 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT) (pp. 322-327). IEEE. to predict the sentiment behind the tweets. The sentiment is classified into three different categories: Positive, Neutral and Negative.
- Overview
- Table of Contents
- Datasets
- Model Architecture
- Preprocessing
- Training
- Evaluation
- Usage
- Dependencies
The following dataset has been used for this project : Mohammed, D. J., & Aleqabie, H. J. (2022, September). The Enrichment Of MVSA Twitter Data Via Caption-Generated Label Using Sentiment Analysis. In 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT) (pp. 322-327). IEEE. which can be found here.
The captions are corresponding labels are available in LabeledText.xlsx, feature engineering has been done to add the following feature columns:
- Caption Length : Indicating length of captions
- Hashtags : Extracting and collecting all the hashtags used in each tweet
- Total Hashtags : Showing the total number of hashtags in each tweets
The code to do this is available to run in
Scripts/Text/FeatureEng.py, the engineered data is then saved as a csv file toData/Text/Engineered.csv
Afterwards, the embeddings are generated for captions and hashtags using TF-IDF approach and BERT.
The code to do this is in
Scripts/Text/Preprocess.py, the functiontfidf_preprocessing()and classBERT_Embeddingsis present inCustomFunctions.pyand then the embeddings are saved inData/Text/TF-IDFandData/Text/BERTalong with target labels and number of captions.
All the dependencies in the project are mentioned in requirements.txt file. To install all dependencies run the following command in your terminal:
pip install -r requirements.txt