|
1 | | -## Exploration of Sentiment Analysis using Lexicon and Machine-Learning Based Methods |
| 1 | +## Exploration of Sentiment Analysis |
2 | 2 |
|
3 | 3 | This repo provides the submission entry for an in-class NLP sentiment analysis competition held at Microsoft AI Singapore group using techniques learned in class to classify text in identifying positive or negative sentiment.
|
4 | 4 |
|
5 | | -Data for this in-class competition comes from the `Sentiment140` dataset where the training and test data consists of randomly sampled 10% and 5% of the Sentiment140 dataset. |
| 5 | + |
| 6 | + |
| 7 | +Recommended to install [Anaconda](https://www.anaconda.com/products/distribution), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. Alternatively, you can make use of [Google Colaboratory](https://colab.research.google.com/), which allows you to write and execute Python codes in your browser. |
| 8 | + |
| 9 | +**Data** |
| 10 | + |
| 11 | +Data for this in-class competition comes from the [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140) dataset where the training and test data consists of randomly sampled 10% and 5% of the dataset. |
6 | 12 |
|
7 | | -- Text Pre-processing |
8 | | -- VADER (VALENCE based sentiment analyzer) |
| 13 | +## Getting started using Lexicon and Machine Learning (ML) based methods |
| 14 | +Open `SentimentAnalysis.ipynb` on a jupyter notebook environment. Alternatively, you can view the codes in Google Colab [](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis.ipynb). |
| 15 | + |
| 16 | +- VADER (VALENCE based sentiment analyzer) (67%) |
9 | 17 | - Naive Bayes
|
10 | | -- Linear SVM (Support Vector Machine) |
| 18 | +- Linear SVM (Support Vector Machine) (80%) |
11 | 19 | - Decision Tree
|
12 | 20 | - Random Forest
|
13 | 21 | - Extra Trees
|
14 | | -- SVC |
| 22 | +- SVC (80%) |
15 | 23 |
|
16 | | - |
| 24 | +## Exploring using Deep Learning Techniques (LSTM) |
| 25 | +Open `SentimentAnalysis_RNN.ipynb` on a jupyter notebook environment. Alternatively, you can view the codes in Google Colab [](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis_RNN.ipynb). |
| 26 | + |
| 27 | +The LSTM deep learning method (79%) did not perform better than SVC/SVM method |
17 | 28 |
|
18 | | -## Getting started |
19 | | -Open `SentimentAnalysis.ipynb` on a jupyter notebook environment. Alternatively, you can view the codes in Google Colab [here](https://drive.google.com/open?id=1d_po5AQDFRovk4livi2kvv1hhjPLxqAC). The notebook consists of further technical details. |
| 29 | +## How about the BERT Transformers model |
| 30 | +Open `SentimentAnalysis_BERT.ipynb` on a jupyter notebook environment. Alternatively, you can view the codes in Google Colab [](https://githubtocolab.com/KwokHing/SentimentAnalysis-Python-Demo/blob/master/SentimentAnalysis_BERT.ipynb). |
20 | 31 |
|
21 | | -## Improvements |
22 | | -Could potentially explore the use of Deep Learning Techniques such as RNN and/or LSTM for sentiment analysis |
| 32 | +The State-of-the-Art transformer model performs slightly better at 82% accuracy |
23 | 33 |
|
24 | 34 | <!---
|
25 | 35 | # Walk-through of the submission entry:
|
|
0 commit comments