|
1 | 1 | # Twitter Sentiment Analysis Using InSet (Indonesia Sentiment Lexicon) and Random Forest Classifier |
2 | 2 |
|
3 | | -I conducted a comparison of word weighting using Count Vectorizer and TF-IDF, in classifying sentiment on tweets of motorcycle racing events at Mandalika Circuit. The dataset was taken from Twitter of tweets from February 04, 2022, to March 27, 2022. Then the dataset is cleaned to get a clean dataset that will be used in the next stage, using text preprocessing techniques. After that, sentiment labeling will be carried out, using `Inset Lexicon`. The last stage is data modeling. Data modeling is done using the `Random Forest Classifier` method to get the final result in the form of a classification model and the results of the comparison between `Count Vectorizer and TF-IDF` word weighting. |
| 3 | +I conducted a comparison of word weighting using Count Vectorizer and TF-IDF, in classifying sentiment on tweets of motorcycle racing events at Mandalika Circuit. The dataset was taken from Twitter of tweets from February 04, 2022, to March 27, 2022. Then the dataset is cleaned to get a clean dataset that will be used in the next stage, using text preprocessing techniques. After that, sentiment labeling will be carried out, using `Inset Lexicon`. The last stage is data modeling. Data modeling is done using the `Random Forest Classifier` method to get the final result in the form of a classification model and the results of the comparison between `Count Vectorizer and TF-IDF` word weighting. `The results of this analysis show that the use of Count Vectorizer is still better at analyzing sentiment on Mandalika Circuit tweets with an accuracy of 79.89% when compared to the use of TF-IDF which only gets an accuracy of 79.23%`. |
4 | 4 |
|
5 | 5 | ## Requirements |
6 | 6 | There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows. |
|
0 commit comments