Commit 0456129

committed

Added Reddit-scraping-and-flair-detection folder

1 parent ce7e971 commit 0456129Copy full SHA for 0456129

File tree

+4559

-0

lines changed

+4559

-0

lines changed

12 KB

Binary file not shown.

6 KB

Binary file not shown.

Lines changed: 1015 additions & 0 deletions

Large diffs are not rendered by default.

Lines changed: 1364 additions & 0 deletions

Large diffs are not rendered by default.

Lines changed: 27 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,27 @@`
	`1`	`+# Reddit Flair Detector`
	`2`	`+## Steps followed:`
	`3`	`+`
	`4`	`+Described each step along with code in the notebooks.`
	`5`	`+`
	`6`	`+### Step 1: Extraction of r/india data`
	`7`	`+Used praw library of python for extraction.`
	`8`	`+`
	`9`	`+### Step 2: Exploratory Data Analysis`
	`10`	`+Analysed the data using graphs and scattered points as well as correlation. Used matplotlib library for the same.`
	`11`	`+`
	`12`	`+### Step 3: Made Reddit Flair Detector. Performed the following the steps:`
	`13`	`+- Preprocessed the data: Removed stopwords and performed stemming on the data`
	`14`	`+- Diving into training and test: Divided the dataset into training and test set. Used standard, 0.7:0.3 metric`
	`15`	`+- Testing accross classifiers: Tested along 3 classifiers: Naive Bayees, SVM and Logisitic Regression. Checked accuracy of each of the classifiers.`
	`16`	`+- Saving the model: Saved the model with highest accuracy in a .sav file to use it for prediction.`
	`17`	`+- Model testing: Take input URL from the user and return the predicted and actual flairs. Call the saved model for predicted flairs`
	`18`	`+`
	`19`	`+### How it works:`
	`20`	`+The model reads all the urls in the file line by line and predict the flair`
	`21`	`+- The same is stored in json file.`
	`22`	`+`
	`23`	`+### Output:`
	`24`	`+`
	`25`	`+It will be a key and predicted flair as value.`
	`26`	`+`
	`27`	`+`