Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ GWork Public

Classify gunosy news articles by Naive Bayes classifier and predict article category at django server

Notifications You must be signed in to change notification settings

shunk031/GWork

Repository files navigation

G Work

Install requirement modules

$ pip install -r python_requirements.txt

Collect news article from gunosy

$ cd scripts
$ python crawl_page.py CATEGORY # specify article category

The collected articles are stored in data of the current directory.

Preprocess

Make single csv file

Make each article data to one CSV file for each category. The CSV file is stored in GClassifier/dataset/row.

$ cd scripts
$ python make_single_file.py all # specify article category or all

Wakatigaking

Do wakatigaki data and format it. Output CSV file is stored in GClassifier/dataset/preprocess.

$ cd GClassifier
$ python g_preprocess.py all --wakati_type mecab-noun
# if you use word-level n-gram
$ python g_preprocess.py all --wakati_type word-ngram --ngram_n 2

Train Naive Bayes model and dump it.

Train Naive Bayes model using the wakatigaking data and dump it to GClassifier/naive_bayes_model.pkl.

$ cd GClassifier
$ python dump_classifier.py mecab-noun_all # or n-gram_all

Run Predict news category server

Run the server and access http://localhost:8000/predict_category/ then enter gunosy article URL.

$ python manage.py runserver

Model validation

We evaluated classifier using 5-fold cross validation. The result is here

$ cd Gclassifier
$ python train_cross_validation.py mecab-noun_all --kfold 5

About

Classify gunosy news articles by Naive Bayes classifier and predict article category at django server

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /