shunk031/GWork

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
GClassifier		GClassifier
GCrawler		GCrawler
GWork		GWork
predict_category		predict_category
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
manage.py		manage.py
python_requirements.txt		python_requirements.txt

Repository files navigation

G Work

Install requirement modules

$ pip install -r python_requirements.txt

Collect news article from gunosy

$ cd scripts
$ python crawl_page.py CATEGORY # specify article category

The collected articles are stored in data of the current directory.

Preprocess

Make single csv file

Make each article data to one CSV file for each category. The CSV file is stored in GClassifier/dataset/row.

$ cd scripts
$ python make_single_file.py all # specify article category or all

Wakatigaking

Do wakatigaki data and format it. Output CSV file is stored in GClassifier/dataset/preprocess.

$ cd GClassifier
$ python g_preprocess.py all --wakati_type mecab-noun
# if you use word-level n-gram
$ python g_preprocess.py all --wakati_type word-ngram --ngram_n 2

Train Naive Bayes model and dump it.

Train Naive Bayes model using the wakatigaking data and dump it to GClassifier/naive_bayes_model.pkl.

$ cd GClassifier
$ python dump_classifier.py mecab-noun_all # or n-gram_all

Run Predict news category server

Run the server and access http://localhost:8000/predict_category/ then enter gunosy article URL.

$ python manage.py runserver

Model validation

We evaluated classifier using 5-fold cross validation. The result is here

$ cd Gclassifier
$ python train_cross_validation.py mecab-noun_all --kfold 5

About

Classify gunosy news articles by Naive Bayes classifier and predict article category at django server

Releases

No releases published

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shunk031/GWork

Folders and files

Latest commit

History

Repository files navigation

G Work

Install requirement modules

Collect news article from gunosy

Preprocess

Make single csv file

Wakatigaking

Train Naive Bayes model and dump it.

Run Predict news category server

Model validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

shunk031/GWork

Folders and files

Latest commit

History

Repository files navigation

G Work

Install requirement modules

Collect news article from gunosy

Preprocess

Make single csv file

Wakatigaking

Train Naive Bayes model and dump it.

Run Predict news category server

Model validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages