Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

snailcoder/TextCNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

16 Commits

Repository files navigation

This is a tensorflow version of TextCNN proposed by Yoon Kim in paper Convolutional Neural Networks for Sentence Classification.

There are two implementations here. In the earlier implementation in the old directory, I try to structure the model by class and some interfaces such as inference, training, loss and so on. Later, I found that using TFRecord dataset to train is more efficient, so I reimplement this project in a new structure with tf.dataset. The new version is in the new directory.

There is an excellent tutorial here. This blog and the implementation introduced in it give a great help to me.

How to train the model(new)?

  1. Download the Google (Mikolov) word2vec file.
  2. Preprocess movie reviews data, build vocabulary, create dataset for training and validation, and store them in TFRecord files:
cd new
python preprocess_dataset.py --pos_input_file /path/to/positive/examples/file --neg_input_file /path/to/negative/examples/file --output_dir /path/to/save/tfrecords

Note: the clean movie reviews dataset, rt-polarity.pos and rt-polarity.neg, are originally taken from Yoon Kim's repository. You can use them directly to generate TFRecords.

  1. Train the TextCNN:
python train.py --input_train_file_pattern "/path/to/save/tfrecords/train-?????-of-?????" --input_valid_file_pattern "/path/to/save/tfrecords/valid-?????-of-?????" --w2v_file /path/to/google/word2vec/file --vocab_file /path/to/vocab/file --train_dir /path/to/save/checkpoints

Experiment result

With the default settings in configuration.py, the model obtained a dev accuracy of 78% without any fine-tuning.

About

Tensorflow implement of the TextCNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /