Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

VaasuDevanS/Natural-Language-Processing-Assignments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

14 Commits

Repository files navigation

Natural-Language-Processing-Assignments

University of New Brunswick Fall-2018 CS6765: Natural Language Processing

This Repository contains the python code for the Fall Term Assignments.
No usage of numpy/nltk in any of the code and developed using Python2.7 (built-in modules)
sklearn is used only in Assignment3 for Logistic Regression

Getting started

No Python-file Usage
1 tokenize.py
count.py
python tokenize.py FILE > FILE.tokens
python count.py FILE.tokens > FILE.freqs
2 lm.py
perplexity.py
python lm.py MODEL TRAIN_FILE TEST_FILE > OUTPUT
python perplexity.py OUTPUT
3 classify.py
score.py
python classify.py METHOD TRAIN_DOCS TRAIN_CLASSES TEST_DOCS > PREDICTED_CLASSES
python score.py PREDICTED_CLASSES TRUE_CLASSES
4 tag.py
accuracy.py
python tag.py TRAIN_FILE TEST_FILE METHOD > SYSTEM_OUTPUT
python accuracy.py TRUE_TAGS SYSTEM_OUTPUT
5 chatbot.py python chatbot.py METHOD

Arguments

No Arguments File-Location (in Individual Assignment folder)
1 FILE Data/tweets-en.txt.gz
2 MODEL
TRAIN_FILE
TEST_FILE
1 or 2 or interp
Data/reuters-train.txt
Data/reuters-dev.txt
3 METHOD
TRAIN_DOCS
TRAIN_CLASSES
TEST_FILE
TRUE_CLASSES
baseline or lr or lexicon or nb or nbbin
Data/train.docs.txt
Data/train.classes.txt
Data/dev.docs.txt
Data/dev.classes.txt
4 TRAIN_FILE
TEST_FILE
METHOD
TRUE_TAGS
Data/train.en.txt
Data/dev.en.words.txt
baseline or hmm
Data/dev.en.tags.txt
5 METHOD overlap
w2v
both

Assignment 2: - MODEL

  • 1 represents Unigram (with Add-1 smoothing)
  • 2 represents Bigram (with Add-k smoothing)
  • 3 represents Interpolated (both Unigram and Bigram)

Assignment 3: - METHOD

  • baseline represents Most-Frequent-Class-Baseline
  • lr represents Logistic Regression (used from skimage)
  • lexicon represents Sentiment Lexicon containing + and - words
  • nb represents Naive Bayes Model (with add-k smoothing)
  • nbbin represents Binarized Naive Bayes

Assignment 4: - METHOD

  • baseline represents Most-Frequent-Tag-Baseline
  • 2 represents Hidden Markov Model (Bigram with add-k smoothing) and Viterbi Algorithm

Assignment 5: - METHOD

  • overlap represents Chatbot responses based on the word overlap
  • w2v represents Response with highest Cosine value (from pre-trained vectors from fastText)
  • both represents both responses from overlap and w2v with their Cosine values

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /