I am trying to find what words appear the most often. But each time I run FreqDist it does not return the most common words but letters.
FreqDist({' ': 496, 'e': 306, 't': 205, 'a': 182, 's': 181, 'n': 160, 'o': 146, 'r': 142, 'i': 118, 'l': 110, ...})
Here is my code:
newdf['tokens1'] = newdf['review'].apply(word_tokenize) newdf['tokens1'] = newdf['tokens1'].apply(str)
for i in range(newdf.shape[1]):
# Add each comment.
review_comments = review_comments + newdf['tokens1'][i]
from nltk.probability import FreqDist
fdist = FreqDist(review_comments)
fdist
returns
FreqDist({' ': 496, 'e': 306, 't': 205, 'a': 182, 's': 181, 'n': 160, 'o': 146, 'r': 142, 'i': 118, 'l': 110, ...})
gtomer
6,6141 gold badge15 silver badges29 bronze badges
1 Answer 1
You need first yo use nltk.word_tokenize:
from nltk.tokenize import word_tokenize
tokens = nltk.word_tokenize(review_comments)
fdist = FreqDist(tokens)
fdist
answered Jul 22, 2023 at 17:16
gtomer
6,6141 gold badge15 silver badges29 bronze badges
Sign up to request clarification or add additional context in comments.
Comments
Explore related questions
See similar questions with these tags.
lang-py