Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

lszxlong/spark-LDA-example

Repository files navigation

spark-LDA-example

A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.

We are going to perform these procedures for the document clustering, these steps include:

  1. Spark RegexTokenizer : For Tokenization

  2. Stanford NLP Morphology : For Stemming and lemmatization

  3. Spark StopWordsRemover : For removing stop words and punctuation

  4. Spark TF-IDF : For computing term frequencies or tf-idf

  5. Spark LDA : For Clustering of documents.

About

A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

  • Scala 100.0%

AltStyle によって変換されたページ (->オリジナル) /