lszxlong/spark-LDA-example

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
project		project
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
activator		activator
activator.bat		activator.bat
activator.properties		activator.properties
build.sbt		build.sbt

Repository files navigation

spark-LDA-example

A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.

We are going to perform these procedures for the document clustering, these steps include:

Spark RegexTokenizer : For Tokenization
Stanford NLP Morphology : For Stemming and lemmatization
Spark StopWordsRemover : For removing stop words and punctuation
Spark TF-IDF : For computing term frequencies or tf-idf
Spark LDA : For Clustering of documents.

About

A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.

blog.knoldus.com/2016/10/08/spark-lda-clustering/

Releases

No releases published

Packages

Contributors

Languages

Scala 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lszxlong/spark-LDA-example

Folders and files

Latest commit

History

Repository files navigation

spark-LDA-example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spark-LDA-example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages