The
Reuters-21578 Text Categorization Test Collection. This data set
precisely defines the "Modified Apté" split used in the
chapter (David Lewis' updating of Apté et al.'s (1994) split to the
revised Reuters-21578 collection, which he prepared).
Naive
Bayes software for learning to classify text
And a different set of training/testing data for text classifiers. Part
of the online companion for Tom Mitchell's Machine Learning text. Based
on the Rainbow/Libbow software package (by Andrew McCallum).