Stars
A scalable web crawler framework for Java.
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
A Java package to automatically detect anomalies in large scale time-series data
How to spot first stories on Twitter using Storm.
Event Detection With CLustering of Wavelet-based Signals (EDCoW) - Based on the paper 'Event Detection in Twitter' by Jianshu Weng, Bu-Sung Lee ICWSM 2011
A novel method for first story detection on Twitter data. Includes a sample dataset.