At the advice of someone on SO I'm posting this here.
I am doing sentiment analysis on tweets. I have code that I developed from following an online tutorial (found here: http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/here) and adding in some parts myself, which looks like this:
When I ran this on my sample dataset, it all worked perfectly, although a little inaccurately (training set only had 50 tweets). My REAL training set however has 1.5 million tweets. I'm finding that using the default trainer provided by Python is just far too slow.
Is this too large a dataset to be used with the default Python classifier? Does anybody have any suggestions or alternatives that could be used to do this operation? In all responses please bear in mind I could only accomplish this with a tutorial and am totally new to Python (am usually a Java coder).
Original SO post: http://stackoverflow.com/questions/18154278/is-there-a-maximum-size-for-the-nltk-naive-bayes-classifer#18154932Original SO post
At the advice of someone on SO I'm posting this here.
I am doing sentiment analysis on tweets. I have code that I developed from following an online tutorial (found here: http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/) and adding in some parts myself, which looks like this:
When I ran this on my sample dataset, it all worked perfectly, although a little inaccurately (training set only had 50 tweets). My REAL training set however has 1.5 million tweets. I'm finding that using the default trainer provided by Python is just far too slow. Is this too large a dataset to be used with the default Python classifier? Does anybody have any suggestions or alternatives that could be used to do this operation? In all responses please bear in mind I could only accomplish this with a tutorial and am totally new to Python (am usually a Java coder)
Original SO post: http://stackoverflow.com/questions/18154278/is-there-a-maximum-size-for-the-nltk-naive-bayes-classifer#18154932
I am doing sentiment analysis on tweets. I have code that I developed from following an online tutorial (found here) and adding in some parts myself, which looks like this:
When I ran this on my sample dataset, it all worked perfectly, although a little inaccurately (training set only had 50 tweets). My REAL training set however has 1.5 million tweets. I'm finding that using the default trainer provided by Python is just far too slow.
Is this too large a dataset to be used with the default Python classifier? Does anybody have any suggestions or alternatives that could be used to do this operation? In all responses please bear in mind I could only accomplish this with a tutorial and am totally new to Python (am usually a Java coder).