[Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.14,1.15

montanaro@users.sourceforge.net montanaro@users.sourceforge.net
2002年8月27日 20:45:08 -0700


Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv4506
Modified Files:
	GBayes.py 
Log Message:
ehh - it actually didn't work all that well. the spurious report that it
did well was pilot error. besides, tim's report suggests that a simple
str.split() may be the best tokenizer anyway.
Index: GBayes.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/GBayes.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** GBayes.py	28 Aug 2002 00:43:44 -0000	1.14
--- GBayes.py	28 Aug 2002 03:45:06 -0000	1.15
***************
*** 108,116 ****
 return tokenize_ngram(string, 15)
 
- def tokenize_trigram(string):
- r"""tokenize w/ re '[\w$-]+', result squished to 3-char runs"""
- lst = "".join(_token_re.findall(string))
- return tokenize_ngram(string, 3)
- 
 # add user-visible string as key and function as value - function's docstring
 # serves as help string when -H is used, so keep it brief!
--- 108,111 ----
***************
*** 124,128 ****
 "split": tokenize_split,
 "split_fold": tokenize_split_foldcase,
- "trigram": tokenize_trigram,
 "words": tokenize_words,
 "words_fold": tokenize_words_foldcase,
--- 119,122 ----

AltStyle によって変換されたページ (->オリジナル) /