[Python-checkins] python/nondist/sandbox/spambayes timtest.py,1.16,1.17

tim_one@users.sourceforge.net tim_one@users.sourceforge.net
2002年9月04日 21:32:24 -0700


Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv1336
Modified Files:
	timtest.py 
Log Message:
Added note about word length.
Index: timtest.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/timtest.py,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** timtest.py	5 Sep 2002 03:48:28 -0000	1.16
--- timtest.py	5 Sep 2002 04:32:22 -0000	1.17
***************
*** 436,443 ****
 n = _len(word)
 
 if 3 <= n <= 12:
 yield word
 
! elif n > 2:
 # A long word.
 
--- 436,450 ----
 n = _len(word)
 
+ # XXX How big should "a word" be?
+ # XXX I expect 12 is fine -- a test run boosting to 13 had no effect
+ # XXX on f-p rate, and did a little better or worse than 12 across
+ # XXX runs -- overall, no significant difference. It's only "common
+ # XXX sense" so far driving the exclusion of lengths 1 and 2.
+ 
+ # Make sure this range matches in tokenize().
 if 3 <= n <= 12:
 yield word
 
! elif n >= 3:
 # A long word.
 
***************
*** 555,558 ****
--- 562,566 ----
 for w in text.split():
 n = len(w)
+ # Make sure this range matches in tokenize_word().
 if 3 <= n <= 12:
 yield w

AltStyle によって変換されたページ (->オリジナル) /