[Python-checkins] r54786 - in python/trunk: Lib/encodings/utf_8_sig.py Lib/test/test_codecs.py Misc/NEWS

walter.doerwald python-checkins at python.org
Thu Apr 12 12:35:03 CEST 2007


Author: walter.doerwald
Date: Thu Apr 12 12:35:00 2007
New Revision: 54786
Modified:
 python/trunk/Lib/encodings/utf_8_sig.py
 python/trunk/Lib/test/test_codecs.py
 python/trunk/Misc/NEWS
Log:
Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
Modified: python/trunk/Lib/encodings/utf_8_sig.py
==============================================================================
--- python/trunk/Lib/encodings/utf_8_sig.py	(original)
+++ python/trunk/Lib/encodings/utf_8_sig.py	Thu Apr 12 12:35:00 2007
@@ -44,14 +44,19 @@
 self.first = True
 
 def _buffer_decode(self, input, errors, final):
- if self.first and codecs.BOM_UTF8.startswith(input): # might be a BOM
+ if self.first:
 if len(input) < 3:
- # not enough data to decide if this really is a BOM
- # => try again on the next call
- return (u"", 0)
- (output, consumed) = codecs.utf_8_decode(input[3:], errors, final)
- self.first = False
- return (output, consumed+3)
+ if codecs.BOM_UTF8.startswith(input):
+ # not enough data to decide if this really is a BOM
+ # => try again on the next call
+ return (u"", 0)
+ else:
+ self.first = None
+ else:
+ self.first = None
+ if input[:3] == codecs.BOM_UTF8:
+ (output, consumed) = codecs.utf_8_decode(input[3:], errors, final)
+ return (output, consumed+3)
 return codecs.utf_8_decode(input, errors, final)
 
 def reset(self):
Modified: python/trunk/Lib/test/test_codecs.py
==============================================================================
--- python/trunk/Lib/test/test_codecs.py	(original)
+++ python/trunk/Lib/test/test_codecs.py	Thu Apr 12 12:35:00 2007
@@ -429,6 +429,11 @@
 # SF bug #1601501: check that the codec works with a buffer
 unicode("\xef\xbb\xbf", "utf-8-sig")
 
+ def test_bom(self):
+ d = codecs.getincrementaldecoder("utf-8-sig")()
+ s = u"spam"
+ self.assertEqual(d.decode(s.encode("utf-8-sig")), s)
+
 class EscapeDecodeTest(unittest.TestCase):
 def test_empty(self):
 self.assertEquals(codecs.escape_decode(""), ("", 0))
Modified: python/trunk/Misc/NEWS
==============================================================================
--- python/trunk/Misc/NEWS	(original)
+++ python/trunk/Misc/NEWS	Thu Apr 12 12:35:00 2007
@@ -591,6 +591,8 @@
 
 - idle: Honor the "Cancel" action in the save dialog (Debian bug #299092).
 
+- Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
+ first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
 
 Extension Modules
 -----------------


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /