[Python-checkins] CVS: python/dist/src/Misc unicode.txt,3.5,3.6

2000年4月13日 10:12:41 -0400

Update of /projects/cvsroot/python/dist/src/Misc
In directory seahag.cnri.reston.va.us:/home/fdrake/projects/python/Misc
Modified Files:
	unicode.txt 
Log Message:
M.-A. Lemburg <mal@lemburg.com>:
Updated to version 1.4.
Index: unicode.txt
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Misc/unicode.txt,v
retrieving revision 3.5
retrieving revision 3.6
diff -C2 -r3.5 -r3.6
*** unicode.txt	2000年04月10日 19:45:09	3.5
--- unicode.txt	2000年04月13日 14:12:38	3.6
***************
*** 1,4 ****
 =============================================================================
! Python Unicode Integration Proposal Version: 1.3
 -----------------------------------------------------------------------------

--- 1,4 ----
 =============================================================================
! Python Unicode Integration Proposal Version: 1.4
 -----------------------------------------------------------------------------

***************
*** 163,166 ****
--- 163,177 ----
 as their UTF-8 equivalent strings.

+ When compared using cmp() (or PyObject_Compare()) the implementation
+ should mask TypeErrors raised during the conversion to remain in synch
+ with the string behavior. All other errors such as ValueErrors raised
+ during coercion of strings to Unicode should not be masked and passed
+ through to the user.
+ 
+ In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
+ should be coerced to Unicode before applying the test. Errors occuring
+ during coercion (e.g. None in u'abc') should not be masked.
+ 
+ 
 Coercion:
 ---------
***************
*** 381,384 ****
--- 392,402 ----
 self.stream.write(data)

+ def writelines(self, list):
+ 
+ """ Writes the concatenated list of strings to the stream
+ using .write().
+ """
+ self.write(''.join(list))
+ 
 def reset(self):

***************
*** 464,467 ****
--- 482,526 ----
 return object

+ def readline(self, size=None):
+ 
+ """ Read one line from the input stream and return the
+ decoded data.
+ 
+ Note: Unlike the .readlines() method, this method inherits
+ the line breaking knowledge from the underlying stream's
+ .readline() method -- there is currently no support for
+ line breaking using the codec decoder due to lack of line
+ buffering. Sublcasses should however, if possible, try to
+ implement this method using their own knowledge of line
+ breaking.
+ 
+ size, if given, is passed as size argument to the stream's
+ .readline() method.
+ 
+ """
+ if size is None:
+ line = self.stream.readline()
+ else:
+ line = self.stream.readline(size)
+ return self.decode(line)[0]
+ 
+ def readlines(self, sizehint=0):
+ 
+ """ Read all lines available on the input stream
+ and return them as list of lines.
+ 
+ Line breaks are implemented using the codec's decoder
+ method and are included in the list entries.
+ 
+ sizehint, if given, is passed as size argument to the
+ stream's .read() method.
+ 
+ """
+ if sizehint is None:
+ data = self.stream.read()
+ else:
+ data = self.stream.read(sizehint)
+ return self.decode(data)[0].splitlines(1)
+ 
 def reset(self):

***************
*** 483,489 ****
 return getattr(self.stream,name)

- XXX What about .readline(), .readlines() ? These could be implemented
- using .read() as generic functions instead of requiring their
- implementation by all codecs. Also see Line Breaks.

 Stream codec implementors are free to combine the StreamWriter and
--- 542,545 ----
***************
*** 693,699 ****
 effect:

! '%s': '%s' does str(u) for Unicode objects embedded
! in Python strings, so the output will be
! u.encode(<default encoding>)

 In case the format string is an Unicode object, all parameters are coerced
--- 749,756 ----
 effect:

! '%s': For Unicode objects this will cause coercion of the
! 			whole format string to Unicode. Note that
! 			you should use a Unicode format string to start
! 			with for performance reasons.

 In case the format string is an Unicode object, all parameters are coerced
***************
*** 923,926 ****
--- 980,986 ----
 	http://www-4.ibm.com/software/developer/library/internationalization-support.html

+ IANA Character Set Names:
+ 	ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+ 
 Encodings:

***************
*** 945,948 ****
--- 1005,1014 ----
 History of this Proposal:
 -------------------------
+ 1.4: Added note about mixed type comparisons and contains tests.
+ Changed treating of Unicode objects in format strings (if used
+ with '%s' % u they will now cause the format string to be
+ coerced to Unicode, thus producing a Unicode object on return).
+ Added link to IANA charset names (thanks to Lars Marius Garshol).
+ Added new codec methods .readline(), .readlines() and .writelines().
 1.3: Added new "es" and "es#" parser markers
 1.2: Removed POD about codecs.open()