ISO/ IEC JTC1/SC22/WG14 N895

	Document: WG14 N895
WG14,
 Japan has the following three items for the agenda of WG14 Kona meeting.
 Please discuss about them during the meeting.
 1) Rationale of 64-bit integer type
 Japan needs to confirm whether the result of discussion at London
 meeting is completely reflected at revised Rationale (especially
 topic for elimination of long long overhead).
 (Japan would like to know whether or not a draft written by Mr.Gwyn
 after London meeting (see attached email SC22WG14.7328) is going to
 be included into revised Rationale.)
2) Guarantee the behavior of extended conversion utilities even after
 change of LC_CTYPE at specific condition
 Japan wants to propose a correction about the behavior of the extended
 multibyte and wide-character conversion utilities in 7.24.6.
 Please discuss about this issue at WG14 meeting.
 Current : The behavior of the extended multibyte and wide-character
 conversion utilities in 7.24.6 after change of LC_CTYPE
 is undefined.
 Proposal: The behavior of the extended multibyte and wide-character
 conversion utilities in 7.24.6 after change of LC_CTYPE
 is guaranteed.
 Details:
 FDIS of C9X has the following specification:
> 7.24.6 Extended multibyte and wide-character conversion
> utilities
> [...]
> [#2] Most of the following functions -- those that are
> listed as ``restartable'', 7.24.6.3 and 7.24.6.4 -- take as
> a last argument a pointer to an object of type mbstate_t
> that is used to describe the current conversion state from a
> particular multibyte character sequence to a wide-character
> sequence (or the reverse) under the rules of a particular
> setting for the LC_CTYPE category of the current locale.
>
> [#3] The initial conversion state corresponds, for a
> conversion in either direction, to the beginning of a new
> multibyte character in the initial shift state. A zero-
> valued mbstate_t object is (at least) one way to describe an
> initial conversion state. A zero-valued mbstate_t object
> can be used to initiate conversion involving any multibyte
> character sequence, in any LC_CTYPE category setting. If an
> mbstate_t object has been altered by any of the functions
> described in this subclause, and is then used with a
> different multibyte character sequence, or in the other
> conversion direction, or with a different LC_CTYPE category
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> setting than on earlier function calls, the behavior is
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> undefined.290)
> ^^^^^^^^^
> 290Thus, a particular mbstate_t object can be used, for
> example, with both the mbrtowc and mbsrtowcs functions as
> long as they are used to step sequentially through the
> same multibyte character string.
 The above specification is not Japan's original intention as
 the author of the Multibyte Support Extension a.k.a Amendment1.
 Japan's original intention about the behavior of the extended
 multibyte and wide-character conversion utilities is described
 in the Rationale of Amendment 1:
 "Annex H9.2.3 Multiple encoding environment" in the Amendment 1:
> [...]
> The encoding rule information is effectively a part of the
> conversion state. Thus, the encoding information should be stored
> with the hidden mbstate_t object with the FILE object. (Some
> implementation may even choose to store the encoding rule as
> part of the value of an fpos_t object.) The conversion state
> just created when a file is opened is said to have *unbound*
> state because it has no relations to any of the encoding
> rules. Just after the first wide-character input/output operation,
> the conversion state is *bound* to the encoding rule which
> correspond to LC_CTYPE category of the current locale. The
> following is a summary of the relations between various objects,
> the shift state, and the encoding rules.
> fpos_t FILE
> shift state | included |included |
> encoding rule | maybe | included |
> changing LC_CTYPE
> (unbound) | no effect | affected |
> (bound) | no effect | no effect|
 "Annex H.13.1 Conversion state" in the Amendment 1:
> To handle multiple strings with a state-dependent encoding, the
> committee introduced the concept of conversion state. The
> conversion state determines the behavior of a conversion between
> multibyte and wide-character encodings. For conversion from
> multibyte to wide character, the conversion state stores
> information such as the position within the current multibyte
> character(as a sequence of characters or a wide-character
> accumulator). And for conversions in either direction, the
> conversion state stores the current shift state (if any) and
> possible the encoding rule.
> [...]
 Please consider the following program:
 setlocale(A);
 f1 = fopen("...", "r");
 wc = fgetwc(f1);
 setlocale(B);
 f2 = fopen("...", "w");
 while (wc != WEOF) {
 fputwc(wc, f2);
 wc = fgetrwc(f1);
 }
 According to the current C9X FDIS, the behavior of this program
 is undefined. However, from the viewpoint of the design goal
 of MSE as described in the rationale of Amendment1, it should be
 well defined. Namely, the conversion state in bound state
 should not be affected by changing LC_CTYPE in order to support
 the multiple strings with a state-dependent encoding in multiple
 encoding environment.
 Please discuss and consider this issue at WG14 meeting.
 If more detail is needed, please email to c.wg@nec.co.jp.
 More technical discussion is welcome.
 3) WG14 Tokyo meeting 2000/Apr
 ITSCJ(Information Technology Standards Commission of Japan)
 http://www.itscj.ipsj.or.jp/eg/index.html
 will host the WG14 meeting which will be held on 2000年04月10日/14.
 The meeting place is located in KIKAI-SHINKO-KAIKAN Building:
 3-5-8, Shiba-Koen, Minato-Ku, Tokyo 105-0011 Japan
 One room(Room #68, cap. 18 people, on the 6th floor) for WG14 meeting
 is already booked. (In Japan, 1st. floor is a ground floor.)
 Traffic information, hotel and other accommodation information and so on
 will be informed by ITSCJ or Makoto Noda to WG14 via email soon.
 Thank you,
 Makoto Noda, Chair of ITSCJ/SC22/C WG
 --------------------------> Date: Fri, 16 Jul 99 10:39:59 EDT> From: "Douglas A. Gwyn (IST)" <gwyn@arl.mil>> X-Sequence: SC22WG14@dkuug.dk 7328> X-Errors-To: SC22WG14-request@dkuug.dk> To: Randy Meyers <rmeyers@ix.netcom.com>> Cc: sc22wg14 <sc22wg14@dkuug.dk>> Subject: (c.wg 8975) (SC22WG14.7328) (SC22WG14.7294) Rationale for
elimination of long long overhead>> As a specific illustration of an implementation techgnique: the compiler> can emit an external reference to a symbol ".i64used" (for example) if> and only if the source code makes any use of a 64-bit integer type, and> the C run-time object library can contain two versions of the _doprint> module (which performs the actual work for the *printf family). The first> version of printf would define the symbol ".i64used" and contain support> for formatting 64-bit integers, while the second version would do neither.> If the two versions of _dorpnt and the *printf modules are ordered> properly in the library, then the linker would automatically include the> 64-bit supporting version to satisfy the reference to ".i64used", before> seeing any reference to _doprint from the *printf modules, and then the> *printf modules would use the already-defined 64-bit version of _doprint.> If there were no reference to ".i64used" (because the program made no> use of any 64-bit integer type), then the linker would skip the first
> _doprnt module, would include *printf modules, then would include the> non-64-bit _doprnt module in order to satisfy the referece to _doprnt> from the *printf modules.>