Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

UTF-8, UTF-16 and UTF-32 support #344

Open
Labels
topic: utilitiescontainers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
@awvwgk

Description

String support in stdlib is currently limited to ASCII, @wclodius2 brought up the issue of supporting UTF-8, UTF-16 and UTF-32 as well:

FWIW for a "string type" to supplant the intrinsic character I would make the internal representation an integer array so that it is straight forward to extend it to represent UCS/Unicode. The integer type could be either INT8 if a UTF-8 representation is desired, INT16 for a UTF-16 representation, or INT32 for UTF-32. I would expect the UTF-32 representation would be the most straight-forward to implement and best for East Asian ideographs, UTF-8 would be the most efficient for most European and Semetic languages, UTF-16 the most efficient for most of the rest of the world.

Originally posted by @wclodius2 in #334 (comment)

Implementing to_title will require more than ASCII. Allowing more than just ASCII will require access to the Unicode character database, https://unicode.org/ucd/. This database will also be required for to_upper, to_lower, and reverse if more than ASCII is involved. This database consists of several tens of megabytes of files, http://www.unicode.org/Public/UCD/latest/, and including it in the Standard Library will be controversial, but requiring users to download and install it on their own will also be controversial. FWIW I have a couple of modules to process the more important files in the database.

Originally posted by @wclodius2 in #335 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: utilitiescontainers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /