Unicode support

intconsole_read_unicode(s32*code)

read Unicode code point from console

Parameters

s32 *code

pointer to store Unicode code point

Return

0 = success

s32utf8_get(constchar**src)

get next UTF-8 code point from buffer

Parameters

const char **src

pointer to current byte, updated to point to next byte

Return

code point, or 0 for end of string, or -1 if no legal

code point is found. In case of an error src points to the incorrect byte.

intutf8_put(s32code, char**dst)

write UTF-8 code point to buffer

Parameters

s32 code

code point

char **dst

pointer to destination buffer, updated to next position

Return

-1 if the input parameters are invalid

size_tutf8_utf16_strnlen(constchar*src, size_tcount)

length of a truncated utf-8 string after conversion to utf-16

Parameters

const char *src

utf-8 string

size_t count

maximum number of code points to convert

Return

length in u16 after conversion to utf-16 without the

trailing 0. If an invalid UTF-8 sequence is hit one u16 will be reserved for a replacement character.

utf8_utf16_strlen

utf8_utf16_strlen (a)

length of a utf-8 string after conversion to utf-16

Parameters

a

utf-8 string

Return

length in u16 after conversion to utf-16 without the

trailing 0. If an invalid UTF-8 sequence is hit one u16 will be reserved for a replacement character.

intutf8_utf16_strncpy(u16**dst, constchar*src, size_tcount)

copy utf-8 string to utf-16 string

Parameters

u16 **dst

destination buffer

const char *src

source buffer

size_t count

maximum number of code points to copy

Return

-1 if the input parameters are invalid

utf8_utf16_strcpy

utf8_utf16_strcpy (d, s)

copy utf-8 string to utf-16 string

Parameters

d

destination buffer

s

source buffer

Return

-1 if the input parameters are invalid

s32utf16_get(constu16**src)

get next UTF-16 code point from buffer

Parameters

const u16 **src

pointer to current word, updated to point to next word

Return

code point, or 0 for end of string, or -1 if no legal

code point is found. In case of an error src points to the incorrect word.

intutf16_put(s32code, u16**dst)

write UTF-16 code point to buffer

Parameters

s32 code

code point

u16 **dst

pointer to destination buffer, updated to next position

Return

-1 if the input parameters are invalid

size_tutf16_strnlen(constu16*src, size_tcount)

length of a truncated utf-16 string

Parameters

const u16 *src

utf-16 string

size_t count

maximum number of code points to convert

Return

length in code points. If an invalid UTF-16 sequence is

hit one position will be reserved for a replacement character.

size_tutf16_utf8_strnlen(constu16*src, size_tcount)

length of a truncated utf-16 string after conversion to utf-8

Parameters

const u16 *src

utf-16 string

size_t count

maximum number of code points to convert

Return

length in bytes after conversion to utf-8 without the

trailing 0. If an invalid UTF-16 sequence is hit one byte will be reserved for a replacement character.

utf16_utf8_strlen

utf16_utf8_strlen (a)

length of a utf-16 string after conversion to utf-8

Parameters

a

utf-16 string

Return

length in bytes after conversion to utf-8 without the

trailing 0. If an invalid UTF-16 sequence is hit one byte will be reserved for a replacement character.

intutf16_utf8_strncpy(char**dst, constu16*src, size_tcount)

copy utf-16 string to utf-8 string

Parameters

char **dst

destination buffer

const u16 *src

source buffer

size_t count

maximum number of code points to copy

Return

-1 if the input parameters are invalid

utf16_utf8_strcpy

utf16_utf8_strcpy (d, s)

copy utf-16 string to utf-8 string

Parameters

d

destination buffer

s

source buffer

Return

-1 if the input parameters are invalid

s32utf_to_lower(consts32code)

convert a Unicode letter to lower case

Parameters

const s32 code

letter to convert

Return

lower case letter or unchanged letter

s32utf_to_upper(consts32code)

convert a Unicode letter to upper case

Parameters

const s32 code

letter to convert

Return

upper case letter or unchanged letter

intu16_strcasecmp(constu16*s1, constu16*s2)

compare two u16 strings case insensitively

Parameters

const u16 *s1

first string to compare

const u16 *s2

second string to compare

Return

0 if the first n u16 are the same in s1 and s2

< 0 if the first different u16 in s1 is less than the corresponding u16 in s2 > 0 if the first different u16 in s1 is greater than the

intu16_strncmp(constu16*s1, constu16*s2, size_tn)

compare two u16 string

Parameters

const u16 *s1

first string to compare

const u16 *s2

second string to compare

size_t n

maximum number of u16 to compare

Return

0 if the first n u16 are the same in s1 and s2

< 0 if the first different u16 in s1 is less than the corresponding u16 in s2 > 0 if the first different u16 in s1 is greater than the corresponding u16 in s2

u16_strcmp

u16_strcmp (s1, s2)

compare two u16 string

Parameters

s1

first string to compare

s2

second string to compare

Return

0 if the first n u16 are the same in s1 and s2

< 0 if the first different u16 in s1 is less than the corresponding u16 in s2 > 0 if the first different u16 in s1 is greater than the corresponding u16 in s2

size_tu16_strsize(constvoid*in)

count size of u16 string in bytes including the null character

Parameters

const void *in

null terminated u16 string

Description

Counts the number of bytes occupied by a u16 string

Return

bytes in a u16 string

size_tu16_strnlen(constu16*in, size_tcount)

count non-zero words

Parameters

const u16 *in

null terminated u16 string

size_t count

maximum number of words to count

Description

This function matches wscnlen_s() if the -fshort-wchar compiler flag is set. In the EFI context we explicitly need a function handling u16 strings.

Return

number of non-zero words.

This is not the number of utf-16 letters!

size_tu16_strlen(constvoid*in)

count non-zero words

Parameters

const void *in

null terminated u16 string

Description

This function matches wsclen() if the -fshort-wchar compiler flag is set. In the EFI context we explicitly need a function handling u16 strings.

Return

number of non-zero words.

This is not the number of utf-16 letters!

u16*u16_strcpy(u16*dest, constu16*src)

copy u16 string

Parameters

u16 *dest

destination buffer

const u16 *src

source buffer (null terminated)

Description

Copy u16 string pointed to by src, including terminating null word, to the buffer pointed to by dest.

Return

‘dest’ address

u16*u16_strdup(constvoid*src)

duplicate u16 string

Parameters

const void *src

source buffer (null terminated)

Description

Copy u16 string pointed to by src, including terminating null word, to a newly allocated buffer.

Return

allocated new buffer on success, NULL on failure

size_tu16_strlcat(u16*dest, constu16*src, size_tcount)

Append a length-limited, NUL-terminated string to another

Parameters

u16 *dest

zero terminated u16 destination string

const u16 *src

zero terminated u16 source string

size_t count

size of buffer in u16 words including taling 0x0000

Description

Append the source string src to the destination string dest, overwriting null word at the end of dest adding a terminating null word.

Return

required size including trailing 0x0000 in u16 words

If return value >= count, truncation occurred.

uint8_t*utf16_to_utf8(uint8_t*dest, constuint16_t*src, size_tsize)

Convert an utf16 string to utf8

Parameters

uint8_t *dest

the destination buffer to write the utf8 characters

const uint16_t *src

the source utf16 string

size_t size

the number of utf16 characters to convert

Description

Converts ‘size’ characters of the utf16 string ‘src’ to utf8 written to the ‘dest’ buffer.

NOTE that a single utf16 character can generate up to 3 utf8 characters. See MAX_UTF8_PER_UTF16.

Return

the pointer to the first unwritten byte in ‘dest’

intutf_to_cp(s32*c, constu16*codepage)

translate Unicode code point to 8bit codepage

Parameters

s32 *c

pointer to Unicode code point to be translated

const u16 *codepage

Unicode to codepage translation table

Description

Codepoints that do not exist in the codepage are rendered as question mark.

Return

0 on success, -ENOENT if codepoint cannot be translated

intutf8_to_cp437_stream(u8c, char*buffer)

convert UTF-8 stream to codepage 437

Parameters

u8 c

next UTF-8 character to convert

char *buffer

buffer, at least 5 characters

Return

next codepage 437 character or 0

intutf8_to_utf32_stream(u8c, char*buffer)

convert UTF-8 byte stream to Unicode code points

Parameters

u8 c

next UTF-8 character to convert

char *buffer

buffer, at least 5 characters

Description

The function is called for each byte c in a UTF-8 stream. The byte is appended to the temporary storage buffer until the UTF-8 stream in buffer describes a Unicode code point.

When a new code point has been decoded it is returned and buffer[0] is set to ‘0’, otherwise the return value is 0.

The buffer must be at least 5 characters long. Before the first function invocation buffer[0] must be set to ‘0’."

Return

Unicode code point or 0