9.0
top
← prev up next →

Unicode Break AlgorithmsπŸ”— i

Racket 8.7 added basic support for working with Unicode grapheme clusters, where multiple codepoints make up an entity that is rendered as a single character. This module expands that functionality, and adds word and sentence breaks from Unicode Annex #29, Text Segmentation. It does not attempt to provide language/locale specific algorithms.

The rules used are in accordance with Unicode 15.1, to match Racket 8.13.

1Grapheme BreaksπŸ”— i

Returns a sequence that produces a series of strings, one grapheme of the specified range of str per entry. It is undefined if start is not the initial index of a grapheme sequence.

Returns a list of the graphemes of the specified range of str. It is undefined if start is not the initial index of a grapheme sequence.

Same as string-split-graphemes , but returns immutable strings.

Returns a list of the starting indexes of each grapheme in the specified range of str. It is undefined if start is not the initial index of a grapheme sequence.

2Word BreaksπŸ”— i

ch:char?
Returns the Unicode word break property of the given character, which is one of the following symbols: ' ALetter, ' CR, ' Double_Quote, ' Extend ' ExtendNumLet, ' Format, ' Hebrew_Letter, ' Katakana, ' LF, ' MidLetter, ' MidNum, ' MidNumLet, ' Newline, ' Numeric, ' Other, ' Regional_Indicator, ' Single_Quote, ' WSegSpace or ' ZWJ.

Returns #t if a word break exists before the character at index i. There is always a break before start and end.

Returns the number of characters/codepoints in the string before the next Unicode word break starting from start and not going past end.

procedure

( in-words str
[ start
end
#:skip-blanks?skip-blanks?])(sequence/c string? )
str:string?
skip-blanks?:any/c=#f
Returns a sequence that produces a series of strings, one word of the specified range of str per entry. If #:skip-blanks? is true, "words" that consist only of white space are omitted.

procedure

[ start
end
#:skip-blanks?skip-blanks?])
(listofstring? )
str:string?
skip-blanks?:any/c=#f
Returns a list of the words in the specified range of str. If #:skip-blanks? is true, "words" that consist only of white space are omitted.

procedure

[ start
end
#:skip-blanks?skip-blanks?])
(listof(and/cstring? immutable? ))
str:string?
skip-blanks?:any/c=#f
Same as string-split-words , but returns immutable strings.

Returns a list of the indexes of each word break in the specified range of str. The implicit breaks at the beginning and end of the string are included.

3Sentence BreaksπŸ”— i

Return the Unicode sentence break property of the given character, which is one of the following symbols: ' ATerm, ' CR, ' Close, ' Extend, ' Format, ' LF, ' Lower, ' Numeric, ' OLetter, ' Other, ' SContinue, ' STerm, ' Sep, ' Sp or ' Upper.

Returns a sequence that produces a series of strings, one sentence in the specified range of str per entry. It is undefined if start is not the initial index of a sentence.

Returns a list of the sentences of the specified range of str. It is undefined if start is not the initial index of a sentence.

procedure

( string-split-sentencess/immutable str
[ start
end])
(listof(and/cstring? immutable? ))
str:string?
Same as string-split-sentencess, but returns immutable strings.

procedure

( string-sentence-indexesstr[startend])

str:string?
Returns a list of the indexes of the start of each sentence in the specified range of str. It is undefined if start is not the initial index of a sentence.

4Other functionsπŸ”— i

procedure

( char-east-asian-width-property ch)(or/c'N'Na'H'A'F'W)

ch:char?
Returns the Annex #11 East Asian Width property assigned to the given character.

top
← prev up next →

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /