info coreutils

manpagez: man pages & more
info coreutils
Home | html | info | man
File: coreutils.info, Node: Character arrays, Next: Translating, Up: tr invocation
9.1.1 Specifying arrays of characters
-------------------------------------
The STRING1 and STRING2 operands are not regular expressions, even
though they may look similar. Instead, they merely represent arrays of
characters. As a GNU extension to POSIX, an empty string operand
represents an empty array of characters.
 The interpretation of STRING1 and STRING2 depends on locale. GNU
‘tr’ fully supports only safe single-byte locales, where each possible
input byte represents a single character. Unfortunately, this means GNU
‘tr’ will not handle commands like ‘tr ö Ł’ the way you might expect,
since (assuming a UTF-8 encoding) this is equivalent to ‘tr '303円266円'
'305円201円'’ and GNU ‘tr’ will simply transliterate all ‘303円’ bytes to
‘305円’ bytes, etc. POSIX does not clearly specify the behavior of ‘tr’
in locales where characters are represented by byte sequences instead of
by individual bytes, or where data might contain invalid bytes that are
encoding errors. To avoid problems in this area, you can run ‘tr’ in a
safe single-byte locale by using a shell command like ‘LC_ALL=C tr’
instead of plain ‘tr’.
 Although most characters simply represent themselves in STRING1 and
STRING2, the strings can contain shorthands listed below, for
convenience. Some shorthands can be used only in STRING1 or STRING2, as
noted below.
Backslash escapes
 The following backslash escape sequences are recognized:
 ‘\a’
 Bell (BEL, Control-G).
 ‘\b’
 Backspace (BS, Control-H).
 ‘\f’
 Form feed (FF, Control-L).
 ‘\n’
 Newline (LF, Control-J).
 ‘\r’
 Carriage return (CR, Control-M).
 ‘\t’
 Tab (HT, Control-I).
 ‘\v’
 Vertical tab (VT, Control-K).
 ‘\OOO’
 The eight-bit byte with the value given by OOO, which is the
 longest sequence of one to three octal digits following the
 backslash. For portability, OOO should represent a value that
 fits in eight bits. As a GNU extension to POSIX, if the value
 would not fit, then only the first two digits of OOO are used,
 e.g., ‘400円’ is equivalent to ‘0400円’ and represents a
 two-byte sequence.
 ‘\\’
 A backslash.
 It is an error if no character follows an unescaped backslash. As
 a GNU extension, a backslash followed by a character not listed
 above is interpreted as that character, removing any special
 significance; this can be used to escape the characters ‘[’ and ‘-’
 when they would otherwise be special.
Ranges
 The notation ‘M-N’ expands to the characters from M through N, in
 ascending order. M should not collate after N; if it does, an
 error results. As an example, ‘0-9’ is the same as ‘0123456789’.
 GNU ‘tr’ does not support the System V syntax that uses square
 brackets to enclose ranges. Translations specified in that format
 sometimes work as expected, since the brackets are often
 transliterated to themselves. However, they should be avoided
 because they sometimes behave unexpectedly. For example, ‘tr -d
 '[0-9]'’ deletes brackets as well as digits.
 Many historically common and even accepted uses of ranges are not
 fully portable. For example, on EBCDIC hosts using the ‘A-Z’ range
 will not do what most would expect because ‘A’ through ‘Z’ are not
 contiguous as they are in ASCII. One way to work around this is to
 use character classes (see below). Otherwise, it is most portable
 (and most ugly) to enumerate the members of the ranges.
Repeated characters
 The notation ‘[C*N]’ in STRING2 expands to N copies of character C.
 Thus, ‘[y*6]’ is the same as ‘yyyyyy’. The notation ‘[C*]’ in
 STRING2 expands to as many copies of C as are needed to make ARRAY2
 as long as ARRAY1. If N begins with ‘0’, it is interpreted in
 octal, otherwise in decimal. A zero-valued N is treated as if it
 were absent.
Character classes
 The notation ‘[:CLASS:]’ expands to all characters in the
 (predefined) class CLASS. When the ‘--delete’ (‘-d’) and
 ‘--squeeze-repeats’ (‘-s’) options are both given, any character
 class can be used in STRING2. Otherwise, only the character
 classes ‘lower’ and ‘upper’ are accepted in STRING2, and then only
 if the corresponding character class (‘upper’ and ‘lower’,
 respectively) is specified in the same relative position in
 STRING1. Doing this specifies case conversion. Except for case
 conversion, a class’s characters appear in no particular order.
 The class names are given below; an error results when an invalid
 class name is given.
 ‘alnum’
 Letters and digits.
 ‘alpha’
 Letters.
 ‘blank’
 Horizontal whitespace.
 ‘cntrl’
 Control characters.
 ‘digit’
 Digits.
 ‘graph’
 Printable characters, not including space.
 ‘lower’
 Lowercase letters.
 ‘print’
 Printable characters, including space.
 ‘punct’
 Punctuation characters.
 ‘space’
 Horizontal or vertical whitespace.
 ‘upper’
 Uppercase letters.
 ‘xdigit’
 Hexadecimal digits.
Equivalence classes
 The syntax ‘[=C=]’ expands to all characters equivalent to C, in no
 particular order. These equivalence classes are allowed in STRING2
 only when ‘--delete’ (‘-d’) and ‘--squeeze-repeats’ ‘-s’ are both
 given.
 Although equivalence classes are intended to support non-English
 alphabets, there seems to be no standard way to define them or
 determine their contents. Therefore, they are not fully
 implemented in GNU ‘tr’; each character’s equivalence class
 consists only of that character, which is of no particular use.
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.

AltStyle によって変換されたページ (->オリジナル) /