I'd like to be able to run
unicode-names 'abç'
and see the corresponding Unicode character names:
LATIN SMALL LETTER A
LATIN SMALL LETTER B
LATIN SMALL LETTER C WITH CEDILLA
Printing a string as a series of Unicode glyph names would be useful in several cases:
- Distinguish easily confused characters such as "i" and "í".
- Explain what a literal string actually contains (for example non-printable or unassigned, zero-width characters).
3 Answers 3
The uniutils package has the program uniname
:
$ printf %s '...—' |uniname
character byte UTF-32 encoded as glyph name
0 0 002026 E2 80 A6 ... HORIZONTAL ELLIPSIS
1 3 002014 E2 80 94 — EM DASH
-
1For minimal output with only the names, use these options:
echo -n ...— | uniname -bcegpu
l0b0– l0b02012年03月19日 07:53:50 +00:00Commented Mar 19, 2012 at 7:53 -
1@l0b0, UNIX-compliant
echo
implementations would output-n ...—<newline>
for that. Better asprintf %s '...—'
(using single quotes also avoids problems if the bytes of the encoding of those non-ASCII characters happen to be special to the shell)Stéphane Chazelas– Stéphane Chazelas2022年02月25日 13:25:55 +00:00Commented Feb 25, 2022 at 13:25
I don't know a good way to check this from bash
, but Python has a built-in Unicode database which you can use like in a script like this:
#!/usr/bin/env python
import sys, unicodedata
for ch in sys.stdin.read().decode('utf-8'):
try:
print unicodedata.name(ch)
except ValueError:
print 'codepoint ', ord(ch)
You can use this script like this (assuming you called it unicode-names
):
$ echo 'abc©áοπρσ' | unicode-names
LATIN SMALL LETTER A
LATIN SMALL LETTER B
LATIN SMALL LETTER C
COPYRIGHT SIGN
LATIN SMALL LETTER A WITH ACUTE
GREEK SMALL LETTER OMICRON
GREEK SMALL LETTER PI
GREEK SMALL LETTER RHO
GREEK SMALL LETTER SIGMA
codepoint 10
The database throws a ValueError
exception for any characters it doesn't know about, so we print their codepoints in decimal (these are unprintable characters, usually).
Caveat: the script assumes your terminal is UTF-8 encoded. If it isn't, you should change the argument of the decode()
method. Python supports a very wide selection of encodings, yours will definitely be in there.
-
1Better -- use
sys.getdefaultencoding()
.Chris Down– Chris Down2012年03月15日 21:38:43 +00:00Commented Mar 15, 2012 at 21:38
I once wrote this u
script for that:
#! /bin/sh -
exec perl -Mcharnames=full -Mopen=locale -lne '
printf "U+%04X %s\n", ord($_), charnames::viacode(ord($_)) for /./g' -- "$@"
Used as:
$ u <<< 'ę1ドル⁄2'
U+0119 LATIN SMALL LETTER E WITH OGONEK
U+00A3 POUND SIGN
U+00BD VULGAR FRACTION ONE HALF
I also have this openbox
(my window manager) key binding:
<keybind key="W-J">
<action name="Execute">
<command>sh -c "notify-send -- \"$(xclip -o | perl -Mcharnames=:full -C -lne 'printf \"U+%04X %s\n\", $_, charnames::viacode($_) for map ord, /\P{ascii}/g')\""</command>
</action>
</keybind>
Which upon pressing Windows+J sends a notification describing the non-ASCII characters in the primary X11 selection, which you might find useful.