How to print Unicode glyph names for input string?

Question 1

I'd like to be able to run

unicode-names 'abç'

and see the corresponding Unicode character names:

LATIN SMALL LETTER A
LATIN SMALL LETTER B
LATIN SMALL LETTER C WITH CEDILLA

Printing a string as a series of Unicode glyph names would be useful in several cases:

Distinguish easily confused characters such as "i" and "í".
Explain what a literal string actually contains (for example non-printable or unassigned, zero-width characters).

Question 2

The uniutils package has the program uniname:

$ printf %s '...—' |uniname
character byte UTF-32 encoded as glyph name
 0 0 002026 E2 80 A6 ... HORIZONTAL ELLIPSIS
 1 3 002014 E2 80 94 — EM DASH

Question 3

For minimal output with only the names, use these options: echo -n ...— | uniname -bcegpu

Question 4

@l0b0, UNIX-compliant echo implementations would output -n ...—<newline> for that. Better as printf %s '...—' (using single quotes also avoids problems if the bytes of the encoding of those non-ASCII characters happen to be special to the shell)

Question 5

I don't know a good way to check this from bash, but Python has a built-in Unicode database which you can use like in a script like this:

#!/usr/bin/env python
import sys, unicodedata
for ch in sys.stdin.read().decode('utf-8'):
 try:
 print unicodedata.name(ch)
 except ValueError:
 print 'codepoint ', ord(ch)

You can use this script like this (assuming you called it unicode-names):

$ echo 'abc©áοπρσ' | unicode-names
LATIN SMALL LETTER A
LATIN SMALL LETTER B
LATIN SMALL LETTER C
COPYRIGHT SIGN
LATIN SMALL LETTER A WITH ACUTE
GREEK SMALL LETTER OMICRON
GREEK SMALL LETTER PI
GREEK SMALL LETTER RHO
GREEK SMALL LETTER SIGMA
codepoint 10

The database throws a ValueError exception for any characters it doesn't know about, so we print their codepoints in decimal (these are unprintable characters, usually).

Caveat: the script assumes your terminal is UTF-8 encoded. If it isn't, you should change the argument of the decode() method. Python supports a very wide selection of encodings, yours will definitely be in there.

Question 6

Better -- use sys.getdefaultencoding().

Question 7

I once wrote this u script for that:

#! /bin/sh -
exec perl -Mcharnames=full -Mopen=locale -lne '
 printf "U+%04X %s\n", ord($_), charnames::viacode(ord($_)) for /./g' -- "$@"

Used as:

$ u <<< 'ę1ドル⁄2'
U+0119 LATIN SMALL LETTER E WITH OGONEK
U+00A3 POUND SIGN
U+00BD VULGAR FRACTION ONE HALF

I also have this openbox (my window manager) key binding:

 <keybind key="W-J">
 <action name="Execute">
 <command>sh -c "notify-send -- \"$(xclip -o | perl -Mcharnames=:full -C -lne 'printf \"U+%04X %s\n\", $_, charnames::viacode($_) for map ord, /\P{ascii}/g')\""</command>
 </action>
 </keybind>

Which upon pressing Windows+J sends a notification describing the non-ASCII characters in the primary X11 selection, which you might find useful.

score 14 · Accepted Answer · 2012-03-15 20:15:47Z

14

The uniutils package has the program uniname:

$ printf %s '...—' |uniname
character byte UTF-32 encoded as glyph name
 0 0 002026 E2 80 A6 ... HORIZONTAL ELLIPSIS
 1 3 002014 E2 80 94 — EM DASH

Share

Improve this answer

edited Feb 25, 2022 at 13:26

Stéphane Chazelas's user avatar

Stéphane Chazelas

583k96 gold badges1.1k silver badges1.7k bronze badges

answered Mar 15, 2012 at 20:15

donothingsuccessfully's user avatar

donothingsuccessfully donothingsuccessfully

1,2078 silver badges12 bronze badges

2

1

For minimal output with only the names, use these options: echo -n ...— | uniname -bcegpu

l0b0
– l0b0

2012年03月19日 07:53:50 +00:00
Commented Mar 19, 2012 at 7:53
1

@l0b0, UNIX-compliant echo implementations would output -n ...—<newline> for that. Better as printf %s '...—' (using single quotes also avoids problems if the bytes of the encoding of those non-ASCII characters happen to be special to the shell)

Stéphane Chazelas
– Stéphane Chazelas

2022年02月25日 13:25:55 +00:00
Commented Feb 25, 2022 at 13:25

Add a comment |

Stack Exchange Network

How to print Unicode glyph names for input string?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How to print Unicode glyph names for input string?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions