Unicode identifiers in Python?

Question 1

I want to build a Python function that calculates,

alt text

and would like to name my summation function Σ. In a similar fashion, would like to use Π for product, and so on. I was wondering if there was a way to name a python function in this fashion?

def Σ (..):
 ..
 ..

That is, does Python support unicode identifiers, and if so, could someone provide an example for it?

Thanks!

Original motivation for this was a piece of Clojure code I saw today that looks like,

(defn entropy [X]
 (* -1 (Σ [i X] (* (p i) (log (p i))))))

where Σ is a macro defined as,

(defmacro Σ
 ... )

and I thought that was pretty cool.

BTW, to address a couple of comments about readability - with a lot of stats/ML code for instance, being able to compose operations with symbols would be really helpful. (Especially for really complex integrals et al)

φ(z) = ∫(N(x|0,1,1), -∞, z)

vs

Phi(z) = integral(N(x|0,1,1), -inf, z)

or even just the lambda character for lambda()!

Question 2

Although not as cool, Python's summation function is pretty elegant: sum()

Question 3

Sounds like a horrible idea for ease of input (presumably $\sum$ wouldn't work, right?)

Question 4

Maybe you want to have a look at Fortress which allows Unicode and TeX style notation.

Question 5

"Sounds like a horrible idea for ease of input" — depends what keyboard shortcuts you’ve got, doesn’t it? Curly quotes, like the kind I used at the start of this comment, are a bit of a drag to type by default in Windows (I believe), but have decent shortcuts on the Mac. If you do a lot of mathy programming, you could configure shortcuts to make the typing easy.

Question 6

φ and φ are variants of the same symbol, so it makes sense to be the same identifier (specially when you're reading code out loud)

Question 7

(I think it’s pretty cool too, that might mean we’re geeks.)

You’re fine to do this with the code you have above in Python 3. (It works in my Python 3.1 interpreter at least.) See:

But in Python 2, identifiers can only be ASCII letters, numbers and underscores.

http://docs.python.org/reference/lexical_analysis.html#identifiers

Question 8

Is the Python 2 incompatability the reason for the following quote from the Tutorial: "don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code"? or is UTF-8 still unpreferable for international purposes in Python 3?

Question 9

It's worth pointing out that Python 3 does support Unicode identifiers, but only allows letter or number like symbols (see http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers for full details). That's why Σ works (remember that it's a Greek letter, not just a math symbol), but √ doesn't.

For anyone interested, I made a website that lists every Unicode character that is valid in a Python variable https://www.asmeurer.com/python-unicode-variable-names/ (be warned that there are quite a lot of them, over 100000 in fact)

Question 10

(this answer is meant to be a minor addendum not a complete answer)

The additional gotcha to unicode identifiers (which @mike-desimone mentions and I discovered quickly when I thought this was a cool thread and switched to a terminal to play with it), is the multiple versions of each glyph are not equivalent, with regards to how you get to each glyph on each platform. For example Σ (aka greek capital letter sigma, aka U+03A3, [can't find a direct mac input method]) is fine, but unfortunately ∑ (aka N-ary Summation, aka U+2211, aka opt/alt-w using Mac OS X) is not a valid identifier.

>>> Σ = 20
>>> Σ
20

but

>>> ∑ = 20
File "<input>", line 1
 ∑ = 20
 ^
SyntaxError: invalid character in identifier

Using Σ specifically (and probably unicode chars in general) as an identifier might generate some very hard to diagnose errors if you have multiple developers on multiple platforms contributing to your code, for example, debug this visually:

∑ looks very similar to Σ, depending on the typeface selected

The two glyphs are easier to differentiate on this page, but depending on the font used, this may not be the case.

Even the traceback isn't much clearer unless Σ is printed near the ∑

 File "~/Dev/play_python33/identifiers.py", line 12
 print(∑([2, 2, 2, 2, 2]))
 ^
SyntaxError: invalid character in identifier

Question 11

Another gotcha is that there are multiple glyphs that are equivalent. Define φ = 5, then φ is φ → True

Question 12

@endolith this is exactly what I discovered in horror today.

Question 13

According to is it bad, you can use some unicode characters, but not all: You are restricted to characters identified as letters.

>>> α = 3 
>>> Σ = sum 
>>> import math 
>>> √ = math.sqrt 
 File "<stdin>", line 1 
  √ = 3 
   ^ 
SyntaxError: invalid character in identifier

Besides: I think it is very cool to be able to use unicode as identifiers - and I wish, i could use all.

I use the neo keyboard layout, which gives me greek and math symbols on extra layers:

αβχδεφγψιθκλνοπφστ[&ωξυζ
∀⇐CΔ∃ΦΓΨ∫Λ⇔Σ∈QR∂⊂√∩Ξ

Question 14

Also, there are often distinct versions of characters that are also Greek letters. For example, the Greek capital sigma is U+03A3, while the math sigma is U+1D6BA, U+1D6F4, U+1D72E, U+1D768, or U+1D7A2 depending on styling. Similarly, Greek capital omega is U+03A9, math omegas start at U+1D6C0, and the Ohms symbol is U+2126.

Question 15

Another nice way to enter most symbols is the compose key, e.g. on Windows via WinCompose

Question 16

Python 2.x does not support unicode identifiers, and consequently does not support Σ as an identifier. Python 3.x does support unicode identifiers, although many people will get cross if they have to edit source files with, for example, identifiers A and Α (latin A and greek capital alpha.) Sigma is often readable enough, but still, not as readable as the word sigma, so why bother?

Question 17

I think readability of words versus symbols depends on context. When I’m reading something mathy, I find symbols (e.g. x + y) more readable than the wordy equivalents you’d get in, say, AppleScript (e.g. add x to y). Symbols are terser, and generally let you get by on shape recognition alone, which I think is easier on the brain than reading. I don’t do enough mathy stuff to have felt the need to add a sigma sign to my code though.

Question 18

That doesn't look any more readable with unicode identifiers to me.

Question 19

"That doesn't look any more readable with unicode identifiers to me." — It does look more similar to the equation posted at the top of the question though. If someone was used to reading equations like that, mightn’t they find the symbol-y Python code more readable too?

Question 20

@Paul: sure, readability is always subjective. The audience is important. Which is why you need to consider the audience more than your own preferences. It's easy if you're always going to be your own entire audience, of course, but frequently things that start out that way end up in a wider distribution, and with a wider set of contributors.

Question 21

One place where Unicode identifiers will be nice is in iPython Notebook, because you can have variable names that are named the same as the variables they represent. For example, the variable representing a chip's thermal impedance from junction to ambient is θJA, and constantly writing it as THETA_JA makes it harder for non-programmers to read the code.

Paul D. Waite 99.5k57 gold badges204 silver badges275 bronze badges · Accepted Answer · 2010-04-15 22:58:56Z

(I think it’s pretty cool too, that might mean we’re geeks.)

You’re fine to do this with the code you have above in Python 3. (It works in my Python 3.1 interpreter at least.) See:

But in Python 2, identifiers can only be ASCII letters, numbers and underscores.

http://docs.python.org/reference/lexical_analysis.html#identifiers

Is the Python 2 incompatability the reason for the following quote from the Tutorial: "don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code"? or is UTF-8 still unpreferable for international purposes in Python 3?

CollectivesTM on Stack Overflow

Unicode identifiers in Python?

5 Answers 5

1 Comment

Comments

2 Comments

2 Comments

12 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

1 Comment

Comments

2 Comments

2 Comments

12 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related