Python modifies unicode identifiers?

Asked 5 years, 9 months ago

Viewed 132 times

Python 3.8 supports using a limited set of non-ASCII Unicode characters in identifiers. So, it seems that it is valid to use Σ as a character in an identifier.

However, something is wrong...

Problem

def f(Σ):
 print(f'{Σ=}')
f(1)
f(Σ=2)
f(**{'Σ': 3})

The first two calls are fine, but the third fails:

Σ=1
Σ=2
Traceback (most recent call last):
 File "sigma.py", line 24, in <module>
 f(**{'Σ': 3})
TypeError: f() got an unexpected keyword argument 'Σ'

Analysis

Let's see what is actually going on:

def f2(**kw):
 for name, value in kw.items():
 print(f'{name}={value} {ord(name)=}')
f2(Σ=2)
f2(**{'Σ': 3})

It prints:

Σ=2 ord(name)=931
Σ=3 ord(name)=120506

I called it with Σ both times, but it was changed to the very similar simpler Σ in the first call.

It seems that an argument named Σ (U+1D6BA) is implicitly renamed to Σ (U+03A3), and in every call to the function, argument Σ is also implicitly renamed to Σ, except if it is passed as **kwargs.

The Questions

Is this a bug? It does not look like it is accidental. Is it documented? Is there a set of true characters and a list of alias characters available somewhere?

Improve this question

edited Mar 21, 2020 at 17:09

asked Mar 21, 2020 at 17:04

zvone's user avatar

zvone

19.5k5 gold badges53 silver badges85 bronze badges

1

Nice find. Filed a bug?

Patrick Artner
– Patrick Artner

2020年03月21日 17:13:48 +00:00
Commented Mar 21, 2020 at 17:13
8

The documentation about Identifiers and keywords states: "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.". That could be the reason.

Matthias
– Matthias

2020年03月21日 17:17:47 +00:00
Commented Mar 21, 2020 at 17:17
4

@Matthias Yeah that seems to be it. You can reproduce the behaviour without kwargs: Σ = 0; Σ -> 0. And just to confirm the normal form, unicodedata.normalize('NFKC', 'Σ') -> 'Σ'.

wjandrea
– wjandrea

2020年03月21日 17:28:20 +00:00
Commented Mar 21, 2020 at 17:28

Add a comment |

1 Answer 1

Sorted by: Reset to default

I think this happens because of the way Python handles characters.
If you set a variable using one of your provided sigma letters: Σ or Σ, you can also access it with the other one. Knowing that both these snippets work:

>>> Σ = 5
>>> Σ
5

>>> Σ = 5
>>> Σ
5

You can see in globals() it is assigned to Σ (ord: 931)
My guess is Python modifies the character before performing a variable lookup.
Similar discussion here, posted by me in github/wtfpython

Improve this answer

answered Feb 21, 2022 at 12:26

musava_ribica's user avatar

musava_ribica

4821 gold badge8 silver badges21 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Python modifies unicode identifiers?

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related