matplotlib

Thread: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

Brought to you by: cjgohlke, dsdale, efiring, heeres, and 8 others

matplotlib-devel

[matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: <edi...@gm...> - 2006年06月22日 13:51:43

I finally solved the problem of automaticaly generating the dicts for
unicode <-> TeX conversion. This is the first step in enabling unicode
support in mathtext.
The STIX projects is usefull after all ;) They keep a nice table of
Unicode symbols at:
http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005年09月24日
Any comments about the script are appreciated :). Now I'll dig a bit
deeper into the font classes to fix them to suport unicode.
'''A script for seemlesly copying the data from the stix-tbl.ascii*
file to a set
of python dicts. Dicts are then pickled to coresponding files, for
later retrieval.
Currently used table file:
http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005年09月24日
'''
import pickle
table_filename = 'stix-tbl.ascii-2005年09月24日'
dict_names = ['uni2type1', 'type12uni', 'uni2tex', 'tex2uni']
dicts = {}
# initialize the dicts
for name in dict_names:
 dicts[name] = {}
for line in file(table_filename):
 if line[:2]==' 0':
 uni_num = eval("u'\\u"+line[2:6].strip().lower()+"'")
 type1_name = line[12:37].strip()
 tex_name = line[83:110].strip()
 if type1_name:
 dicts['uni2type1'][uni_num] = type1_name
 dicts['type12uni'][type1_name] = uni_num
 if tex_name:
 dicts['uni2tex'][uni_num] = tex_name
 dicts['tex2uni'][tex_name] = uni_num
for name in dict_names:
 pickle.dump(dicts[name], open(name + '.pcl','w'))
# An example
uni_char = u'\u00d7'
print dicts['uni2tex'][uni_char]
print dicts['uni2type1'][uni_char]
# Testing of results, testing; feel free to unquote
# _mathtext_data.py can be found in the matplolib dir
#~ from _mathtext_data import latex_to_bakoma
#~ supported = 0
#~ unsupported = 0
#~ for tex_symbol in latex_to_bakoma:
 #~ try:
 #~ print tex_symbol, dicts['tex2uni'][tex_symbol]
 #~ supported += 1
 #~ except KeyError:
 #~ unsupported += 1
 #~ pass
#~ print supported, unsupported

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: John H. <jdh...@ac...> - 2006年06月22日 14:43:32

>>>>> "Edin" =3D=3D Edin Salkovi=A7 <edi...@gm...> writes:
 Edin> I finally solved the problem of automaticaly generating the
 Edin> dicts for unicode <-> TeX conversion. This is the first step
 Edin> in enabling unicode support in mathtext.
Excellent.=20
 Edin> The STIX projects is usefull after all ;) They keep a nice
 Edin> table of Unicode symbols at:
 Edin> http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005年09月24日
 Edin> Any comments about the script are appreciated :). Now I'll
Since you asked :-)
I may not have mentioned this but the style conventions for mpl code
are=20
 functions : lower or lower_score_separated
 variables and attributes : lower or lowerUpper
 classes : Upper or MixedUpper
Also, I am not too fond of the dict of dicts -- why not use variable
names? Here is my version
 import pickle
 fname =3D 'stix-tbl.ascii-2005年09月24日'
 uni2type1 =3D dict()
 type12uni =3D dict()
 uni2tex =3D dict()
 tex2uni =3D dict()
 for line in file(fname):
 if line[:2]!=3D' 0': continue # using continue avoids unneccesary=
 indent
 uninum =3D line[2:6].strip().lower()
 type1name =3D line[12:37].strip()
 texname =3D line[83:110].strip()
 uninum =3D int(uninum, 16)
 if type1name:
 uni2type1[uninum] =3D type1name
 type12uni[type1name] =3D uninum
 if texname:
 uni2tex[uninum] =3D texname
 tex2uni[texname] =3D uninum
 pickle.dump((uni2type1, type12uni, uni2tex, tex2uni), file('unitex.pc=
l','w'))
 # An example
 unichar =3D int('00d7', 16)
 print uni2tex.get(unichar)
 print uni2type1.get(unichar)
Also, I am a little hesitant to use pickle files for the final
mapping. I suggest you write a script that generates the python code
contains the dictionaries you need (that is how much of _mathext_data
was generated.
Thanks,
JDH

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: <edi...@gm...> - 2006年06月23日 09:50:30

On 6/22/06, John Hunter <jdh...@ac...> wrote:
> Since you asked :-)
>
> I may not have mentioned this but the style conventions for mpl code
> are
>
> functions : lower or lower_score_separated
> variables and attributes : lower or lowerUpper
> classes : Upper or MixedUpper
OK
> Also, I am not too fond of the dict of dicts -- why not use variable
> names?
I used a dict of dicts because this allowed me to generate separate
picle files (for each one of the dicts in the top-level dict) and
anything else (see the final script) by their coresponding top-level
dict name. I thought it was better, for practical/speed reasons, to
have separate pickle files, for every dict.
> for line in file(fname):
> if line[:2]!=' 0': continue # using continue avoids unneccesary indent
Thanks for the tip!
> uninum = line[2:6].strip().lower()
> type1name = line[12:37].strip()
> texname = line[83:110].strip()
>
> uninum = int(uninum, 16)
I thought that the idea was to allow users to write unicode strings
directly in TeX (OK, this isn't much of an excuse :). That's why I
used the eval approach, to get the dict keys (or values) to be unicode
strings. I'm also aware that indexing by ints is faster, and that the
underlying FT2 functions work with ints... OK, I'm now convinced that
your approach is better :)
> pickle.dump((uni2type1, type12uni, uni2tex, tex2uni), file('unitex.pcl','w'))
>
> # An example
> unichar = int('00d7', 16)
> print uni2tex.get(unichar)
> print uni2type1.get(unichar)
>
> Also, I am a little hesitant to use pickle files for the final
> mapping. I suggest you write a script that generates the python code
> contains the dictionaries you need (that is how much of _mathext_data
> was generated.
The reason why I used pickle - from the Python docs:
=====
Strings can easily be written to and read from a file. Numbers take a
bit more effort, since the read() method only returns strings, which
will have to be passed to a function like int(), which takes a string
like '123' and returns its numeric value 123. However, when you want
to save more complex data types like lists, dictionaries, or class
instances, things get a lot more complicated.
Rather than have users be constantly writing and debugging code to
save complicated data types, Python provides a standard module called
pickle. This is an amazing module that can take almost any Python
object (even some forms of Python code!), and convert it to a string
representation; this process is called pickling. Reconstructing the
object from the string representation is called unpickling. Between
pickling and unpickling, the string representing the object may have
been stored in a file or data, or sent over a network connection to
some distant machine.
=====
So I thought that pickling was the obvious way to go. And, of course,
unpickling with cPickle is very fast. I also think that no human being
should change the automaticaly generated dicts. Rather, we should put
a separate python file (i.e. _mathtext_manual_data.py) where anybody
who wants to manually override the automaticaly generated values, or
add new (key, value) pairs can do so.
The idea:
_mathtext_manual_data.py:
=======
uni2text = {key1:value1, key2:value2}
tex2uni = {}
uni2type1 = {}
type12uni = {}
uni2tex.py:
=======
from cPickle import load
uni2tex = load(open('uni2tex.cpl'))
try:
 import _mathtext_manual_data
 uni2tex.update(_mathtext_manual_data.uni2tex)
except (TypeError, SyntaxError): # Just these exceptions should be raised
 raise
except: # All other exceptions should be silent
 pass
=====
Finally, I added lines for automatically generating pretty much
everything that can be automatically generated
stix-tbl2py.py
=======
'''A script for seemlesly copying the data from the stix-tbl.ascii*
file to a set
of python dicts. Dicts are then pickled to coresponding files, for
later retrieval.
Currently used table file:
http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005年09月24日
'''
import pickle
tablefilename = 'stix-tbl.ascii-2005年09月24日'
dictnames = ['uni2type1', 'type12uni', 'uni2tex', 'tex2uni']
dicts = {}
# initialize the dicts
for name in dictnames:
 dicts[name] = {}
for line in file(tablefilename):
 if line[:2]!=' 0': continue
 uninum = int(line[2:6].strip().lower(), 16)
 type1name = line[12:37].strip()
 texname = line[83:110].strip()
 if type1name:
 dicts['uni2type1'][uninum] = type1name
 dicts['type12uni'][type1name] = uninum
 if texname:
 dicts['uni2tex'][uninum] = texname
 dicts['tex2uni'][texname] = uninum
template = '''# Automatically generated file.
from cPickle import load
%(name)s = load(open('%(name)s.pcl'))
try:
 import _mathtext_manual_data
 %(name)s.update(_mathtext_manual_data.%(name)s)
except (TypeError, SyntaxError): # Just these exceptions should be raised
 raise
except: # All other exceptions should be silent
 pass
'''
# pickling the dicts to corresponding .pcl files
# automatically generating .py module files, used by importers
for name in dictnames:
 pickle.dump(dicts[name], open(name + '.pcl','w'))
 file(name + '.py','w').write(template%{'name':name})
# An example
from uni2tex import uni2tex
from uni2type1 import uni2type1
unichar = u'\u00d7'
uninum = ord(unichar)
print uni2tex[uninum]
print uni2type1[uninum]
Cheers,
Edin

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: John H. <jdh...@ac...> - 2006年06月23日 13:34:40

>>>>> "Edin" =3D=3D Edin Salkovi=A7 <edi...@gm...> writes:
 Edin> I thought that the idea was to allow users to write unicode
 Edin> strings directly in TeX (OK, this isn't much of an excuse
No, this is not the reason. Someone may want to do that one day so it
is good to keep the possibility in the back of your mind. The point
of this work is to decouple mathtext from the bakoma fonts. Right now
the mathtext data has a hard mapping from texnames->bakoma glyph info.
By setting up the encoding from texnames->unicode, then with a little
more work we can use any set of fonts that provide the unicode names.
Once we jettison bakoma, we can get nicer glyphs and kerning with a
decent set of fonts. Once we have that, we can work on the layout
algorithms.
JDH

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: John H. <jdh...@ac...> - 2006年06月23日 13:29:18

>>>>> "Edin" =3D=3D Edin Salkovi=A7 <edi...@gm...> writes:
 Edin> The reason why I used pickle - from the Python docs: =3D=3D=3D=3D=
=3D
I have had bad experiences in the past with pickle files created with
one version that don't load with another. I don't know if that is a
common problem or if others have experienced it, but it has made me
wary of them for mpl, where we work across platforms and python
versions. Maybe this concern is unfounded. I still do not understand
what the downside is of simply creating a dictionary in a python
module as we do with latex_to_bakoma.
JDH

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: Fernando P. <fpe...@gm...> - 2006年06月23日 13:43:46

On 6/23/06, John Hunter <jdh...@ac...> wrote:
> >>>>> "Edin" =3D=3D Edin Salkovi=A7 <edi...@gm...> writes:
> Edin> The reason why I used pickle - from the Python docs: =3D=3D=3D=
=3D=3D
>
> I have had bad experiences in the past with pickle files created with
> one version that don't load with another. I don't know if that is a
> common problem or if others have experienced it, but it has made me
> wary of them for mpl, where we work across platforms and python
> versions. Maybe this concern is unfounded. I still do not understand
> what the downside is of simply creating a dictionary in a python
> module as we do with latex_to_bakoma.
The most common way pickle breaks is when you pickle an instance and
later modify the class it belongs to such that some attribute
disappears or is renamed. Since pickling works by 'fully qualified
name', meaning that it only saves the name of the class and the
instance data, but it doesn't actually save the original class, in
this scenario the pickle can't be unpickled since there are
attributes that the new class doesn't have anymore.
If you are strictly pickling data in one of the builtin python types,
you are /probably/ OK, as I don't see python removing attributes from
dicts, and the builtin data types don't really have any special
instance attributes with much metadata that can change.
But it's still true that there's a window for problems with pickle
that simply isn't there with a pure auto-generated source module. And
the speed argument is, I think moot: when you import something, python
marshals the source into binary bytecode using something which I think
is quite similar to cPickle, and probably just as fast (if not faster,
since marshal is simpler than pickle). I'm not 100% sure on the
details of bytecode marshalling, so please correct me if this part is
wrong.
HTH,
f

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: <edi...@gm...> - 2006年06月23日 21:28:12

VGhhbmtzIEpvaG4gYW5kIEZlcm5hbmRvLAoKWW91J3JlIHJpZ2h0LiBJJ2xsIGNoYW5nZSB0aGUg
c2NyaXB0cyB0byBnZW5lcmF0ZSBwdXJlIFB5dGhvbiBtb2R1bGVzLApidXQgSSdsbCBsZWF2ZSB0
aGUgIm1hbnVhbCIgbW9kdWxlLgoKQXMgZm9yIFVuaWNvZGUsIEkgZnVsbHkgdW5kZXJzdGFuZCB3
aGF0IHlvdSBtZWFuIEpvaG4sIGFuZCBJJ20gcGxhbmluZwp0byB0cnkgdG8gZ2V0IG1hdGh0ZXh0
IHRvIHdvcmsgd2l0aCB0aGUgZm9udHMgSSBtZW50aW9uZWQgdG8geW91IGEKd2hpbGUgYWdvOgpo
dHRwOi8vY2Fub3B1cy5pYWNwLmR2by5ydS9+cGFub3YvY20tdW5pY29kZS8KCmFsdGhvdWdoIHRo
ZXkgZG9uJ3QgaGF2ZSBhbG1vc3QgYW55IHB1cmUgbWF0aCBjaGFyYWN0ZXJzIChsaWtlCmludGVn
cmFsIGV0Yy4pLCBidXQgYXQgbGVhc3QgdGhleSdsbCBiZSB1c2VmdWxsIGZvciB0ZXN0aW5nIHRo
ZQptb2R1bGUuIFRoZXkgaGF2ZSBzb21lIHZlcnkgZXhvdGljIGNoYXJhY3RlcnMuIFRoZSBtYWlu
dGFpbmVyIHNhaWQKdGhhdCwgaWYgSSAob3IgYW55Ym9keSkgd2FudCB0bywgSSBjYW4gc2VuZCBo
aW0gcGF0Y2hlcyBmb3IgdGhlIG1hdGgKc3ltYm9scyAobm90IGZvciB0aGlzIFNvQyA6KS4KCkVk
aW4KCk9uIDYvMjMvMDYsIEZlcm5hbmRvIFBlcmV6IDxmcGVyZXoubmV0QGdtYWlsLmNvbT4gd3Jv
dGU6Cj4gT24gNi8yMy8wNiwgSm9obiBIdW50ZXIgPGpkaHVudGVyQGFjZS5ic2QudWNoaWNhZ28u
ZWR1PiB3cm90ZToKPiA+ID4+Pj4+ICJFZGluIiA9PSBFZGluIFNhbGtvdmnCpyA8ZWRpbi5zYWxr
b3ZpY0BnbWFpbC5jb20+IHdyaXRlczoKPiA+ICAgICBFZGluPiBUaGUgcmVhc29uIHdoeSBJIHVz
ZWQgcGlja2xlIC0gZnJvbSB0aGUgUHl0aG9uIGRvY3M6ID09PT09Cj4gPgo+ID4gSSBoYXZlIGhh
ZCBiYWQgZXhwZXJpZW5jZXMgaW4gdGhlIHBhc3Qgd2l0aCBwaWNrbGUgZmlsZXMgY3JlYXRlZCB3
aXRoCj4gPiBvbmUgdmVyc2lvbiB0aGF0IGRvbid0IGxvYWQgd2l0aCBhbm90aGVyLiAgSSBkb24n
dCBrbm93IGlmIHRoYXQgaXMgYQo+ID4gY29tbW9uIHByb2JsZW0gb3IgaWYgb3RoZXJzIGhhdmUg
ZXhwZXJpZW5jZWQgaXQsIGJ1dCBpdCBoYXMgbWFkZSBtZQo+ID4gd2FyeSBvZiB0aGVtIGZvciBt
cGwsIHdoZXJlIHdlIHdvcmsgYWNyb3NzIHBsYXRmb3JtcyBhbmQgcHl0aG9uCj4gPiB2ZXJzaW9u
cy4gIE1heWJlIHRoaXMgY29uY2VybiBpcyB1bmZvdW5kZWQuICBJIHN0aWxsIGRvIG5vdCB1bmRl
cnN0YW5kCj4gPiB3aGF0IHRoZSBkb3duc2lkZSBpcyBvZiBzaW1wbHkgY3JlYXRpbmcgYSBkaWN0
aW9uYXJ5IGluIGEgcHl0aG9uCj4gPiBtb2R1bGUgYXMgd2UgZG8gd2l0aCBsYXRleF90b19iYWtv
bWEuCj4KPiBUaGUgbW9zdCBjb21tb24gd2F5IHBpY2tsZSBicmVha3MgaXMgd2hlbiB5b3UgcGlj
a2xlIGFuIGluc3RhbmNlIGFuZAo+IGxhdGVyIG1vZGlmeSB0aGUgY2xhc3MgaXQgYmVsb25ncyB0
byBzdWNoIHRoYXQgc29tZSBhdHRyaWJ1dGUKPiBkaXNhcHBlYXJzIG9yIGlzIHJlbmFtZWQuICBT
aW5jZSBwaWNrbGluZyB3b3JrcyBieSAgJ2Z1bGx5IHF1YWxpZmllZAo+IG5hbWUnLCBtZWFuaW5n
IHRoYXQgaXQgb25seSBzYXZlcyB0aGUgbmFtZSBvZiB0aGUgY2xhc3MgYW5kIHRoZQo+IGluc3Rh
bmNlIGRhdGEsIGJ1dCBpdCBkb2Vzbid0IGFjdHVhbGx5IHNhdmUgdGhlIG9yaWdpbmFsIGNsYXNz
LCBpbgo+IHRoaXMgc2NlbmFyaW8gdGhlIHBpY2tsZSBjYW4ndCBiZSB1bnBpY2tsZWQgc2luY2Ug
IHRoZXJlIGFyZQo+IGF0dHJpYnV0ZXMgdGhhdCB0aGUgbmV3IGNsYXNzIGRvZXNuJ3QgaGF2ZSBh
bnltb3JlLgo+Cj4gSWYgeW91IGFyZSBzdHJpY3RseSBwaWNrbGluZyBkYXRhIGluIG9uZSBvZiB0
aGUgYnVpbHRpbiBweXRob24gdHlwZXMsCj4geW91IGFyZSAvcHJvYmFibHkvIE9LLCBhcyBJIGRv
bid0IHNlZSBweXRob24gcmVtb3ZpbmcgYXR0cmlidXRlcyBmcm9tCj4gZGljdHMsIGFuZCB0aGUg
YnVpbHRpbiBkYXRhIHR5cGVzIGRvbid0IHJlYWxseSBoYXZlIGFueSBzcGVjaWFsCj4gaW5zdGFu
Y2UgYXR0cmlidXRlcyB3aXRoIG11Y2ggbWV0YWRhdGEgdGhhdCBjYW4gY2hhbmdlLgo+Cj4gQnV0
IGl0J3Mgc3RpbGwgdHJ1ZSB0aGF0IHRoZXJlJ3MgYSB3aW5kb3cgZm9yIHByb2JsZW1zIHdpdGgg
cGlja2xlCj4gdGhhdCBzaW1wbHkgaXNuJ3QgdGhlcmUgd2l0aCBhIHB1cmUgYXV0by1nZW5lcmF0
ZWQgc291cmNlIG1vZHVsZS4gIEFuZAo+IHRoZSBzcGVlZCBhcmd1bWVudCBpcywgSSB0aGluayBt
b290OiB3aGVuIHlvdSBpbXBvcnQgc29tZXRoaW5nLCBweXRob24KPiBtYXJzaGFscyB0aGUgc291
cmNlIGludG8gYmluYXJ5IGJ5dGVjb2RlIHVzaW5nIHNvbWV0aGluZyB3aGljaCBJIHRoaW5rCj4g
aXMgcXVpdGUgc2ltaWxhciB0byBjUGlja2xlLCBhbmQgcHJvYmFibHkganVzdCBhcyBmYXN0IChp
ZiBub3QgZmFzdGVyLAo+IHNpbmNlIG1hcnNoYWwgaXMgc2ltcGxlciB0aGFuIHBpY2tsZSkuICBJ
J20gbm90IDEwMCUgc3VyZSBvbiB0aGUKPiBkZXRhaWxzIG9mIGJ5dGVjb2RlIG1hcnNoYWxsaW5n
LCBzbyBwbGVhc2UgY29ycmVjdCBtZSBpZiB0aGlzIHBhcnQgaXMKPiB3cm9uZy4KPgo+IEhUSCwK
Pgo+IGYKPgo=

Re: [matplotlib-devel] Unicode to Tex symbols, Type1 names, and vice versa

From: <edi...@gm...> - 2006年06月24日 00:12:14

Look what happened to my beautiful code :(
'''A script for seemlesly copying the data from the stix-tbl.ascii*
file to a set
of python dicts. Dicts are then saved to .py coresponding files, for
later retrieval.
Currently used table file:
http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005年09月24日
'''
tablefilename = 'stix-tbl.ascii-2005年09月24日'
dictnames = ['uni2type1', 'type12uni', 'uni2tex', 'tex2uni']
dicts = {}
# initialize the dicts
for name in dictnames:
 dicts[name] = {}
for line in file(tablefilename):
 if line[:2]!=' 0': continue
 uninum = int(line[2:6].strip().lower(), 16)
 type1name = line[12:37].strip()
 texname = line[83:110].strip()
 if type1name:
 dicts['uni2type1'][uninum] = type1name
 dicts['type12uni'][type1name] = uninum
 if texname:
 dicts['uni2tex'][uninum] = texname
 dicts['tex2uni'][texname] = uninum
template = '''# Automatically generated file.
# Don't edit this file. Edit _mathtext_manual_data.py instead
%(name)s = {%(pairs)s
}
try:
 from _mathtext_manual_data import _%(name)s
 %(name)s.update(_%(name)s)
except (TypeError, SyntaxError): # Just these exceptions should be raised
 raise
except: # All other exceptions should be silent. Even ImportError
 pass
'''
# automatically generating .py module files, used by importers
for name in ('uni2type1', 'uni2tex'):
 pairs = ''
 for key, value in dicts[name].items():
 value = value.replace("'","\\'")
 value = value.replace('"','\\"')
 pair = "%(key)i : r'%(value)s',\n"%(locals())
 pairs += pair
 file(name + '.py','w').write(template%{'name':name, 'pairs':pairs})
for name in ('type12uni', 'tex2uni'):
 pairs = ''
 for key, value in dicts[name].items():
 key = key.replace("'","\\'")
 key = key.replace('"','\\"')
 pair = "r'%(key)s' : %(value)i,\n"%(locals())
 pairs += pair
 file(name + '.py','w').write(template%{'name':name, 'pairs':pairs})
# An example
from uni2tex import uni2tex
from uni2type1 import uni2type1
unichar = u'\u00d7'
uninum = ord(unichar)
print uni2tex[uninum]
print uni2type1[uninum]

Thanks for helping keep SourceForge clean.