Message 61740 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	lemburg
Recipients	lemburg, orivej, pitrou
Date	2008年01月27日.16:03:53
SpamBayes Score	0.00015932498
Marked as misclassified	No
Message-id	<1201449836.03.0.994558797943.issue1943@psf.upfronthosting.co.za>

Content
I don't really see the connection with #1629305. An optimization that would be worth checking is hooking up the Py_UNICODE pointer to interned Unicode objects if the contents match (e.g. do a quick check based on the hash value and then a memcmp). That would save memory and the call to the pymalloc allocator. Another strategy could involve a priority queue style cache with the aim of identifying often used Unicode strings and then reusing them. This could also be enhanced using an offline approach: you first run an application with an instrumented Python interpreter to find the most often used strings and then pre-fill the cache or interned dictionary on the production Python interpreter at startup time. Coming from a completely different angle, you could also use the Py_UNICODE pointer to share slices of a larger data buffer. A Unicode sub-type could handle this case, keeping a PyObject* reference to the larger buffer, so that it doesn't get garbage collected before the Unicode slice. Regarding memory constrained environments: these should simply switch off all free lists and pymalloc. OTOH, even mobile phones come with gigabytes of RAM nowadays, so it's not really worth the trouble, IMHO.

Content

I don't really see the connection with #1629305. 
An optimization that would be worth checking is hooking up the
Py_UNICODE pointer to interned Unicode objects if the contents match
(e.g. do a quick check based on the hash value and then a memcmp). That
would save memory and the call to the pymalloc allocator.
Another strategy could involve a priority queue style cache with the aim
of identifying often used Unicode strings and then reusing them. 
This could also be enhanced using an offline approach: you first run an
application with an instrumented Python interpreter to find the most
often used strings and then pre-fill the cache or interned dictionary on
the production Python interpreter at startup time.
Coming from a completely different angle, you could also use the
Py_UNICODE pointer to share slices of a larger data buffer. A Unicode
sub-type could handle this case, keeping a PyObject* reference to the
larger buffer, so that it doesn't get garbage collected before the
Unicode slice.
Regarding memory constrained environments: these should simply switch
off all free lists and pymalloc. OTOH, even mobile phones come with
gigabytes of RAM nowadays, so it's not really worth the trouble, IMHO.

History
Date	User	Action	Args
2008年01月27日 16:03:56	lemburg	set	spambayes_score: 0.000159325 -> 0.00015932498 recipients: + lemburg, pitrou, orivej
2008年01月27日 16:03:56	lemburg	set	spambayes_score: 0.000159325 -> 0.000159325 messageid: <1201449836.03.0.994558797943.issue1943@psf.upfronthosting.co.za>
2008年01月27日 16:03:54	lemburg	link	issue1943 messages
2008年01月27日 16:03:54	lemburg	create

homepage