Message 139510 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vinay.sajip
Recipients	Arfrever, amaury.forgeotdarc, benjamin.peterson, djc, georg.brandl, grubert, pitrou, vinay.sajip
Date	2011年06月30日.19:12:50
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1309461169.39295.YahooMailRC@web25805.mail.ukl.yahoo.com>
In-reply-to	<1309458142.65.0.160576633248.issue12291@psf.upfronthosting.co.za>

Content
> Antoine Pitrou <pitrou@free.fr> added the comment: > > I think the question is: will the slowdown apply to module import, or only to >explicit uses of the marshal module? If the latter, then I think we can live >with it - we discourage using marshal as a general-purpose serialization scheme >anyway. Thanks for reviewing the patch. As to your point - agreed, and as the marshal code is completely broken now, something that works is an improvement, even if it's slower than optimal. If that turns out to be a problem in practice, it can be fixed. > As for the patch: > - why do the tests have to carry a large chunk of binary data? if some data is >needed, at least it would be nice if it were kept small Agreed, I just used the initial file where I had the problem - didn't know where the problem was, initially. Will look at reducing this. > - not sure either why it needs zlib and base64; just use the repr() of the >bytestring The original data was 53K, so just the repr of the bytestring of the raw data is around 150K. If I zip the data, it's about 4K, but the repr of that data is around 12K. The base64/zlibbed version is 5K. This will be less of an issue when the test data size is reduced. > - "assertEqual = self.assertEqual" is more of a nuisance than anything else; >you're making the code more complicated to read just to save a few keystrokes; >same for "assertIsInstance = self.assertIsInstance" I'm not sure how it's more complicated or harder to understand. I didn't do it just to save keystrokes when writing, it's also to avoid noise when reading. IMO this is a question of personal taste, or is this approach proscribed somewhere? I've certainly seen it in other Python code. > - can you add a comment next to the fields you're adding to the marshal C >struct? Yes, will do. > - if the "r_byte" macro isn't used anymore, it should be removed, not >commented out The commenting out is a temporary measure, the comment will be removed before commit. > - in r_string(), what happens if PyBytes_Check(data) is false? there should >probably be a TypeError of some sort There is a PyBytes_Check in marshal_load which raises a TypeError if it fails. The PyBytes_Check in r_string is only called in the path from marshal_load (because p->readable is only non-NULL in that path) - so for the failure to occur one would need an initial read to deliver bytes, and a subsequent read to deliver non-bytes. Still, I have no problem adding a TypeError raising in r_string if PyBytes_Check fails. > - in r_string() again, if data is NULL after the read() call, the exception >shouldn't be shadowed by an EOFError Good point, I aim to fix this by changing the lower condition to if (!PyErr_Occurred() && (read < n)) { ... } which should be sufficient - do you agree? > - why do you call read(1) at the beginning? it seems to make the code more >complicated for no useful purpose I gave the reasoning in the comment just above the read(1). I agree it makes it a little more complicated, but not onerously so - and the approach fails fast as well as allowing operation with any stream, not just random-seekable ones.

Content

> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> I think the question is: will the slowdown apply to module import, or only to 
>explicit uses of the marshal module? If the latter, then I think we can live 
>with it - we discourage using marshal as a general-purpose serialization scheme 
>anyway.
Thanks for reviewing the patch.
As to your point - agreed, and as the marshal code is completely broken now, 
something that works is an improvement, even if it's slower than optimal. If 
that turns out to be a problem in practice, it can be fixed.
> As for the patch:
> - why do the tests have to carry a large chunk of binary data? if some data is 
>needed, at least it would be nice if it were kept small
Agreed, I just used the initial file where I had the problem - didn't know where 
the problem was, initially. Will look at reducing this.
> - not sure either why it needs zlib and base64; just use the repr() of the 
>bytestring
The original data was 53K, so just the repr of the bytestring of the raw data is 
around 150K. If I zip the data, it's about 4K, but the repr of that data is 
around 12K. The base64/zlibbed version is 5K.
This will be less of an issue when the test data size is reduced.
> - "assertEqual = self.assertEqual" is more of a nuisance than anything else; 
>you're making the code more complicated to read just to save a few keystrokes; 
>same for "assertIsInstance = self.assertIsInstance"
I'm not sure how it's more complicated or harder to understand. I didn't do it 
just to save keystrokes when writing, it's also to avoid noise when reading. IMO 
this is a question of personal taste, or is this approach proscribed somewhere? 
I've certainly seen it in other Python code.
> - can you add a comment next to the fields you're adding to the marshal C 
>struct?
Yes, will do.
> - if the "r_byte" macro isn't used anymore, it should be removed, not 
>commented out
The commenting out is a temporary measure, the comment will be removed before 
commit.
> - in r_string(), what happens if PyBytes_Check(data) is false? there should 
>probably be a TypeError of some sort
There is a PyBytes_Check in marshal_load which raises a TypeError if it fails. 
The PyBytes_Check in r_string is only called in the path from marshal_load 
(because p->readable is only non-NULL in that path) - so for the failure to 
occur one would need an initial read to deliver bytes, and a subsequent read to 
deliver non-bytes. Still, I have no problem adding a TypeError raising in 
r_string if PyBytes_Check fails.
> - in r_string() again, if data is NULL after the read() call, the exception 
>shouldn't be shadowed by an EOFError
Good point, I aim to fix this by changing the lower condition to
if (!PyErr_Occurred() && (read < n)) { ... }
which should be sufficient - do you agree?
> - why do you call read(1) at the beginning? it seems to make the code more 
>complicated for no useful purpose
I gave the reasoning in the comment just above the read(1). I agree it makes it 
a little more complicated, but not onerously so - and the approach fails fast as 
well as allowing operation with any stream, not just random-seekable ones.

History
Date	User	Action	Args
2011年06月30日 19:12:52	vinay.sajip	set	recipients: + vinay.sajip, georg.brandl, amaury.forgeotdarc, grubert, pitrou, benjamin.peterson, djc, Arfrever
2011年06月30日 19:12:50	vinay.sajip	link	issue12291 messages
2011年06月30日 19:12:50	vinay.sajip	create

homepage