SourceForge logo
SourceForge logo
Menu

matplotlib-devel

From: Andrew S. <str...@as...> - 2007年12月16日 23:26:28
As a followup to the work on floats, I fixed rec2csv to deal with funky 
strings (strings with commas and quotes). I've checked this is as r4749, 
but I thought I'd announce here in case someone (John, in particular) 
was depending on some peculiar aspect of the old implementation. If 
there is something you depend on, can you update the unittests in 
unit/mlab_unit.py to check for the behavior you need?
Also, John, I didn't see mlab.FormatString being used anywhere, but I 
didn't want to change it, either. So I made FormatString2 and used it. 
But if there's no known use of FormatString, lets kill the original and 
more FormatString2 into its place.
-Andrew
From: John H. <jd...@gm...> - 2007年12月17日 04:10:28
On Dec 16, 2007 5:26 PM, Andrew Straw <str...@as...> wrote:
> As a followup to the work on floats, I fixed rec2csv to deal with funky
> strings (strings with commas and quotes). I've checked this is as r4749,
> but I thought I'd announce here in case someone (John, in particular)
> was depending on some peculiar aspect of the old implementation. If
> there is something you depend on, can you update the unittests in
> unit/mlab_unit.py to check for the behavior you need?
>
> Also, John, I didn't see mlab.FormatString being used anywhere, but I
> didn't want to change it, either. So I made FormatString2 and used it.
> But if there's no known use of FormatString, lets kill the original and
> more FormatString2 into its place.
I am happy to leave it as is, and fix it if I bump into anything.
There was a use case at work that made me quote all the strings with "
but I am not sure what it is right now so I will test with your
version tomorrow and see if I can find any problems.
There is one more thing I'd like to clean up in this code, and that is
the handling of date and datetime. The current implementation will
return datetime regardless of whether the string looks like a date or
a datetime. Thus it would fail a roundtrip tests if you had a date
field. I may make some changes tomorrow to support date or datetime.
What do you think of making checkrows=0 the default for csv2rec? It
is slower, but mostly guaranteed to do the right thing, whereas
checkrows=5 which is the current default can often fail if the first 5
rows match some type but later rows do not, and may lead to confusion
among new users.
JDH
From: Fernando P. <fpe...@gm...> - 2007年12月17日 06:44:33
On Dec 16, 2007 9:10 PM, John Hunter <jd...@gm...> wrote:
> On Dec 16, 2007 5:26 PM, Andrew Straw <str...@as...> wrote:
>
> > As a followup to the work on floats, I fixed rec2csv to deal with funky
> > strings (strings with commas and quotes). I've checked this is as r4749,
> > but I thought I'd announce here in case someone (John, in particular)
> > was depending on some peculiar aspect of the old implementation. If
> > there is something you depend on, can you update the unittests in
> > unit/mlab_unit.py to check for the behavior you need?
> >
> > Also, John, I didn't see mlab.FormatString being used anywhere, but I
> > didn't want to change it, either. So I made FormatString2 and used it.
> > But if there's no known use of FormatString, lets kill the original and
> > more FormatString2 into its place.
>
> I am happy to leave it as is, and fix it if I bump into anything.
> There was a use case at work that made me quote all the strings with "
> but I am not sure what it is right now so I will test with your
> version tomorrow and see if I can find any problems.
>
> There is one more thing I'd like to clean up in this code, and that is
> the handling of date and datetime. The current implementation will
> return datetime regardless of whether the string looks like a date or
> a datetime. Thus it would fail a roundtrip tests if you had a date
> field. I may make some changes tomorrow to support date or datetime.
I'm CCing Chris Burns, from Berkeley here, but it would be great if
you post a note of these changes on the numpy list. One of the things
to come out of the sprint was a lot of work on numpy i/o, and that
includes looking into integrating much of the MPL facilities. So it
would be great if you keep the numpy team posted, to make sure they
get your latest code.
Cheers.
>
> What do you think of making checkrows=0 the default for csv2rec? It
> is slower, but mostly guaranteed to do the right thing, whereas
> checkrows=5 which is the current default can often fail if the first 5
> rows match some type but later rows do not, and may lead to confusion
> among new users.
From: John H. <jd...@gm...> - 2007年12月17日 04:44:04
On Dec 16, 2007 5:26 PM, Andrew Straw <str...@as...> wrote:
> As a followup to the work on floats, I fixed rec2csv to deal with funky
> strings (strings with commas and quotes). I've checked this is as r4749,
> but I thought I'd announce here in case someone (John, in particular)
> was depending on some peculiar aspect of the old implementation. If
> there is something you depend on, can you update the unittests in
> unit/mlab_unit.py to check for the behavior you need?
OK, I found one problem with repr. When adding support for date vs
datetime to the csv2rec roundtrip, and adding this to the unit test, I
notice that repr is probably not what we want for datetime; str makes
more sense here:
In [1]: import datetime
In [2]: d = datetime.date.today()
In [3]: repr(d)
Out[3]: 'datetime.date(2007, 12, 16)'
In [4]: str(d)
Out[4]: '2007-12-16'
The latter is more natural in the CSV file, and the repr version is
not supported by dateutil, at least not the one we are shipping:
In [5]: import dateutil.parser
In [6]: dateutil.parser.parse(repr(d))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Users/jdhunter/python/svn/matplotlib/examples/<ipython console> in <module>()
...snip the rest of the traceback...
In [7]: dateutil.parser.parse(str(d))
Out[7]: datetime.datetime(2007, 12, 16, 0, 0)
So I changed FormatObject to use str, pending further discussion. At
least for my common use cases, the only obj types I have in my record
arrays are dates and datetimes, and I find this to be a pretty
compelling use case since it is the type least likely to be supported
by other persistence methods (tostring and pickle both fail or do not
behave as expected with datetimes in the recarray).
But there is an oddity in the parsing of milliseconds which is causing
the updated unit test to fail; the code below illustrates the problem:
In [3]: import dateutil.parser
In [4]: import datetime
In [5]: s = '2007-12-18 22:29:34.924122'
In [6]: dateutil.parser.parse(s)
Out[6]: datetime.datetime(2007, 12, 18, 22, 29, 34, 924121)
Thoughts?
JDH
From: Andrew S. <str...@as...> - 2007年12月18日 08:53:44
Thanks for looking at this. From the unittest that you checked in, it 
appears you've succeeded in getting dates and datetimes to work, modulo 
the apparent datetime precision buglet you mention below.
My understanding with repr() is that, in general, it should attempt to 
support lossless copying of data with the eval() function. It does 
appear that dateutils does this -- repr() produces a string that looks 
like what you'd use to call the constructor. (Numpy scalars don't follow 
this idiom, and it's doubtful they could change without breaking a lot 
of code.) I also agree that for a CSV file, something like the str() on 
date and datetime objects makes most sense. This is all just to say that 
the date and datetime stuff looks good to my eye, but you're using that 
stuff a lot more than me. Finally, from your example, it does look like 
there's a bug somewhere in the datetime code. Since I don't use them, I 
leave that to you...
Finally, as far as the checkrows -- I must admit that I wasn't even 
aware of its existence until you asked the question. (That's the problem 
with writing code that just works - no one notices it!) Now that you've 
posed the question, I suppose I'm in favor a default of 0 (which 
actually means infinity). I suppose no one in their right mind is going 
to use CSV files to store gigabytes of data, so parsing the whole thing 
for consistency seems like a worthwhile expenditure of cycles. (And if 
they're thinking about it, maybe the slowness with dissuade them, or at 
least cause them to read the docstrings! :)
John Hunter wrote:
> On Dec 16, 2007 5:26 PM, Andrew Straw <str...@as...> wrote:
>
> 
>> As a followup to the work on floats, I fixed rec2csv to deal with funky
>> strings (strings with commas and quotes). I've checked this is as r4749,
>> but I thought I'd announce here in case someone (John, in particular)
>> was depending on some peculiar aspect of the old implementation. If
>> there is something you depend on, can you update the unittests in
>> unit/mlab_unit.py to check for the behavior you need?
>> 
>
> OK, I found one problem with repr. When adding support for date vs
> datetime to the csv2rec roundtrip, and adding this to the unit test, I
> notice that repr is probably not what we want for datetime; str makes
> more sense here:
>
> In [1]: import datetime
>
> In [2]: d = datetime.date.today()
>
> In [3]: repr(d)
> Out[3]: 'datetime.date(2007, 12, 16)'
>
> In [4]: str(d)
> Out[4]: '2007-12-16'
>
> The latter is more natural in the CSV file, and the repr version is
> not supported by dateutil, at least not the one we are shipping:
>
> In [5]: import dateutil.parser
>
> In [6]: dateutil.parser.parse(repr(d))
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
>
> /Users/jdhunter/python/svn/matplotlib/examples/<ipython console> in <module>()
>
> ...snip the rest of the traceback...
>
> In [7]: dateutil.parser.parse(str(d))
> Out[7]: datetime.datetime(2007, 12, 16, 0, 0)
>
>
> So I changed FormatObject to use str, pending further discussion. At
> least for my common use cases, the only obj types I have in my record
> arrays are dates and datetimes, and I find this to be a pretty
> compelling use case since it is the type least likely to be supported
> by other persistence methods (tostring and pickle both fail or do not
> behave as expected with datetimes in the recarray).
>
> But there is an oddity in the parsing of milliseconds which is causing
> the updated unit test to fail; the code below illustrates the problem:
>
> In [3]: import dateutil.parser
>
> In [4]: import datetime
>
> In [5]: s = '2007-12-18 22:29:34.924122'
>
> In [6]: dateutil.parser.parse(s)
> Out[6]: datetime.datetime(2007, 12, 18, 22, 29, 34, 924121)
>
> Thoughts?
>
> JDH
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services
> for just about anything Open Source.
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
> 
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.
Thanks for helping keep SourceForge clean.
X





Briefly describe the problem (required):
Upload screenshot of ad (required):
Select a file, or drag & drop file here.
Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

AltStyle によって変換されたページ (->オリジナル) /