In some examples, I have been moving example functions and data into a module, so that they can be run from anywhere. Many other examples still rely on a relative path in the examples dir. Eg, I go to the gallery and download the source for the axes grid toolkit example simple_rgb.py, and try to run it from my desktop, I get the error "no module names demo_image". While I know how to get the data, a naive user will not. So in some examples I have been adopting the approach, eg in examples/pylab_examples/scatter_demo2.py import matplotlib datafile = matplotlib.get_example_data('goog.npy') These examples will run anywhere mpl is installed. Another approach would to write a version of get_example_data that checks locally for a datafile, and if it is not where you expect to be, attempt a urlretrieve as a temp file. The gallery is becoming the goto place for most users of the website, and I would like as many examples as possible to run after a simple download to the desktop . I am sensitive to packagers who may not want to ship large amounts of data w/ the main library, so we may want to minimize the amount we ship in mpl-data which matplotlib.get_example_data uses, but it may be a good idea to setup a new svn directory at the top level (mpl_data) and write a urllib enabled matplotlib.get_example_data that fetches it from the repo if it can't find it locally. JDH
On Fri, Jul 31, 2009 at 1:10 PM, John Hunter<jd...@gm...> wrote: > The gallery is becoming the goto place for most users of the website, > and I would like as many examples as possible to run after a simple > download to the desktop . I am sensitive to packagers who may not > want to ship large amounts of data w/ the main library, so we may want > to minimize the amount we ship in mpl-data which > matplotlib.get_example_data uses, but it may be a good idea to setup a > new svn directory at the top level (mpl_data) and write a urllib > enabled matplotlib.get_example_data that fetches it from the repo if > it can't find it locally. OK, I committed a first pass at this to HEAD. I created a new svn directory called mpl_data svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/mpl_data and a cbook.get_mpl_data function, as used in this example:: import matplotlib.cbook as cbook import matplotlib.pyplot as plt fname = cbook.get_mpl_data('lena.png', asfileobj=False) print 'fname', fname im = plt.imread(fname) plt.imshow(im) plt.show() The function will check ~/.matplotlib/mpl_data and fetch it using urllib from svn HEAD if it is not there, caching in the process. It would be nice to support an svn revision (w/o relying on svn) as I note in this comment in get_mpl_data: # TODO: how to handle stale data in the cache that has been # updated from svn -- is there a clean http way to get the current # revision number that will not leave us at the mercy of html # changes at sf? If others agree w/ the basic concept, we should port as many data requiring examples over, removing data from examples/data and lib/matplotlib/mpl-data/example as we go. This will result in smaller tarballs and binaries, and make the examples more portable. JDH
So, I just downloaded 0.99 rc1 and wanted to play with axesgrid examples and got the results you reported below in your example. I am in fact naive, and its not clear to me how to get around this problem of the demo_image module not being found. What is the solution? Thanks, Josh John Hunter-4 wrote: > > In some examples, I have been moving example functions and data into a > module, so that they can be run from anywhere. Many other examples > still rely on a relative path in the examples dir. Eg, I go to the > gallery and download the source for the axes grid toolkit example > simple_rgb.py, and try to run it from my desktop, I get the error "no > module names demo_image". While I know how to get the data, a naive > user will not. So in some examples I have been adopting the approach, > eg in examples/pylab_examples/scatter_demo2.py > > import matplotlib > datafile = matplotlib.get_example_data('goog.npy') > > These examples will run anywhere mpl is installed. Another approach > would to write a version of get_example_data that checks locally for a > datafile, and if it is not where you expect to be, attempt a > urlretrieve as a temp file. > > The gallery is becoming the goto place for most users of the website, > and I would like as many examples as possible to run after a simple > download to the desktop . I am sensitive to packagers who may not > want to ship large amounts of data w/ the main library, so we may want > to minimize the amount we ship in mpl-data which > matplotlib.get_example_data uses, but it may be a good idea to setup a > new svn directory at the top level (mpl_data) and write a urllib > enabled matplotlib.get_example_data that fetches it from the repo if > it can't find it locally. > > JDH > ----- Josh Hemann Statistical Advisor http://www.vni.com/ Visual Numerics jh...@vn... | P 720.407.4214 | F 720.407.4199 -- View this message in context: http://www.nabble.com/example-data-in-example-code-tp24760754p24811726.html Sent from the matplotlib - devel mailing list archive at Nabble.com.
On Tue, Aug 4, 2009 at 11:17 AM, Josh Hemann<jh...@vn...> wrote: > > So, I just downloaded 0.99 rc1 and wanted to play with axesgrid examples and > got the results you reported below in your example. I am in fact naive, and > its not clear to me how to get around this problem of the demo_image module > not being found. What is the solution? The solution is to get the examples directory and run it from there, where it will have the example data. Although I added support for having auto-fetched data in svn, we haven't ported the examples over to use it yet. If you have svn you can grab the examples dir with svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/matplotlib/examples mpl_examples and then run the examples in their directory, eg examples/axes_grid. Then they should be able to see their data. JDH
John Hunter <jd...@gm...> writes: > # TODO: how to handle stale data in the cache that has been > # updated from svn -- is there a clean http way to get the current > # revision number that will not leave us at the mercy of html > # changes at sf? The mod_dav_svn server sends an ETag header that happens to contain the revision number where the file was last modified, and a Last-Modified header that contains the date of that revision. The clean http way to make use of these is to make a conditional request - I hacked up a processor class for urllib2 that does this, and checked it in. -- Jouni K. Seppänen http://www.iki.fi/jks
On Tue, Aug 4, 2009 at 2:45 PM, Jouni K. Seppänen<jk...@ik...> wrote: > The mod_dav_svn server sends an ETag header that happens to contain the > revision number where the file was last modified, and a Last-Modified > header that contains the date of that revision. The clean http way to > make use of these is to make a conditional request - I hacked up a > processor class for urllib2 that does this, and checked it in. Wow, that is really clever and cool. Nicely done. I added mpl_data/testdata.csv which is easier to modify than lena.png to test the revision control and it worked beautifully (examples/misc/mpl_data_test.py) I didn't understand this part of the code: fn = rightmost while os.path.exists(self.in_cache_dir(fn)): fn = rightmost + '.' + str(random.randint(0,9999999)) when would there be a name clash that would require the randint appended? Also, how hard would it be to add support for a directory structure? I see you are getting the filename from the url as the last thing past the '/'. Is there any way to generalize this so a relative path could be supported in the svn repo and local cache dir? JDH
On Tue, Aug 4, 2009 at 2:45 PM, Jouni K. Seppänen<jk...@ik...> wrote: > John Hunter <jd...@gm...> writes: > >> # TODO: how to handle stale data in the cache that has been >> # updated from svn -- is there a clean http way to get the current >> # revision number that will not leave us at the mercy of html >> # changes at sf? > > The mod_dav_svn server sends an ETag header that happens to contain the > revision number where the file was last modified, and a Last-Modified > header that contains the date of that revision. The clean http way to > make use of these is to make a conditional request - I hacked up a > processor class for urllib2 that does this, and checked it in. Also, it would be preferable for the returned file object which supports the "seek" method. This is what cbook.to_filehandle checks for, and what mlab.csv2rec uses to rewind the file after doing a data introspection pass through to get the data types. Eg, >>> import matplotlib.mlab as mlab >>> import matplotlib.cbook as cbook >>> r = mlab.csv2rec( cbook.get_mpl_data('testdata.csv') ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/mlab.py", line 2108, in csv2rec fh = cbook.to_filehandle(fname) File "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/cbook.py", line 339, in to_filehandle raise ValueError('fname must be a string or file handle') ValueError: fname must be a string or file handle Perhaps we could return a plain file handle pointing to the cached data? JDH
On Wed, Aug 5, 2009 at 7:11 AM, John Hunter <jd...@gm...> wrote: > >>> import matplotlib.mlab as mlab > >>> import matplotlib.cbook as cbook > >>> r = mlab.csv2rec( cbook.get_mpl_data('testdata.csv') ) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/mlab.py", > line 2108, in csv2rec > fh = cbook.to_filehandle(fname) > File > "/Users/jdhunter/dev/lib/python2.6/site-packages/matplotlib/cbook.py", > line 339, in to_filehandle > raise ValueError('fname must be a string or file handle') > ValueError: fname must be a string or file handle > > Perhaps we could return a plain file handle pointing to the cached data? Another option is to use StringIO to create a new file-like object after read()-ing in all the data. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Wed, Aug 5, 2009 at 7:11 AM, John Hunter<jd...@gm...> wrote: > Perhaps we could return a plain file handle pointing to the cached data? OK, I've made a few changes to the code so Jouni you will probably want to review them * I renamed the svn repo and function to be "sample_data" rather than "mpl_data" to avoid confusion with lib/matplotlib/mpl-data. The svn repo, the examples and the cbook function have all been renamed. The repo is :: svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/sample_data and the examples are:: johnh@udesktop191:mpl> ls examples/misc/sam*.py examples/misc/sample_data_demo.py examples/misc/sample_data_test.py * I added support for nested subdirs, so you can now do, as in examples/misc/sample_data_test.py:: datafile = 'testdir/subdir/testsub.csv' fh = cbook.get_sample_data(datafile) * I commented out the random number appending, because I do not see the use case, but we can re-add it when you enlighten me :-) * I always return a file handle to the cached file, so seek works, and is exercised in examples/misc/sample_data_test.py It is probably worth doing a little more work to make the processor plus the "get_sample_data" function all part of one class, so other people can reuse it with other repos and other dirs. Eg, something like the following in cbook:: myserver = ViewVCCacheServer(mycachedir, myurlbase) get_sample_data = myserver.get_sample_data
John Hunter <jd...@gm...> writes: > * I commented out the random number appending, because I do not see > the use case, but we can re-add it when you enlighten me :-) I did that in case someone wanted to retrieve files from several different locations -- my version of the cache handler was not tied to any particular base URL. Since all cached files were in one flat directory, there was the danger of filename collisions. > * I added support for nested subdirs, so you can now do, as in > examples/misc/sample_data_test.py:: > > datafile = 'testdir/subdir/testsub.csv' > fh = cbook.get_sample_data(datafile) I think mirroring a directory structure is somewhat more complicated than caching a set of arbitrary URLs in a flat cache directory. For example, I think the remove_stale_files method will need to be changed to walk all subdirectories, and handling cases such as having a subdirectory named foo that is replaced by a file named foo could be complicated. One thing that's still missing is off-line usage: if the user does not have net connectivity at the moment but does have the file in the cache, it should not cause an error. Perhaps the base URL should be http://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/sample_data/ instead of http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/sample_data/ to avoid dependency on the viewvc service of SourceForge. -- Jouni K. Seppänen http://www.iki.fi/jks
On Sun, Aug 9, 2009 at 2:01 PM, Jouni K. Seppänen<jk...@ik...> wrote: > I think mirroring a directory structure is somewhat more complicated > than caching a set of arbitrary URLs in a flat cache directory. For > example, I think the remove_stale_files method will need to be changed > to walk all subdirectories, and handling cases such as having a > subdirectory named foo that is replaced by a file named foo could be > complicated. > > One thing that's still missing is off-line usage: if the user does not > have net connectivity at the moment but does have the file in the cache, > it should not cause an error. > > Perhaps the base URL should be > http://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/sample_data/ > instead of > http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/sample_data/ > to avoid dependency on the viewvc service of SourceForge. Would you like to take a crack at these fixes? I have scipy coming up and need to start getting my tutorial material together, so I am not going to have a lot of time for bug fixes, though I would be happy to get as many fixes and patches in next week and try to get one bugfix 0.99.1 out before scipy. JDH
John Hunter <jd...@gm...> writes: > Would you like to take a crack at these fixes? [...] I would be happy > to get as many fixes and patches in next week and try to get one > bugfix 0.99.1 out before scipy. I started fixing the issues. It's not complete yet, but the current state should be usable. However, this work is not on the 0.99 branch, only on the trunk. Handling the off-line use case is harder than I thought: at least on my laptop the urlopen call just blocks, so there needs to be a timeout mechanism. -- Jouni K. Seppänen http://www.iki.fi/jks