SourceForge logo
SourceForge logo
Menu

Re: [matplotlib-devel] Sample data: a proposal

From: Andrew S. <str...@as...> - 2010年09月12日 15:30:49
On 09/12/2010 07:10 AM, Jouni K. Seppänen wrote:
> A while ago there was a discussion [1] about how using the
> get_sample_data function in building the documentation is a problem for
> Debian packagers. Let me see if I understand the goals of
> get_sample_data correctly:
>
> * we want to enable users to run examples they find in the gallery
> without downloading extra files;
>
> * we don't want to package all the sample data with matplotlib, either
> because it is too large, or because it changes more often than we
> release new versions.
> 
* Also, we want to have the sample data not to be in the same version 
control repository as MPL proper so that when we download the MPL source 
code itself, we don't get the sample data. (This is one of the sticking 
points for a move to git.)
> Here's what I suggest:
>
> 1. Package the sample data in a separate zip file that users can
> download and expand in e.g. ~/.matplotlib/sample_data if they like.
> This file could be released more often than matplotlib, if needed.
> Debian can use this as one source file and package it as a separate
> deb file.
>
> 2. Make get_sample_data look first in the place where the zip file could
> have been expanded, and only if the required file is not found, try
> to obtain it from the web. Add an option to disable the network
> access. This is different from what we do now, because now
> get_sample_data always tries to check if there is a newer version
> available, which apparently doesn't work reliably on unconnected
> computers.
>
> 3. To make this work, agree that sample data files are immutable: if a
> new version is needed, it needs to have a new name (and thus the
> examples using it need to be updated). The files have not been
> changed a lot [2], so I don't think this is very much of a burden.
>
> What do you think?
>
> 
#1 and #2 seem reasonable to me.
I don't like #3 -- for the same reasons as we want to separate the rest 
of the sample data (smaller download, smaller repository, and separation 
of code and non-essential data), I think the test comparison images 
should be with the sample data. Having to deal with renames in the tests 
would be annoying. Two alternative ideas to handle for the versioning 
issue: A) Add a .py file in the main source repository with is a list of 
sample data filenames and checksums. If a sample data file doesn't 
exist, or its checksum is wrong, it can be downloaded. B) The source 
file could simply have the same data version number required and the 
sample data itself could be versioned.

View entire thread

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.
Thanks for helping keep SourceForge clean.
X





Briefly describe the problem (required):
Upload screenshot of ad (required):
Select a file, or drag & drop file here.
Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

AltStyle によって変換されたページ (->オリジナル) /