matplotlib

matplotlib-users Mailing List for matplotlib

Brought to you by: cjgohlke, dsdale, efiring, heeres, and 8 others

matplotlib-users — Discussion related to using matplotlib

You can subscribe to this list here.

2003	_Jan	_Feb	_Mar	_Apr	_May (3)	_Jun	_Jul	_Aug (12)	_Sep (12)	_Oct (56)	_Nov (65)	_Dec (37)
2004	_Jan (59)	_Feb (78)	_Mar (153)	_Apr (205)	_May (184)	_Jun (123)	_Jul (171)	_Aug (156)	_Sep (190)	_Oct (120)	_Nov (154)	_Dec (223)
2005	_Jan (184)	_Feb (267)	_Mar (214)	_Apr (286)	_May (320)	_Jun (299)	_Jul (348)	_Aug (283)	_Sep (355)	_Oct (293)	_Nov (232)	_Dec (203)
2006	_Jan (352)	_Feb (358)	_Mar (403)	_Apr (313)	_May (165)	_Jun (281)	_Jul (316)	_Aug (228)	_Sep (279)	_Oct (243)	_Nov (315)	_Dec (345)
2007	_Jan (260)	_Feb (323)	_Mar (340)	_Apr (319)	_May (290)	_Jun (296)	_Jul (221)	_Aug (292)	_Sep (242)	_Oct (248)	_Nov (242)	_Dec (332)
2008	_Jan (312)	_Feb (359)	_Mar (454)	_Apr (287)	_May (340)	_Jun (450)	_Jul (403)	_Aug (324)	_Sep (349)	_Oct (385)	_Nov (363)	_Dec (437)
2009	_Jan (500)	_Feb (301)	_Mar (409)	_Apr (486)	_May (545)	_Jun (391)	_Jul (518)	_Aug (497)	_Sep (492)	_Oct (429)	_Nov (357)	_Dec (310)
2010	_Jan (371)	_Feb (657)	_Mar (519)	_Apr (432)	_May (312)	_Jun (416)	_Jul (477)	_Aug (386)	_Sep (419)	_Oct (435)	_Nov (320)	_Dec (202)
2011	_Jan (321)	_Feb (413)	_Mar (299)	_Apr (215)	_May (284)	_Jun (203)	_Jul (207)	_Aug (314)	_Sep (321)	_Oct (259)	_Nov (347)	_Dec (209)
2012	_Jan (322)	_Feb (414)	_Mar (377)	_Apr (179)	_May (173)	_Jun (234)	_Jul (295)	_Aug (239)	_Sep (276)	_Oct (355)	_Nov (144)	_Dec (108)
2013	_Jan (170)	_Feb (89)	_Mar (204)	_Apr (133)	_May (142)	_Jun (89)	_Jul (160)	_Aug (180)	_Sep (69)	_Oct (136)	_Nov (83)	_Dec (32)
2014	_Jan (71)	_Feb (90)	_Mar (161)	_Apr (117)	_May (78)	_Jun (94)	_Jul (60)	_Aug (83)	_Sep (102)	_Oct (132)	_Nov (154)	_Dec (96)
2015	_Jan (45)	_Feb (138)	_Mar (176)	_Apr (132)	_May (119)	_Jun (124)	_Jul (77)	_Aug (31)	_Sep (34)	_Oct (22)	_Nov (23)	_Dec (9)
2016	_Jan (26)	_Feb (17)	_Mar (10)	_Apr (8)	_May (4)	_Jun (8)	_Jul (6)	_Aug (5)	_Sep (9)	_Oct (4)	_Nov	_Dec
2017	_Jan (5)	_Feb (7)	_Mar (1)	_Apr (5)	_May	_Jun (3)	_Jul (6)	_Aug (1)	_Sep	_Oct (2)	_Nov (1)	_Dec
2018	_Jan	_Feb	_Mar	_Apr (1)	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2020	_Jan	_Feb	_Mar	_Apr	_May (1)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2025	_Jan (1)	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

S	M	T	W	T	F	S
1 (4)	2 (3)	3 (3)	4 (5)	5 (1)	6	7 (5)
8 (1)	9 (3)	10 (3)	11 (15)	12 (10)	13 (2)	14
15	16 (2)	17 (3)	18	19	20 (3)	21 (1)
22 (5)	23 (5)	24	25 (3)	26	27 (1)	28
29

Flat | Threaded

Re: [Matplotlib-users] large data sets and performance

From: John H. <jdh...@ac...> - 2004年02月11日 23:51:49

>>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
 Perry> I like the sounds of this approach even more. But I wonder
 Perry> if it can be made somewhat more generic. This approach (if
 Perry> I read it correctly seems to need a backend function for
 Perry> each shape: perhaps only for circle?). What I was thinking
 Perry> was if there was a way to pass it the vectors or path for a
 Perry> symbol (for very often, many points will share the same
 Perry> shape, if not all the same x,y scale). 
Of course (slaps self on head). matplotlib 0.1 was designed around
gtk drawing which doesn't support paths. Although I've been mumbling
about adding paths for sometime (what with paint, ps, and agg), I'm
still thinking inside the box. A collection of paths is the natural
solution
 Perry> I suppose circle and other curved items could be handled
 Perry> with A bezier type call.
Agg special cases this one with a dedicated ellipse function. 
 ellipse(ex, y, width, height, numsteps)
It's still a path, but you have a dedicated function to build that
path up speedily.
One potential drawback: how do you bring along the other backends that
don't have path support? In the RectangleCollection approach, we can
always fall back on draw_rectangle. In the path collection, it's more
difficult.
 backend_gtk (pygtk) - no support for paths AFAIK
 backend_wx (wxpython) - no support for paths AFAIK; Jeremy?
 backend_ps - full path support
 backend_agg - ditto
 backend_gd - partial, I think; gotta check
 backend_paint (libart) - full, perhaps with bugs
JDH

[Matplotlib-users] graphics backends

From: matthew a. <ma...@ca...> - 2004年02月11日 23:24:13

Excuse me, what does AGG stand for?
And I'm curious, have you looked into cairo as a possible backend? It's a 
vector drawing library that's trying to be OS independent. I think there's 
a python interface. So far it outputs to bitmap images, X11, postscript, 
and an Open GL port is underway.
http://freedesktop.org/Cairo/Home
Interested to hear your thoughts. Maybe it's a matter of a job looking for 
a volunteer.
Cheers,
Matthew.

RE: [Matplotlib-users] large data sets and performance

From: Perry G. <pe...@st...> - 2004年02月11日 23:03:19

John Hunter writes:
> >>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
> 
> Perry> What I was alluding to was that if a backend primitive was
> Perry> added that allowed plotting a symbol (patch?) or point for
> Perry> an array of points. The base implementation would just do
> Perry> a python loop over the single point case so there is no
> Perry> requirement for a backend to overload this call. But it
> Perry> could do so if it wanted to loop over all points in C. How
> Perry> flexible to make this is open to discussion (e.g., allowing
> Perry> x, and y scaling factors, as arrays, for the symbol to be
> Perry> plotted, and other attributes that may vary with point such
> Perry> as color)
> 
> To make this work in the current design, you'll need more than a new
> backend method.
> 
[much good explanation of why...]
OK, I understand.
> My first response to this problem was to use a naive container class,
> eg Circles, and an appropriate backend method, eg, draw_circles. In
> this case, scatter would instantiate a Circles instance with a list of
> circles. When Circles was called to render, it would need to create a
> sequence of location data and a sequence of gcs
[...]
I'd agree that this doesn't seem worth the trouble
> 
> Much better is to implement a GraphicsContextCollection, where the
> relevant properties can be either individual elements or
> len(collection) sequences. If a property is an element, it's
> homogeneous across the collection. If it's len(collection), iterate
> over it. The CircleCollection, instead of storing individual Circle
> instances as I wrote about above, stores just the location and size
> data in arrays and a single GraphicsContextCollection.
> 
> def scatter(x, y, s, c):
> 
> collection = CircleCollection(x, y, s)
> gc = GraphicsContextCollection()
> gc.set_linewidth(1.0) # a single line width
> gc.set_foreground(c) # a len(x) array of facecolors
> gc.set_edgecolor('k') # a single edgecolor
> 
> collection.set_gc(gc)
> 
> axes.add_collection(collection)
> return collection
> 
> And this will be blazingly fast compared to the solution above, since,
> for example, you transform the x, y, and s coordinates as numeric
> arrays rather than individually. And there is almost no function call
> overhead. And as you say, if the backend doesn't implement a
> draw_circles method, the CircleCollection can just fall back on
> calling the existing methods in a loop.
> 
> Thoughts?
> 
I like the sounds of this approach even more. But I wonder if
it can be made somewhat more generic. This approach (if I read
it correctly seems to need a backend function for each shape:
perhaps only for circle?). What I was thinking was if there was a way
to pass it the vectors or path for a symbol (for very often, 
many points will share the same shape, if not all the same x,y
scale). Here the circle is a bit of a special case compared to
crosses, error bars triangles and other symbols that are usually
made up of a few straight lines. In these cases you could pass
the backend the context collection along with the shape
(and perhaps some scaling info if that isn't part of the context).
That way only one backend routine is needed. 
I suppose circle and other curved items could be handled with 
A bezier type call. 
But perhaps I still misunderstand.
Thanks for your very detailed response.
Perry

Re: [Matplotlib-users] large data sets and performance

From: John H. <jdh...@ac...> - 2004年02月11日 22:43:33

>>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
 Perry> What I was alluding to was that if a backend primitive was
 Perry> added that allowed plotting a symbol (patch?) or point for
 Perry> an array of points. The base implementation would just do
 Perry> a python loop over the single point case so there is no
 Perry> requirement for a backend to overload this call. But it
 Perry> could do so if it wanted to loop over all points in C. How
 Perry> flexible to make this is open to discussion (e.g., allowing
 Perry> x, and y scaling factors, as arrays, for the symbol to be
 Perry> plotted, and other attributes that may vary with point such
 Perry> as color)
To make this work in the current design, you'll need more than a new
backend method.
Plot commands like scatter instantiate Artists (Circle) and add them
to the Axes as a generic patch instances. On a call to draw, the Axes
instance iterates over all of it's patch instances and forwards the
call on to the artists it contains. These, in turn instantiate gc
instances which contain information like linewidth, facecolor,
edgecolor, alpha , etc... The patch instance also transforms its data
into display units and calls the relevant backend method. Eg, a
Circle instance would call
 renderer.draw_arc(gc, x, y, width, ...)
This makes it relatively easy to write a backend since you only have
to worry about 1 coordinate system (display) and don't need to know
anything about the Artist objects (Circle, Line, Rectangle, Text, ...)
The point is that no existing entity knows that a collection of
patches are all circles, and noone is keeping track of whether they
share a property or not. This buys you total flexibility to set
individual properties, but you pay for it in performance, since you
have to set every property for every object and call render methods
for each one, and so on.
My first response to this problem was to use a naive container class,
eg Circles, and an appropriate backend method, eg, draw_circles. In
this case, scatter would instantiate a Circles instance with a list of
circles. When Circles was called to render, it would need to create a
sequence of location data and a sequence of gcs
 locs = [ (x0, y0, w0, h0), (x1, y1, w1, h1), ...]
 gcs = [ circ0.get_gc(), circ1.get_gc(), ...] 
and then call
 renderer.draw_ellipses( locs, gcs).
This would provide some savings, but probably not dramatic ones. The
backends would need to know how to read the GCs. In backend_agg
extension code, I've implemented the code (in CVS) to read the python
GraphicsContextBase information using the python API. 
 _gc_get_linecap
 _gc_get_joinstyle
 _gc_get_color # returns rgb
This is kind of backward, implementing an object in python and then
accessing it at the extension level code using the Python API, but it
does keep as much of the frontend in python as possible, which is
desirable. The point is that for your approach to work and to not
break encapsulation, the backends have to know about the GC.
The discussion above was focused on preserving all the individual
properties of the actors (eg every circle can have it's own linewidth,
color, alpha, dash style). But this is rare. Usually, we just want to
vary one or two properties across a large collection, eg, color in
pcolor and size and color in scatter.
Much better is to implement a GraphicsContextCollection, where the
relevant properties can be either individual elements or
len(collection) sequences. If a property is an element, it's
homogeneous across the collection. If it's len(collection), iterate
over it. The CircleCollection, instead of storing individual Circle
instances as I wrote about above, stores just the location and size
data in arrays and a single GraphicsContextCollection.
def scatter(x, y, s, c):
 collection = CircleCollection(x, y, s)
 gc = GraphicsContextCollection()
 gc.set_linewidth(1.0) # a single line width
 gc.set_foreground(c) # a len(x) array of facecolors
 gc.set_edgecolor('k') # a single edgecolor
 collection.set_gc(gc)
 axes.add_collection(collection)
 return collection
And this will be blazingly fast compared to the solution above, since,
for example, you transform the x, y, and s coordinates as numeric
arrays rather than individually. And there is almost no function call
overhead. And as you say, if the backend doesn't implement a
draw_circles method, the CircleCollection can just fall back on
calling the existing methods in a loop.
Thoughts?
 
JDH

Re: [Matplotlib-users] large data sets and performance

From: John H. <jdh...@ac...> - 2004年02月11日 21:16:46

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:
 Peter> Will mostly be plotting time Vs value(time) but in certain
 Peter> cases will need plots of other data, and therefore have to
 Peter> look at the worst case scenario. Not exactly sure what you
 Peter> mean by "continuous" since all are descrete data
 Peter> points. The data may not be smooth (could have misbehaving
 Peter> sensors giving garbage) and jump all over the place.
Bad terminology: for x I meant sorted (monotonic) and for y the ideal
cases is smooth and not varying too rapidly. Try the lod feature and
see if it works for you.
Perhaps it would be better to extend the LOD functionality, so that
you control the extent of subsampling. Eg, suppose you have 100,000 x
data points but only 1000 pixels of display. Then for every data 100
points you could set the decimation factor, perhaps as a percentage.
More generally, we could implement a LOD base class users could supply
their own derived instances to subsample the data how they see fit,
eg, min and max over the 100 points, and so on. By reshaping the
points into a 1000x100 matrix, this could be done in Numeric
efficiently. 
 >> econdly, the standard gdmodule will iterate over the x, y
 >> values in a python loop in gd.py. This is slow for lines with
 >> lots of points. I have a patched gdmodule that I can send you
 >> (provide platform info) that moves this step to the extension
 >> module. Potentially a very big win.
 Peter> Yes, that would be great! System info:
Here is the link
http://nitace.bsd.uchicago.edu:8080/files/share/gdmodule-0.52b.tar.gz
You must also upgrade gd to 2.0.22 (alas 2.0.21 is obsolete!) since I
needed the latest version to get this sucker ported to win32.
 >> Another possibility: change backends. The GTK backend is
 >> significantly faster than GD. If you want to work off line
 >> (ie, draw to image only and not display to screen ) and are on
 >> a linux box, you can do this with GTK and Xvfb. I'll give you
 >> instructions if interested. In the next release of matplotlib,
 >> there will be a libart paint backend (cross platform) that may
 >> be faster than GD. I'm working on an Agg backend that should
 >> be considerably faster than all the other backends since it
 >> does everything in extension code -- we'll see
 Peter> Yes I am only planning to work offline. Want to be able to
 Peter> pipe the output images to stdout. I am looking for the
 Peter> fastest solution possible.
I don't know how to write a GTK pixbuf to stdout. I inquired on the
pygtk mailing list, so perhaps we'll learn something soon. To use GTK
in Xvfb, make sure you have Xvfb (X virtual frame buffer) installed
(/usr/X11R6/bin/Xvfb). There is probably an RPM, but I don't
remember.
You then need to start it with something like
XVFB_HOME=/usr/X11R6
 
$XVFB_HOME/bin/Xvfb :1 -co $XVFB_HOME/lib/X11/rgb -fp $XVFB_HOME/lib/X11/fonts/misc/,$XVFB_HOME/lib/X11/fonts/Speedo/,$XVFB_HOME/lib/X11/fonts/Type1/,$XVFB_HOME/lib/X11/fonts/75dpi/,$XVFB_HOME/lib/X11/fonts/100dpi/ &
And connect your display to it
> setenv DISPLAY :1
Now you can use gtk as follows 
from matplotlib.matlab import *
from matplotlib.backends.backend_gtk import show_xvfb
def f(t):
 s1 = cos(2*pi*t)
 e1 = exp(-t)
 return multiply(s1,e1)
t1 = arange(0.0, 5.0, 0.1)
t2 = arange(0.0, 5.0, 0.02)
t3 = arange(0.0, 2.0, 0.01)
subplot(211)
plot(t1, f(t1), 'bo', t2, f(t2), 'k')
title('A tale of 2 subplots')
ylabel('Damped oscillation')
subplot(212)
plot(t3, cos(2*pi*t3), 'r--')
xlabel('time (s)')
ylabel('Undamped')
savefig('subplot_demo')
show_xvfb() # not show!

Re: [Matplotlib-users] large data sets and performance

From: Peter G. <pgr...@ge...> - 2004年02月11日 20:22:32

Perry:
Currently using connected line plots, but do not want to limit myself in 
any way when it comes to presenting data. I am certain that at one 
point, I will use every plot available in the matplotlib arsenal. On a 
3.2Ghz P4 with 2GB RAM get ~90 seconds for a 100,000 data set, ~50 
seconds for 50,000 and ~9 seconds for a 10,000 (sort of linear). This is 
way to long for my purposes. I was hoping more for ~5 seconds for 
100,000 points.
John:
I routinely plot data sets this large. 500,000 data points is a
>I routinely plot data sets this large. 500,000 data points is a
>typical 10 seconds of EEG, which is the application that led me to
> 
>
That sounds good!
>If your xdata are sorted, ie like time, the following
>
> l = plot(blah, blah)
> set(l, 'lod', True)
>
>could be a big win.
>
>Whether this is appropriate or not depends on the data set of course,
>whether it is continuous, and so on. Can you describe your dataset in
>more detail, because I would like to add whatever optimizations are
>appropriate -- if others can pipe in here too that would help.
> 
>
Will mostly be plotting time Vs value(time) but in certain cases will 
need plots of other data, and therefore have to look at the worst case 
scenario.
Not exactly sure what you mean by "continuous" since all are descrete 
data points. The data may not be smooth (could have misbehaving sensors 
giving garbage) and jump all over the place.
>econdly, the standard gdmodule will iterate over the x, y values in a
>python loop in gd.py. This is slow for lines with lots of points. I
>have a patched gdmodule that I can send you (provide platform info)
>that moves this step to the extension module. Potentially a very big
>win.
> 
>
Yes, that would be great!
System info:
OS: RedHat9 ( kernel 2.4.20)
gcc version from running 'gcc -v':
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
Python: Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
matplotlig: matplotlib-0.50e
gdpython: 0.51 (with modified _gdmodule.c)
gd: gd-2.0.21
>Another possibility: change backends. The GTK backend is
>significantly faster than GD. If you want to work off line (ie, draw
>to image only and not display to screen ) and are on a linux box, you
>can do this with GTK and Xvfb. I'll give you instructions if
>interested. In the next release of matplotlib, there will be a libart
>paint backend (cross platform) that may be faster than GD. I'm
>working on an Agg backend that should be considerably faster than all
>the other backends since it does everything in extension code -- we'll
>see 
>
Yes I am only planning to work offline. Want to be able to pipe the 
output images to stdout. I am looking for the fastest solution possible.
Thanks again.
Peter

RE: [Matplotlib-users] large data sets and performance

From: Perry G. <pe...@st...> - 2004年02月11日 19:50:45

John Hunter writes:
> 
> could be a big win. LOD is "Level of Detail" and if true subsamples
> the data according to the pixel width of the output, as you described.
> Whether this is appropriate or not depends on the data set of course,
> whether it is continuous, and so on. Can you describe your dataset in
> more detail, because I would like to add whatever optimizations are
> appropriate -- if others can pipe in here too that would help.
> 
>
What I was alluding to was that if a backend primitive was added that
allowed plotting a symbol (patch?) or point for an array of points.
The base implementation would just do a python loop over the single
point case so there is no requirement for a backend to overload this
call. But it could do so if it wanted to loop over all points in C. 
How flexible to make this is open to discussion (e.g., allowing
x, and y scaling factors, as arrays, for the symbol to be plotted, and
other attributes that may vary with point such as color)
Perry

Re: [Matplotlib-users] large data sets and performance

From: John H. <jdh...@ac...> - 2004年02月11日 19:39:28

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:
 Peter> Hello: We will be dealing with large (> 100,000 but in some
 Peter> instances as big as 500,000 points) data sets. They are to
 Peter> be plotted, and I would like to use matplotlib.
Are you working with plot/loglog/etc (line data) or
pcolor/hist/scatter/bar (patch data)?
I routinely plot data sets this large. 500,000 data points is a
typical 10 seconds of EEG, which is the application that led me to
write matplotlib. EEG is fairly special: the x axis time is
monotonically increasing and the y axis is smooth. This lets me take
advantage of level of detail subsampling.
If your xdata are sorted, ie like time, the following
 l = plot(blah, blah)
 set(l, 'lod', True)
could be a big win. LOD is "Level of Detail" and if true subsamples
the data according to the pixel width of the output, as you described.
Whether this is appropriate or not depends on the data set of course,
whether it is continuous, and so on. Can you describe your dataset in
more detail, because I would like to add whatever optimizations are
appropriate -- if others can pipe in here too that would help.
Secondly, the standard gdmodule will iterate over the x, y values in a
python loop in gd.py. This is slow for lines with lots of points. I
have a patched gdmodule that I can send you (provide platform info)
that moves this step to the extension module. Potentially a very big
win.
Another possibility: change backends. The GTK backend is
significantly faster than GD. If you want to work off line (ie, draw
to image only and not display to screen ) and are on a linux box, you
can do this with GTK and Xvfb. I'll give you instructions if
interested. In the next release of matplotlib, there will be a libart
paint backend (cross platform) that may be faster than GD. I'm
working on an Agg backend that should be considerably faster than all
the other backends since it does everything in extension code -- we'll
see :-).
JDH

RE: [Matplotlib-users] large data sets and performance

From: Perry G. <pe...@st...> - 2004年02月11日 19:22:34

How are you plotting the data? As a scatter plot (e.g., symbols
or points) or as a connected line plot. The former can be quite
a bit slower and we have some thoughts about speeding that up
(which we haven't broached with JDH yet). How long is it taking
and how much faster do you need it?
Perry Greenfield
> -----Original Message-----
> From: mat...@li...
> [mailto:mat...@li...]On Behalf Of Peter
> Groszkowski
> Sent: Wednesday, February 11, 2004 2:14 PM
> To: mat...@li...
> Subject: [Matplotlib-users] large data sets and performance
> 
> 
> Hello:
> 
> We will be dealing with large (> 100,000 but in some instances as big as 
> 500,000 points) data sets. They are to be plotted, and I would like to 
> use matplotlib. I did a few preliminary tests, and it seems like 
> plotting that many pairs is a little too much for the system to handle. 
> Currently we are using (as a backend to some other software) gnuplot for 
> doing this plotting. It seems to be "lighting-fast" but I suspect (may 
> be wrong!) that it reduces this data before the plotting takes place, 
> and only selects every nth point. I have to go through the code that 
> calls it to be certain. I would imagine that it is not necessary to get 
> evrey point in 100,000 to produce a page-size plot, but I'm not sure if 
> simply grabbing every nth point and reducing the data like that is the 
> best way about this. So my question is to anyone else out there who is 
> also dealing with these large (and very large) data sets? What do you 
> do? Any library routines that you use before plotting to massage that 
> data? Are there any ways (ie flags to set) to optimize this in 
> matplotlib? Any other software you use? I should note that I use the GD 
> backend and pipe the output to stdout for a cgi scrpit to pick up.
> 
> Thanks.
> 
> -- 
> Peter Groszkowski Gemini Observatory
> Tel: +1 808 974-2509 670 N. A'ohoku Place
> Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>

[Matplotlib-users] large data sets and performance

From: Peter G. <pgr...@ge...> - 2004年02月11日 19:17:09

Hello:
We will be dealing with large (> 100,000 but in some instances as big as 
500,000 points) data sets. They are to be plotted, and I would like to 
use matplotlib. I did a few preliminary tests, and it seems like 
plotting that many pairs is a little too much for the system to handle. 
Currently we are using (as a backend to some other software) gnuplot for 
doing this plotting. It seems to be "lighting-fast" but I suspect (may 
be wrong!) that it reduces this data before the plotting takes place, 
and only selects every nth point. I have to go through the code that 
calls it to be certain. I would imagine that it is not necessary to get 
evrey point in 100,000 to produce a page-size plot, but I'm not sure if 
simply grabbing every nth point and reducing the data like that is the 
best way about this. So my question is to anyone else out there who is 
also dealing with these large (and very large) data sets? What do you 
do? Any library routines that you use before plotting to massage that 
data? Are there any ways (ie flags to set) to optimize this in 
matplotlib? Any other software you use? I should note that I use the GD 
backend and pipe the output to stdout for a cgi scrpit to pick up.
Thanks.
-- 
Peter Groszkowski Gemini Observatory
Tel: +1 808 974-2509 670 N. A'ohoku Place
Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA

Re: [Matplotlib-users] object picker legend and axis

From: John H. <jdh...@ac...> - 2004年02月11日 17:46:17

>>>>> "Jean-Baptiste" =3D=3D Jean-Baptiste Cazier <Jean-Baptiste.cazier@d=
ecode.is> writes:
 Jean-Baptiste> S=E6ll ! Thanks for the info and update. I upgaded
 Jean-Baptiste> my library and my program. The dramatic change of
 Jean-Baptiste> API was a bit painful, but the new syntax is more
 Jean-Baptiste> clear. Do you have any idea when you will hav
 Jean-Baptiste> reached a stable version in general and in term of
 Jean-Baptiste> version ?
It has been a bit painful, my apologies; I still have one application
to port myself. But it was necessary. matplotlib is undergoing
active development. The basic idea is to write a backend for a
powerful image renderer, http://antigrain.com or libart , and use that
backend to draw to the GUIs. Rather than each GUI implementing it's
own drawing, I'm moving to one high quality image renderer that will
draw to the GUI.
Why? Four major benefits
 * Easier maintenance: when I want to add a feature, I add it to the
 image backend and all the GUIs automatically benefit.
 * Enhanced drawing capabilities - the GUIs don't support a lot of
 the more sophisticated drawing capabilities, eg, paths from PS and
 SVG, or alpha blending, or gouraud shading. The agg backend
 supports all of these, and therefore by extension, so will the
 GUIs.
 * Font standardization across backends. With a common image
 renderer that supports freetype, all the GUIs can have freetype
 support with common font names.
=20
 * Ease of plain old image integration with figures - 2D image plots,
 like pcolor, will become very fast and very pretty.
Any solution along these lines will be performance competitive with
native GUI and work on all major platforms (win32, osx, linux, unix)
or we won't do it.
In order to do this right, I needed the Figure class (and all it's
children) to be totally backend independent. So a WX figure can
render to a postscript renderer, or an agg renderer, or a gd renderer,
or a libart renderer. In earlier releases of matplotlib, the AxisText
and Figure classes were tied up in the backend.
I have made it a policy not to predict a stable API because I don't
want to go back on my word. That said, I think the existing design is
clean and I am happy with it. I would not have said that a month ago.
I don't anticipate any major changes to the Figure API or the
FigureCanvas API.
Changes I do expect to see in the near future are
 * the addition of GTK/AGG, WX/AGG and Tkinter/AGG backends. These
 will be optional so for those of you who want to stay with the
 classic GUI backends you can. But I would encourage you to move
 over at that time since you'll get better drawing, more
 sophisticated plots, etc... I will probably add a matplotlibrc
 file in which you can select your default backend. This will be an
 almost trivial API port, if you choose to. This will probably be
 around matplotlib-0.6 (1-2 months).
 * Addition of path drawing methods in the renderer API. This will
 allow for the more sophisticated that state dependent paths
 support. If you don't work directly with the renderer, it won't
 affect you. It's mainly for backend implementers.
As for API stability in general, I have 2 additional thoughts.
 * for the matlab interface, if it's matlab compatible it's 99% likely
 to be stable. We've made minor departures from this when there was
 clearly a better way to do it, but it's a good rule of thumb.
 * As with many projects, I think the 1.0 release should guarantee
 some API stability. I predict you'll see a pretty stable API until
 then.
JDH

Re: [Matplotlib-users] ANN matplotlib-0.42

From: John H. <jdh...@ac...> - 2004年02月11日 16:16:34

>>>>> "LUK" == LUK ShunTim <shu...@po...> writes:
 >> OK, now we at least know where the problem is. I don't get
 >> such an error message on my system (rhl9, pygtk-2.0.0). What
 >> platform are you on, and what versions of GTK and pygtk are you
 >> running? JDH
 LUK> W2K, Enthought python 2.3, pygtk 2.0, gtk 2.0
Tracked this one down. Apparently the latest version of GTK for
windows does different font aliasing. This is controlled by the file
c:\gtk\etc\pango\pango.aliases
Try adding the line:
times = "times new roman,angsana new,mingliu,simsun,gulimche,ms gothic,latha,mangal,code2000"
If you get other font messages, adding more font aliases to this file
may help.
I'm making some changes in the gtk backend - mapping times -> serif,
so in the next release of matplotlib this should be fixed
automagically.
JDH

Re: [Matplotlib-users] Question about Matplotlib system performance

From: John H. <jdh...@ac...> - 2004年02月11日 15:17:40

>>>>> "derek" == derek bandler <d_b...@ho...> writes:
 derek> A small problem - I downloaded the tar/zipped file to my
 derek> pc, but winzip errors on onloading it. I downloaded
 derek> something called power zip and have the same problem.
 derek> There's an error msg that the archive may be corrupt.
Oh, I assumed you were on a UNIX platform because of the Solaris
reference. Here's a link to a windows installer
 http://nitace.bsd.uchicago.edu:8080/files/share/matplotlib-0.50j.win32.exe
 derek> As far as the patch, would that require re-building the
 derek> software?
Yep, and this is far from trivial on win32. Hopefully, the pygtk
maintainers will include the patch in the next release, but there has
been grumbling on the pygtk list about patches collecting dust. In
any case, I cache most of the relevant stuff now, so repeated draws
should not expose the memory leak as before.
Give it a try.
JDH

[Matplotlib-users] plot legend title extension

From: matthew a. <ma...@ca...> - 2004年02月11日 03:40:06

Hi
You might know that octave is a free software clone of matlab. I used it 
heavily for a previous project. One thing I liked about it's plot command 
is an extension to the fmt string which allowed you to specify the legend 
title for each line between semicolons. Like this:
plot(x, sin(x), 'r;sine;')
plot(x, cos(x), 'g;cosine;')
plot(x, tan(x), 'b;tangent;')
It seems like a handy way to be able to do things.
I couldn't see this mentioned in the matplotlib docs, and when I tried it 
I got a dialog "Unrecognized character in format string". May I suggest it 
as a useful addition to matplotlib? It seems like legend() might need 
adjustment to allow for it too.
BTW, using the GTK backend the "Unrecognized character in format string" 
dialog has locked up my python session after I clicked OK. It appeared
straight after the plot command -- not when I ran show(). After I clicked 
OK, the window stayed visible but unresponsive to the close button etc.
m.

1 message has been excluded from this view by a project administrator.

Flat | Threaded

Thanks for helping keep SourceForge clean.