>>>>> "Darren" == Darren Dale <dd...@co...> writes: JDH> Here is my near term wish list for the PS backend: JDH> - implement draw_markers and draw_lines with the new API JDH> (transform is done in backend). There are comments in JDH> backend_bases and in backend_ps to get you started Darren> I started looking into this tonight, but I am pretty much Darren> lost. The comments are a little too abstract for me right Darren> now, I cant find a footing. Could you offer some more Darren> details? Sure, maybe more than you had bargained for <wink>. I'm CC-ing the dev list in case any of this information is useful to others. [BTW, Darren is tentatively offering to take on some of the work to keep the PS backend up to snuff] There are several motivations to change backend renderer API, most of them based on limitations or inefficiencies of the current API * The renderer interface is based on the GTK drawing model, which doesn't have a path concept, and is thus a bit behind most drawing APIs: ps, pdf, svg, cairo, agg, libart, etc... * Once you have a draw path method, many of the other methods (draw_rectangle, draw_polygon) become superfluous since they are just special cases of draw_path. [ There is some debate about whether it is useful to keep these redundant methods around for efficiency or convenience. ] * Many backends (svg, ps, agg) have transformation support built-in (at least for affine transformations). I initially did the transformations in the front-end for convenience to backend writers (backends always work in display coords) but this caused several problems, inefficiency being one, and the new API moves the transformation to the backend. Among other things, it allows the backend to fail gracefully when transforming on a per-element basis (log of non-positive data) w/o a mask or w/o an extra pass through the data. For large numbers of points, the savings can be appreciable. So the new backend methods are passed a Transformation instance. * We needed a draw_markers method. draw_markers is a special case where the same path is repeatedly drawn at many places. In the old API, we would do something like this for draw_plus in the Line2D class for (x,y) in zip(xt, yt): renderer.draw_line(gc, x-offset, y, x+offset, y) renderer.draw_line(gc, x, y-offset, x, y+offset) This is enormously inefficient, because of all the extra function calls and because of all the gc state setting that must be done on each call to draw_line in the inner loop. In the new API, we do path = agg.path_storage() path.move_to(-offset, 0) path.line_to( offset, 0) path.move_to( 0, -offset) path.line_to( 0, offset) renderer.draw_markers(gc, path, None, xt, yt, self._transform) and the backend only has to set the gc state once. Also, agg can cache the rasterized path and display it at many locations which is fast. So those are the motivations. There are three new methods that have been introduced thus far. The plan is introduce these three new methods and then remove many of the redundant methods, so the overall number of renderer methods will decrease. draw_markers - draw the same path at many locations draw_path - draw an agg path (details later) draw_lines - already exists but new method has trans in backend The signatures of these three methods are draw_markers(self, gc, path, rgbFace, x, y, trans): draw_path(self, gc, rgbFace, path, trans) draw_lines(self, gc, x, y, trans) These should be documented in backend_bases, but gc is a backend GraphicsContext, rgbFace is an rgbTuple or None, x and y are numerix arrays, path is an agg.path_storage and trans is a matplotlib.transforms.Transformation instance. Details on these latter two to follow. path is an agg.path_storage instance. In the first implementation of draw_markers in backend_ps, path was simply a list of (code vertices...) where code was one of STOP, MOVETO, LINETO, CURVE3, CURVE4, ENDPOLY and vertices were a bunch of x,y verts. I subsequently decided to just use the agg path class for this (wrapped by SWIG) because it is more generally useful (the code in backend_ps _draw_markers is thus stale). Here is a script that illustrates the path_storage class from matplotlib.agg import path_storage p = path_storage() p.move_to(10,10) p.line_rel(100,100) p.line_rel(0,-100) p.line_to(30,30) p.curve3(20,30,40,50) for i in range(p.total_vertices()): cmd, x, y = p.vertex(i) print cmd, x, y This script outputs peds-pc311:~/python/projects/matplotlib/unit> python path_storage.py 1 10.0 10.0 2 110.0 110.0 2 110.0 10.0 2 30.0 30.0 3 20.0 30.0 3 40.0 50.0 Note that there are more vertices than commands used to create the path, because there are two vertices generated by the curve3 call. The 1,2,3 command codes are from an agg ENUM, and are found in agg22/include/agg_basics.h enum path_commands_e { path_cmd_stop = 0, //----path_cmd_stop path_cmd_move_to = 1, //----path_cmd_move_to path_cmd_line_to = 2, //----path_cmd_line_to path_cmd_curve3 = 3, //----path_cmd_curve3 path_cmd_curve4 = 4, //----path_cmd_curve4 path_cmd_end_poly = 6, //----path_cmd_end_poly path_cmd_mask = 0x0F //----path_cmd_mask }; See agg22/include/agg_basics.h, agg22/include/agg_path_storage.h and swig/agg_path_storage.i for more information on available methods of the agg path_storage class. You will need to translate these path primitives into the basic postscript moveto, lineto, etc commands. For the curve3 you would use a cubic spline. I don't know if postscript has a quartic spline... The Transformation class is fairly well documented in transforms.py and in the _draw_markers prototype method I wrote in backend_ps. Here is an example usage if trans.need_nonlinear(): x,y = trans.nonlinear_only_numerix(x, y) # the a,b,c,d,tx,ty affine which transforms x and y vec6 = trans.as_vec6_val() vec6 is a standard length 6 vector containing the information needed to make an affine transformation. Note the call to transform.nonlinear_only_numerix(x, y) can fail (eg log of nonpositive data). I may provide some helper function in extension code to support this. What you want is a function that returns the transformed data with a mask indicating the points to be skipped. I suggest you not worry about this right now -- if the transformation fails because the user has illegal data that is OK for the time being. It is easier in the agg extension code because I to the transformation element-by-element in a c++ loop and drop points on which the transformation fails. This would probably be prohibitively slow in python. Note that I hid the _draw_markers prototype method in backend_ps with a prefix underscore because it is incomplete and because I am using the existence of that method in Line2D as a sentinel for whether a backend as implemented the new API. For example, in lines.py self._newstyle = hasattr(renderer, 'draw_markers') So once you implement draw_markers, you need to implement draw_lines with the new signature. draw_path isn't utilized yet by the front-end, but it will be nice to expose a path primitive for people who want to make splines, etc. I'll try and take this email and turn it into something more formal, or use it to rewrite backend_bases and backend_template. So far, the only backend besides agg to be ported to the new API is cairo -- I guess as long as the old API is still working there is little incentive to do it. I've been holding off *requiring* the new API because it would irreparably break some backends that don't support paths (gtk, wx, gd). Some of these (gtk, wx) have been essential for some people because they support unicode. But now that agg and ps support unicode, this is no longer so important. We can also provide a helper method that converts simple paths (those comprised of moveto, lineto and endpoly) into draw_line and draw_polygon methods if we want to keep these backends on board. Also, Steve thinks GTK may be getting paths in the near future as they move to a cairo renderer, which suggests that waiting may be the right move. OK, that should be enough to get you started. Sorry for the incomplete set of documentation or guidelines. There has been a lot of discussion on where the backends should be going, and since I've been mulling all the options I've been slow to offer clear guidance in the backend documentation. I think your first objective should be to figure out how to translate an agg.path_storage into a postscript path -- the rest should be easy :-) Let me know if you have any more questions! JDH
On Wednesday 30 March 2005 10:39 pm, John Hunter wrote: > JDH> - implement draw_markers and draw_lines with the new API > JDH> (transform is done in backend). [..snip..] I made a first (and second) attempt at implementing draw_markers and draw_lines in the postscript backend. The changes are in CVS, although I left draw_markers masked as _draw_markers, it needs to be unmasked if you want to try it out. I found some places for speed/memory/ps-filesize improvements. With draw_markers masked, the script below took 2.43 seconds to generate and write the 1.5MB eps file. With draw_markers unmasked, it took 0.69 seconds to make a 350KB eps file. Some comments: 1) Circles are being drawn with draw_markers, but agg.path_storage has no curve information in it? Circles are faithfully reproduced in ps output, but it takes 50 line segments to draw each circle in plot(arange(10000),'-o'). 2) I think each tickmark is listed in agg.path_storage twice, and therefore gets rendered twice in PS. 3) I expected marker paths to be terminated with the agg.path_cmd_end_poly code. This is not the case. What is the purpose of path_cmd_end_poly? 4) I am getting an unrecognized agg.path_commands_e code. They should be one of 0,1,2,3,4,6,0x0F, and I am getting a value of 70. ?? I just ignore it and PS seems to render fine. 5) Im not doing anything with vec6 = transform.as_vec6_val(). I'm not sure what it is used for. 6) draw_lines is getting a long pathlist from agg. Rather than draw a straight line between two points, it is doing something like 50.106 249.850 moveto 53.826 249.850 lineto 57.546 249.850 lineto 61.266 249.850 lineto and thats just for the line in the legend! The straight line in the actual plot has many, many intermediate points. Feedback appreciated! from pylab import * from time import clock figure(1) plot(arange(10000),'-s') l=legend(('1e4 markers',)) d = clock() savefig('temp.eps') print clock()-d -- Darren
>>>>> "Darren" == Darren Dale <dd...@co...> writes: Darren> I made a first (and second) attempt at implementing Darren> draw_markers and draw_lines in the postscript backend. The Darren> changes are in CVS, although I left draw_markers masked as Darren> _draw_markers, it needs to be unmasked if you want to try Darren> it out. Hey Darren, thanks for working on this. Darren> I found some places for speed/memory/ps-filesize Darren> improvements. With draw_markers masked, the script below Darren> took 2.43 seconds to generate and write the 1.5MB eps Darren> file. With draw_markers unmasked, it took 0.69 seconds to Darren> make a 350KB eps file. A good start. You'll might be able to get this number down a bit more, which I discuss below. Darren> 1) Circles are being drawn with draw_markers, but Darren> agg.path_storage has no curve information in it? Circles Darren> are faithfully reproduced in ps output, but it takes 50 Darren> line segments to draw each circle in Darren> plot(arange(10000),'-o'). This is a wart slated for destruction. We plan to replace circles and ellipses with splines rather than vertices. Just hasn't been done yet. Darren> 2) I think each tickmark is listed in agg.path_storage Darren> twice, and therefore gets rendered twice in PS. Why do you think this? Which ticks? Darren> 3) I expected marker paths to be terminated with the Darren> agg.path_cmd_end_poly code. This is not the case. What is Darren> the purpose of path_cmd_end_poly? Only marker paths that are polygons have end poly (eg draw_circle). A lot of the paths (eg tick marks) are not polygons and so don't have an end_poly code. Darren> 4) I am getting an unrecognized agg.path_commands_e Darren> code. They should be one of 0,1,2,3,4,6,0x0F, and I am Darren> getting a value of 70. ?? I just ignore it and PS seems to Darren> render fine. I had to track this one down myself. lines.py calls path.end_poly() agg_path_storage::end_poly calls add_vertex(0.0, 0.0, path_cmd_end_poly | flags); where flags is agg_basics path_flags_e::path_flags_close = 0x40. You can test for end poly using the agg module with >>> 0x40 | 6 70 >>> from matplotlib.agg import path_storage, is_end_poly >>> is_end_poly(71) False >>> is_end_poly(70) True Darren> 5) Im not doing anything with vec6 = Darren> transform.as_vec6_val(). I'm not sure what it is used for. This is in case you want to do the affine transformation yourself. The transform is a nonlinear part plus an affine. Note that backend_ps is currently doing if transform.need_nonlinear(): x,y = transform.nonlinear_only_numerix(x, y) x, y = transform.numerix_x_y(x, y) which is wrong -- it will fail for nonlinear transforms like log because the numerix_x_y call does the nonlinear and the affine part and so you will be doing the nonlinear part twice. The motivation for separating out the nonlinear and affine parts was to let the backend machinery do the affine part (in the great majority of cases, the transforms are pure affine anyway). So you might want to do if transform.need_nonlinear(): x,y = transform.nonlinear_only_numerix(x, y) vec6 = transform.as_vec6_val() and then set the current ps affine to vec6. Darren> 6) draw_lines is getting a long pathlist from agg. Rather Darren> than draw a straight line between two points, it is doing Darren> something like Darren> 50.106 249.850 moveto 53.826 249.850 lineto 57.546 249.850 Darren> lineto 61.266 249.850 lineto Darren> and thats just for the line in the legend! The straight Darren> line in the actual plot has many, many intermediate Darren> points. That is not surprising. matplotlib plots what you give it. If you specify a straight line of 10000 points as you did in your example plot(arange(10000),'-s') matplotlib will plot all 10000 vertices of the line. It's incumbent on the user not to pass in redundant data. Now, onto the subject of how you might be able to make this faster. One of the primary motivations of draw_markers is that you should only have to set the graphics context state once. In the current implementation, we have while start < len(x): to_draw = izip(x[start:end],y[start:end]) ps = ['%1.3f %1.3f marker' % point for point in to_draw] self._draw_ps("\n".join(ps), gc, None) start = end end += 1000 and _draw_ps sets the gc state. Now this isn't really a huge deal, since you are chunking the data in 1000 length buckets. But for very large data sets (500k markers) it will result in 500 superfluous calls to set the gc state. It might be worth implementing a push_gc method that sets the current gc state, and then calling this at the top of draw_markers and not inside the loop. We'll probably want to implement this as a default gc method across backends anyway in the near term, so it would be a worthwhile change. Hope this helps, thanks again. JDH
Hi John, > > Darren> 2) I think each tickmark is listed in agg.path_storage > Darren> twice, and therefore gets rendered twice in PS. > > Why do you think this? Which ticks? I was checking the output of the files I was generating, here is a clip responsible for rendering a single xtickmark: % draw_markers /marker { gsave newpath translate 0.000 0.000 m 0.000 4.000 l closepath stroke grestore } bind def 0.500 setlinewidth 0 setlinecap 80.640 31.680 marker 80.640 31.680 marker stroke The coordinates (80.640 31.680) are rendered twice; I can comment one of these lines out of the PS file and the tick still renders. Its not a bug in draw_markers, the square data markers are only rendered once, it seems to be specific to tickmarks. I think we could get a performance boost if all similar ticks were passed together to draw_markers, right now they are passed independently. > Darren> 5) Im not doing anything with vec6 = > Darren> transform.as_vec6_val(). I'm not sure what it is used for. > > This is in case you want to do the affine transformation yourself. > The transform is a nonlinear part plus an affine. Note that > backend_ps is currently doing > > if transform.need_nonlinear(): > x,y = transform.nonlinear_only_numerix(x, y) > x, y = transform.numerix_x_y(x, y) > > which is wrong -- it will fail for nonlinear transforms like log > because the numerix_x_y call does the nonlinear and the affine part > and so you will be doing the nonlinear part twice. I'll get up to speed on this eventually. I just copied those three lines from backend_cairo.draw_markers. > Darren> 6) draw_lines is getting a long pathlist from agg. > > That is not surprising. matplotlib plots what you give it. Yeah, I realized I had made a boneheaded observation just after I hit the send button. > Now, onto the subject of how you might be able to make this faster. [...] > It might be worth implementing a push_gc method > that sets the current gc state, and then calling this at the top of > draw_markers and not inside the loop. We'll probably want to > implement this as a default gc method across backends anyway in the > near term, so it would be a worthwhile change. OK. Would you add the signature to backend_bases? -- Darren
Darren Dale wrote: >% draw_markers >/marker { gsave >newpath >translate >0.000 0.000 m >0.000 4.000 l >closepath >stroke >grestore } bind def >0.500 setlinewidth >0 setlinecap >80.640 31.680 marker >80.640 31.680 marker >stroke > >The coordinates (80.640 31.680) are rendered twice; I can comment one of these >lines out of the PS file and the tick still renders. Its not a bug in >draw_markers, the square data markers are only rendered once, it seems to be >specific to tickmarks. > >I think we could get a performance boost if all similar ticks were passed >together to draw_markers, right now they are passed independently. > > Yes, this would be good, since the same marker could be save and then just translated from position to position. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218
>>>>> "Darren" == Darren Dale <dd...@co...> writes: Darren> The coordinates (80.640 31.680) are rendered twice; I can Darren> comment one of these lines out of the PS file and the tick Darren> still renders. Its not a bug in draw_markers, the square Darren> data markers are only rendered once, it seems to be Darren> specific to tickmarks. Strange.... I'll look into this later. Darren> I think we could get a performance boost if all similar Darren> ticks were passed together to draw_markers, right now they Darren> are passed independently. We could, but it would require some redesign. Tick is a class, and the axis contains a list of ticks. Thus it would take some top-level redesign. Darren> Yeah, I realized I had made a boneheaded observation just Darren> after I hit the send button. It's always that way :-) That is what the send button is for: self enlightenment. Darren> OK. Would you add the signature to backend_bases? Not yet. I was just suggesting you use this internally. def draw_markers(self, gc, path, rgbFace, x, y, transform): self.push_gc(gc) while 1: .... snip... and later when it becomes part of the api, you'll already have done the hard part. You can also call this function from draw_ps. Basically, all you need to do is rip the gc setting part of out of draw_ps. JDH
Hi John, On Mon, 4 Apr 2005, John Hunter wrote: > >>>>> "Darren" == Darren Dale <dd...@co...> writes: > Darren> I think we could get a performance boost if all > Darren> similar ticks were passed together to draw_markers, > Darren> right now they a are passed independently. > > We could, but it would require some redesign. Tick is a > class, and the axis contains a list of ticks. Thus it would > take some top-level redesign. I'd also encourage looking at how the Ticks are implemented. I believe that for simple plots (say, simple_plot.py), the tick drawing is what dominates rendering time, at least in the WxAgg backend (which is dominated by the Agg rendering time). I wouldn't be surprised if this was the case for most backends. As far as I can tell, each tick mark is a separate Line2D with 2 points and have all the available properties of a Line2D. That seems like a fine approach (certainly easy), but it's definitely overkill. My speed tests say that rendering one thousand lines with two points is a lot slower than rendering two lines with one thousand points (easy enough to test). That means tick drawing can easily be the performance bottleneck. I like Darren's and Paul's suggestion (set line properties once, then have the ticks be a simple list of pen up / pen down). I believe major and minor ticks would need to have different properties, but it's still only 2 set of properties. I understand that this might mean a significant redesign, but the performance boost might be worth it. Thanks, --Matt PS: Someone might want tick marks to have all the flexibility that they currently enjoy. My guess is that this would be unusual (I don't see any examples that use this flexibility), and that such cases could just add custom lines themselves.
>>>>> "Matt" == Matt Newville <new...@ca...> writes: Matt> I like Darren's and Paul's suggestion (set line properties Matt> once, then have the ticks be a simple list of pen up / pen Matt> down). I believe major and minor ticks would need to have Matt> different properties, but it's still only 2 set of Matt> properties. I understand that this might mean a significant Matt> redesign, but the performance boost might be worth it. I would bet dollars to doughnuts (careful here, Perry still owes me a doughnut!) that almost all of the tick cost comes from laying out the text of the ticks and not in drawing the tick lines themselves -- Arnd posted some hotshot profile of this earlier, but I don't remember the exact results). I agree ticks (and text in general) are too expensive. In my experience, this is usually only starts a problem in animated plots (do you have another use case in mind?). I think we might be able to work around this particular problem by supporting the drawing of only a subset of the artists in the scene. I imagine something like the following is workable. line, = ax.plot(blah) dynamic = (line,) # a list of artists to animate # draws everything but artists in dynamic and caches Axes bbox to bitmap ax.animate_prepare( dynamic) while 1: line.set_data(blah) # blits the axes background cache and renders only the artists in dynamic ax.animate() I'm not opposed to a redesign of the Tick drawing if there are appreciable gains to be had, but my guess is we may get more bang for the buck in special casing the typical text layout (angle=0.0, no mathtext, no unicode) and handling dynamic updates more intelligently. JDH
Hi John, Hmm, could be. Text is definitely slow, but my recollection is that the Line2D drawing of the ticks was actually significant. For example, the speed difference when turning on/off the right and top ticks (which don't generally have text) was noticeable. It's been awhile since I looked at this, and I'm not finding my test scripts right now. My conclusions at the time were that agg rendering was dominating WXAgg time (so improving the WXAgg icky get-rgb-image-then-render-as-bitmap was not so slow) and that tick line rendering in Agg was much slower than I had expected. I'll try to reproduce this, but this week is sort of full for me. Currently, line plotting with WXAgg is fast enough for me (I can reliably get better than 10 plots/sec on WinXP in my app, for example). Also, just to be clear: I owe you much more than doughnuts. Thanks, --Matt On Mon, 4 Apr 2005, John Hunter wrote: > >>>>> "Matt" == Matt Newville <new...@ca...> writes: > > Matt> I like Darren's and Paul's suggestion (set line properties > Matt> once, then have the ticks be a simple list of pen up / pen > Matt> down). I believe major and minor ticks would need to have > Matt> different properties, but it's still only 2 set of > Matt> properties. I understand that this might mean a significant > Matt> redesign, but the performance boost might be worth it. > > I would bet dollars to doughnuts (careful here, Perry still owes me a > doughnut!) that almost all of the tick cost comes from laying out the > text of the ticks and not in drawing the tick lines themselves -- Arnd > posted some hotshot profile of this earlier, but I don't remember the > exact results). > > I agree ticks (and text in general) are too expensive. In my > experience, this is usually only starts a problem in animated plots > (do you have another use case in mind?). I think we might be able to > work around this particular problem by supporting the drawing of only > a subset of the artists in the scene. I imagine something like the > following is workable. > > line, = ax.plot(blah) > > dynamic = (line,) # a list of artists to animate > # draws everything but artists in dynamic and caches Axes bbox to bitmap > ax.animate_prepare( dynamic) > > while 1: > line.set_data(blah) > # blits the axes background cache and renders only the artists in dynamic > ax.animate() > > > I'm not opposed to a redesign of the Tick drawing if there are > appreciable gains to be had, but my guess is we may get more bang for > the buck in special casing the typical text layout (angle=0.0, no > mathtext, no unicode) and handling dynamic updates more intelligently. > > JDH > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel >
On Monday 04 April 2005 12:09 pm, John Hunter wrote: > > I agree ticks (and text in general) are too expensive. In my > experience, this is usually only starts a problem in animated plots > (do you have another use case in mind?). I think we might be able to > work around this particular problem by supporting the drawing of only > a subset of the artists in the scene. [...] > > > I'm not opposed to a redesign of the Tick drawing if there are > appreciable gains to be had, but my guess is we may get more bang for > the buck in special casing the typical text layout (angle=0.0, no > mathtext, no unicode) and handling dynamic updates more intelligently. Data acquisition is a good example of where a new tick protocol would be useful. Supposing the user wants a plot in their gui that autoscales after the addition of each new point (which is not uncommon), the ticks would need to render as quickly as possible. Everytime somebody I work with complains about the LabView program from National Instruments, I think about how nice it would be to do data acquisition with Python. I had hoped that Taco would mature into a solid library for interfacing with scientific instruments, but the project doesnt seem very active, judging by their webpage http://www.esrf.fr/taco/. -- Darren