Fast line drawing algorithm

Question 1

The task is to find a way to draw a horizontal line in an array of 16-bit integers.

We're assuming a 256x192 pixel array with 16 pixels per word. A line is a contiguous run of set (1) bits. Lines can start in the middle of any word, overlap any other words, and end in any word; they may also start and end in the same word. They may not wrap on to the next line. Hint: the middle words are easy - just write 0xffff, but the edges will be tricky, as will handling the case for the start and end in the same word. A function/procedure/routine must take an x0 and x1 coordinate indicating the horizontal start and stop points, as well as a y coordinate.

I exclude myself from this because I designed a nearly identical algorithm myself for an embedded processor but I'm curious how others would go about it. Bonus points for using relatively fast operations (for example, a 64 bit multiply or floating point operation wouldn't be fast on an embedded machine but a simple bit shift would be.)

Question 2

I started a discussion about performance measurements on meta - you're invited.

Question 3

how was winner chosen?

Question 4

This code assumes that both x0 and x1 are inclusive endpoints, and that words are little endian (i.e. the (0,0) pixel can be set with array[0][0]|=1).

int line(word *array, int x0, int x1, int y) {
 word *line = array + (y << 4);
 word *start = line + (x0 >> 4);
 word *end = line + (x1 >> 4);
 word start_mask = (word)-1 << (x0 & 15);
 word end_mask = (unsigned word)-1 >> (15 - (x1 & 15));
 if (start == end) {
 *start |= start_mask & end_mask;
 } else {
 *start |= start_mask;
 *end |= end_mask;
 for (word *p = start + 1; p < end; p++) *p = (word)-1;
 }
}

Question 5

How fast is it?

Question 6

Python

The main trick here is to use a lookup table to store bitmasks of the pixels. This saves a few operations. A 1kB table is not that large even for an embedded platform these days

If space is really tight, for the price of a couple of &0xf the lookup table can be reduced to just 64B

This code is in Python, but would be simple to port to any language that supports bit operations.

If using C, you could consider unwinding the loop using the switch from Duff's device. Since the line is maximum of 16 words wide, I would extend the switch to 14 lines and dispense with the while altogether.

T=[65535, 32767, 16383, 8191, 4095, 2047, 1023, 511,
 255, 127, 63, 31, 15, 7, 3, 1]*16
U=[32768, 49152, 57344, 61440, 63488, 64512, 65024, 65280,
 65408, 65472, 65504, 65520, 65528, 65532, 65534, 65535]*16
def drawline(x1,x2,y):
 y_=y<<4
 x1_=y_+(x1>>4)
 x2_=y_+(x2>>4)
 if x1_==x2_:
 buf[x1_]|=T[x1]&U[x2]
 return 
 buf[x1_]|=T[x1]
 buf[x2_]|=U[x2] 
 x1_+=+1
 while x1_<x2_:
 buf[x1_] = 0xffff
 x1_+=1
#### testing code ####
def clear():
 global buf
 buf=[0]*192*16
def render():
 for y in range(192):
 print "".join(bin(buf[(y<<4)+x])[2:].zfill(16) for x in range(16))
clear()
for y in range(0,192):
 drawline(y/2,y,y)
for x in range(10,200,6):
 drawline(x,x+2,0)
 drawline(x+3,x+5,1)
for y in range(-49,50):
 drawline(200-int((2500-y*y)**.5), 200+int((2500-y*y)**.5), y+60)
render()

Question 7

Here is a C version of my Python answer using the switch statement instead of the while loop and reduced indexing by incrementing a pointer instead of the array index

The size of the lookup table can be substantially reduced by using T[x1 & 0xf] and U[x2 & 0xf] for a couple of extra instructions

#include <stdio.h>
#include <math.h>
unsigned short T[] = {0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
 0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001};
unsigned short U[] = {0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
 0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
 0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff};
unsigned short buf[192*16];
void clear(){
 int i;
 for (i=0; i<192*16; i++) buf[i]==0;
}
void render(){
 int x,y;
 for (y=0; y<192; y++){
 for (x=0; x<256; x++) printf("%d", (buf[(y<<4)+(x>>4)]>>(15-(x&15)))&1);
 printf("\n");
 }
}
void drawline(int x1, int x2, int y){
 int y_ = y<<4;
 int x1_ = y_+(x1>>4);
 int x2_ = y_+(x2>>4);
 unsigned short *p = buf+x1_;
 if (x1_==x2_){
 *p|=T[x1]&U[x2];
 return;
 }
 *p++|=T[x1];
 switch (x2_-x1_){
 case 14: *p++ = 0xffff;
 case 13: *p++ = 0xffff;
 case 12: *p++ = 0xffff;
 case 11: *p++ = 0xffff;
 case 10: *p++ = 0xffff;
 case 9: *p++ = 0xffff;
 case 8: *p++ = 0xffff;
 case 7: *p++ = 0xffff;
 case 6: *p++ = 0xffff;
 case 5: *p++ = 0xffff;
 case 4: *p++ = 0xffff;
 case 3: *p++ = 0xffff;
 case 2: *p++ = 0xffff;
 case 1: *p++ = U[x2];
 } 
}
int main(){
 int x,y;
 clear();
 for (y=0; y<192; y++){
 drawline(y/2,y,y); 
 }
 for (x=10; x<200; x+=6){
 drawline(x,x+2,0);
 drawline(x+3,x+5,1);
 }
 for (y=-49; y<50; y++){
 x = sqrt(2500-y*y);
 drawline(200-x, 200+x, y+60);
 }
 render();
 return 0;
 }

Question 8

How fast is it?

Question 9

@user unknown, How long is a piece of string? I think it should be faster than the accepted answer because it uses a lookup table to reduce the amount of work slightly. Why don't you try them out and let us know what you find?

Question 10

Scala, (削除) 7s / 1M lines (削除ここまで) 4.1s / 1M lines

// declaration and initialisation of an empty field: 
val field = Array.ofDim[Short] (192, 16)

first implementation:

// util-method: set a single Bit:
def setBit (x: Int, y: Int) = 
 field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort 
def line (x0: Int, x1: Int, y: Int) = 
 (x0 to x1) foreach (setBit (_ , y))

After eliminating the inner method call, and replacing the for- with a while-loop, on my 2Ghz Single Core with Scala 2.8 it absolves 1 Mio. Lines in 4.1s sec. instead of initial 7s.

 def line (x0: Int, x1: Int, y: Int) = {
 var x = x0
 while (x < x1) { 
 field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort
 x += 1
 }
 }

Testcode and invocation:

// sample invocation:
line (12, 39, 3) 
// verification 
def shortprint (s: Short) = s.toBinaryString.length match { 
 case 16 => s.toBinaryString 
 case 32 => s.toBinaryString.substring (16) 
 case x => ("0000000000000000".substring (x) + s.toBinaryString)}
field (3).take (5).foreach (s=> println (shortprint (s))) 
// result:
0000000000001111
1111111111111111
1111111100000000
0000000000000000
0000000000000000

Performance testing:

 val r = util.Random 
 def testrow () {
 val a = r.nextInt (256)
 val b = r.nextInt (256)
 if (a < b)
 line (a, b, r.nextInt (192)) else
 line (b, a, r.nextInt (192)) 
 }
 def test (count: Int): Unit = {
 for (n <- (0 to count))
 testrow ()
 }
 // 1 mio tests
 test (1000*1000)

Tested with the unix tool time, comparing the user-time, including startup-time, compiled code, no JVM-start-up phase.

Increasing the number of lines shows, that for every new million, it needs an extra 3.3s.

Keith Randall Keith Randall 21.1k2 gold badges46 silver badges106 bronze badges · Accepted Answer · 2011-02-05 03:13:41Z

This code assumes that both x0 and x1 are inclusive endpoints, and that words are little endian (i.e. the (0,0) pixel can be set with array[0][0]|=1).

int line(word *array, int x0, int x1, int y) {
 word *line = array + (y << 4);
 word *start = line + (x0 >> 4);
 word *end = line + (x1 >> 4);
 word start_mask = (word)-1 << (x0 & 15);
 word end_mask = (unsigned word)-1 >> (15 - (x1 & 15));
 if (start == end) {
 *start |= start_mask & end_mask;
 } else {
 *start |= start_mask;
 *end |= end_mask;
 for (word *p = start + 1; p < end; p++) *p = (word)-1;
 }
}

1

\$\begingroup\$ How fast is it? \$\endgroup\$

user unknown
– user unknown

2011年08月13日 22:21:41 +00:00
Commented Aug 13, 2011 at 22:21

Stack Exchange Network

Fast line drawing algorithm

4 Answers 4

Python

Scala, (削除) 7s / 1M lines (削除ここまで) 4.1s / 1M lines

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fast line drawing algorithm

4 Answers 4

Python

Scala, (削除) 7s / 1M lines (削除ここまで) 4.1s / 1M lines

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions