The task is to find a way to draw a horizontal line in an array of 16-bit integers.
We're assuming a 256x192 pixel array with 16 pixels per word. A line is a contiguous run of set (1) bits. Lines can start in the middle of any word, overlap any other words, and end in any word; they may also start and end in the same word. They may not wrap on to the next line. Hint: the middle words are easy - just write 0xffff, but the edges will be tricky, as will handling the case for the start and end in the same word. A function/procedure/routine must take an x0 and x1 coordinate indicating the horizontal start and stop points, as well as a y coordinate.
I exclude myself from this because I designed a nearly identical algorithm myself for an embedded processor but I'm curious how others would go about it. Bonus points for using relatively fast operations (for example, a 64 bit multiply or floating point operation wouldn't be fast on an embedded machine but a simple bit shift would be.)
-
\$\begingroup\$ I started a discussion about performance measurements on meta - you're invited. \$\endgroup\$user unknown– user unknown2011年08月13日 22:59:00 +00:00Commented Aug 13, 2011 at 22:59
-
\$\begingroup\$ how was winner chosen? \$\endgroup\$Rommudoh– Rommudoh2011年08月15日 13:14:07 +00:00Commented Aug 15, 2011 at 13:14
4 Answers 4
This code assumes that both x0 and x1 are inclusive endpoints, and that words are little endian (i.e. the (0,0) pixel can be set with array[0][0]|=1
).
int line(word *array, int x0, int x1, int y) {
word *line = array + (y << 4);
word *start = line + (x0 >> 4);
word *end = line + (x1 >> 4);
word start_mask = (word)-1 << (x0 & 15);
word end_mask = (unsigned word)-1 >> (15 - (x1 & 15));
if (start == end) {
*start |= start_mask & end_mask;
} else {
*start |= start_mask;
*end |= end_mask;
for (word *p = start + 1; p < end; p++) *p = (word)-1;
}
}
-
1\$\begingroup\$ How fast is it? \$\endgroup\$user unknown– user unknown2011年08月13日 22:21:41 +00:00Commented Aug 13, 2011 at 22:21
Python
The main trick here is to use a lookup table to store bitmasks of the pixels. This saves a few operations. A 1kB table is not that large even for an embedded platform these days
If space is really tight, for the price of a couple of &0xf the lookup table can be reduced to just 64B
This code is in Python, but would be simple to port to any language that supports bit operations.
If using C, you could consider unwinding the loop using the switch
from Duff's device. Since the line is maximum of 16 words wide, I would extend the switch
to 14 lines and dispense with the while
altogether.
T=[65535, 32767, 16383, 8191, 4095, 2047, 1023, 511,
255, 127, 63, 31, 15, 7, 3, 1]*16
U=[32768, 49152, 57344, 61440, 63488, 64512, 65024, 65280,
65408, 65472, 65504, 65520, 65528, 65532, 65534, 65535]*16
def drawline(x1,x2,y):
y_=y<<4
x1_=y_+(x1>>4)
x2_=y_+(x2>>4)
if x1_==x2_:
buf[x1_]|=T[x1]&U[x2]
return
buf[x1_]|=T[x1]
buf[x2_]|=U[x2]
x1_+=+1
while x1_<x2_:
buf[x1_] = 0xffff
x1_+=1
#### testing code ####
def clear():
global buf
buf=[0]*192*16
def render():
for y in range(192):
print "".join(bin(buf[(y<<4)+x])[2:].zfill(16) for x in range(16))
clear()
for y in range(0,192):
drawline(y/2,y,y)
for x in range(10,200,6):
drawline(x,x+2,0)
drawline(x+3,x+5,1)
for y in range(-49,50):
drawline(200-int((2500-y*y)**.5), 200+int((2500-y*y)**.5), y+60)
render()
Here is a C version of my Python answer using the switch statement instead of the while loop and reduced indexing by incrementing a pointer instead of the array index
The size of the lookup table can be substantially reduced by using T[x1 & 0xf] and U[x2 & 0xf] for a couple of extra instructions
#include <stdio.h>
#include <math.h>
unsigned short T[] = {0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001,
0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x07ff, 0x03ff, 0x01ff,
0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x0007, 0x0003, 0x0001};
unsigned short U[] = {0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff,
0x8000, 0xc000, 0xe000, 0xf000, 0xf800, 0xfc00, 0xfe00, 0xff00,
0xff80, 0xffc0, 0xffe0, 0xfff0, 0xfff8, 0xfffc, 0xfffe, 0xffff};
unsigned short buf[192*16];
void clear(){
int i;
for (i=0; i<192*16; i++) buf[i]==0;
}
void render(){
int x,y;
for (y=0; y<192; y++){
for (x=0; x<256; x++) printf("%d", (buf[(y<<4)+(x>>4)]>>(15-(x&15)))&1);
printf("\n");
}
}
void drawline(int x1, int x2, int y){
int y_ = y<<4;
int x1_ = y_+(x1>>4);
int x2_ = y_+(x2>>4);
unsigned short *p = buf+x1_;
if (x1_==x2_){
*p|=T[x1]&U[x2];
return;
}
*p++|=T[x1];
switch (x2_-x1_){
case 14: *p++ = 0xffff;
case 13: *p++ = 0xffff;
case 12: *p++ = 0xffff;
case 11: *p++ = 0xffff;
case 10: *p++ = 0xffff;
case 9: *p++ = 0xffff;
case 8: *p++ = 0xffff;
case 7: *p++ = 0xffff;
case 6: *p++ = 0xffff;
case 5: *p++ = 0xffff;
case 4: *p++ = 0xffff;
case 3: *p++ = 0xffff;
case 2: *p++ = 0xffff;
case 1: *p++ = U[x2];
}
}
int main(){
int x,y;
clear();
for (y=0; y<192; y++){
drawline(y/2,y,y);
}
for (x=10; x<200; x+=6){
drawline(x,x+2,0);
drawline(x+3,x+5,1);
}
for (y=-49; y<50; y++){
x = sqrt(2500-y*y);
drawline(200-x, 200+x, y+60);
}
render();
return 0;
}
-
\$\begingroup\$ How fast is it? \$\endgroup\$user unknown– user unknown2011年08月13日 21:14:37 +00:00Commented Aug 13, 2011 at 21:14
-
\$\begingroup\$ @user unknown, How long is a piece of string? I think it should be faster than the accepted answer because it uses a lookup table to reduce the amount of work slightly. Why don't you try them out and let us know what you find? \$\endgroup\$gnibbler– gnibbler2011年08月14日 08:53:49 +00:00Commented Aug 14, 2011 at 8:53
Scala, (削除) 7s / 1M lines (削除ここまで) 4.1s / 1M lines
// declaration and initialisation of an empty field:
val field = Array.ofDim[Short] (192, 16)
first implementation:
// util-method: set a single Bit:
def setBit (x: Int, y: Int) =
field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort
def line (x0: Int, x1: Int, y: Int) =
(x0 to x1) foreach (setBit (_ , y))
After eliminating the inner method call, and replacing the for- with a while-loop, on my 2Ghz Single Core with Scala 2.8 it absolves 1 Mio. Lines in 4.1s sec. instead of initial 7s.
def line (x0: Int, x1: Int, y: Int) = {
var x = x0
while (x < x1) {
field (y)(x/16) = (field (y)(x/16) | (1 << (15 - (x % 16)))).toShort
x += 1
}
}
Testcode and invocation:
// sample invocation:
line (12, 39, 3)
// verification
def shortprint (s: Short) = s.toBinaryString.length match {
case 16 => s.toBinaryString
case 32 => s.toBinaryString.substring (16)
case x => ("0000000000000000".substring (x) + s.toBinaryString)}
field (3).take (5).foreach (s=> println (shortprint (s)))
// result:
0000000000001111
1111111111111111
1111111100000000
0000000000000000
0000000000000000
Performance testing:
val r = util.Random
def testrow () {
val a = r.nextInt (256)
val b = r.nextInt (256)
if (a < b)
line (a, b, r.nextInt (192)) else
line (b, a, r.nextInt (192))
}
def test (count: Int): Unit = {
for (n <- (0 to count))
testrow ()
}
// 1 mio tests
test (1000*1000)
Tested with the unix tool time, comparing the user-time, including startup-time, compiled code, no JVM-start-up phase.
Increasing the number of lines shows, that for every new million, it needs an extra 3.3s.