Long multiply, 8 bits at a time

Question 1

You are given a 16-bit machine and told to implement multiplication of arbitrary-sized integers. Your registers can only hold 16-bit numbers, and the biggest multiply instruction takes two 8-bit inputs and generates a 16-bit result.

Your program must take as input two arbitrary-sized positive numbers and output their product. Each input number is encoded on its own line as a little-endian byte array where each byte is a 2-digit hex number. The output must be similarly formatted. Perhaps best explained with an example:

input

1f 4a 07
63 a3

output

fd 66 03 a7 04

which encodes the multiplication 477727*41827=わ19981887229.

You can assume that the last (most significant) byte of each input number is nonzero, and the last chunk of the number you output must be nonzero. Both input numbers will be at most 100 bytes long.

Smallest code wins.

Remember, the biggest multiply you are allowed to use is 1 byte * 1 byte, and no integer types bigger than 2 bytes!

Question 2

This is crucial for languages, that have no default 8-bit type, like Haskell.

Question 3

What about addition? Can we pretend to have a ready-made arbitrary-size addition function? If not, what can we add?

Question 4

@Timwi: You can do anything you'd like 16 bits at a time. Adds, shifts, whatever. Any larger operation you need to synthesize yourself.

Question 5

+1 for correct byte order

Question 6

Perl, 137 characters

($x,$y)=<>;while($x=~s/.. *//s){$e=hex$&;$i=0;$s=$r[$i]+=$e*hex,$r[$i]&=255,$r[++$i]+=$s>>8 for$y=~/.. */gs;$y="00$y"}printf'%02x 'x@r,@r

Caveats

Sometimes prints an extra 00 byte at the end of the result. Of course the result is still correct even with that extra byte.
Prints an extra space after the last hex byte in the result.

Explanation

The explanation is going to be a bit long, but I think most people here will find it interesting.

First of all, when I was 10 years old, I was taught the following little trick. You can multiply any two positive numbers with this. I will describe this using the example of 13 ×ばつ 47. You start by writing the first number, 13, and dividing it by 2 (round down each time) until you reach 1:

Now, next to the 13 you write the other number, 47, and keep multiplying it by 2 the same number of times:

Now you cross out all the lines where the number on the left is even. In this case, this is only the 6. (I can’t do strike-through in code, so I’ll just remove it.) Finally, you add all the remaining numbers on the right:

And this is the right answer. 13 ×ばつ 47 = 611.

Now, since you are all computer geeks, you will have realised that what we’re actually doing in the left and right columns is x >> 1 and y << 1, respectively. Furthermore, we add the y only if x & 1 == 1. This translates directly into an algorithm, which I’ll write here in pseudocode:

input x, y
result = 0
while x > 0:
 if x & 1 == 1:
 result = result + y
 x = x >> 1
 y = y << 1
print result

We can re-write the if to use a multiplication, and then we can easily change this so that it works on a byte-by-byte basis instead of bit-by-bit:

input x, y
result = 0
while x > 0:
 result = result + (y * (x & 255))
 x = x >> 8
 y = y << 8
print result

This still contains a multiplication with y, which is arbitrary-size, so we need to change that into a loop too. We’ll do that in Perl.

Now translate everything to Perl:

$x and $y are the inputs in hex format, so they have the least significant byte first.
Thus, instead of x >> 8 I do $x =~ s/.. *//s. I need the space+star because the last byte might not have a space on it (could use space+? too). This automatically puts the removed byte (x & 255) into $&.
y << 8 is simply $y = "00$y".
The result is actually a numerical array, @r. At the end, each element of @r contains one byte of the answer, but halfway through the calculation it may contain more than one byte. I’ll prove to you below that each value is never more than two bytes (16 bits) and that the result is always one byte at the end.

So here is the Perl code unravelled and commented:

# Input x and y
($x, $y) = <>;
# Do the equivalent of $& = x & 255, x = x >> 8
while ($x =~ s/.. *//s)
{
 # Let e = x & 255
 $e = hex $&;
 # For every byte in y... (notice this sets $_ to each byte)
 $i = 0;
 for ($y =~ /.. */gs)
 {
 # Do the multiplication of two single-byte values.
 $s = $r[$i] += $e*hex,
 # Truncate the value in $r[$i] to one byte. The rest of it is still in $s
 $r[$i] &= 255,
 # Move to the next array item and add the carry there.
 $r[++$i] += $s >> 8
 }
 # Do the equivalent of y = y << 8
 $y = "00$y"
}
# Output the result in hex format.
printf '%02x ' x @r, @r

Now for the proof that this always outputs bytes, and that the calculation never generates values greater than two bytes. I’ll prove this by induction over the while loop:

The empty @r at the beginning clearly has no values greater than 0xFF in it (because it has no values in it at all). This concludes the base case.
Now, given that @r contains only single bytes at the beginning of each while iteration:
- The for loop explicitly &=s all values in the result array with 255 except the last one, so we only need to look at that last one.
- We know that we always remove only one byte from $x and $y:
  - Therefore, $e*hex is a multiplication of two single-byte values, which means it’s in the range 0 — 0xFE01.
  - By the inductive hypothesis, $r[$i] is one byte; therefore, $s = $r[$i] += $e*hex is in the range 0 — 0xFF00.
  - Therefore, $s >> 8 is always one byte.
- $y grows an extra 00 in each iteration of the while loop:
  - Therefore, in every iteration of the while loop, the inner for loop runs for one more iteration than it did in the previous while iteration.
  - Therefore, the $r[++$i] += $s >> 8 in the last iteration of the for loop always adds $s >> 8 to 0, and we already established that $s >> 8 is always one byte.
- Therefore, the last value stored in @r at the end of the for loop is also a single byte.

This concludes a wonderful and exciting challenge. Thanks a lot for posting it!

Question 7

OCaml + Batteries, 362 characters

A standard O(n*m) schoolboy multiplication algorithm. Note that in order to meet the challenge requirements, the operations are done on the bytes of strings, which in OCaml are (conveniently, in this case) mutable. Note also that the accumulator s never overflows 16 bits, since 2(2^8 - 1) + (2^8 - 1)^2 = (2^8 - 1)(2^8 + 1) = 2^16 - 1.

let(@)=List.map
let m a b=Char.(String.(let e s=of_list(((^)"0x"|-to_int|-chr)@nsplit s" ")in
let a,b=e a,e b in let m,n=length a,length b in let c=make(m+n)'000円'in
iteri(fun i d->let s,x=ref 0,code d in iteri(fun j e->let y=code e in
s:=!s+code c.[i+j]+x*y;c.[i+j]<-chr(!s mod
256);s:=!s/256)b;c.[i+n]<-chr!s)a;join" "((code|-Printf.sprintf"%02x")@to_list c)))

For example,

# m "1f 4a 07" "63 a3" ;;
- : string = "fd 66 03 a7 04"
# m "ff ff ff ff" "ff ff ff ff" ;;
- : string = "01 00 00 00 fe ff ff ff"

Question 8

C Solution

This solution does no input validation. It's also only lightly tested. Speed was not really a consideration. Malloc's memory, and isn't particularly clever about how much it grabs. Guaranteed to be enough, and more than necessary.

m() accepts a string, expects two newlines in the string, one after each number. Expects only numbers, lowercase characters, spaces, and newlines. Expects hex digits to always be a pair.

No multiply operation is ever used (knowingly). Shifting is performed on 8-bit variables. One 16-bit addition is performed. No 32-bit data types.

Shrunk by hand, and only mildly. edit: more obfuscation, fewer chars :D Compiles with warnings with gcc.

Characters: 675

typedef unsigned char u8;
#define x calloc
#define f for
#define l p++
#define E *p>57?*p-87:*p-48
#define g(a) --i;--a;continue
void m(u8*d){short n=0,m=0,a,b,i,k,s;u8*t,*q,*r,*p=d,o;f(;*p!=10;n++,l){}l;f(;*p
!=10;m++,l){}t=x(n,1);q=x(m,1);r=x(n,1);p=d;a=n;i=0;f(;*p!=10;i++,l){if(*p==32){
g(a);}t[i]=E;t[i]<<=4;l;t[i]|=E;}a/=2;b=m;i=0;l;f(;*p!=10;i++,l){if(*p==32){g(b)
;}q[i]=E;q[i]<<=4;l;q[i]|=E;}b/=2;f(k=0;k<8*b;k++){if(q[0]&1){o=0;f(i=0;i<n;i++)
{s=o+t[i]+r[i];o=s>>8;r[i]=s&255;}}f(i=n;i;i--){o=t[i-1]>>7&1;t[i-1]*=2;if(i!=n)
t[i]|=o;}f(i=0;i<m;i++){o=q[i]&1;q[i]/=2;if(i)q[i-1]|=(o<<7);}}k=(r[a+b-1]==0)?a
+b-1:b+a;f(i=0;i<k;i++){printf("%02x ",r[i]);}putchar(10);}

You can test with this:

int main(void){
 m("1f 4a 07\n63 a3\n");
 m("ff ff ff ff\nff ff ff ff\n");
 m("10 20 30 40\n50 60 70\n");
 m("01 02 03 04 05 06\n01 01 01\n");
 m("00 00 00 00 00 00 00 00 00 00 00 00 01\n00 00 00 00 00 00 00 00 02\n");
 return 0;
}

Result:

$ ./long 
fd 66 03 a7 04 
01 00 00 00 fe ff ff ff 
00 05 10 22 34 2d 1c 
01 03 06 09 0c 0f 0b 06 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02

Question 9

JavaScript (Node.js), 160 bytes

x=>y=>x.map((t,i)=>y.map(u=>(f=i=>(c=s[i]>>8)&&f(i++,s[i]=c+~~s[i],s[i-1]%=256))(i,s[i]=~~s[i++]+`0x${t}`*`0x${u}`)),s=[])&&s.map(t=>(t<16?0:'')+t.toString(16))

Try it online!

Lot newer language than that time though

Question 10

8086 DOS .COM file, 134 bytes

It works now!

What better language to do this in than 16-bit assembly?

00000000: b9 c8 00 89 cd 31 c0 f3 50 89 e7 e8 5d 00 92 aa .....1..P...]...
00000010: 74 f9 29 e7 87 fb 89 e6 8d 3a e8 4e 00 9c 56 57 t.)......:.N..VW
00000020: 89 d9 ac f6 e2 03 05 ab 83 15 00 4f e2 f4 5f 5e ...........O.._^
00000030: 47 9d 74 e6 80 39 00 74 01 4b 01 df 01 e5 29 ef G.t..9.t.K....).
00000040: 89 ee ac 88 c3 b1 04 d2 e8 e8 10 00 93 24 0f e8 .............$..
00000050: 0a 00 b0 20 e8 0d 00 4f 75 e8 cd 20 3c 09 76 02 ... ...Ou.. <.v.
00000060: 04 27 04 30 b4 02 88 c2 cd 21 c3 b2 00 b4 01 cd .'.0.....!......
00000070: 21 2c 30 72 0e 3c 0a 72 02 2c 27 b1 04 d2 e2 00 !,0r.<.r.,'.....
00000080: c2 eb ea 3c f0 c3 ...<..

My first, and likely last 16-bit 8086 program. 😫

Assembly (NASM with C comments because of syntax highlighting):

 // sed -i -e 's#//#;#g' dosmult.asm
 // nasm -f bin dosmult.asm -o dosmult.com
 // The default, but never harmed anyone.
 [bits 16]
 // 8086 code only! If I use 386 instructions or any of that,
 // I would ruin the challenge.
 [cpu 8086]
 // COM files start at 100h.
 org 100h
 section .text
start:
 // Create a 400 byte buffer on the stack.
 // 200 will be reserved for the input, and 200 for the output.
 mov cx, 200
 // Save 200 in BP for later, it will be useful.
 mov bp, cx
 // Push 0 200 times making a 400 byte zeroed buffer on the stack
 xor ax, ax
 rep push ax
 mov di, sp
 // Read the first line of hex digits.
.getline_loop:
 call gethex
 // Swap DL to AL and store with STOSB
 xchg dx, ax
 stosb
 // Loop if it was a space
 jz .getline_loop
 // Get length
 sub di, sp
 // Store length in BX
 xchg di, bx
 // Save the pointer to input in SI
 mov si, sp
 // Load the pointer to output in DI (SI + 200)
 lea di, [si + bp]
 // Begin multiplication, grade school byte by byte
 // Instead of creating a third buffer, we process the data
 // as the user inputs it.
.loop:
 call gethex
 // Save flags from gethex so we know when to stop
 pushfw
 // Save SI and DI
 push si
 push di
 // Loop counter
 mov cx, bx
.mult_loop:
 // Load next byte from first input
 lodsb
 // Multiply by the input byte
 mul dl
 // 16-bit add to [DI]
 // We actually add to AX, then do STOSW.
 add ax, [di]
 stosw
 // Add the carry bit.
 adc word [di], 0
 // Decrement DI after STOSW incremented it by 2 to iterate
 // on each byte.
 dec di
 // while (--cx)
 loop .mult_loop
.mult_loop_end:
 // Restore DI and SI
 pop di
 pop si
 // Increment output pointer
 inc di
 // Restore flags from gethex
 popfw
 // Jump if we got a space.
 jz .loop
.loop_end:
 // Was there a trailing zero?
 cmp byte[di + bx], 0
 jz .nodec
.dec:
 // If so, decrement the count.
 dec bx
.nodec:
 // Calculate the length with ugly pointer arithmetic.
 // I feel there is a better way to do this.
 add di, bx // ptr += len
 add sp, bp // sp += 200 (sp = output)
 sub di, sp // ptr -= output
 // Store output array in SI.
 mov si, sp
 // Print each byte in hex
.print_loop:
 // Load byte
 lodsb
 // puthex(AL >> 4)
 mov bl, al
 mov cl, 4
 shr al, cl
 call puthex
 // puthex(AL & 0Fh)
 xchg ax, bx
 and al, 0Fh
 call puthex
 // putc(' ')
 mov al, ' '
 call putc
 // Loop for every byte from DI.
 dec di
 jnz .print_loop
.print_loop_end:
 // exit, not bothering to clean anything up. :P
 int 20h
 // Prints hex digit in AL.
 // Must be <= 0fh
puthex:
 cmp al, 9
 jbe .no_hexa
.hexa:
 add al, 'a' - '9' - 1
.no_hexa:
 add al, '0'
 // Fallthrough
 // Prints AL to stdout.
putc:
 mov ah, 2h
 mov dl, al
 int 21h
 ret
 // Reads a hex byte.
 // Returns in DL.
 // Sets whether it ended in a space in the Z flag.
gethex:
 mov dl, 0
.start:
 // Read char into AL
 mov ah, 1h
 int 21h
 // Subtract ASCII '0'
 sub al, '0'
 // If we hit a space or newline, they will trigger the 'below'
 // condition
 jb .ret
 // Was AL '0'-'9'? If not, correct it to 'a'-'f'.
 cmp al, 10
 jb .no_hexa
.hexa:
 // Correct AL
 sub al, 'a' - '9' - 1
.no_hexa:
 // DL = (DL << 4) + AL
 // 8086 doesn't have shift by immediate.
 mov cl, 4
 shl dl, cl
 // Add to BL
 add dl, al
 // Loop
 jmp .start
.ret:
 // Check if AL before the subtraction was a space.
 cmp al, ' ' - '0'
 ret

Naturally, being a 16-bit instruction set, I am only doing 8-bit and 16-bit arithmetic (and I don't use the 16-bit mul).

Beats the Perl answer by 3 bytes. Granted, this is binary machine code and that is source code, so I guess this isn't that much to get excited about. However, 8086 lacks a lot of the features Perl has (about 1/3 of it is just I/O and hex conversion 😫).

I am sure there are multiple optimizations possible here. A lot of my knowledge is from writing 32-bit and 64-bit code, so I am not fully knowledgeable in the 16-bit tricks.

Input is from stdin, in space separated lowercase hex, each ending in a new line. It is not line buffered, so no typos allowed! :P

Output is printed to stdout.

Timwi 13k3 gold badges46 silver badges66 bronze badges · Accepted Answer · 2011-04-12 14:22:38Z

Perl, 137 characters

($x,$y)=<>;while($x=~s/.. *//s){$e=hex$&;$i=0;$s=$r[$i]+=$e*hex,$r[$i]&=255,$r[++$i]+=$s>>8 for$y=~/.. */gs;$y="00$y"}printf'%02x 'x@r,@r

Caveats

Sometimes prints an extra 00 byte at the end of the result. Of course the result is still correct even with that extra byte.
Prints an extra space after the last hex byte in the result.

Explanation

The explanation is going to be a bit long, but I think most people here will find it interesting.

First of all, when I was 10 years old, I was taught the following little trick. You can multiply any two positive numbers with this. I will describe this using the example of 13 ×ばつ 47. You start by writing the first number, 13, and dividing it by 2 (round down each time) until you reach 1:

Now, next to the 13 you write the other number, 47, and keep multiplying it by 2 the same number of times:

Now you cross out all the lines where the number on the left is even. In this case, this is only the 6. (I can’t do strike-through in code, so I’ll just remove it.) Finally, you add all the remaining numbers on the right:

And this is the right answer. 13 ×ばつ 47 = 611.

Now, since you are all computer geeks, you will have realised that what we’re actually doing in the left and right columns is x >> 1 and y << 1, respectively. Furthermore, we add the y only if x & 1 == 1. This translates directly into an algorithm, which I’ll write here in pseudocode:

input x, y
result = 0
while x > 0:
 if x & 1 == 1:
 result = result + y
 x = x >> 1
 y = y << 1
print result

We can re-write the if to use a multiplication, and then we can easily change this so that it works on a byte-by-byte basis instead of bit-by-bit:

input x, y
result = 0
while x > 0:
 result = result + (y * (x & 255))
 x = x >> 8
 y = y << 8
print result

This still contains a multiplication with y, which is arbitrary-size, so we need to change that into a loop too. We’ll do that in Perl.

Now translate everything to Perl:

$x and $y are the inputs in hex format, so they have the least significant byte first.
Thus, instead of x >> 8 I do $x =~ s/.. *//s. I need the space+star because the last byte might not have a space on it (could use space+? too). This automatically puts the removed byte (x & 255) into $&.
y << 8 is simply $y = "00$y".
The result is actually a numerical array, @r. At the end, each element of @r contains one byte of the answer, but halfway through the calculation it may contain more than one byte. I’ll prove to you below that each value is never more than two bytes (16 bits) and that the result is always one byte at the end.

So here is the Perl code unravelled and commented:

# Input x and y
($x, $y) = <>;
# Do the equivalent of $& = x & 255, x = x >> 8
while ($x =~ s/.. *//s)
{
 # Let e = x & 255
 $e = hex $&;
 # For every byte in y... (notice this sets $_ to each byte)
 $i = 0;
 for ($y =~ /.. */gs)
 {
 # Do the multiplication of two single-byte values.
 $s = $r[$i] += $e*hex,
 # Truncate the value in $r[$i] to one byte. The rest of it is still in $s
 $r[$i] &= 255,
 # Move to the next array item and add the carry there.
 $r[++$i] += $s >> 8
 }
 # Do the equivalent of y = y << 8
 $y = "00$y"
}
# Output the result in hex format.
printf '%02x ' x @r, @r

Now for the proof that this always outputs bytes, and that the calculation never generates values greater than two bytes. I’ll prove this by induction over the while loop:

The empty @r at the beginning clearly has no values greater than 0xFF in it (because it has no values in it at all). This concludes the base case.
Now, given that @r contains only single bytes at the beginning of each while iteration:
- The for loop explicitly &=s all values in the result array with 255 except the last one, so we only need to look at that last one.
- We know that we always remove only one byte from $x and $y:
  - Therefore, $e*hex is a multiplication of two single-byte values, which means it’s in the range 0 — 0xFE01.
  - By the inductive hypothesis, $r[$i] is one byte; therefore, $s = $r[$i] += $e*hex is in the range 0 — 0xFF00.
  - Therefore, $s >> 8 is always one byte.
- $y grows an extra 00 in each iteration of the while loop:
  - Therefore, in every iteration of the while loop, the inner for loop runs for one more iteration than it did in the previous while iteration.
  - Therefore, the $r[++$i] += $s >> 8 in the last iteration of the for loop always adds $s >> 8 to 0, and we already established that $s >> 8 is always one byte.
- Therefore, the last value stored in @r at the end of the for loop is also a single byte.

This concludes a wonderful and exciting challenge. Thanks a lot for posting it!

Stack Exchange Network

Long multiply, 8 bits at a time

5 Answers 5

Perl, 137 characters

Caveats

Explanation

OCaml + Batteries, 362 characters

JavaScript (Node.js), 160 bytes

8086 DOS .COM file, 134 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Long multiply, 8 bits at a time

5 Answers 5

Perl, 137 characters

Caveats

Explanation

OCaml + Batteries, 362 characters

JavaScript (Node.js), 160 bytes

8086 DOS .COM file, 134 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions