Fizz Buzz is a common challenge given during interviews. The challenge goes something like this:
Write a program that prints the numbers from 1 to n. If a number is
divisible by 3, write Fizz instead. If a number is divisible by 5,
write Buzz instead. However, if the number is divisible by both 3 and
5, write FizzBuzz instead.
The goal of this question is to write a FizzBuzz implementation that goes from 1 to infinity (or some arbitrary very very large number), and that implementation should do it as fast as possible.
Checking throughput
Write your fizz buzz program. Run it. Pipe the output through <your_program> | pv > /dev/null. The higher the throughput, the better you did.
Example
A naive implementation written in C gets you about 170MiB/s on an average machine:
#include <stdio.h>
int main() {
for (int i = 1; i < 1000000000; i++) {
if ((i % 3 == 0) && (i % 5 == 0)) {
printf("FizzBuzz\n");
} else if (i % 3 == 0) {
printf("Fizz\n");
} else if (i % 5 == 0) {
printf("Buzz\n");
} else {
printf("%d\n", i);
}
}
}
There is a lot of room for improvement here. In fact, I've seen an implementation that can get more than 3GiB/s on the same machine.
I want to see what clever ideas the community can come up with to push the throughput to its limits.
Rules
- All languages are allowed.
- The program output must be exactly valid fizzbuzz. No playing tricks such as writing null bytes in between the valid output - null bytes that don't show up in the console but do count towards
pv throughput.
Here's an example of valid output:
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
# ... and so on
Valid output must be simple ASCII, single-byte per character, new lines are a single \n and not \r\n. The numbers and strings must be correct as per the FizzBuzz requirements. The output must go on forever (or a very high astronomical number, at-least 2^58) and not halt / change prematurely.
Parallel implementations are allowed (and encouraged).
Architecture specific optimizations / assembly is also allowed. This is not a real contest - I just want to see how people push fizz buzz to its limit - even if it only works in special circumstances/platforms.
Scores
Scores are from running on my desktop with an AMD 5950x CPU (16C / 32T). I have 64GB of 3200Mhz RAM. CPU mitigations are disabled.
**By far the best score so far is in C++ by @David Frank - generating FizzBuzz at around 1.7 Terrabit/s
At the second place - @ais523 - generating FizzBuzz at 61GiB/s with x86 assembly.
Results for java:
| Author |
Throughput |
| ioan2 |
2.6 GiB/s |
| randy |
0.6 GiB/s |
| randy_ioan |
3.3 GiB/s |
| ioan |
4.6 GiB/s |
| olivier |
5.8 GiB/s |
Results for python:
| Author |
Throughput |
| bconstanzo |
0.1 GiB/s |
| michal |
0.1 GiB/s |
| ksousa_chunking |
0.1 GiB/s |
| ksousa_initial |
0.0 GiB/s |
| arrmansa |
0.2 GiB/s |
| antoine |
0.5 GiB/s |
Results for pypy:
| Author |
Throughput |
| bconstanzo_pypy |
0.3 GiB/s |
Results for rust:
| Author |
Throughput |
| aiden |
3.4 GiB/s |
| xavier |
0.9 GiB/s |
Results for ruby:
| Author |
Throughput |
| lonelyelk |
0.1 GiB/s |
| dgutov |
1.7 GiB/s |
Results for asm:
| Author |
Throughput |
| ais523 |
60.8 GiB/s |
| paolo |
39.0 GiB/s |
Results for julia:
| Author |
Throughput |
| marc |
0.7 GiB/s |
| tkluck |
15.5 GiB/s |
Results for c:
| Author |
Throughput |
| random832 |
0.5 GiB/s |
| neil |
8.4 GiB/s |
| kamila |
8.4 GiB/s |
| xiver77 |
20.9 GiB/s |
| isaacg |
5.7 GiB/s |
Results for cpp:
| Author |
Throughput |
| jdt |
4.8 GiB/s |
| tobycpp |
5.4 GiB/s |
| david |
208.3 GiB/s |
Results for numba:
| Author |
Throughput |
| arrmansa |
0.1 GiB/s |
Results for numpy:
| Author |
Throughput |
| arrmansa |
0.3 GiB/s |
| arrmansa-multiprocessing |
0.7 GiB/s |
| arrmansa-multiprocessing-2 |
0.7 GiB/s |
Results for go:
| Author |
Throughput |
| Bysmyyr |
3.7 GiB/s |
| psaikko |
6.8 GiB/s |
Results for php:
| Author |
Throughput |
| no_gravity |
0.5 GiB/s |
Results for elixir:
| Author |
Throughput |
| technusm1 |
0.3 GiB/s |
Results for csharp:
| Author |
Throughput |
| neon-sunset |
1.2 GiB/s |
asm submissions
java submissions
pypy submissions
python submissions
ruby submissions
rust submissions
c submissions
cpp submissions
julia submissions
numba submissions
numpy submissions
go submissions
php submissions
csharp submissions
Plots generated using https://github.com/omertuc/fizzgolf
memcpy(p,"Fizz\n",6); p+=5be faster thansprintf? \$\endgroup\$memcpy(p,"Fizz\n",5);to avoid pointlessly writing the null character each time?memcpy/sprintftuning makes no measurable difference, as that's only the setup, outside the main loop.sprint()makes for more maintainable code. (I'm new to programming for all-out speed; in my day job that comes in third, after robustness and maintainability) \$\endgroup\$atomic_int shuffleinstead of the reduction. (time passes...) Yes, and I've updated to code that actually works faster in parallel, at last! \$\endgroup\$