Given the following Rust program:
fn main() {
let mut reader = io::stdin();
for line in reader.lock().lines() {
match line {
Ok(l) => print!("{}", l),
Err(_) => continue,
}
}
}
yes | program
achieves 1.04MiB/s throughput, according to pv
. The trivial C program below, which I fully realize does less, gives me a throughput of 141MiB/s.
int main() {
int c;
while((c=getchar()) != EOF)
putchar(c);
}
How do I go about finding out what's keeping the Rust version from being faster? While I would appreciate if you just told me what to change to make it faster, I'm far more interested in how to find out what needs to be changed.
Edit: I tried changing lines()
to chars()
in the Rust version, to approximate the C version better, but it didn't seem to make any difference.
2 Answers 2
I'm going to treat this as a code review question where the goal is to figure out how to write high-performance I/O code in Rust.
There are two critical steps to speed up Rust I/O:
- Make sure the optimizer is turned on. Rust loops have terrible performance in debug mode. Turning the optimizer on should probably get you near 25 MB/sec or so, at least in my experience.
- Avoid line-based I/O, or anything else which uses
String
values. For maximum performance, we want to work with raw byte buffers and avoid ever letting Rust callmalloc
orfree
.
Here's some code which drills down to fill_buf
, which is about as low as we can go before losing generality:
use std::io;
use std::io::{IoError, IoErrorKind};
fn main() {
let mut reader = io::stdin();
let mut buffer = reader.lock();
let mut writer = io::stdout();
loop {
let consumed = match buffer.fill_buf() {
Ok(bytes) => { writer.write(bytes).unwrap(); bytes.len() },
Err(IoError{kind: IoErrorKind::EndOfFile, ..}) => break,
Err(ref err) => panic!("Failed with: {}", err)
};
buffer.consume(consumed);
}
}
Of course, this means that you need to find the line breaks yourself, and be prepared for lines to be split over multiple buffers. Also, notice how I pass bytes.len()
outside of the match
block before trying to call buffer.consume()
. This is necessary to placate the lifetime checker.
Trying this with pv
:
cat /dev/zero | target/release/throughput | pv > /dev/null
...gives us a throughput of 527 MB/s.
-
\$\begingroup\$ What is the issue the lifetime checker has with calling consume() inside the match branch? \$\endgroup\$mkaito– mkaito2014年12月15日 22:39:36 +00:00Commented Dec 15, 2014 at 22:39
-
1\$\begingroup\$ Try it. :-) Basically, the
bytes
value returned byfill_buf
points to an internal buffer insidebuffer
, and that prevents anybody from calling&mut self
methods onbuffer
untilbytes
goes out of scope. So we need to do all our work inside thematch
block, then escape so thatbytes
goes away and we can safely accessbuffer
again. There's some tricky Rust magic in these low-level APIs. \$\endgroup\$emk– emk2014年12月15日 23:19:34 +00:00Commented Dec 15, 2014 at 23:19 -
\$\begingroup\$ Wow, and I thought C pointers were hard to grok :D this lifetimes business is going to take some work to comprehend. \$\endgroup\$mkaito– mkaito2014年12月16日 00:23:56 +00:00Commented Dec 16, 2014 at 0:23
-
\$\begingroup\$ Well, to be fair, if you were using this API in C, you'd need to be very careful to avoid clobbering the buffer before you were done using it. If I'm going to have to say, "Well, this pointer is still valid here, because I haven't yet called that," I don't mind the compiler checking these things automatically. But yeah,
fill_buf
is a weird, low-level API with tricky lifetimes. \$\endgroup\$emk– emk2014年12月16日 11:37:30 +00:00Commented Dec 16, 2014 at 11:37 -
\$\begingroup\$ And as far as I've learned so far, the only way to get good throughput in this kind of application. Although the blind pass-through makes for a complicated filter application. For example, while your code is really fast and I think I understand what's going on, I have a hard time working out how to get my filter on it. Namely, I want to drop anything that's not valid unicode. \$\endgroup\$mkaito– mkaito2014年12月16日 18:32:04 +00:00Commented Dec 16, 2014 at 18:32
How do I go about finding out what's keeping the Rust version from being faster?
In the general case you would use a profiler like Linux's perf
tool to give you a rundown of which functions are eating up all your time. But perf
is broken on my computer at the moment and I don't feel like fixing it, so let's try some more Rust-specific advice. :)
The first thing to do is note that print!
is a macro (note the exclamation point!), which means that it's expanding at compile-time to do... something. We can figure out what that something is by passing --pretty expanded
to the compiler, yielding this:
#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate "std" as std;
#[prelude_import]
use std::prelude::*;
fn main() {
let mut reader = io::stdin();
for line in reader.lock().lines() {
match line {
Ok(l) =>
match (&l,) {
(__arg0,) => {
#[inline]
#[allow(dead_code)]
static __STATIC_FMTSTR: &'static [&'static str] = &[""];
::std::io::stdio::print_args(&::std::fmt::Arguments::new(__STATIC_FMTSTR,
&[::std::fmt::argument(::std::fmt::Show::fmt,
__arg0)]))
}
},
Err(_) => continue ,
}
}
}
Yikes... that's a whole lot of stuff just for printing a line via a type-safe format string. In truth we really don't care about type-safe format strings in this application, given that we want an apples-to-apples comparison to C, so we can get rid of the macro entirely... but what to replace it with? I could give you the step-by-step, but instead I'll point you to the stdio docs and let you poke around at your leisure: http://doc.rust-lang.org/std/io/stdio/
Here's the result of me perusing the docs there:
use std::io;
fn main() {
let mut reader = io::stdin();
let mut writer = io::stdout();
while let Ok(c) = reader.read_u8() {
writer.write_u8(c);
}
}
This looks much more directly comparable to your C program.
(If you're wondering what while let foo
does, it is simply treated as an infinite loop that breaks whenever you get an enum that isn't foo
. In this case, reader.read_u8()
returns an IoResult
, and will give you the Ok
variant when it gets a char or the Err
variant when it hits EOF. See read_u8
's documentation here: http://doc.rust-lang.org/std/io/trait.Reader.html#method.read_u8)
I'm sure this program could be optimized further, perhaps by investigating the stdin_raw()
and stdout_raw()
functions mentioned in the stdio docs. But this should give you a good starting point, and give you the tools to dig deeper yourself.
EDIT: Changed read_char
to read_u8
for better analogy with C.
-
\$\begingroup\$ Brilliant reply! I actually learned a lot. Sadly, the code you arrive at has exactly the same throughput as my code, but I can't seem to figure out why.
perf
says the top symbol is something from the kernel called_raw_spin_lock
, which doesn't mean anything to me. \$\endgroup\$mkaito– mkaito2014年12月15日 22:43:10 +00:00Commented Dec 15, 2014 at 22:43 -
\$\begingroup\$ It's a shame stdin_raw is private. \$\endgroup\$hayd– hayd2018年10月18日 06:33:25 +00:00Commented Oct 18, 2018 at 6:33
cargo build --release
orrustc -O file.rs
Additionally, if you want to find more information about the assembly produced, you can pass--emit asm
to rustc \$\endgroup\$