12
\$\begingroup\$

Given the following Rust program:

fn main() {
 let mut reader = io::stdin();
 for line in reader.lock().lines() {
 match line {
 Ok(l) => print!("{}", l),
 Err(_) => continue,
 }
 }
}

yes | program achieves 1.04MiB/s throughput, according to pv. The trivial C program below, which I fully realize does less, gives me a throughput of 141MiB/s.

int main() {
 int c;
 while((c=getchar()) != EOF)
 putchar(c);
}

How do I go about finding out what's keeping the Rust version from being faster? While I would appreciate if you just told me what to change to make it faster, I'm far more interested in how to find out what needs to be changed.

Edit: I tried changing lines() to chars() in the Rust version, to approximate the C version better, but it didn't seem to make any difference.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Dec 15, 2014 at 19:11
\$\endgroup\$
9
  • \$\begingroup\$ Are you compiling with optimisations? cargo build --release or rustc -O file.rs Additionally, if you want to find more information about the assembly produced, you can pass --emit asm to rustc \$\endgroup\$ Commented Dec 15, 2014 at 19:53
  • 1
    \$\begingroup\$ I was not, as a matter of fact. I did try just now, and the difference in throughput is decent, even if nowhere near what I'd expect (1.04MiB/s to 2.97MiB/s). I'm not familiar with assembly, so I'm not sure whether inspecting it would be useful. \$\endgroup\$ Commented Dec 15, 2014 at 20:15
  • \$\begingroup\$ These programs seem fairly different, lines and char yield decoded unicode data, and print! goes through type-conversion and encoding. The C program blindly copies bytes from the input to the output. \$\endgroup\$ Commented Dec 15, 2014 at 20:16
  • 1
    \$\begingroup\$ This question appears to be off-topic, because it seems to be about how to make something work, rather than code review \$\endgroup\$ Commented Dec 15, 2014 at 21:32
  • 4
    \$\begingroup\$ The code works just fine, but it's surprisingly slow. I state specifically in my question that I want to learn how to approach problems like these rather than someone to just paste me a faster version. \$\endgroup\$ Commented Dec 15, 2014 at 21:44

2 Answers 2

15
\$\begingroup\$

I'm going to treat this as a code review question where the goal is to figure out how to write high-performance I/O code in Rust.

There are two critical steps to speed up Rust I/O:

  1. Make sure the optimizer is turned on. Rust loops have terrible performance in debug mode. Turning the optimizer on should probably get you near 25 MB/sec or so, at least in my experience.
  2. Avoid line-based I/O, or anything else which uses String values. For maximum performance, we want to work with raw byte buffers and avoid ever letting Rust call malloc or free.

Here's some code which drills down to fill_buf, which is about as low as we can go before losing generality:

use std::io;
use std::io::{IoError, IoErrorKind};
fn main() {
 let mut reader = io::stdin();
 let mut buffer = reader.lock();
 let mut writer = io::stdout();
 loop {
 let consumed = match buffer.fill_buf() {
 Ok(bytes) => { writer.write(bytes).unwrap(); bytes.len() },
 Err(IoError{kind: IoErrorKind::EndOfFile, ..}) => break,
 Err(ref err) => panic!("Failed with: {}", err)
 };
 buffer.consume(consumed);
 }
}

Of course, this means that you need to find the line breaks yourself, and be prepared for lines to be split over multiple buffers. Also, notice how I pass bytes.len() outside of the match block before trying to call buffer.consume(). This is necessary to placate the lifetime checker.

Trying this with pv:

cat /dev/zero | target/release/throughput | pv > /dev/null

...gives us a throughput of 527 MB/s.

answered Dec 15, 2014 at 21:19
\$\endgroup\$
6
  • \$\begingroup\$ What is the issue the lifetime checker has with calling consume() inside the match branch? \$\endgroup\$ Commented Dec 15, 2014 at 22:39
  • 1
    \$\begingroup\$ Try it. :-) Basically, the bytes value returned by fill_buf points to an internal buffer inside buffer, and that prevents anybody from calling &mut self methods on buffer until bytes goes out of scope. So we need to do all our work inside the match block, then escape so that bytes goes away and we can safely access buffer again. There's some tricky Rust magic in these low-level APIs. \$\endgroup\$ Commented Dec 15, 2014 at 23:19
  • \$\begingroup\$ Wow, and I thought C pointers were hard to grok :D this lifetimes business is going to take some work to comprehend. \$\endgroup\$ Commented Dec 16, 2014 at 0:23
  • \$\begingroup\$ Well, to be fair, if you were using this API in C, you'd need to be very careful to avoid clobbering the buffer before you were done using it. If I'm going to have to say, "Well, this pointer is still valid here, because I haven't yet called that," I don't mind the compiler checking these things automatically. But yeah, fill_buf is a weird, low-level API with tricky lifetimes. \$\endgroup\$ Commented Dec 16, 2014 at 11:37
  • \$\begingroup\$ And as far as I've learned so far, the only way to get good throughput in this kind of application. Although the blind pass-through makes for a complicated filter application. For example, while your code is really fast and I think I understand what's going on, I have a hard time working out how to get my filter on it. Namely, I want to drop anything that's not valid unicode. \$\endgroup\$ Commented Dec 16, 2014 at 18:32
6
\$\begingroup\$

How do I go about finding out what's keeping the Rust version from being faster?

In the general case you would use a profiler like Linux's perf tool to give you a rundown of which functions are eating up all your time. But perf is broken on my computer at the moment and I don't feel like fixing it, so let's try some more Rust-specific advice. :)

The first thing to do is note that print! is a macro (note the exclamation point!), which means that it's expanding at compile-time to do... something. We can figure out what that something is by passing --pretty expanded to the compiler, yielding this:

#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate "std" as std;
#[prelude_import]
use std::prelude::*;
fn main() {
 let mut reader = io::stdin();
 for line in reader.lock().lines() {
 match line {
 Ok(l) =>
 match (&l,) {
 (__arg0,) => {
 #[inline]
 #[allow(dead_code)]
 static __STATIC_FMTSTR: &'static [&'static str] = &[""];
 ::std::io::stdio::print_args(&::std::fmt::Arguments::new(__STATIC_FMTSTR,
 &[::std::fmt::argument(::std::fmt::Show::fmt,
 __arg0)]))
 }
 },
 Err(_) => continue ,
 }
 }
}

Yikes... that's a whole lot of stuff just for printing a line via a type-safe format string. In truth we really don't care about type-safe format strings in this application, given that we want an apples-to-apples comparison to C, so we can get rid of the macro entirely... but what to replace it with? I could give you the step-by-step, but instead I'll point you to the stdio docs and let you poke around at your leisure: http://doc.rust-lang.org/std/io/stdio/

Here's the result of me perusing the docs there:

use std::io;
fn main() {
 let mut reader = io::stdin();
 let mut writer = io::stdout();
 while let Ok(c) = reader.read_u8() {
 writer.write_u8(c);
 }
}

This looks much more directly comparable to your C program.

(If you're wondering what while let foo does, it is simply treated as an infinite loop that breaks whenever you get an enum that isn't foo. In this case, reader.read_u8() returns an IoResult, and will give you the Ok variant when it gets a char or the Err variant when it hits EOF. See read_u8's documentation here: http://doc.rust-lang.org/std/io/trait.Reader.html#method.read_u8)

I'm sure this program could be optimized further, perhaps by investigating the stdin_raw() and stdout_raw() functions mentioned in the stdio docs. But this should give you a good starting point, and give you the tools to dig deeper yourself.

EDIT: Changed read_char to read_u8 for better analogy with C.

answered Dec 15, 2014 at 21:24
\$\endgroup\$
2
  • \$\begingroup\$ Brilliant reply! I actually learned a lot. Sadly, the code you arrive at has exactly the same throughput as my code, but I can't seem to figure out why. perf says the top symbol is something from the kernel called _raw_spin_lock, which doesn't mean anything to me. \$\endgroup\$ Commented Dec 15, 2014 at 22:43
  • \$\begingroup\$ It's a shame stdin_raw is private. \$\endgroup\$ Commented Oct 18, 2018 at 6:33

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.