How can I find out why this Rust program to echo lines from stdin is slow?

Question 1

Given the following Rust program:

fn main() {
 let mut reader = io::stdin();
 for line in reader.lock().lines() {
 match line {
 Ok(l) => print!("{}", l),
 Err(_) => continue,
 }
 }
}

yes | program achieves 1.04MiB/s throughput, according to pv. The trivial C program below, which I fully realize does less, gives me a throughput of 141MiB/s.

int main() {
 int c;
 while((c=getchar()) != EOF)
 putchar(c);
}

How do I go about finding out what's keeping the Rust version from being faster? While I would appreciate if you just told me what to change to make it faster, I'm far more interested in how to find out what needs to be changed.

Edit: I tried changing lines() to chars() in the Rust version, to approximate the C version better, but it didn't seem to make any difference.

Question 2

Are you compiling with optimisations? cargo build --release or rustc -O file.rs Additionally, if you want to find more information about the assembly produced, you can pass --emit asm to rustc

Question 3

I was not, as a matter of fact. I did try just now, and the difference in throughput is decent, even if nowhere near what I'd expect (1.04MiB/s to 2.97MiB/s). I'm not familiar with assembly, so I'm not sure whether inspecting it would be useful.

Question 4

These programs seem fairly different, lines and char yield decoded unicode data, and print! goes through type-conversion and encoding. The C program blindly copies bytes from the input to the output.

Question 5

This question appears to be off-topic, because it seems to be about how to make something work, rather than code review

Question 6

The code works just fine, but it's surprisingly slow. I state specifically in my question that I want to learn how to approach problems like these rather than someone to just paste me a faster version.

Question 7

I'm going to treat this as a code review question where the goal is to figure out how to write high-performance I/O code in Rust.

There are two critical steps to speed up Rust I/O:

Make sure the optimizer is turned on. Rust loops have terrible performance in debug mode. Turning the optimizer on should probably get you near 25 MB/sec or so, at least in my experience.
Avoid line-based I/O, or anything else which uses String values. For maximum performance, we want to work with raw byte buffers and avoid ever letting Rust call malloc or free.

Here's some code which drills down to fill_buf, which is about as low as we can go before losing generality:

use std::io;
use std::io::{IoError, IoErrorKind};
fn main() {
 let mut reader = io::stdin();
 let mut buffer = reader.lock();
 let mut writer = io::stdout();
 loop {
 let consumed = match buffer.fill_buf() {
 Ok(bytes) => { writer.write(bytes).unwrap(); bytes.len() },
 Err(IoError{kind: IoErrorKind::EndOfFile, ..}) => break,
 Err(ref err) => panic!("Failed with: {}", err)
 };
 buffer.consume(consumed);
 }
}

Of course, this means that you need to find the line breaks yourself, and be prepared for lines to be split over multiple buffers. Also, notice how I pass bytes.len() outside of the match block before trying to call buffer.consume(). This is necessary to placate the lifetime checker.

Trying this with pv:

cat /dev/zero | target/release/throughput | pv > /dev/null

...gives us a throughput of 527 MB/s.

Question 8

What is the issue the lifetime checker has with calling consume() inside the match branch?

Question 9

Try it. :-) Basically, the bytes value returned by fill_buf points to an internal buffer inside buffer, and that prevents anybody from calling &mut self methods on buffer until bytes goes out of scope. So we need to do all our work inside the match block, then escape so that bytes goes away and we can safely access buffer again. There's some tricky Rust magic in these low-level APIs.

Question 10

Wow, and I thought C pointers were hard to grok :D this lifetimes business is going to take some work to comprehend.

Question 11

Well, to be fair, if you were using this API in C, you'd need to be very careful to avoid clobbering the buffer before you were done using it. If I'm going to have to say, "Well, this pointer is still valid here, because I haven't yet called that," I don't mind the compiler checking these things automatically. But yeah, fill_buf is a weird, low-level API with tricky lifetimes.

Question 12

And as far as I've learned so far, the only way to get good throughput in this kind of application. Although the blind pass-through makes for a complicated filter application. For example, while your code is really fast and I think I understand what's going on, I have a hard time working out how to get my filter on it. Namely, I want to drop anything that's not valid unicode.

Question 13

How do I go about finding out what's keeping the Rust version from being faster?

In the general case you would use a profiler like Linux's perf tool to give you a rundown of which functions are eating up all your time. But perf is broken on my computer at the moment and I don't feel like fixing it, so let's try some more Rust-specific advice. :)

The first thing to do is note that print! is a macro (note the exclamation point!), which means that it's expanding at compile-time to do... something. We can figure out what that something is by passing --pretty expanded to the compiler, yielding this:

#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate "std" as std;
#[prelude_import]
use std::prelude::*;
fn main() {
 let mut reader = io::stdin();
 for line in reader.lock().lines() {
 match line {
 Ok(l) =>
 match (&l,) {
 (__arg0,) => {
 #[inline]
 #[allow(dead_code)]
 static __STATIC_FMTSTR: &'static [&'static str] = &[""];
 ::std::io::stdio::print_args(&::std::fmt::Arguments::new(__STATIC_FMTSTR,
 &[::std::fmt::argument(::std::fmt::Show::fmt,
 __arg0)]))
 }
 },
 Err(_) => continue ,
 }
 }
}

Yikes... that's a whole lot of stuff just for printing a line via a type-safe format string. In truth we really don't care about type-safe format strings in this application, given that we want an apples-to-apples comparison to C, so we can get rid of the macro entirely... but what to replace it with? I could give you the step-by-step, but instead I'll point you to the stdio docs and let you poke around at your leisure: http://doc.rust-lang.org/std/io/stdio/

Here's the result of me perusing the docs there:

use std::io;
fn main() {
 let mut reader = io::stdin();
 let mut writer = io::stdout();
 while let Ok(c) = reader.read_u8() {
 writer.write_u8(c);
 }
}

This looks much more directly comparable to your C program.

(If you're wondering what while let foo does, it is simply treated as an infinite loop that breaks whenever you get an enum that isn't foo. In this case, reader.read_u8() returns an IoResult, and will give you the Ok variant when it gets a char or the Err variant when it hits EOF. See read_u8's documentation here: http://doc.rust-lang.org/std/io/trait.Reader.html#method.read_u8)

I'm sure this program could be optimized further, perhaps by investigating the stdin_raw() and stdout_raw() functions mentioned in the stdio docs. But this should give you a good starting point, and give you the tools to dig deeper yourself.

EDIT: Changed read_char to read_u8 for better analogy with C.

Question 14

Brilliant reply! I actually learned a lot. Sadly, the code you arrive at has exactly the same throughput as my code, but I can't seem to figure out why. perf says the top symbol is something from the kernel called _raw_spin_lock, which doesn't mean anything to me.

Question 15

It's a shame stdin_raw is private.

emk emk 2511 silver badge4 bronze badges · Answer 1 · 2014-12-15 21:19:34Z

I'm going to treat this as a code review question where the goal is to figure out how to write high-performance I/O code in Rust.

There are two critical steps to speed up Rust I/O:

Make sure the optimizer is turned on. Rust loops have terrible performance in debug mode. Turning the optimizer on should probably get you near 25 MB/sec or so, at least in my experience.
Avoid line-based I/O, or anything else which uses String values. For maximum performance, we want to work with raw byte buffers and avoid ever letting Rust call malloc or free.

Here's some code which drills down to fill_buf, which is about as low as we can go before losing generality:

use std::io;
use std::io::{IoError, IoErrorKind};
fn main() {
 let mut reader = io::stdin();
 let mut buffer = reader.lock();
 let mut writer = io::stdout();
 loop {
 let consumed = match buffer.fill_buf() {
 Ok(bytes) => { writer.write(bytes).unwrap(); bytes.len() },
 Err(IoError{kind: IoErrorKind::EndOfFile, ..}) => break,
 Err(ref err) => panic!("Failed with: {}", err)
 };
 buffer.consume(consumed);
 }
}

Of course, this means that you need to find the line breaks yourself, and be prepared for lines to be split over multiple buffers. Also, notice how I pass bytes.len() outside of the match block before trying to call buffer.consume(). This is necessary to placate the lifetime checker.

Trying this with pv:

cat /dev/zero | target/release/throughput | pv > /dev/null

...gives us a throughput of 527 MB/s.

What is the issue the lifetime checker has with calling consume() inside the match branch?
Try it. :-) Basically, the bytes value returned by fill_buf points to an internal buffer inside buffer, and that prevents anybody from calling &mut self methods on buffer until bytes goes out of scope. So we need to do all our work inside the match block, then escape so that bytes goes away and we can safely access buffer again. There's some tricky Rust magic in these low-level APIs.
Wow, and I thought C pointers were hard to grok :D this lifetimes business is going to take some work to comprehend.
Well, to be fair, if you were using this API in C, you'd need to be very careful to avoid clobbering the buffer before you were done using it. If I'm going to have to say, "Well, this pointer is still valid here, because I haven't yet called that," I don't mind the compiler checking these things automatically. But yeah, fill_buf is a weird, low-level API with tricky lifetimes.
And as far as I've learned so far, the only way to get good throughput in this kind of application. Although the blind pass-through makes for a complicated filter application. For example, while your code is really fast and I think I understand what's going on, I have a hard time working out how to get my filter on it. Namely, I want to drop anything that's not valid unicode.

B. Striegel B. Striegel 1612 bronze badges · Answer 2 · 2014-12-15 21:24:33Z

How do I go about finding out what's keeping the Rust version from being faster?

In the general case you would use a profiler like Linux's perf tool to give you a rundown of which functions are eating up all your time. But perf is broken on my computer at the moment and I don't feel like fixing it, so let's try some more Rust-specific advice. :)

The first thing to do is note that print! is a macro (note the exclamation point!), which means that it's expanding at compile-time to do... something. We can figure out what that something is by passing --pretty expanded to the compiler, yielding this:

#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate "std" as std;
#[prelude_import]
use std::prelude::*;
fn main() {
 let mut reader = io::stdin();
 for line in reader.lock().lines() {
 match line {
 Ok(l) =>
 match (&l,) {
 (__arg0,) => {
 #[inline]
 #[allow(dead_code)]
 static __STATIC_FMTSTR: &'static [&'static str] = &[""];
 ::std::io::stdio::print_args(&::std::fmt::Arguments::new(__STATIC_FMTSTR,
 &[::std::fmt::argument(::std::fmt::Show::fmt,
 __arg0)]))
 }
 },
 Err(_) => continue ,
 }
 }
}

Yikes... that's a whole lot of stuff just for printing a line via a type-safe format string. In truth we really don't care about type-safe format strings in this application, given that we want an apples-to-apples comparison to C, so we can get rid of the macro entirely... but what to replace it with? I could give you the step-by-step, but instead I'll point you to the stdio docs and let you poke around at your leisure: http://doc.rust-lang.org/std/io/stdio/

Here's the result of me perusing the docs there:

use std::io;
fn main() {
 let mut reader = io::stdin();
 let mut writer = io::stdout();
 while let Ok(c) = reader.read_u8() {
 writer.write_u8(c);
 }
}

This looks much more directly comparable to your C program.

(If you're wondering what while let foo does, it is simply treated as an infinite loop that breaks whenever you get an enum that isn't foo. In this case, reader.read_u8() returns an IoResult, and will give you the Ok variant when it gets a char or the Err variant when it hits EOF. See read_u8's documentation here: http://doc.rust-lang.org/std/io/trait.Reader.html#method.read_u8)

I'm sure this program could be optimized further, perhaps by investigating the stdin_raw() and stdout_raw() functions mentioned in the stdio docs. But this should give you a good starting point, and give you the tools to dig deeper yourself.

EDIT: Changed read_char to read_u8 for better analogy with C.

Brilliant reply! I actually learned a lot. Sadly, the code you arrive at has exactly the same throughput as my code, but I can't seem to figure out why. perf says the top symbol is something from the kernel called _raw_spin_lock, which doesn't mean anything to me.

Stack Exchange Network

How can I find out why this Rust program to echo lines from stdin is slow?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I find out why this Rust program to echo lines from stdin is slow?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions