Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Why is reading lines from stdin much slower in C++ than Python?

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.


(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>
using namespace std;
int main() {
 string input_line;
 long line_count = 0;
 time_t start = time(NULL);
 int sec;
 int lps;
 while (cin) {
 getline(cin, input_line);
 if (!cin.eof())
 line_count++;
 };
 sec = (int) time(NULL) - start;
 cerr << "Read " << line_count << " lines in " << sec << " seconds.";
 if (sec > 0) {
 lps = line_count / sec;
 cerr << " LPS: " << lps << endl;
 } else
 cerr << endl;
 return 0;
}
// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys
count = 0
start = time.time()
for line in sys.stdin:
 count += 1
delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
 lines_per_sec = int(round(count/delta_sec))
 print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
 lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889
$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP: Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in 1 seconds. LPS: 5570000

Tiny benchmark addendum and recap

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:

Implementation Lines per second
python (default) 3,571,428
cin (default/naive) 819,672
cin (no sync) 12,500,000
fgets 14,285,714
wc (not fair comparison) 54,644,808

Answer*

Draft saved
Draft discarded
Cancel
2
  • 43
    Wow, that was quite insightful! While I've been aware that cat is unnecessary for feeding input to stdin of programs and that the < shell redirect is preferred, I've generally stuck to cat due to the left-to-right flow of data that the former method preserves visually when I reason about pipelines. Performance differences in such cases I've found to be negligible. But, I do appreciate your educating us, Bela. Commented May 9, 2017 at 1:16
  • 22
    Redirection is parsed out of the shell command line at an early stage, which allows you to do one of these, if it gives a more pleasing appearance of left-to-right flow: $ < big_file time my_program $ time < big_file my_program This should work in any POSIX shell (i.e. not `csh` and I'm not sure about exotica like `rc` : ) Commented May 10, 2017 at 21:55

default

AltStyle によって変換されたページ (->オリジナル) /