Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Why is reading lines from stdin much slower in C++ than Python?

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.


(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>
using namespace std;
int main() {
 string input_line;
 long line_count = 0;
 time_t start = time(NULL);
 int sec;
 int lps;
 while (cin) {
 getline(cin, input_line);
 if (!cin.eof())
 line_count++;
 };
 sec = (int) time(NULL) - start;
 cerr << "Read " << line_count << " lines in " << sec << " seconds.";
 if (sec > 0) {
 lps = line_count / sec;
 cerr << " LPS: " << lps << endl;
 } else
 cerr << endl;
 return 0;
}
// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys
count = 0
start = time.time()
for line in sys.stdin:
 count += 1
delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
 lines_per_sec = int(round(count/delta_sec))
 print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
 lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889
$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP: Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in 1 seconds. LPS: 5570000

Tiny benchmark addendum and recap

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:

Implementation Lines per second
python (default) 3,571,428
cin (default/naive) 819,672
cin (no sync) 12,500,000
fgets 14,285,714
wc (not fair comparison) 54,644,808

Answer*

Draft saved
Draft discarded
Cancel
1
  • 2
    fgets has the issue of needing to pre-allocate the buffer. And if a line from the file is longer than the buffer, the only way to tell is to either linearly scan to find the length of the data read (fgets returns a pointer to the buffer, not something useful like the number of characters read), or ensure that every fgets is preceded by explicitly NUL-ing out that final character so you can check if it's been replaced and if it was replaced with a newline or something else. It may be faster, but giving up modern C++ features for uglier, less convenient, less secure C APIs is unpleasant. Commented Jun 14, 2024 at 16:30

default

AltStyle によって変換されたページ (->オリジナル) /