I need to process 100 Gigabytes of logs in a weird format, then do some analysis on the results.
Initial parts of parsing and CLI done, tried on some test data with 1 GB and it took around a minute. I ran a sanity check that just copied *standard-input* to *standard-out*, and it showed that most of the time is spent in the reading part. Python did the same thing in a couple of seconds, if even.
To generate sample data:
yes "$(printf 'A%.0s' {1..10})" | head -c 1G > sample.txt
Common LISP code:
#!/usr/bin/env -S sbcl --script
(defun main ()
(loop for line = (read-line *standard-input* nil)
while line
do (write-string line)
(write-char #\NEWLINE)))
(eval-when (:execute)
(main))
Python code:
#!/usr/bin/env python3
import sys
def main():
for line in sys.stdin:
sys.stdout.write(line)
if __name__ == "__main__":
main()
To run:
time cat data/sample.txt | ./test.lisp > result_lisp.txt
# real 1m59.719s
# user 0m47.231s
# sys 1m13.008s
time cat data/sample.txt | ./test.py > result_python.txt
# real 0m9.557s
# user 0m7.688s
# sys 0m2.144s
SBCL version: SBCL 2.2.9.debian.
Python Version: Python 3.13.3.
Is there a workaround or fix for this? So far CL has never let me down on the performance side, even for some heavy number crunching it was usually faster than Python.
I decided to profile a similar snippet as the originals, now it literally does nothing but read-line.
#!/usr/bin/env -S sbcl --script
(require :sb-sprof)
(defun main ()
(loop for line = (read-line *standard-input* nil)
while line
))
(eval-when (:execute)
(sb-sprof:with-profiling (:max-samples 100000 :sample-interval 0.00001 :report :graph)
(main)))
It seems most of the time is spent dealing with UTF conversion.
Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 12930 49.2 13191 50.2 12930 49.2 - SB-IMPL::INPUT-CHAR/UTF-8
2 9724 37.0 22523 85.7 22654 86.2 - (LAMBDA (&REST REST) :IN SB-IMPL::GET-EXTERNAL-FORMAT)
3 2517 9.6 25834 98.3 25171 95.8 - READ-LINE
4 146 0.6 146 0.6 25317 96.3 - foreign function pthread_sigmask
5 26 0.1 26257 99.9 25343 96.4 - MAIN
6 26 0.1 26 0.1 25369 96.5 - RESTORE-YMM
7 6 0.0 262 1.0 25375 96.5 - SB-IMPL::REFILL-INPUT-BUFFER
8 5 0.0 214 0.8 25380 96.6 - (FLET "WITHOUT-INTERRUPTS-BODY-2" :IN SB-IMPL::REFILL-INPUT-BUFFER)
9 5 0.0 5 0.0 25385 96.6 - SAVE-YMM
10 4 0.0 617 2.3 25389 96.6 - ALLOC-TRAMP
Getting rid of *standard-input* (really bad for shell-like scripts) seems to improve speed a lot, and it calls some different UTF-related things.
(require :sb-sprof)
(defun main (str)
(loop for line = (read-line str nil)
while line))
(eval-when (:execute)
(sb-sprof:with-profiling (:max-samples 100000 :sample-interval 0.00001 :report :graph)
(with-open-file (str "sample.txt")
(main str))))
Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 2133 38.7 2395 43.4 2133 38.7 - SB-IMPL::FD-STREAM-READ-N-CHARACTERS/UTF-8
2 763 13.8 5212 94.5 2896 52.5 - SB-IMPL::ANSI-STREAM-READ-LINE-FROM-FRC-BUFFER
3 725 13.1 725 13.1 3621 65.6 - SB-KERNEL:UB32-BASH-COPY
4 435 7.9 1882 34.1 4056 73.5 - (LABELS SB-IMPL::BUILD-RESULT :IN SB-IMPL::ANSI-STREAM-READ-LINE-FROM-FRC-BUFFER)
5 261 4.7 261 4.7 4317 78.2 - READ-LINE
6 119 2.2 119 2.2 4436 80.4 - foreign function pthread_sigmask
7 33 0.6 5516 100.0 4469 81.0 - MAIN
8 16 0.3 2413 43.7 4485 81.3 - SB-INT:FAST-READ-CHAR-REFILL
9 15 0.3 15 0.3 4500 81.6 - RESTORE-YMM
10 11 0.2 11 0.2 4511 81.8 - SAVE-YMM
3 Answers 3
This is not really a good solution, but it is a slightly hacky workaround which will work if you're on a Linux or macOS, but probably not Windows, system.
The problem is that for some reason SBCL's handling of the standard input and output streams is pretty slow, which seems to be because either they are bivalent or unbuffered by default.
A trick then is to get hold of a stream which points at stdin but which is better. On most recent unixoid systems, /dev/stdin is a file which refers to stdin.
So given a function, count-lines which will count the lines in a stream, a simple-minded SBCL file which does
(defun naively-count-stdin-lines ()
(format t "~&~D lines~%"
(count-lines *standard-input*)))
(naively-count-stdin-lines)
is really slow:
$ time sbcl --noinform --load naive.lisp < f.txt
10000000 lines
sbcl --noinform --load naive.lisp < f.txt 10.87s user 0.18s system 99% cpu 11.138 total
But we can instead write this:
(defun workaround-count-stdin-lines ()
(with-open-file (in "/dev/stdin")
(format t "~&~D lines~%"
(count-lines in))))
(workaround-count-stdin-lines)
and now
$ time sbcl --noinform --load workaround.lisp < f.txt
10000000 lines
sbcl --noinform --load workaround.lisp < f.txt 0.98s user 0.19s system 90% cpu 1.294 total
Which is faster by a factor of more than 10.
Note: I'm not using the --script option just because my count-lines function depends on things loaded from QL so I need it to read my init files. SBCL takes about 0.18s just to start, load init files, load a bunch of prerequisites and compile count-lines: the performance difference for reading files is therefore even larger.
Note also that count-lines uses read-line: whatever this problem is is it not that read-line is slow: rather it is that some kinds of stream in SBCL are radically slower than others when reading characters.
Comments
It seems that read-line is very inefficient for reasons that I cannot explain.
I can however offer a variation to your code which will improve performance significantly.
#!/usr/bin/env -S sbcl --script
(defun main (bufsz)
(loop with buffer = (make-array bufsz :element-type 'character)
for nchars = (read-sequence buffer *standard-input*)
while (> nchars 0)
do (write-string (subseq buffer 0 nchars))))
(eval-when (:execute)
(main 8192)) ;; 8k chunk
On my platform this runs in ~15s and the Python version runs in ~8s.
MacOS 26.0.1 (M2) Python 3.14.0 SBCL 2.5.9
1 Comment
read-line returns a fresh string when it is called: this has at least two consequences. Each time read-line is called a string is allocated, which brings the overhead of allocation and potential garbage collection. Further, strings in SBCL are treated as unicode strings by default, and the *standard-input* and *standard-output* streams are utf-8 by default. So using read-line brings all of the overhead of both string creation, memory management, and unicode processing.
From the discussion that has resulted in this Q&A, it seems that the real bottleneck has to do with character I/O and bivalent streams such as *standard-input* and *standard-output*.
For reference, here is the timing to process the 1 GB test file for the OP read-line/write-string program when run on my laptop:
$ time cat dummy_1G.txt | ./op-echo > out_1G.txt
real 3m49.827s
user 1m28.260s
sys 2m21.909s
There are surely ways to improve on all of the solutions found below.
A Simple Solution
After trying some byte-buffer solutions, it occurred to me that the main bottleneck is in writing characters to *standard-output*; reading characters from *standard-input* seems less troublesome.
Here is a program that uses read-line to read from *standard-input*, but uses the SBCL string-to-octets together with write-sequence so that bytes may be written to *standard-output* instead of characters.
#!/usr/bin/env -S sbcl --script
;;; simple-echo
(declaim (optimize (speed 3)))
(defun main ()
(loop for line = (read-line *standard-input* nil)
while line
;; Process `line`.
do (write-sequence
(sb-ext:string-to-octets (concatenate 'string line "\Newline"))
*standard-output*)))
(eval-when (:execute) (main))
This seems to be about the simplest thing you could do to get a significant performance boost. This shows an improvement of nearly 6x over the OP program:
$ time cat dummy_1G.txt | ./simple-echo > out_1G.txt
real 0m40.311s
user 0m38.671s
sys 0m2.167s
Another Simple (But Not Portable) Solution
The idea of @ignis volens to use /dev/stdin can be extended to use /dev/stdout and combined with the idea to write bytes instead of characters to output.
#!/usr/bin/env -S sbcl --script
(declaim (optimize (speed 3)))
(defun main ()
(with-open-file (in "/dev/stdin")
(with-open-file (out "/dev/stdout"
:direction :output
:if-exists :append
:element-type '(unsigned-byte 8))
(loop for line = (read-line in nil)
while line
;; Process line.
do (write-sequence
(sb-ext:string-to-octets (concatenate 'string line "\Newline"))
out)))))
(eval-when (:execute) (main))
This is the fastest solution I have tested, but it relies on the standard Linux I/O file handles, may or may not work on other Unix-like platforms, and will certainly fail on Windows systems. But for the right user this solution provides a 10x speedup over the original OP code.
$ time cat dummy_1G.txt | ./linux-echo > out_1G.txt
real 0m24.187s
user 0m22.407s
sys 0m2.233s
A Portable Solution
This solution is a little bit faster than the first simple solution, but not as fast as the non-portable solution using /dev/stdin and /dev/stdout. But this solution will work on any platform running SBCL.
In an earlier version of this answer I showed a solution which read the input in blocks of bytes and searched those bytes for newline bytes. This was reasonably fast, but fragile, required platform-specific handling of newlines, and required that lines could sensibly be byte-wise interpreted (meaning that input containing unicode characters would be problematic).
Here is an updated version of that idea which is not so problematic, does not rely on platform-specific treatment of newlines, and can be used with unicode input.
Here the input is read into a byte buffer for speed from *standard-input*. Then the byte buffer is used to create a character buffer using the SBCL extension octets-to-string.
The character buffer is traversed line-by-line as needed when get-line is called. When the character buffer is exhausted, the byte buffer is refilled and a new character buffer is provided.
#!/usr/bin/env -S sbcl --script
;;;; process-lines
;; Buffer for raw I/O bytes.
(defparameter *buffer-size* 4096)
(defparameter *byte-buffer* (make-array *buffer-size*
:element-type '(unsigned-byte 8)))
;; Buffer for characters converted from byte buffer.
(defparameter *char-buffer* (make-array 0 :element-type 'character))
(defparameter *end-char* 0) ; One-past last buffered character.
(defparameter *start-line* 0) ; Start of next line in character buffer.
(declaim (optimize (speed 3))
(type (simple-array (unsigned-byte 8) (*)) *byte-buffer*)
(type (simple-array character (*)) *char-buffer*)
(type fixnum *buffer-size* *end-char* *start-line*))
(defun get-line ()
(when (zerop *start-line*) ; Attempt to fill byte buffer when empty.
(let ((end-byte (read-sequence *byte-buffer* *standard-input*)))
(if (zerop end-byte)
(return-from get-line nil) ; Return nil when input is exhausted.
(setf *char-buffer* (sb-ext:octets-to-string *byte-buffer*
:end end-byte)
*end-char* (length *char-buffer*)))))
(if (<= *start-line* *end-char*) ; Otherwise: end of input reached.
(let ((end-line (position #\Newline *char-buffer* :start *start-line*)))
(if end-line
(let ((start *start-line*))
(declare (type fixnum start))
(setf *start-line* (+ end-line 1))
(subseq *char-buffer* start (+ end-line 1)))
;; Line end not in byte buffer.
(let ((partial-line (subseq *char-buffer* *start-line* *end-char*)))
(declare (type (simple-array character) partial-line))
(setf *start-line* 0)
(let ((finish-line (get-line)))
(if finish-line
(concatenate 'string partial-line finish-line)
partial-line)))))
nil))
(defun main ()
(do ((line (get-line) (get-line)))
((null line))
;; Do something with `line`
;; (setf (aref line 0) #\B)
;; (write-string line *standard-output*) ; slow
(write-sequence (sb-ext:string-to-octets line) *standard-output*)))
(eval-when (:execute) (main))
This solution allows input to be read quickly as bytes from *standard-input*, but the result returned from get-line is a string, so normal string operations can be used for line processing. The processed line is then converted back to bytes before writing to *standard-output*. The result is much faster than the original OP read-line/write-string program:
$ time cat dummy_1G.txt | ./process-lines > out_1G.txt
real 0m33.768s
user 0m32.459s
sys 0m2.210s
Note that using (write-sequence (sb-ext:string-to-octets line) *standard-output*) here instead of (write-string line *standard-output*) resulted in a 7x speedup of this program. But this program is also 7x faster than the OP program, which leads me to believe that the main performance bottleneck is in the writing of output, not the reading of input.
Processing Bytes in Bulk
One way to reduce the overhead is to treat I/O as bytes instead of characters. This avoids the poor character I/O performance of current versions of SBCL. Reading the bytes into a suitably-sized buffer also helps with performance.
Here is a simple script that echos *standard-input* to *standard-output*:
#!/usr/bin/env -S sbcl --script
;;;; my-echo
(defparameter *buffer-size* 4096)
(defun main ()
(declare (optimize (speed 3)))
(loop with buffer = (make-array *buffer-size*
:element-type '(unsigned-byte 8))
for pos = (read-sequence buffer *standard-input*)
do (write-sequence buffer *standard-output* :start 0 :end pos)
until (< pos *buffer-size*)))
(eval-when (:execute) (main))
This program uses read-sequence and write-sequence to process bytes, and on my (old and slowish) laptop the result is almost 30 times faster than the same program using an array of character elements. Here is a timing result for the 1 GB file described by OP:
$ time cat dummy_1G.txt | ./my-echo > out_1G.txt
real 0m1.270s
user 0m0.238s
sys 0m1.367s
For sheer speed in copying of bytes, this program is 180x faster than the OP program.
7 Comments
read-line and decoding UTF-8 but the performance differs by a factor of 10. Reading as octets 'fixes' the problem but means your program is now made of bugs, since you can't even find line endings reliably until the octets are decoded into characters. The answer seems to be that some streams in SBCL are much, much slower than others for decoding UTF-8 at least. That's a deficiency in SBCL I'd report if I knew where they actually listen to bug reports.Explore related questions
See similar questions with these tags.
(declare (type fixnum a b))is enough to arrange for efficient compiled code. I wonder if the trouble here is utf8 parsing? Could we maybe read in N binary bytes, or read in a Latin-1 or binary record terminated by a newline? // Does the corresponding Racket (Scheme) program perform similarly poorly?read-lineallocates a string every time it reads a new line; this probably has a detrimental impact on performance. You might be able to work around this by usingread-sequenceto read blocks of data into an array, then searching through the array for newline characters to locate the lines.