Re: io:lines() and 0円
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: io:lines() and 0円
- From: Philipp Janda <siffiejoe@...>
- Date: 2014年2月21日 00:29:37 +0100
Hi!
This seems to be a fun discussion ... :-)
Am 20.02.2014 21:03 schröbte Dirk Laurie:
2014年02月20日 21:44 GMT+02:00 René Rebe <rene@exactcode.de>:
The discussion is about lines(), that it using fgets is just an
implementation detail.
If Roberto would not kind of implied performance loss is not that acceptable
with his bible test case then a fgetc() look without all this troubles would
have been very fine for me, too.
I can certainly give up improving vanilla Lua and convincing some that
random data loss is usually considered a bug, and live very happily with the
fix that works for me just fine.
Have fun parsing MIME, CGI data, or financial programs exports using 0円
field delimiters. Or wherever a zero comes along.
It is useful to look again at the start of the post where it all started.
I just noticed that io:lines() does not cope with 0円 in the lines
Allow me to summarize the facts.
1. io.lines operates on text files.
`io.lines` operates on any file you throw in its way. It *opens* the
files as text streams, but that is something you will only find out if
you read the source code. The manual does not specify this (except for
`io.lines()` without arguments, which uses `io.input`). Apparently
`file:lines` does not raise an error when used on a binary file object
either (which is a good thing for anybody using non-ASCII characters) ...
I may be wrong, but isn't there some 16-bit encoding where every other
byte is zero for ASCII characters (UCS-2, UTF-16, or something)?
2. Text files may not contain any control character except whitespace.
That is your definition. The Lua manual does not contain that (or any
other) definition. AFAIK even ISO C does not have a definition. I found
one for POSIX[1], but that one is different from yours.
It is true that Lua cannot do better than the underlying C library, but
ISO C does not forbid the C library to do better than the lowest common
denominator specified by ISO C for text streams.
I also think that defining text files for Lua won't help much, because
you can only verify that a file is a proper text file by opening it in
binary mode and checking every character. And the alternative (silent
data loss) may be difficult to detect from within a program ...
3. 0円 is not whitespace.
That one we agree on.
In other words, the behaviour complained of is that a standard library
routine when given data that does not conform to specification gives
undefined results.
Currently there is no relevant specification other than the source code
or a collection of mailing list posts.
Regarding performance: If I needed maximum read performance I would bind
`mmap`. I think `lua-apr`[2] contains a file-like binding, anyone knows
of any other? But I suspect that all this performance would be wasted
anyway: text files[*] usually don't get that big (unless you
mis-configure `logrotate`).
I don't have the bible installed, but the largest text file I could find
on my computer is the `ngerman` dictionary with 4.3 Mb. My largest
logfile in `/var/log/*` is 1 Mb (`kern.log.1`) ...
Philipp
[1]:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397
[2]: http://peterodding.com/code/lua/apr/docs/#shared_memory
[*]: Files primarily intended to be read by humans.
- References:
- io:lines() and 0円, René Rebe
- Re: io:lines() and 0円, steve donovan
- Re: io:lines() and 0円, René Rebe
- Re: io:lines() and 0円, Enrico Colombini
- Re: io:lines() and 0円, steve donovan
- Re: io:lines() and 0円, René Rebe
- Re: io:lines() and 0円, Sean Conner
- Re: io:lines() and 0円, Roberto Ierusalimschy
- Re: io:lines() and 0円, René Rebe
- Re: io:lines() and 0円, Cezary H. Noweta
- Re: io:lines() and 0円, René Rebe
- Re: io:lines() and 0円, Dirk Laurie