Re: io:lines() and 0円
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: io:lines() and 0円
- From: William Ahern <william@...>
- Date: 2014年2月17日 13:05:49 -0800
On Mon, Feb 17, 2014 at 05:16:29PM +0100, Ren? Rebe wrote:
> Hi,
>
> On Feb 17, 2014, at 16:55 , steve donovan wrote:
>
> > On Mon, Feb 17, 2014 at 5:51 PM, Ren? Rebe <rene@exactcode.de> wrote:
> >> I just noticed that io:lines() does not cope with 0円 in the lines, and thus
> >> just returns truncated lines (lua-5.2.3, but legacy 5.1 likewise).
> >
> > This is not surprising. The whole idea of 'lines' only really applies
> > to text files, at least in my head ;)
>
> well, in my option library foundations should just work, and not silently
> discard some bits and bytes. A line is a line, no matter how many 0円 are
> in there until the next \n-newline. And the Lua manual points out Lua
> strings are 0円-save.
>
> I already provided patches a year or two ago for other pattern matching 0円
> fixes, which where merged into 5.2.
>
> One quite simple and obvious use of lines with 0円 binary data is parsing
> MIME, CGI data.
Well, in MIME a line ends in \r\n. So if you want to be 8-bit clean you
technically shouldn't be treating a line as simply ending in \n, anyhow.
OTOH, in MIME even "8-bit" encoded entities shouldn't have bare 0円 or \n
characters. The "binary" transfer encoding allows those. But even in binary
transfer encoding a line is \r\n.
So there's no simple answer, really.
The sockets implementation in my cqueues library has a text-mode translation
feature which translates \r\n sequences to \n, because on Unix (unlike
Windows) this is not done by the underlying stdio implementation. This
allows simple (and in practice mostly correct) implementation of MIME-like
protocols. But of course I had to implement all of the buffering myself
because you simply cannot reliably depend on the underlying implementation
if you want dependable behavior.
For example, what's your maximum line length? MIME specifies 998, but in
practice lots of implementations allow much larger limits because of broken
clients (like brain-dead PHP scripts). Lua's internal limit is also probably
too small to be production-quality reliable on the open internet (unless you
want endless support calls), and in any event it's not configurable.
Basically, if you want to be serious about this stuff you have to do your
own buffering.