musl/src/regex/regcomp.c, branch master musl - an implementation of the standard library for Linux-based systems regex: fix newline matching with negated brackets 2017年03月21日T16:24:23+00:00 Julien Ramseier j.ramseier@gmail.com 2017年03月21日T16:24:23+00:00 9571c5314a8064eda8a56faa2ae2aeced34497a3 With REG_NEWLINE, POSIX says: "A <newline> in string shall not be matched by a period outside a bracket expression or by any form of a non-matching list"
With REG_NEWLINE, POSIX says:
"A <newline> in string shall not be matched by a period outside
a bracket expression or by any form of a non-matching list"
handle ^ and $ in BRE subexpression start and end as anchors 2016年12月17日T04:26:29+00:00 Szabolcs Nagy nsz@port70.net 2016年11月24日T00:44:49+00:00 7a4c25d78030b3a43ed5c8dd1a456f73cb990f44 In BRE, ^ is an anchor at the beginning of an expression, optionally it may be an anchor at the beginning of a subexpression and must be treated as a literal otherwise. Previously musl treated ^ in subexpressions as literal, but at least glibc and gnu sed treats it as an anchor and that's the more useful behaviour: it can always be escaped to get back the literal meaning. Same for $ at the end of a subexpression. Portable BRE should not rely on this, but there are sed commands in build scripts which do. This changes the meaning of the BREs: \(^a\) \(a\|^b\) \(a$\) \(a$\|b\)
In BRE, ^ is an anchor at the beginning of an expression, optionally
it may be an anchor at the beginning of a subexpression and must be
treated as a literal otherwise.
Previously musl treated ^ in subexpressions as literal, but at least
glibc and gnu sed treats it as an anchor and that's the more useful
behaviour: it can always be escaped to get back the literal meaning.
Same for $ at the end of a subexpression.
Portable BRE should not rely on this, but there are sed commands in
build scripts which do.
This changes the meaning of the BREs:
	\(^a\)
	\(a\|^b\)
	\(a$\)
	\(a$\|b\)
fix the use of uninitialized value in regcomp 2016年05月22日T21:52:19+00:00 Szabolcs Nagy nsz@port70.net 2016年05月21日T13:21:38+00:00 51eeb6ebc94d965768143c45e9f39b0a7998bdbd the num_submatches field of some ast nodes was not initialized in tre_add_tag_{left,right}, but was accessed later. this was a benign bug since the uninitialized values were never used (these values are created during tre_add_tags and copied around during tre_expand_ast where they are also used in computations, but nothing in the final tnfa depends on them).
the num_submatches field of some ast nodes was not initialized in
tre_add_tag_{left,right}, but was accessed later.
this was a benign bug since the uninitialized values were never used
(these values are created during tre_add_tags and copied around during
tre_expand_ast where they are also used in computations, but nothing
in the final tnfa depends on them).
fix ^* at the start of a complete BRE 2016年03月02日T05:47:22+00:00 Szabolcs Nagy nsz@port70.net 2016年02月29日T16:36:25+00:00 29b13575376509bb21539711f30c1deaf0823033 This is a workaround to treat * as literal * at the start of a BRE. Ideally ^ would be treated as an anchor at the start of any BRE subexpression and similarly $ would be an anchor at the end of any subexpression. This is not required by the standard and hard to do with the current code, but it's the existing practice. If it is changed, * should be treated as literal after such anchor as well.
This is a workaround to treat * as literal * at the start of a BRE.
Ideally ^ would be treated as an anchor at the start of any BRE
subexpression and similarly $ would be an anchor at the end of any
subexpression. This is not required by the standard and hard to do
with the current code, but it's the existing practice. If it is
changed, * should be treated as literal after such anchor as well.
fix * at the start of a BRE subexpression 2016年03月02日T05:47:19+00:00 Szabolcs Nagy nsz@port70.net 2016年02月29日T15:04:46+00:00 39ea71fb8afddda879a1999f2a203dfdaf57a639 commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at the start of a BRE subexpression, but it should be accepted as literal * there according to the standard. This patch does not fix subexpressions starting with ^*.
commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at
the start of a BRE subexpression, but it should be accepted as
literal * there according to the standard.
This patch does not fix subexpressions starting with ^*.
regex: increase the stack tre uses for tnfa creation 2016年01月31日T22:33:54+00:00 Szabolcs Nagy nsz@port70.net 2016年01月31日T15:46:46+00:00 2810b30fc3c515e38d6acabe87de7b48bb8bfc7b 10k elements stack is increased to 1000k, otherwise tnfa creation fails for reasonable sized patterns: a single literal char can add 7 elements to this stack, so regcomp of an 1500 char long pattern (with only litral chars) fails with REG_ESPACE. (the new limit allows about < 150k chars, this arbitrary limit allows most command line regex usage.) ideally there would be no upper bound: regcomp dynamically reallocates this buffer, every reallocation checks for allocation failure and at the end this stack is freed so there is no reason for special bound. however that may have unwanted effect on regcomp and regexec runtime so this is a conservative change.
10k elements stack is increased to 1000k, otherwise tnfa creation fails
for reasonable sized patterns: a single literal char can add 7 elements
to this stack, so regcomp of an 1500 char long pattern (with only litral
chars) fails with REG_ESPACE. (the new limit allows about < 150k chars,
this arbitrary limit allows most command line regex usage.)
ideally there would be no upper bound: regcomp dynamically reallocates
this buffer, every reallocation checks for allocation failure and at
the end this stack is freed so there is no reason for special bound.
however that may have unwanted effect on regcomp and regexec runtime
so this is a conservative change.
regex: simplify the {,} repetition parsing logic 2016年01月31日T01:53:52+00:00 Szabolcs Nagy nsz@port70.net 2015年04月18日T17:53:38+00:00 831e9d9efa61566a25c1dcdbd28f55daeea4dd32
regex: treat \+, \? as repetitions in BRE 2016年01月31日T01:53:42+00:00 Szabolcs Nagy nsz@port70.net 2015年04月18日T17:28:49+00:00 25160f1c08235cf5b6a9617c5640380618a0f6ff These are undefined escape sequences by the standard, but often used in sed scripts.
These are undefined escape sequences by the standard, but often
used in sed scripts.
regex: rewrite the repetition parsing code 2016年01月31日T01:53:29+00:00 Szabolcs Nagy nsz@port70.net 2015年04月18日T17:25:31+00:00 03498ec22a4804ddbd8203d9ac94b6f7b6574b3c The goto logic was hard to follow and modify. This is in preparation for the BRE \+ and \? support.
The goto logic was hard to follow and modify. This is
in preparation for the BRE \+ and \? support.
regex: treat \| in BRE as alternation 2016年01月31日T01:53:17+00:00 Szabolcs Nagy nsz@port70.net 2015年04月18日T16:47:17+00:00 da4cc13b9705e7d3a02216959b9711b3b30828c1 The standard does not define semantics for \| in BRE, but some code depends on it meaning alternation. Empty alternative expression is allowed to be consistent with ERE. Based on a patch by Rob Landley.
The standard does not define semantics for \| in BRE, but some code
depends on it meaning alternation. Empty alternative expression is
allowed to be consistent with ERE.
Based on a patch by Rob Landley.

AltStyle によって変換されたページ (->オリジナル) /