musl/src/regex/glob.c, branch master musl - an implementation of the standard library for Linux-based systems glob: fix wrong return code when aborting before any matches 2023年08月24日T16:54:51+00:00 Rich Felker dalias@aerifal.cx 2023年08月24日T16:54:51+00:00 79bdacff83a6bd5b70ff5ae5eb8b6de82c2f7c30 when the result count was zero, glob was ignoring a possible GLOB_ABORTED error code and returning GLOB_NOMATCH. whether this happened could be nondeterministic and dependent on the order of dirent enumeration, in cases where multiple matches were present and only some produced errors. caught by Tor's test_util_glob.
when the result count was zero, glob was ignoring a possible
GLOB_ABORTED error code and returning GLOB_NOMATCH. whether this
happened could be nondeterministic and dependent on the order of
dirent enumeration, in cases where multiple matches were present and
only some produced errors.
caught by Tor's test_util_glob.
remove LFS64 symbol aliases; replace with dynamic linker remapping 2022年10月19日T18:01:31+00:00 Rich Felker dalias@aerifal.cx 2022年09月26日T21:14:18+00:00 246f1c811448f37a44b41cd8df8d0ef9736d95f4 originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
originally the namespace-infringing "large file support" interfaces
were included as part of glibc-ABI-compat, with the intent that they
not be used for linking, since our off_t is and always has been
unconditionally 64-bit and since we usually do not aim to support
nonstandard interfaces when there is an equivalent standard interface.
unfortunately, having the symbols present and available for linking
caused configure scripts to detect them and attempt to use them
without declarations, producing all the expected ill effects that
entails.
as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made
to prevent this, using macros to redirect the LFS64 names to the
standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE.
however, this has turned out to be a source of further problems,
especially since g++ defines _GNU_SOURCE by default. in particular,
the presence of these names as macros breaks a lot of valid code.
this commit removes all the LFS64 symbols and replaces them with a
mechanism in the dynamic linker symbol lookup failure path to retry
with the spurious "64" removed from the symbol name. in the future,
if/when the rest of glibc-ABI-compat is moved out of libc, this can be
removed.
fix failure of glob to match broken symlinks under some conditions 2019年08月07日T06:38:45+00:00 Rich Felker dalias@aerifal.cx 2019年07月11日T20:55:17+00:00 e408deefeb1a60b6e9e1bb63393590926f96ee64 when the pattern ended with one or more literal path components, or when the GLOB_MARK flag was passed to request that glob flag directory results and the type obtained by readdir was unknown or inconclusive (symlink), the stat function was called to evaluate existence and/or determine type. however, stat fails with ENOENT for broken symlinks, and this caused the match to be omitted from the results. instead, use stat only for the unknown/inconclusive cases with GLOB_MARK, and otherwise, or if stat fails, use lstat existence still needs to be determined. this minimizes the number of costly syscalls, performing both only in the case where GLOB_MARK is in use and there is a final literal path component which is a broken symlink. based on/simplified from patch by James Y Knight.
when the pattern ended with one or more literal path components, or
when the GLOB_MARK flag was passed to request that glob flag directory
results and the type obtained by readdir was unknown or inconclusive
(symlink), the stat function was called to evaluate existence and/or
determine type. however, stat fails with ENOENT for broken symlinks,
and this caused the match to be omitted from the results.
instead, use stat only for the unknown/inconclusive cases with
GLOB_MARK, and otherwise, or if stat fails, use lstat existence still
needs to be determined. this minimizes the number of costly syscalls,
performing both only in the case where GLOB_MARK is in use and there
is a final literal path component which is a broken symlink.
based on/simplified from patch by James Y Knight.
glob: implement GLOB_TILDE and GLOB_TILDE_CHECK 2019年08月06日T18:03:31+00:00 Ismael Luceno ismael@iodev.co.uk 2019年07月25日T21:50:48+00:00 49eacf29d29fa704b2feda879978ae64a825358c
allow escaped path-separator slashes in glob 2018年10月13日T05:28:07+00:00 Rich Felker dalias@aerifal.cx 2018年10月13日T04:55:48+00:00 481006fd8887b80c4f794085179047e28102b01a previously (before and after rewrite), spurious escaping of path separators as \/ was not treated the same as /, but rather got split as an unpaired \ at the end of the fnmatch pattern and an unescaped /, resulting in a mismatch/error. for the case of \/ as part of the maximal literal prefix, remove the explicit rejection of it and move the handling of / below escape processing. for the case of \/ after a proper glob pattern, it's hard to parse the pattern, so don't. instead cheat and count repetitions of \ prior to the already-found / character. if there are an odd number, the last is escaping the /, so back up the split position by one. now the char clobbered by null termination is variable, so save it and restore as needed.
previously (before and after rewrite), spurious escaping of path
separators as \/ was not treated the same as /, but rather got split
as an unpaired \ at the end of the fnmatch pattern and an unescaped /,
resulting in a mismatch/error.
for the case of \/ as part of the maximal literal prefix, remove the
explicit rejection of it and move the handling of / below escape
processing.
for the case of \/ after a proper glob pattern, it's hard to parse the
pattern, so don't. instead cheat and count repetitions of \ prior to
the already-found / character. if there are an odd number, the last is
escaping the /, so back up the split position by one. now the
char clobbered by null termination is variable, so save it and restore
as needed.
rewrite core of the glob implementation for correctness & optimization 2018年10月13日T03:31:27+00:00 Rich Felker dalias@aerifal.cx 2018年10月13日T02:32:41+00:00 d44b07fc904f6a0d31ba025f3e9f423c1e47547e this code has been long overdue for a rewrite, but the immediate cause that necessitated it was total failure to see past unreadable path components. for example, A/B/* would fail to match anything, even though it should succeed, when both A and A/B are searchable but only A/B is readable. this problem both was caught in conformance testing, and impacted users. the old glob implementation insisted on searching the listing of each path component for a match, even if the next component was a literal. it also used considerable stack space, up to length of the pattern, per recursion level, and relied on an artificial bound of the pattern length by PATH_MAX, which was incorrect because a pattern can be much longer than PATH_MAX while having matches shorter (for example, with necessarily long bracket expressions, or with redundancy). in the new implementation, each level of recursion starts by consuming the maximal literal (possibly escaped-literal) path prefix remaining in the pattern, and only opening a directory to read when there is a proper glob pattern in the next path component. it then recurses into each matching entry. the top-level glob function provided automatic storage (up to PATH_MAX) for construction of candidate/result strings, and allocates a duplicate of the pattern that can be modified in-place with temporary null-termination to pass to fnmatch. this allocation is not a big deal since glob already has to perform allocation, and has to link free to clean up if it experiences an allocation failure or other error after some results have already been allocated. care is taken to use the d_type field from iterated dirents when possible; stat is called only when there are literal path components past the last proper-glob component, or when needed to disambiguate symlinks for the purpose of GLOB_MARK. one peculiarity with the new implementation is the manner in which the error handling callback will be called. if attempting to match */B/C/D where a directory A exists that is inaccessible, the error reported will be a stat error for A/B/C/D rather than (previous and wrong implementation) an opendir error for A, or (likely on other implementations) a stat error for A/B. such behavior does not seem to be non-conforming, but if it turns out to be undesirable for any reason, backtracking could be done on error to report the first component producing it. also, redundant slashes are no longer normalized, but preserved as they appear in the pattern; this is probably more correct, and falls out naturally from the algorithm used. since trailing slashes (which force all matches to be directories) are preserved as well, the behavior of GLOB_MARK has been adjusted not to append an additional slash to results that already end in slash.
this code has been long overdue for a rewrite, but the immediate cause
that necessitated it was total failure to see past unreadable path
components. for example, A/B/* would fail to match anything, even
though it should succeed, when both A and A/B are searchable but only
A/B is readable. this problem both was caught in conformance testing,
and impacted users.
the old glob implementation insisted on searching the listing of each
path component for a match, even if the next component was a literal.
it also used considerable stack space, up to length of the pattern,
per recursion level, and relied on an artificial bound of the pattern
length by PATH_MAX, which was incorrect because a pattern can be much
longer than PATH_MAX while having matches shorter (for example, with
necessarily long bracket expressions, or with redundancy).
in the new implementation, each level of recursion starts by consuming
the maximal literal (possibly escaped-literal) path prefix remaining
in the pattern, and only opening a directory to read when there is a
proper glob pattern in the next path component. it then recurses into
each matching entry. the top-level glob function provided automatic
storage (up to PATH_MAX) for construction of candidate/result strings,
and allocates a duplicate of the pattern that can be modified in-place
with temporary null-termination to pass to fnmatch. this allocation is
not a big deal since glob already has to perform allocation, and has
to link free to clean up if it experiences an allocation failure or
other error after some results have already been allocated.
care is taken to use the d_type field from iterated dirents when
possible; stat is called only when there are literal path components
past the last proper-glob component, or when needed to disambiguate
symlinks for the purpose of GLOB_MARK.
one peculiarity with the new implementation is the manner in which the
error handling callback will be called. if attempting to match */B/C/D
where a directory A exists that is inaccessible, the error reported
will be a stat error for A/B/C/D rather than (previous and wrong
implementation) an opendir error for A, or (likely on other
implementations) a stat error for A/B. such behavior does not seem to
be non-conforming, but if it turns out to be undesirable for any
reason, backtracking could be done on error to report the first
component producing it.
also, redundant slashes are no longer normalized, but preserved as
they appear in the pattern; this is probably more correct, and falls
out naturally from the algorithm used. since trailing slashes (which
force all matches to be directories) are preserved as well, the
behavior of GLOB_MARK has been adjusted not to append an additional
slash to results that already end in slash.
fix redundant computations of strlen in glob append function 2018年10月11日T18:27:15+00:00 Rich Felker dalias@aerifal.cx 2018年10月11日T18:27:15+00:00 09a805a62307307230a31125425d0c2b0b6f332e len was already passed as an argument, so don't use strcat, and use memcpy instead of strcpy.
len was already passed as an argument, so don't use strcat, and use
memcpy instead of strcpy.
fix invalid substitute of [1] for flexible array member in glob 2018年10月11日T18:23:14+00:00 Rich Felker dalias@aerifal.cx 2018年10月11日T18:23:14+00:00 e2552581bc004f7dc5ee9ac220cad8abeae19bba
remove spurious inclusion of libc.h for LFS64 ABI aliases 2018年09月12日T18:34:38+00:00 Rich Felker dalias@aerifal.cx 2018年09月12日T04:28:34+00:00 63a4c9adf227a6f6a5f7f70f6dc3f8863f846927 the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
fix regression in glob with literal . or .. path component 2017年10月21日T16:32:16+00:00 Rich Felker dalias@aerifal.cx 2017年10月21日T16:17:49+00:00 ec04d122f1182aeb91f39b0e80ae40c68e4d9605 commit 8c4be3e2209d2a1d3874b8bc2b474668fcbbbac6 was written to preclude the GLOB_PERIOD extension from matching these directory entries, but also precluded literal matches. adjust the check that excludes . and .. to check whether the GLOB_PERIOD flag is in effect, so that it cannot alter behavior in cases governed by the standard, and also don't exclude . or .. in any case where normal glob behavior (fnmatch's FNM_PERIOD flag) would have included one or both of them (patterns such as ".*"). it's still not clear whether this is the preferred behavior for GLOB_PERIOD, but at least it's clear that it can no longer break applications which are not relying on quirks of a nonstandard feature.
commit 8c4be3e2209d2a1d3874b8bc2b474668fcbbbac6 was written to
preclude the GLOB_PERIOD extension from matching these directory
entries, but also precluded literal matches.
adjust the check that excludes . and .. to check whether the
GLOB_PERIOD flag is in effect, so that it cannot alter behavior in
cases governed by the standard, and also don't exclude . or .. in any
case where normal glob behavior (fnmatch's FNM_PERIOD flag) would have
included one or both of them (patterns such as ".*").
it's still not clear whether this is the preferred behavior for
GLOB_PERIOD, but at least it's clear that it can no longer break
applications which are not relying on quirks of a nonstandard feature.

AltStyle によって変換されたページ (->オリジナル) /