Re: Patterns
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Patterns
- From: Mouse <mouse@...>
- Date: 2022年12月14日 16:31:34 -0500 (EST)
> Suppose the topic filter "/sport/#" and these [...]
> /sport
> /sport/racing
> /sport/racing/champion
> /sporting
> Only the first three are allowed to match, so "/sport.*" is not an
> option as that would match the forth one as well. On the other hand
> "/sport/.*" would not match the first one.
Could this maybe be a use case for importing a full-powered regex
package and then matching "^/sport(/|$)" or some such?
Or perhaps matching against /sport and /sport/.* both?
As for other comments, well, it's not clear to me whether you're just
coding to a spec you have nothing else to do with or you're involved in
creating the spec you cite. But it appears to be confusing characters
with Unicode codepoints:
 When it performs subscription matching the Server MUST NOT perform any
 normalization of Topic Names or Topic Filters, or any modification or
 substitution of unrecognized characters [MQTT-4.7.3-4]. Each
 non-wildcarded level in the Topic Filter has to match the
 corresponding level in the Topic Name character for character for the
 match to succeed.
It may be just sloppy language, or it may be disambiguated elsewhere (I
didn't read all 8000+ lines), but "character" in a Unicode environment
can be an ambiguous term, at least sometimes including things that look
like single characters to a user but formed from multiple codepoints
using combining codepoints. For example, A-grave can be represented by
combining U+0300 and U+0041 or by U+00C0 by itself. Forbidding
normalization leads to counterintuitive things, such as a pattern
containing A-grave (potentially) matching some-to-none of the Topic
Names containing A-grave.
/~\ The ASCII				 Mouse
\ / Ribbon Campaign
 X Against HTML		mouse@rodents-montreal.org
/ \ Email!	 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B