I'm writing a Python CSS-selector library that allows one to write these kinds of expressions in Python as a pet project. The goal of the library is to represent selectors in a flat, intuitive and interesting way; all valid syntax defined by the Selectors Level 4 Draft must be supported, in one way or another.
# lorem|foo.bar[baz^="qux"]:has(:invalid)::first-line
selector = (Namespace('lorem') | Tag('foo')) \
.bar \
# Can also be written as [Attribute('baz').starts_with('qux')]
[Attribute('baz', '^=', 'qux')] \
# '>>' is used instead of ' '.
[:'has', (Selector.SELF >> PseudoClass('invalid'),)] \
[::'first-line']
Here's how the hierachy looks like (/
signifies an alias, ()
mixin superclasses):
Selector(ABC) # Enum too?
├── PseudoElement
├── ComplexSelector(Sequence[CompoundSelector | Combinator])
├── CompoundSelector(Sequence[SimpleSelector])
├── SimpleSelector
│ ├── TypeSelector / Tag
│ ├── UniversalSelector
│ ├── AttributeSelector / Attribute
│ ├── ClassSelector / Class
│ ├── IDSelector / ID
│ └── PseudoClass
├── SELF / PseudoClass('scope')
└── ALL / UniversalSelector()
Combinator
├── ChildCombinator: '__gt__' / '>'
├── DescendantCombinator: '__rshift__' / '>>'
├── NamespaceSeparator: '__or__' / '|'
├── NextSiblingCombinator: '__add__' / '+'
├── SubsequentSiblingsCombinator: '__sub__' / '-'
└── ColumnCombinator: '__floordiv__' / '//'
This design has some disadvantages:
The replacements of combinators:
- Descendant combinator () → right shift (
>>
) - Column combinator (
||
) → floor division (//
) - Subsequent-siblings combinator (
~
) → minus/subtract (-
)
>>
and//
are currently not valid combinators, but may be in the future. The last is much safer, since-
is already considered a valid character for<ident-token>
s.- Descendant combinator () → right shift (
Functional pseudo-classes needs a comma between its name (a string/non-callable) and its arguments (a tuple):
[:'where', (Class('foo'), Class('bar'))]
Those disadvantages might need to be considered while modifying the design around the limitations:
- HTML classes with hyphens cannot be added with Python dotted attribute syntax (
.foo-bar
); not to mention, this also means that any classes that implement this syntax using__getattr__
/__getattribute__
won't be able to have any methods. - Currently there is no way to add an ID in the middle of a compound selector. Since Python doesn't have a
#
operator I'm at a loss. I have thought about overloading__call__
butTag('foo').bar('baz')
orTag('foo')[Attribute('qux')]('baz')
would look too much like a normal method call.
How should I go about working around these limitations?
2 Answers 2
Because Python has very different syntax and semantics from CSS selectors, I think these problems will only get worse. You'll end up with something that doesn't look like CSS does and something that doesn't work like Python usually does. Therefore I would like to propose a different way of approaching the syntax.
CSS selectors are mostly a linear combination of simple selectors and combinators. I would suggest using that, and representing something like ns|p a:link
as something like Tag('p', namespace='ns') + Descendant() + Tag('a') + PseudoClass('Link')
.
That is, you only use a single magic method to represent concatenation. Everything else is just regular Python objects, using regular Python constructors.
Your example could be
# lorem|foo.bar[baz^="qux"]:has(:invalid)::first-line
selector = Tag('foo', namespace='lorem) + \
Class('bar') + \
Attribute('baz', '^=', 'qux') + \
PseudoClass('has', Selector.SELF + Descendant() + PseudoClass('invalid')) + \
PseudoElement('first-line')
It may not be exactly what you were looking for, but it has the advantage that it is much easier to learn for Python users because it has much fewer rules and exceptions, and you don't need to worry about new selectors or incompatible syntax.
You can also use &
instead of +
, in which case you can represent a selector list with |
, for example: p.warning, #bigwarning
can become Tag('p') & Class('warning') | ID('bigwarning')
.
An alternative idea is to use no magic at all, and represent compound and/or complex selectors using lists or wrapper objects.
foo.bar > a
might be something like Child([Tag('foo'), Class('Bar')], [Tag('a')])
(compound selectors are lists, complex selectors are wrapped by combinators) or [Tag('foo'), Class('bar'), Child(), Tag('a')]
(complex selectors are lists containing the selectors).
The best option depends on ergonomics, and the ergonomics depend on how users will build and manipulate selectors and for what purpose.
You want to represent CSS concepts using valid python syntax which "looks like" the CSS source text.
Simplest approach would be stick with straight CSS source text,
which we can roundtrip through deserializers and serializers.
Representing punctuation-heavy CSS as python source
will be inherently lossy, so you're going to have to
store the details somewhere, perhaps in a global dict
or in various """docstrings""".
It would be worthwhile to explicitly write down your various goals and tradeoffs. For example, getting IDE navigation / autocompletion "for free" might be one of the things you find attractive about your proposed scheme.
Python notation has been exploited for representing SI units, algebra, vector math, and pathnames. The notation is already a good fit for these domains, in some cases because the language strove to be a good fit. So lossless representation can often be achieved.
There are two mature problem domains that you might wish to take inspiration from.
sqlalchemy
The SQLAlchemy community uses at least one python DSL, arguably multiple ones, to represent SQL operations.
The impedance match is not perfect. Operator
precedence
is a bit of a rough edge, with a OR b
turning
into (a) | (b)
when the two terms are complex.
For some operators, such as IN
, we resort to
.in_()
method call notation despite the
in
keyword seeming to be available.
Table or column names in principle can incorporate SPACE
and many other characters, especially when "`"
or other quoting mechanisms are used.
But in practice DBAs will often choose to adhere
to a conservative regex such as r("^\w+$")
.
Your approach might offer enough advantages that
web designers would choose to adhere to conservative
naming conventions, so e.g. "a-b" --> a_b
--> "a-b"
could be safely round-tripped.
SQL JOINs are commonly more than a hundred lines long, and a great many production queries have been recast to fit within this DSL.
type annotation
Type hinting continues to be something of a moving target in the python community. An application's source code might be read by an "old" or "new" interpreter, or type checker.
Expressing types in a back-compatible way for old interpreters or checkers has been a source of tension, often relieved via a string annotation escape valve. Forward references sometimes raise challenges that are resolved in the same way. In recent years we've had less need for this escape valve.
We see annotations appearing in the AST, and also in comment text.
The experiences of the type annotation community seem most relevant to your CSS goals.
Your goals are still a bit nebulous at this point. Several developer communities have traveled down this road, showing what works well, or poorly, or would work better after adopting some PEPs. You may be able to draw inspiration, learn from mistakes, and better predict a path to success by looking back at this history and incorporating some elements in your project goals.
-
I don't see how this answers my question. I need a working hierachy so that I can use it to represent the parsing results of my (to-be-written) CSS-selector parser. That SQLAlchemy also uses Python operator overloading has nothing to do with my CSS selector library, and type annotation is a matter of implementation, not of API designing/library architecture.InSync– InSync2023年10月28日 23:20:08 +00:00Commented Oct 28, 2023 at 23:20
-
1The OP question needs focus, details, or clarity. I was encouraging you to refine your goals. Otherwise my short answer would be, "a python parse tree is inadequate to represent the CSS details you care about; you should abandon this hopeless end game." Which I phrased as "stick with straight CSS source text". I was trying to channel your requirements gathering efforts in a constructive direction, to turn a Quixotic cause into one that stood a chance of delivering value to end users who might choose to adopt it.J_H– J_H2023年10月29日 01:38:50 +00:00Commented Oct 29, 2023 at 1:38
:'has'
instead of':has'
etc.selector[:'has']
calls__getitem__()
with aslice(None, 'has', None)
(and[::'first-line']
withslice(None, None, 'first-line')
).