-
Notifications
You must be signed in to change notification settings - Fork 54
SPEC 12: Formatting mathematical expressions #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @mdhaber @stefanv @jarrodmillman @j-bowhay and add yourself as co-authors of course 😉
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the definition of "group" above, it is not possible for operators within a group to have different priority.
I believe the definition of "group" was supposed to be something more like a sequence of operations that relies on implicit order of operations rules. Examples include:
- a logical line
- operations within parentheses
- the expression and for/if clauses of a list comprehension
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should show an example where splitting the line makes sense. We would not want to suggest that a / b + c*d should be split across four lines. Consider referring to PEP8 "Should a Line Break Before or After a Binary Operator".
I'm not sure if the use of the term "logical block" is correct. This is a single logical line split across multiple physical lines.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to follow the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so?
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule appears to conflict with the previous rule about space around operators * and /.
I think the reason for an exception - if there needs to be one - should be more explicit. I remember discussing in person that linting tools might check the AST to ensure that it is not modified by an auto-correction, but this is not something that the user will necessarily be thinking abou. A user might be ordering sequences of operations in a particular way to get floating point arithmetic to do what they want. If they are tempted to break the rules to do so:
- the linter does not have to be able to make the correction automatically
- the user is welcome to use parentheses
- the user is welcome to declare an exception to the rule with
noqa
I think the rules need to be more complete before we can assess whether there a need for an exception, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general thought, rules could conflict if we ask them to be applied one after the order in a strict order.
The principle of these rules is that users should not care at all and even learn them. The linters are here for that. If a user does something in a certain order for arithmetic reasons and there is a reordering happening, then we can ask tools to either not reorder or provide a skip for a given rule. See the last point in the notes.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure "group" here is being used in the same sense as elsewhere. I remember discussing in person the idea that a/d*b**c is preferable to a*b**c/d unless there are explicit parentheses, like (a*b**c)/d, but it would not be wrong to do a/d*b**c + e/f*g**h, yet the plus comes after higher priority operators.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be another exception below.
https://github.com/scientific-python/specs/pull/326/files#r1631763770
According to "Only exception is if the expression consist of a single operator linking two groups", there is no exception for:
(a*b)*(c*d)*(e*f)
because there are three groups? Or do you mean that you need a space when any binary operator is linking two explicit "groups" enclosed by parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example it means no spaces because we have 3 groups.
And if we have 2 groups then each group must have at least one operator in it (ie not just a variable or single number).
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are good, for now o just copy pasted the whole block in Black. But yes we should remove the correct ones from there.
@tupui
tupui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt. If you are going to be an author, feel free to just go ahead and commit changes 😉
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
tupui
commented
Aug 11, 2024
Hi @charliermarsh @ambv 👋 I had pinged you both on Twitter some years back about a standard for mathematical equations. At the time you both said this could interest you for both Ruff and Black if we, the Scientific Python community, could come to an agreement.
To be clear, I am of course in no position to ask for any commitment from you and I just hope that you find the topic interesting enough.
This is getting into shape and we have a preliminary draft we would like to present you 😃
Any comments would greatly help and in the end this can also only work if both Ruff and Black would be able and willing to implement this specification 🙏
Thanks again both!
charliermarsh
commented
Aug 11, 2024
Awesome, I look forward to reading + engaging here.
Thanks @charliermarsh. I think the main question right now is whether this is close to being precise enough to be implementable. For example, I don't have much background with ASTs, so perhaps you can suggest ideas that would replace the notions like "implicit subexpressions" I attempted to define. One we have a better understanding of how to write this standard so that it's implementable, we will want to get feedback from a wider audience about adjustments to the particular rules.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that we should add parentheses here (as in: is this an exception to Rule 0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is intended to be an example of "otherwise specified".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of these rules differ based on the expression type? E.g., all of the examples below use Name nodes (like x, y, etc.). What if the expression uses calls or subscripts or similar? Like f() ** 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add examples, but no - to keep things simple, I didn't consider changing the rules based on that.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...of the whole expression"
Assuming I'm reading this correctly, I would be careful here as for j in range(1, i + 1) is not an "expression" in the sense of the Python AST. for is a kind of "statement".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we correct this as simply as:
?
I think the point I was trying to make would still be made.
spec-0012/index.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "proper subset of the whole" mean? Sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proper subset of a set
This is saying that even though an expression like (x + y*z) might be the entire expression in question, we might still refer to it as a "subexpression" when the distinction is not important.
Does this need a clarification, or do you think it would be clear to those familiar with the term "proper subset"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense! I think it will be clear to those that are familiar with the term.
Thanks again @MichaReiser for the points you raised above. I've had a little more time to think about them and I think you did a great job of explaining the challenges of adopting this in Ruff.
- We decided not to implement rules that conflict with the formatter because formatted code should never raise new lint violations. This property is essential to integrate the formatter into check.
- Some of the proposed style guide rules do conflict with the formatter.
The style guide conflicts with existing rules, and we can't just remove them. We have users who depend on them, even if the rules might go beyond what PEP-8 specifies. Arguably, we should never have added those rules,, but here we are.
One idea that I'm not sure has been considered yet would be a completely separate mode of the formatter + linter. The start point would be implementing this specification as lint rules and being able to format code to comply with it, but in isolation (just this spec). If that could be achieved, then any rules and formatting that do not conflict with this spec could then be added into this separate mode.
This separate mode would never be able to offer a superset of ruff's regular functionality. But it could still be enough for us.
- Implementing a formatting style guide is a multi-week, if not multi-month, project. It also increases the complexity to push any new formatting changes because they now need to fit into and be tested with multiple styles guides.
Of course, the main problem is always time. At least with this suggestion, the separate mode wouldn't block new formatting changes from integrating with "normal" ruff. That work would have to happen at some point if we wanted those changes in the separate mode, but in theory it could be done at a way later point.
- I'm very hesitant about introducing a new style guide into the formatter because it would undo some of Black's accomplishments in establishing a widely agreed upon style guide.
Makes sense, although the context for this discussion is that Black’s accomplishments have been largely unable to reach the Scientific Python community due to concerns like this and the formatting of arrays. So we would at least be extending past Black’s accomplishments in some ways :)
Now I can see a lot of arguments for why a completely separate mode like this wouldn’t belong in Ruff’s API but be a separate package which uses Ruff’s machinery under the hood. Maybe that is what we’re looking for.
Still, I think there would be value in translating the non-conflicting parts of this spec to additional lint rules. I’m just trying to be ambitious :)
Note that there is one unclosed LaTeX expression, which would be good to fix first.
Done.
It feels like there could be more clarity between (1) and (5) around when brackets may be / should be inserted to improve an expression.
Can you give an example expression?
a + x*y**3 is rather hard to parse for my eye
Perhaps you're happy with Black's rules for mathematical expressions, and that's fine. I got the sense that others weren't, though, so the rules here attempt to balance the conflicting desires:
- include whitespace, which makes it easier to distinguish multi-character variables from one another and from operators, and
- remove whitespace to visually reinforce the order of operations (as typeset math does and PEP8 calls for)
without requiring the addition of lots of parentheses in expressions involving the common PEMDAS operators.
When I type $a+x \cdot y^3$, it renders as x + y * x**3, and I'd be happy to suggest a set of rules based on that. But assuming that's not in the cards and we only get one whitespace around operators, I preferred to use it to show that the multiplication happens before the addition even though the addition operation appears first on the line.
That's the rationale, but other viewpoints (e.g. "whitespace is only omitted around the highest-priority operator of a compound expression") are perfectly valid, too, so I don't think we'll be able to reason out which is best. I think it may be a matter of either adopting a mostly-baked proposal like this one or putting forth equally complete alternatives and voting. Or maybe we accept something like this as a draft and vote on specific changes like that.
The definition section is a bit daunting...
Ah, right, Pamphile mentioned that before, and we discussed moving it to a reference section at the end. I'll do that.
Intro...
Yeah, I'll take another look at that.
And I think the rules don't forbid developers to include extra parentheses in cases like this if they would like to be extra clear.
The first rule is "do not add extraneous parentheses", but the last is "Any of the preceding rules may be broken if there is a clear reason to do so." So if the rules lead you to an expression that is particularly difficult to parse, break them. But a + (x * y**3) is not recommended because the rules would permit a + x*y**3.
(Another set of rules I considered would be based on "Whitespace is only omitted around the highest-priority operator of a compound expression" and something like "Expressions with more than two operator priority levels must make subexpressions with two operator priority levels explicit.* This would mean you would have to write a + (x * y**3). But I didn't think people would want all those parentheses.)
lucascolley
commented
Sep 29, 2024
The first rule is "do not add extraneous parentheses", but the last is "Any of the preceding rules may be broken if there is a clear reason to do so."
I guess I was alluding to the possibility that a formatter wouldn't add parentheses by default, but also wouldn't take them away if they are already there.
stefanv
commented
Oct 12, 2024
pre-commit.ci autofix
stefanv
commented
Oct 12, 2024
- It feels like there could be more clarity between (1) and (5) around when brackets may be / should be inserted to improve an expression.
Sorry, that should have been (0) and (5).
It's not super intuitive to me when you can / should / must use brackets; but it may well be that the rules are comprehensive.
Wouldn't it be OK to always allow stylistic brackets, for aesthetic grouping, or grouping that is logical to the author?
[This discussion is not a blocker to merging the PR, btw.]
mdhaber
commented
Oct 12, 2024
Yes, see rule 9.
stefanv
commented
Oct 13, 2024
Yes, see rule 9.
"Any of the preceding rules may be broken if there is a clear reason to do so."
Yes? Because you asked:
Wouldn't it be OK to always allow stylistic brackets, for aesthetic grouping, or grouping that is logical to the author?
It even shows examples of adding parentheses that aren't necessary.
stefanv
commented
Oct 13, 2024
I just posted the quote so others don't have to look it up.
@mdhaber
mdhaber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stefanv
commented
Oct 14, 2024
@stefanv Math still didn't render. Any thoughts?
Requires a theme update. Fix has already been made. /cc @jarrodmillman
Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
This follows the proposal on the forum https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions
See other linked discussions as well.