how to implement common sub-expr eliminate · expr-lang/expr · Discussion #675

zhuliquan
Jun 21, 2024

few month ago, I send PR about common sub-expr climinate (CSE) .
I post a idea to implement common sub-expr eliminate, i.e. analysis common sub-expr in compile phase and real-time checking common sub-expr already used and reused result of common sub-expr in-flight. in this way, vm need to make new buffer for cache result of sub-expr in each running. In this PR, antonmedv think should done on the compile spec with AST transformation (i.g. foo[0].bar.baz == 2 && foo[0].bar.goz == 3 => let tmp = foo[0].bar; tmp.baz == 2 && tmp.goz == 3). At that time, I also agreed with his idea, util I found CSE issue in datafusion. Datafusion also extract common sub-expr from original expr in optimize phase. but in this way, optimizer breaks the order of expr computing in short-circuit expr. Let me use two examples to illustrate the shortcomings of AST transformation CSE solutions.
example1:
expr: (condition1 ? sub-expr1 : sub-expr2) or sub-expr1, sub-expr1 will be extracted from original expr. It will ignore whether condition1 is true. it may cause some panic, because sub-expr1 should is executed when condition1 is true.
example2:
expr: sub-expr2 and (sub-expr1 or sub-expr3) and (sub-expr1 or sub-expr4) , sub-expr1 will be extracted from original expr. It will ignore result of sub-expr2. sub-expr2 may be false. evaluating sub-expr1 ahead of time will make running time longer.
Now, datafusion package already prevent short-circuit expr to be eliminated. However with this kind of restriction, it is almost impossible to eliminate any common expressions.
So I think my approach is more robust and more safe (i.e. make sure correct answer and computing faster)

Replies: 1 comment 1 reply

antonmedv
Jun 21, 2024
Maintainer

Another solution will be implementing cache inside user functions.

This way functions have full control over cache implementation.

1 reply

@zhuliquan

zhuliquan Jun 23, 2024
Author

Another solution will be implementing cache inside user functions.

This way functions have full control over cache implementation.

Good ideas.
this solution, cache inside user function is value-based, not symbol-based. For example. expr: startsWith(lower(foo), "bar1") or startsWith(lower(foo), "bar2"), lower(foo) is common sub-expr. we need to insert value of variable foo as key into cache inside lower function. If value of variable foo is long, it will cost much memory. We can build symbol-based cache for reused result of common sub-expr. even, we can encode common sub-expr to index of array (this process can be involved in compile phase) to make access result of sub-expr faster. However, symbol-based cache also can't make full use of the results that have already been calculated. For example. startsWith(lower(foo1), "bar1") or startsWith(lower(foo2), "bar2"), although lower(foo1) and lower(foo2) are completely different based on symbol system, they may also have the same value. Besides, how to handle basic arithmetic / logic expr (i.g. foo + 2 == bar1 or foo + 2 == bar2). it need user to implement function for arithmetic / logic operator.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

how to implement common sub-expr eliminate #675

Uh oh!

{{title}}

Uh oh!

zhuliquan
Jun 21, 2024

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

antonmedv
Jun 21, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

zhuliquan Jun 23, 2024
Author

Select a reply

Uh oh!

Uh oh!

how to implement common sub-expr eliminate #675

Uh oh!

zhuliquan Jun 21, 2024

Replies: 1 comment · 1 reply

Uh oh!

antonmedv Jun 21, 2024 Maintainer

Uh oh!

zhuliquan Jun 23, 2024 Author

zhuliquan
Jun 21, 2024

Replies: 1 comment 1 reply

antonmedv
Jun 21, 2024
Maintainer

zhuliquan Jun 23, 2024
Author