Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

how to implement common sub-expr eliminate #675

Unanswered
zhuliquan asked this question in General
Discussion options

few month ago, I send PR about common sub-expr climinate (CSE) .
I post a idea to implement common sub-expr eliminate, i.e. analysis common sub-expr in compile phase and real-time checking common sub-expr already used and reused result of common sub-expr in-flight. in this way, vm need to make new buffer for cache result of sub-expr in each running. In this PR, antonmedv think should done on the compile spec with AST transformation (i.g. foo[0].bar.baz == 2 && foo[0].bar.goz == 3 => let tmp = foo[0].bar; tmp.baz == 2 && tmp.goz == 3). At that time, I also agreed with his idea, util I found CSE issue in datafusion. Datafusion also extract common sub-expr from original expr in optimize phase. but in this way, optimizer breaks the order of expr computing in short-circuit expr. Let me use two examples to illustrate the shortcomings of AST transformation CSE solutions.
example1:
expr: (condition1 ? sub-expr1 : sub-expr2) or sub-expr1, sub-expr1 will be extracted from original expr. It will ignore whether condition1 is true. it may cause some panic, because sub-expr1 should is executed when condition1 is true.
example2:
expr: sub-expr2 and (sub-expr1 or sub-expr3) and (sub-expr1 or sub-expr4) , sub-expr1 will be extracted from original expr. It will ignore result of sub-expr2. sub-expr2 may be false. evaluating sub-expr1 ahead of time will make running time longer.
Now, datafusion package already prevent short-circuit expr to be eliminated. However with this kind of restriction, it is almost impossible to eliminate any common expressions.
So I think my approach is more robust and more safe (i.e. make sure correct answer and computing faster)

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

Another solution will be implementing cache inside user functions.

This way functions have full control over cache implementation.

You must be logged in to vote
1 reply
Comment options

Another solution will be implementing cache inside user functions.

This way functions have full control over cache implementation.

Good ideas.
this solution, cache inside user function is value-based, not symbol-based. For example. expr: startsWith(lower(foo), "bar1") or startsWith(lower(foo), "bar2"), lower(foo) is common sub-expr. we need to insert value of variable foo as key into cache inside lower function. If value of variable foo is long, it will cost much memory. We can build symbol-based cache for reused result of common sub-expr. even, we can encode common sub-expr to index of array (this process can be involved in compile phase) to make access result of sub-expr faster. However, symbol-based cache also can't make full use of the results that have already been calculated. For example. startsWith(lower(foo1), "bar1") or startsWith(lower(foo2), "bar2"), although lower(foo1) and lower(foo2) are completely different based on symbol system, they may also have the same value. Besides, how to handle basic arithmetic / logic expr (i.g. foo + 2 == bar1 or foo + 2 == bar2). it need user to implement function for arithmetic / logic operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /