-
Notifications
You must be signed in to change notification settings - Fork 1.9k
How to connect two alias dataflow node in Go? #17329
-
I am new to dataflow library for Go. I want to automatically find the alias slice reference expression with same value when executing in Go. I have a program as follows:
func main() { var s []int s[0] = nil s = s[1:] }
Is there any dataflow predicates I can use in dataflow library for Go to connect s in s[0] and s in s[1:]?
I tried two functions, localFlowStep and globalValueNumber, to connect exprNode(s in s[0]) and exprNode(s in s[1:]), but failed. I have tried following queries.
import go import DataFlow predicate indexOfSlice(IndexExpr sliceIdx, SliceExpr slice) { localFlow(exprNode(sliceIdx), exprNode(slice.getBase())) } from IndexExpr sliceIdx, SliceExpr slice where indexOfSlice(sliceIdx, slice) select sliceIdx, slice
The result of the above query contains nothing.
import go import DataFlow from ExprNode s, Node d where globalValueNumber(s) = globalValueNumber(d) select s, d
I don't find s[0] and s in s[1:] in the result using the above query.
I don't know which predicate I can use to connect these two node. Is there any predicate? The conditions requested in the question can be relaxed, i.e. connecting s[0] and s in s[1:] or s[0] and s[1:] or other pairs are all allowed.
It is not a good idea to test whether results of toString of two expressions are equal, since it can't detect alias variables with same string. For example:
func foo() { a := 2 x := b.f x.g = &a b.f.g = &a }
Obviously, x.g and b.f.g are alias expressions because x is just b.f. But the string of two expression is not the same. I think x and b.f may use the same SSA variable.
Beta Was this translation helpful? Give feedback.
All reactions
Hi @Lslightly,
Thanks for your question.
In the first scenario you should be able to connect s with s[0] and s[1:] through an SSAVariable and its uses.
For example,
predicate relatedBy(SsaVariable v, Name n, Name m) {
n != m and
v.getAUse().isFirstNodeOf(n) and v.getAUse().isFirstNodeOf(m)
}
The second example is much more complicated because you need to keep track of aliases and match access path to relate access paths.
For example, to relate x.g and b.f.g you can relate x with b through b.f and later match x.g with b.f.g through substitution of x in the access path with the access path b.f.
An overfitted, but possible useful starting point, example is:
predicate isAliasedThroug...
Replies: 1 comment 5 replies
-
Hi @Lslightly,
Thanks for your question.
In the first scenario you should be able to connect s with s[0] and s[1:] through an SSAVariable and its uses.
For example,
predicate relatedBy(SsaVariable v, Name n, Name m) {
n != m and
v.getAUse().isFirstNodeOf(n) and v.getAUse().isFirstNodeOf(m)
}
The second example is much more complicated because you need to keep track of aliases and match access path to relate access paths.
For example, to relate x.g and b.f.g you can relate x with b through b.f and later match x.g with b.f.g through substitution of x in the access path with the access path b.f.
An overfitted, but possible useful starting point, example is:
predicate isAliasedThroughAccessPath(SsaVariable alias, SsaVariable base, string path) { exists(SelectorExpr selector | base.getAUse().isFirstNodeOf(selector.getBase+()) | globalValueNumber(DataFlow::ssaNode(alias)) = globalValueNumber(DataFlow::exprNode(selector)) and getAccessPath(selector) = path ) } string getAccessPath(SelectorExpr e) { if e.getBase() instanceof SelectorExpr then exists(string root | root = getAccessPath(e.getBase())| result = root + "." + e.getSelector().getName() ) else result = e.getBase().(VariableName) + "." + e.getSelector().getName() } predicate isAliasedSelector(SelectorExpr n, SelectorExpr m) { exists(SsaVariable baseOfN, SsaVariable baseOfM, string path | isAliasedThroughAccessPath(baseOfN, baseOfM, path) and baseOfN.getAUse().isFirstNodeOf(n.getBase+()) and baseOfM.getAUse().isFirstNodeOf(m.getBase+()) and getAccessPath(m) = getAccessPath(n).regexpReplaceAll("^"+baseOfN.getSourceVariable().getName()+"\\.", path + ".") ) } from SelectorExpr n, SelectorExpr m where isAliasedSelector(n, m) select n, m
I will also ask the Go team for suggestions, because they are our CodeQL Go experts.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for writing such a detailed answer. I have learned a lot from it.
For the first example, I tried SsaVariable after I posed the question and I got similar solution. Thank you for your kindness. To be more specific with regard to this case, I write the query as follows:
// `sliceIdx` represents `s[0]`, `slice` represents `s[1:]` predicate useSameSlice(IndexExpr sliceIdx, SliceExpr sliceExpr) { exists( SsaVariable sliceDef | sliceDef.getType().getUnderlyingType() instanceof SliceType and sliceDef.getAUse() = exprNode(sliceIdx.getBase()).asInstruction() and sliceDef.getAUse() = exprNode(sliceExpr.getBase()).asInstruction() ) }
For the solution to the second example, my understanding is annotated in the following code.
// x := b.f // x.g = &a // b.f.g = &a /** * This predicate is equal to the constraint that `base.f` is alias to `alias` * | ↑ | * `path` * `alias` is the ssa variable of `x`(the alias of `b.f`) * `base` is the ssa variable of `b` * `selector` is the selector expression `b.f` * `path` is access path string of `b.f`, evaluating to `b.f` */ predicate isAliasedThroughAccessPath(SsaVariable alias, SsaVariable base, string path) { exists(SelectorExpr selector | base.getAUse().isFirstNodeOf(selector.getBase+()) | globalValueNumber(DataFlow::ssaNode(alias)) = globalValueNumber(DataFlow::exprNode(selector)) and getAccessPath(selector) = path ) } /** * input `x.f`, output string `x.f` * input `b.f.g`, output string `b.f.g` */ string getAccessPath(SelectorExpr e) { if e.getBase() instanceof SelectorExpr then exists(string root | root = getAccessPath(e.getBase())| result = root + "." + e.getSelector().getName() ) else result = e.getBase().(VariableName) + "." + e.getSelector().getName() } /** * `n` is `x.g` * `m` is `b.f.g` * `baseOfN` is ssa variable of `x` * `baseOfM` is ssa variable of `b` in `b.f.g` * `path` is the string `b.f` * string `b.f.g` = string `x.g`.regexpReplaceAll("^x\\.", "b.f.") */ predicate isAliasedSelector(SelectorExpr n, SelectorExpr m) { exists(SsaVariable baseOfN, SsaVariable baseOfM, string path | isAliasedThroughAccessPath(baseOfN, baseOfM, path) and baseOfN.getAUse().isFirstNodeOf(n.getBase+()) and baseOfM.getAUse().isFirstNodeOf(m.getBase+()) and getAccessPath(m) = getAccessPath(n).regexpReplaceAll("^"+baseOfN.getSourceVariable().getName()+"\\.", path + ".") ) } /** * `n` is `x.g`, `m` is `b.f.g` */ from SelectorExpr n, SelectorExpr m where isAliasedSelector(n, m) select n, m
If my testing has no problems, x is a SsaVariable and evaluating b.f.g in b.f.g = &a will not use x SsaVariable to simplify the calculation but re-evalute the expression through 1) b SsaVariable and 2) b.f Instruction and 3) assignment to b.f.g Instruction. So the only approach to judge whether x and b.f are aliases is to test the equality of globalValueNumber instead of SsaVariable or IR::Instruction.
Beta Was this translation helpful? Give feedback.
All reactions
-
Continue the discussion on globalValueNumber. I've tried the following query for the first example. But it returns nothing. The reason I try globalValueNumber is that I also want to recognize the equality of a.s in a.s[0] and a.s in a.s[1:] in a.s = a.s[1:]. I think (s[0], s = s[1:]) and (a.s[0], a.s = a.s[1:]) are similar. But the fact is that a.s is not a use of SsaVariable, so the equality can be only judged by globalValueNumber if the conclusion in my last reply is correct. But globalValueNumber also fails. So what is the solution for more general cases?
predicate useSameSlice(IndexExpr sliceIdx, SliceExpr sliceExpr) { // exists( // SsaVariable sliceDef | // sliceDef.getType().getUnderlyingType() instanceof SliceType // and sliceDef.getAUse() = exprNode(sliceIdx.getBase()).asInstruction() // and sliceDef.getAUse() = exprNode(sliceExpr.getBase()).asInstruction() // ) globalValueNumber(exprNode(sliceIdx.getBase())) = globalValueNumber(exprNode(sliceExpr.getBase())) } from IndexExpr sliceIdx, SliceExpr slice where useSameSlice(sliceIdx, slice) select slice
Beta Was this translation helpful? Give feedback.
All reactions
-
Before we get too deep into the AST and SSA trickery that's possible here, could you say a little about what you want to achieve by identifying possible aliases like this? That will help me propose a solution at the right level of abstraction for your goals.
Beta Was this translation helpful? Give feedback.
All reactions
-
Before we get too deep into the AST and SSA trickery that's possible here, could you say a little about what you want to achieve by identifying possible aliases like this? That will help me propose a solution at the right level of abstraction for your goals.
I want to recognize the pattern as follows. expr1 in expr1[Idx] has the same value as expr2 in expr2[Idx+...:]. expr1 and expr2 evaluate to the same slice value at runtime.
expr1[Idx] = nil expr2 = expr2[Idx+...:]
Beta Was this translation helpful? Give feedback.
All reactions
-
Apologies for the late response; I must have overlooked the notification of your latest reply.
Taking the example Go program
package abc type S struct { f []int } type T struct { s S } func f(s S, t T) int { ts := t.s return s.f[0] + s.f[1] + t.s.f[0] + ts.f[1] }
One useful tool is the SsaWithFields class, which as its name suggests tries to identify uses of the same field by building on top of SSA variables.
This query identifies that the two s.f accesses likely match:
from SsaWithFields swf1, SsaWithFields swf2, DataFlow::Node use1, DataFlow::Node use2 where swf1 = swf2.similar() and swf1.getAUse() = use1 and swf2.getAUse() = use2 and use1 = any(DataFlow::ElementReadNode sn).getBase() and use2 = any(DataFlow::ElementReadNode sn).getBase() and use1 != use2 select use1, use2
To go further and catch the matching index bases in t.s.f[0] + ts.f[1] we need to do more than the existing standard library, but if we look at the definition of SsaWithFields starting at https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/dataflow/SSA.qll#L308, it is possible to augment it so that flow steps can be taken in between field reads:
private IR::Instruction accessPathAux(TSsaWithFields base, Field f) { exists(IR::FieldReadInstruction fr, IR::Instruction basePathPlusFlow, IR::Instruction basePath | fr.getBase() = basePathPlusFlow or fr.getBase() = IR::implicitDerefInstruction(basePathPlusFlow.(IR::EvalInstruction).getExpr()) | DataFlow::localFlow(DataFlow::instructionNode(basePath), DataFlow::instructionNode(basePathPlusFlow)) and base = accessPath(basePath) and f = fr.getField() and result = fr ) }
Note the DataFlow::localFlow(DataFlow::instructionNode(basePath), DataFlow::instructionNode(basePathPlusFlow)) added there compared to the version of that predicate in the standard library. For testing purposes I simply copied from private newtype TSsaWithFields up to the end of class SsaWithFields, and renamed the public class SsaWithFields to class SsaWithFieldsWithFlow so it wouldn't clash with the standard library version.
I can then use the same query as previously replacing SsaWithFields with SsaWithFieldsWithFlow to get both results desired.
Beta Was this translation helpful? Give feedback.