-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Go] Getting the resolved type of an argument #15591
-
Hi,
I am trying to obtain the types of objects queried by the Get method of Kubernetes controllers for https://github.com/openshift/hypershift/tree/release-4.15 . I expect one well defined (and qualified) type per call to Get() like so:
The codebase has 287 of such calls.
The last argument can be passed several ways:
- by a pointer to a declared variable and its close cousin, the pointer to a struct literal assignment
- by the return of some function, with possible slicing operations
- by a function parameter
- (Some other I might've missed)
The following query works for the first case (but only returns 42 results over the 287 Get calls):
from DataFlow::MethodCallNode call, AddressExpr arg, Type t where call.getTarget().hasQualifiedName("sigs.k8s.io/controller-runtime/pkg/client.Client", "Get") and call.getArgument(2).asExpr() = arg and t = arg.getOperand().getType() select call, "Client calls $@ to obtain resource type $@", call, call.getTarget().getName(), t, t.getQualifiedName()
A dataflow query sounds like what should solve the problem at hand but doing so returns several types per Get call, with some of the returned types making no sense at all:
import go import K8sControllerClient::K8sControllerClient import Utils::Utils module ResourceTypesFlow implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node node) { node.asExpr() instanceof ValueExpr } predicate isSink(DataFlow::Node node) { exists(DataFlow::MethodCallNode call | call.getArgument(2) = node and call.getTarget().hasQualifiedName("sigs.k8s.io/controller-runtime/pkg/client.Client", "Get") ) } predicate isBarrier(DataFlow::Node node) { node.asExpr().getLocation().getFile().getBaseName().matches("%_wrapper.go") or node.asExpr().getLocation().getFile().getBaseName().matches("zz_%.go") } } module Flow = TaintTracking::Global<ResourceTypesFlow>; from DataFlow::Node source, DataFlow::Node sink, DataFlow::MethodCallNode call, Type resourceType where Flow::flow(source, sink) and call.getArgument(2) = sink and resourceType = source.getType() select source, "Kubernetes Controller Client calls $@ to obtain resource $@ ($@)", call, call.getCalleeName() + "()", resourceType, resourceType.getQualifiedName(), source, source.toString()
I feel this should be a simple task, what am I missing here? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions
I suspect this might have to do with the K8sControllerClient module that you're importing-- I can't tell for sure without its source code, but does it perhaps introduce models that would propagate dataflow between the different arguments of that Get call?
Meanwhile, to do what you want I suspect you don't need dataflow at all. In your first argument you use arg.getOperand().getType(), which assumes the argument is an address-of operator. Why not drop that assumption and instead use arg.getType().(PointerType).getBaseType() to account for more kinds of expression?
If your concern is that you might have something like Get(..., GetResource()) where GetResource has a loose return type and you...
Replies: 1 comment 5 replies
-
I suspect this might have to do with the K8sControllerClient module that you're importing-- I can't tell for sure without its source code, but does it perhaps introduce models that would propagate dataflow between the different arguments of that Get call?
Meanwhile, to do what you want I suspect you don't need dataflow at all. In your first argument you use arg.getOperand().getType(), which assumes the argument is an address-of operator. Why not drop that assumption and instead use arg.getType().(PointerType).getBaseType() to account for more kinds of expression?
If your concern is that you might have something like Get(..., GetResource()) where GetResource has a loose return type and you want to find the more specific types of the actual resources created -- in that case you actually do need dataflow, and if you provide the source for your imports used in this example I can probably shed light on why you're seeing dataflow to other arguments of the Get method.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you Chris for the fast reply.
Meanwhile, to do what you want I suspect you don't need dataflow at all. In your first argument you use
arg.getOperand().getType(), which assumes the argument is an address-of operator. Why not drop that assumption and instead usearg.getType().(PointerType).getBaseType()to account for more kinds of expression?If your concern is that you might have something like
Get(..., GetResource())whereGetResourcehas a loose return type and you want to find the more specific types of the actual resources created -- in that case you actually do need dataflow, and if you provide the source for your imports used in this example I can probably shed light on why you're seeing dataflow to other arguments of the Get method.
Good shout, by doing this I have almost all of the results I want, but I'm afraid Dataflow is required here, I do have some GetResource-like patterns.
The K8sControllerClient module was a leftover from other experimentation... and it appears that the summary flows defined in it skewed the results 🤦. Removing it gives all the results I want with a couple exceptions:
- (expected) Anything returned by
DeepCopy(). This is expected since I am blacklisting%_wrapper.go, removing that exclusion just returns way too many wrong results. I am all ears if you have a clean way of avoiding this issue (without filling the results with very broad types). - Some dynamic construct don't show up in results like this one:
func ReadObservedConfig(ctx context.Context, c client.Client, observedConfig *ObservedConfig, namespace string) error { log := ctrl.LoggerFrom(ctx) var errs []error configs := map[string]struct { observed *corev1.ConfigMap dest runtime.Object }{ "image": { observed: ObservedImageConfig(namespace), dest: ImageConfig(), }, "project": { observed: ObservedProjectConfig(namespace), dest: ProjectConfig(), }, "build": { observed: ObservedBuildConfig(namespace), dest: BuildConfig(), }, } for _, config := range configs { if err := c.Get(ctx, client.ObjectKeyFromObject(config.observed), config.observed); err != nil {
I tried adding the following predicate to my Dataflow configuration but it ended up returning many wrong results:
predicate isAdditionalFlowStep(DataFlow::Node fromNode, DataFlow::Node toNode) { exists(Write w | w.writesField(toNode.(DataFlow::PostUpdateNode).getPreUpdateNode(), _, fromNode) ) }
- Generics seems to cause trouble and are left out of the results altogether:
func UpdateObject[T crclient.Object](t *testing.T, ctx context.Context, client crclient.Client, original T, mutate func(obj T)) error { return wait.PollImmediateWithContext(ctx, time.Second, time.Minute*1, func(ctx context.Context) (done bool, err error) { if err := client.Get(ctx, crclient.ObjectKeyFromObject(original), original); err != nil {
For reference here is my current query:
/** * @kind problem */ import go module ResourceTypesFlow implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node node) { node.asExpr() instanceof ValueExpr and not node.asExpr().getLocation().getFile().getBaseName().matches(["%_wrapper.go", "zz_%.go"]) } predicate isSink(DataFlow::Node node) { exists(DataFlow::MethodCallNode call | call.getArgument(2) = node and call.getTarget().implements("sigs.k8s.io/controller-runtime/pkg/client", "Client", "Get") ) } } module Flow = TaintTracking::Global<ResourceTypesFlow>; from DataFlow::Node source, DataFlow::Node sink, DataFlow::MethodCallNode call, Type resourceType where Flow::flow(source, sink) and call.getArgument(2) = sink and resourceType = source.getType() select call, "Kubernetes Controller Client calls $@ to obtain resource $@", call, call.getCalleeName() + "()", resourceType, resourceType.getQualifiedName()
Beta Was this translation helpful? Give feedback.
All reactions
-
Q1: DeepCopy. One approach would be to exclude results with undesirable types (certainly Object, perhaps you know of other types that are too general to be useful as well?) and to introduce an additional flow step across a DeepCopy operation in the hope of finding a previous node that has a more specific type. For efficiency this could be done in the isSource predicate -- rather than considering arbitrary value flow, consider flow from expressions whose type is derived from but is not ConfigMap.
Beta Was this translation helpful? Give feedback.
All reactions
-
Q2, flow from a map literal via a range statement: this is a bug, #15613 should fix it.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Q3, generics: firstly I note this is stressing a few different corner cases at once, since we also have flow across capture by an anonymous function definition in play.
I tried to reproduce this with
func Sink(s interface{}) { } func Test[T interface{}](t T) { f := func() { Sink(t) } f() } func Tester() { Test[string]("str4") }
That worked as expected with
module ResourceTypesFlow implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node node) { node.asExpr() instanceof StringLit } predicate isSink(DataFlow::Node node) { exists(DataFlow::CallNode call | call.getArgument(_) = node and call.getTarget().getName() = "Sink" ) } } module Flow = TaintTracking::Global<ResourceTypesFlow>; from DataFlow::Node source, DataFlow::Node sink where Flow::flow(source, sink) select source, sink
So there's something more subtle going on. Are you sure there are calls to UpdateObject in your codebase that have a concrete type that can be getQualifiedNamed? (You might try dropping the .getQualifiedName column to see any expressions with other types)
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for the detailed responses and being super quick on that bug Chris!
Q1 DeepCopy: So I did a bit of a twist on what you're proposing: a taint step that leverages the fact that the receiver's type and the result's type are always identical. I understand that this will need to be expanded to the other functions in the DeepCopy family. For reference, the last iteration of the query:
/** * @kind problem */ import go module ResourceTypesFlow implements DataFlow::ConfigSig { private predicate excluded_files(Expr e) { e.getLocation().getFile().getBaseName().matches(["%_wrapper.go", "zz_%.go"]) } predicate isSource(DataFlow::Node node) { node.asExpr() instanceof ValueExpr and not excluded_files(node.asExpr()) or exists(DataFlow::MethodCallNode call | call.getCalleeName() = "DeepCopy" | // The receiver is a pointer, we need the base type or the resulting type is going to be a // PointerType (and that won't play nice with the other results) call.getReceiver().(DataFlow::ComponentReadNode).getBase() = node and // We do not care about DeepCopy() usage _inside_ excluded files. not excluded_files(call.asExpr()) ) } predicate isSink(DataFlow::Node node) { exists(DataFlow::MethodCallNode call | call.getArgument(2) = node and call.getTarget().implements("sigs.k8s.io/controller-runtime/pkg/client", "Client", "Get") ) } predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { // We take a shortcut here: the receiver type for DeepCopy() calls is the same as the output's // type. So we can force that flow. exists(DataFlow::MethodCallNode call | call.getCalleeName() = "DeepCopy" and node1 = call.getReceiver() and node2 = call.getAResult() ) } } module Flow = TaintTracking::Global<ResourceTypesFlow>; from DataFlow::Node source, DataFlow::Node sink, DataFlow::MethodCallNode call, Type resourceType where Flow::flow(source, sink) and call.getArgument(2) = sink and resourceType = source.getType() select call, "Kubernetes Controller Client calls $@ to obtain resource $@", call, call.getCalleeName() + "()", resourceType, resourceType.getQualifiedName()
Q3 generics: I traced back the calls to UpdateObject in my codebase and found that it finally was only called from end-to-end tests (built with golang's testing module).
Beta Was this translation helpful? Give feedback.