Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Java] Data flow through an object #17069

Answered by mbg
sudeep-hypredge asked this question in Q&A
Discussion options

As a newbie, I am trying to explore dataflow for the simple code as seen at https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_CopyObject_section.html

  1. There are the command line params args[1..3] that get passed to CopyBucketObject() function
  2. The CopyBucketObject uses a builder pattern to create a CopyObjectRequest instance.
  3. The created instance is passed to S3Client.copyObject()

I want to get all data sources that contribute to the argument that is being passed to the copyObject function.
My query looks as below


module MyDataFlowConfig implements DataFlow::ConfigSig {
 predicate isSource(DataFlow::Node source) {
 exists(Expr sourceExpr | source.asExpr() = sourceExpr)
 }
 predicate isSink(DataFlow::Node sink) {
 exists(MethodCall call |
 call.getMethod().hasName("copyObject") and
 call.getMethod().getDeclaringType().getQualifiedName() = "software.amazon.awssdk.services.s3.S3Client" and
 sink.asExpr() = call.getArgument(_)
 ) 
 }
}
module MyFlow = TaintTracking::Global<MyDataFlowConfig>;
from DataFlow::Node source, DataFlow::Node sink // [4]
where 
 MyFlow::flow(source, sink)
select sink, source, sink, source.toString()

I was expecting to see the dataflow start with the sources arg[1..3] , and sink to the copyObject's argument. In reality, the source shows up as the build() function in the builder, variables that went into the builder are not tracked/ traced.

My question is :
How can I have dataflow through an object taken into account ? Note that the object will not necesarily be population with the builder pattern, there could be setters/getter or arbitrary methods involved in dataflow to and from objects.

Any insights highly appreciated

You must be logged in to vote

Hi @sudeep-hypredge 👋🏻

Thanks for the question! For reference, I have pasted the part of the example code you seem to refer to below:

CopyObjectRequest copyReq = CopyObjectRequest.builder()
 .sourceBucket(fromBucket)
 .sourceKey(objectKey)
 .destinationBucket(toBucket)
 .destinationKey(objectKey)
 .build();

In order for CodeQL to track how data flows through methods from a library, we need to understand how inputs to a method relate to its outputs. For example, CodeQL needs to know that destinationKey mutates the object it is called on with objectKey and returns the resulting object (as opposed to e.g. a new object, or the initial object). We have a collection of models which summari...

Replies: 1 comment 12 replies

Comment options

Hi @sudeep-hypredge 👋🏻

Thanks for the question! For reference, I have pasted the part of the example code you seem to refer to below:

CopyObjectRequest copyReq = CopyObjectRequest.builder()
 .sourceBucket(fromBucket)
 .sourceKey(objectKey)
 .destinationBucket(toBucket)
 .destinationKey(objectKey)
 .build();

In order for CodeQL to track how data flows through methods from a library, we need to understand how inputs to a method relate to its outputs. For example, CodeQL needs to know that destinationKey mutates the object it is called on with objectKey and returns the resulting object (as opposed to e.g. a new object, or the initial object). We have a collection of models which summarise this for various libraries. However, I don't believe we currently have such models for the Java SDK for AWS. Therefore, CodeQL doesn't know how data flows through these methods. As a result, you only see data flow from build to copyObject since it doesn't flow through any other, unmodelled methods.

You can read about how to define your own models in our documentation at https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-java-and-kotlin/

You must be logged in to vote
12 replies
Comment options

Thanks @owen-mc and @mbg. I tried your suggestions, didnt work for me, and I have now spent 5 days without making any progress , so am desperate to get it working.
I have simplified my question significantly, and have created a "library" that implements a builder https://github.com/sudeep-hypredge/library. mvn install should get it set up
I also have an application that uses the library. https://github.com/sudeep-hypredge/application
The resulting codeql database of the application is at https://github.com/sudeep-hypredge/codeqldb
In the model editor, I modeled ever single permutation of every single call that showed up under my "library".
I am running the query

/**
 * @name sample copyfile
 * @kind path-problem
 * @problem.severity WARNING
 * @id java/sample/copyfile
 */
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
module SampleDataFlowConfig implements DataFlow::ConfigSig {
 predicate isSource(DataFlow::Node source) {
 exists(Expr sourceExpr | source.asExpr() = sourceExpr)
 }
 predicate isSink(DataFlow::Node sink) {
 exists(Expr sinkExpr | sink.asExpr() = sinkExpr)
 }
}
module Flow = TaintTracking::Global<SampleDataFlowConfig>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink, "pge"

The query basically lists out every single source and sink combination that there is a path to. In terms of taint tracking results, I am able to get the command line argument all the way to the argument to the builder's setter, but the path is broken at the .build() method. I have tried all combinations of all options in the model editor, it hasn't helped.

I understand this might be a significant ask, I hope you or anyone on this forum can help me get this going. I am willing to make you good within reason for the time and effort that you need to put in to help me out.

I am stuck, and I don't know how to proceed further, and any help is very welcome

Comment options

I think I may have found the issue. I reproduced your problem, and verified that I thought it should work. I then asked a colleague, who told me to make sure the VS Code setting codeQL.runningQueries.useExtensionPacks is set to all. That made it work. (The documentation for this is here.)

Comment options

I noticed that this capability is currently in beta and is for limited set of languages. Would love to know the roadmap at least for go and javascript / typescript

I have looked into this for you. We expect to release support for Python soon. There are no immediate plans to work on other languages due to other priorities, but hopefully we will be able to return to this in the future.

In the meantime, it is very possible to write models by hand (I had never used the model editor till this morning), especially once you have got the hang of the format, which hopefully will be made easier by doing some modeling in other languages using the model editor and looking at the models that are created. Note that there is some documentation of the format, which does vary a little bit between languages, at the top of the file ExternalFlow.qll for a given language, e.g. here for go.

Comment options

@owen-mc
I was able to get this to work with the settings you suggested.
I was also able to get the cli to work with a published version of the model pack. Thanks

As for the language support prioritization, exploring it it can be an open source / crowd source initiative ?

Thanks for all the help

Comment options

I'm glad you were able to get it working.

Adding support to the model editor for new languages could be done open source, since both relevant repos are public. But I think it's more likely that we'll get around to it before that happens.

Answer selected by sudeep-hypredge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /