Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(Provenance): Add DirectoryProvenance as a LocalProvenance #9872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pepper-jk wants to merge 3 commits into oss-review-toolkit:main
base: main
Choose a base branch
Loading
from pepper-jk:provenance-local

Conversation

Copy link
Contributor

@pepper-jk pepper-jk commented Jan 30, 2025
edited
Loading

In contrast to the previously added RemoteProvenance stands the LocalProvenance, which has no remote source, but instead references a local input of some kind.
The DirectoryProvenance references a local project directory, which is lacking supported (remote) version control. It is defined by its local directory path only.

Since ORT needs further refactoring until DirectoryProvenance can be fully utilized, it the new class can not be used right now. However the presence of a new KnownProvenance class, results in when conditional cases not being exhaustive anymore.

To circumvent this issue, the following changes were made:

  1. Whereever possible RemoteProvenance is used as parameter type instead of KnownProvenance.
  2. When necessary KnownProvenances are cast to RemoteProvenance.
  3. If Provenance is expected, DirectoryProvenance is handled like UnknownProvenance.

The excaption being hash and storageKey, which both default to an empty string. However, since the rest of the code does not handle DirectoryProvenance, this should not become an issue.

For more context on the new Provenance Hierarchy, see: #8803 (comment)

Signed-off-by: Jens Keim jens.keim@forvia.com

@pepper-jk pepper-jk requested a review from a team as a code owner January 30, 2025 15:55
@pepper-jk pepper-jk marked this pull request as draft January 30, 2025 16:03
Copy link

codecov bot commented Feb 3, 2025
edited
Loading

Codecov Report

❌ Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.47%. Comparing base (d47c9ac) to head (450b5eb).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...main/kotlin/storages/ProvenanceBasedFileStorage.kt 33.33% 1 Missing and 1 partial ⚠️
.../kotlin/storages/ProvenanceBasedPostgresStorage.kt 33.33% 1 Missing and 1 partial ⚠️
...src/main/kotlin/utils/FileProvenanceFileStorage.kt 0.00% 1 Missing ⚠️
...main/kotlin/utils/PostgresProvenanceFileStorage.kt 0.00% 1 Missing ⚠️
model/src/main/kotlin/utils/PurlExtensions.kt 0.00% 1 Missing ⚠️
Additional details and impacted files
@@ Coverage Diff @@
## main #9872 +/- ##
============================================
+ Coverage 57.38% 57.47% +0.08% 
- Complexity 1673 1691 +18 
============================================
 Files 346 346 
 Lines 12759 12801 +42 
 Branches 1209 1219 +10 
============================================
+ Hits 7322 7357 +35 
- Misses 4972 4976 +4 
- Partials 465 468 +3 
Flag Coverage Δ
funTest-docker 71.15% <ø> (+0.04%) ⬆️
funTest-non-docker 32.91% <27.27%> (-0.11%) ⬇️
test-ubuntu-24.04 41.89% <18.18%> (-0.02%) ⬇️
test-windows-2025 41.87% <18.18%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pepper-jk pepper-jk changed the title (削除) feat(Provenance): Add DirectoryProvenance as a LocalProvenance (削除ここまで) (追記) feat(Provenance): Add a LocalProvenance sub-interface (追記ここまで) Feb 3, 2025
@pepper-jk pepper-jk marked this pull request as ready for review February 3, 2025 13:21
Copy link
Contributor Author

@sschuberth failing tests seem unrelated.
Could you have a look and maybe restart them?
A review would be awesome as well.

Copy link
Member

@sschuberth failing tests seem unrelated.

The "Unable to create proxy for sealed class" failures in org.ossreviewtoolkit.scanner.ScannerTest are real. I guess the problem is that nothing currently implements the LocalProvenance interface. So you probably have to add a DirectoryProvenance at the same time.

pepper-jk reacted with thumbs up emoji

Copy link
Contributor Author

pepper-jk commented Feb 11, 2025
edited
Loading

Thanks @sschuberth, I missed that. I'm already working on the DirectoryProvenance over at provenance-directory.
So it would be easy enough to append it here.

However, it is quite a large branch, since a lot of whens require a DirectoryProvenance case.
Most of which are for KnownProvenance input, so I can not default to UnknownProvenance behavior.

I started with reasonable implementation for the DirectoryProvenance behavior, but the scope of required does not seem to end. So I was considering switching to RemoteProvenance inputs to avoid the usage of DirectoryProvenance for now.

Any thoughts or other ideas how to keep the PR small?

Copy link
Member

Most of which are for KnownProvenance input, so I can not default to UnknownProvenance behavior.

Can't you "cheat" a bit by first limiting some / all KnownProvenance cases to RemoteProvenance (which semantically is no difference compared to before), and only later bit-by-bit extend respective code paths to also handle LocalProvenance?

pepper-jk reacted with thumbs up emoji

Copy link
Contributor Author

So I was considering switching to RemoteProvenance inputs to avoid the usage of DirectoryProvenance for now.

Can't you "cheat" a bit by first limiting some / all KnownProvenance cases to RemoteProvenance (which semantically is no difference compared to before), and only later bit-by-bit extend respective code paths to also handle LocalProvenance?

Great to see we are on the same page here. I will do just that.

@pepper-jk pepper-jk marked this pull request as draft February 12, 2025 15:46
@pepper-jk pepper-jk force-pushed the provenance-local branch 3 times, most recently from 9322fb4 to 9ffd2c5 Compare February 13, 2025 12:50
@pepper-jk pepper-jk changed the title (削除) feat(Provenance): Add a LocalProvenance sub-interface (削除ここまで) (追記) feat(Provenance): Add DirectoryProvenance as a LocalProvenance (追記ここまで) Feb 13, 2025
@pepper-jk pepper-jk marked this pull request as ready for review February 13, 2025 12:59
Copy link
Contributor Author

@sschuberth I added the DirectoryProvenance now and "cheated" the when cases.
Let me know what you think, if you have any reservations.

Copy link
Member

sschuberth commented Feb 13, 2025
edited
Loading

Commit message:

It is defined by its local directory path only.

We should probably say "absolute" / "canonical" / "real" path, and do the implementation accordingly.

it the new class can not be used right now.

"the new class can not be used right now."

However the presence of a new KnownProvenance class, results

"However, the presence of a new KnownProvenance class results"

Whereever

"Wherever"

excaption

"exception"

Provenance Hierarchy

"provenance hierarchy"

pepper-jk reacted with thumbs up emoji

is UnknownProvenance -> false
is ArtifactProvenance -> sourceArtifactUrl != null && sourceArtifactUrl == provenance.sourceArtifact.url
is RepositoryProvenance -> vcs != null && vcs.matches(provenance)
is DirectoryProvenance -> false
Copy link
Member

@sschuberth sschuberth Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could combine this with line 85 to

is DirectoryProvenance, UnknownProvenance -> false

Copy link
Member

@sschuberth sschuberth Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But actually, at least when a PackageConfiguration is used in the context of a RepositoryConfiguration, the Identifier (and Provenance) actually is allowed to refer to a Project. So we should probably implement a proper check for DirectoryProvenance in that case after all.

Copy link
Contributor Author

@pepper-jk pepper-jk Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I assumed we would need a proper check sooner or later, which is why I added the case as a separate line.

Since this would touch the attributes of the PackageConfiguration, I assumed the best cause of action was to postpone any changes for the sake of a shorter PR and handle DirectoryProvenace like UnknownProvenance for now. I'd recommend to handle this, when we actually start using the DirectoryProvenance.

* scan results for a package based on the VCS path.
*/
private val packageProvenances = mutableMapOf<Identifier, KnownProvenance>()
private val packageProvenances = mutableMapOf<Identifier, RemoteProvenance>()
Copy link
Member

@sschuberth sschuberth Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is changed, the docs need to be adjusted as well.

Copy link
Member

@sschuberth sschuberth Mar 24, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change requires some careful thinking: We actually do want to be able to scan projects with DirectoryProvenance in the end, at least in the case when analyzer and scanner are run on the same machine. So is it really correct to limit us to RemoteProvenance here?

Copy link
Contributor Author

@pepper-jk pepper-jk Mar 25, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably multiple places, where the limit to RemoteProvenance is not correct in order to handle the DirectoryProvenance, once it is fully implemented.

But right now, while the class is dormant, it is either this or cast the arguments. Otherwise we would have to handle it everywhere from the get go as well.

side note: as this attribute is named packageProvenances, we probably need to address the shift from packages to projects as (primary) input for the scanner at some point, since you repeatedly stated before that a DirectoryProvenace can not be a Package due to lack of a remote.

val query = table.selectAll()

if (provenance !is RemoteProvenance) {
throw ScanStorageException("Scan result must have a known provenance, but it is $provenance.")
Copy link
Member

@sschuberth sschuberth Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message refers to "known provenance", but the check is against RemoteProvenance, so this does not match.

This also somewhat relates to my comment above, that we actually do want to be able to scan a DirectoryProvenance.

Copy link
Contributor Author

@pepper-jk pepper-jk Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the code and the exception's text are technically not reflecting the same thing.

However, while the DirectoryProvenance is unused, the RemoteProvenance is practically the same as KnownProvenance. Similar to my comment above, this is a temporary change, which aims to avoid issues due unfinished class handling.

requireEmptyVcsPath(provenance)

if (provenance !is KnownProvenance) {
if (provenance !is RemoteProvenance) {
Copy link
Member

@sschuberth sschuberth Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Contributor Author

@pepper-jk pepper-jk Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. ^^

In contrast to the previously added `RemoteProvenance` stands the
`LocalProvenance`, which has no remote source, but instead references
a local input of some kind.
The `DirectoryProvenance` references a local project directory,
which is lacking supported (remote) version control.
It is defined by its canonical path only.
Since ORT needs further refactoring until `DirectoryProvenance` can
be fully utilized, the new class can not be used right now.
However the presence of a new `KnownProvenance` class results in
`when` conditional cases not being exhaustive anymore.
To circumvent this issue, the following changes were made:
1. Wherever possible `RemoteProvenance` is used as parameter type
 instead of `KnownProvenance`.
2. When necessary `KnownProvenance`s are cast to `RemoteProvenance`.
3. If `Provenance` is expected, `DirectoryProvenance` is handled
 like `UnknownProvenance`.
For instances of `Package` the new default data structure should be
`RemoteProvenance`, as a `Package` by definition requires a remote.
The exception being `hash` and `storageKey`, which both required a
default value, which was set to the `canonicalPath`.
However, since the rest of the code does not handle
`DirectoryProvenance`, this should remain unused for now.
See [1] for more context on the new provenance hierarchy.
[1]: oss-review-toolkit#8803 (comment)
Signed-off-by: Jens Keim <jens.keim@forvia.com>
Casts were removed from `FileListResolver`, `ProvenanceBasedFileStorage`,
`ProvenanceDownloader`, and `Scanner`.
In order to avoid casting `KnownProvenance` to `RemoteProvenance`,
a lot of parameters need to be changed to `RemoteProvenance`.
With the exception of `ProvenanceBasedFileStorage`, which now uses
`UnknownProvenance` exceptions just like `ProvenanceBasedPostgresStorage`.
Signed-off-by: Jens Keim <jens.keim@forvia.com>
Signed-off-by: Jens Keim <jens.keim@forvia.com>
Copy link
Contributor Author

Just a rebase onto 66.1.0 for potential use in my POC.

@sschuberth sschuberth removed their assignment Sep 2, 2025
Copy link
Contributor Author

pepper-jk commented Sep 4, 2025
edited
Loading

As discussed in the weekly developer meeting today, I have for you the working branch for handling DirectoryProvenance instead of omitting it from every KnownProvenance check in ORT:
diff: main...pepper-jk:ort:provenance-local-handling
commits: https://github.com/pepper-jk/ort/commits/provenance-local-handling/
tests: https://github.com/pepper-jk/ort/actions/runs/17458366518/job/49577013217

The branch is still rough, I just extracted it from the POC. I believe for proper handling of the DirectoryProvenance for packages still requires some more work, but it could be an alternate route for this PR.
So far it is just a byproduct of my POC, where I am focused on using DirectoryProvenance as Analyzer input via Repository, based on #10575.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@sschuberth sschuberth Awaiting requested review from sschuberth

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /