External documents · guacsec/trustify · Discussion #556

ctron
Jul 17, 2024
Maintainer

This is the start of a discussion around external documents. I want to add this do the SBOM design doc at some point. But want to open it up to a broader discussion before.

Both SPDX and CycloneDX support referencing nodes in external documents. So far we ignore those, but in a recent
discussion, this topic came up. We need to tackle that issue (#533)
anyway.

Spec

For SPDX, external documents are listed in the header of the document. They are defined with:

An ID string
A document namespace (URI)
A checksum/digest/hash

This basically provides a mapping table from an internal ID to an external (document namespace), plus a safeguard
with the digest.

The document namespace should be unique for each document created. Changes to the document must result in a new
namespace.

The ID string has the format of DocumentRef-<id>, where <id> is some unique identifier.

When using in a relationship, it will be combined with a node id: Document-Ref-<id>:<node-id>.

Implementation

We already have the digest of the target document. We also have the document namespace.

We cannot use the document namespace to locate a document, as it might be a URI, but the spec says:

Although it is not required, the URI can be constructed in a way which provides information on how the SPDX document can be found.

So there's no guarantee that the URI actually points to the document.

We would need to store the "ID" of the external reference, which is only valid in the context of a single SPDX SBOM.

Resolving packages

When querying today, we return a single hierarchy by default. That shouldn't be a different when adding those external
references.

When resolving transient dependencies, that might be different. One way to deal with this could be to stop resolving
when an external reference is encountered. Similar to "symlinks" on a Unix system. Traditionally, operations recursing
into a directory stop with a symlink, but report the symlink itself. Unless the user requests to follow symlinks.

Foreign keys

Currently, we ingest the relationships with a foreign key in the node IDs. That won't work for the external references,
as that would require ingesting the referenced document first. Also, it would create issues when deleting such
documents.

One way to deal with this would be to create a second relationship table. One that allows one side of the relationship
to be external, not enforcing any foreign key.

That would duplicate things. But it might also be helpful in lookups, or transient resolve operations, where we might
want to opt out of processing external references.

I don't see an alternative other than giving up foreign keys, which I'd like to avoid if possible.

What would be possible is to create additional "non-foreign key" fields for the reference in the same table. However,
that would basically do the same (two tables) but squeeze them into one, and feels quite messy.

Conflicting documents

I am 100% sure that we will encounter the issue that there's a conflict for the combination of document namespace and
digest. Which comes from updating a document, without updating the document namespace. Which is what we do at RH.

I don't think this should become a problem though. Since we would find multiple (possible) targets for the reference,
but could then eliminate others due to the mismatched digest.

Also, we need to be careful to store the original digest of the document, not the one after applying any fixes (like
license expressions).

Replies: 4 comments 1 reply

bobmcwhirter
Jul 17, 2024

I've also had thoughts recently about unhooking our existing advisory table from directly 'owning' a document (and its SHAs).

Mostly so we can work towards crafting advisories from whole cloth within the system, without a backing external document, and then mint out the externally-bound document when desired.

This could maybe allow for advisory -> document 1-to-many relationship. But I'm unsure, as most of the other relationships do come from a document.

So maybe still 1-to-1, but allow for 1-to-zero, from advisory to document, for the internally-managed case.

1 reply

@ctron

ctron Jul 18, 2024
Maintainer Author

While it feels like derailing the original topic :) ... I think for authoring advisories, we might need a different strategy. I've the feeling that authoring a statement of vulnerabilities in the context of a product/package/artifact (aka advisory) might actually require context specific vulnerability information, which could be amended with more generic/common vulnerability information.

Still, in the context of a product (or similar) the vulnerability will be defined slightly different. So having that "advisory" <-> "advisory vulnerability" construct feels right for me today. Maybe we can find better names.

What might also work is that an advisory (or whatever name it gets) somehow gets aligned with an SBOM. So "vulnerabilities in the context of an SBOM" being an "advisory".

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External documents #556

Uh oh!

{{title}}

Uh oh!

ctron
Jul 17, 2024
Maintainer

Spec

Implementation

Resolving packages

Foreign keys

Conflicting documents

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

bobmcwhirter
Jul 17, 2024

Uh oh!

{{title}}

Uh oh!

ctron Jul 18, 2024
Maintainer Author

Select a reply

Uh oh!

External documents #556

Uh oh!

ctron Jul 17, 2024 Maintainer

Spec

Implementation

Resolving packages

Foreign keys

Conflicting documents

Replies: 4 comments · 1 reply

Uh oh!

bobmcwhirter Jul 17, 2024

Uh oh!

ctron Jul 18, 2024 Maintainer Author

ctron
Jul 17, 2024
Maintainer

Replies: 4 comments 1 reply

bobmcwhirter
Jul 17, 2024

ctron Jul 18, 2024
Maintainer Author