Skip to content

feat: improve VSA generation with digest for each subject #685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 10, 2024

Conversation

nathanwn
Copy link
Member

@nathanwn nathanwn commented Mar 26, 2024

This pull request improves upon the VSA generation feature in Macaron, by populating each subject with a corresponding SHA256 digest if certain conditions are met. This change helps VSA generation in Macaron become more in line with in-toto attestation specifications. More importantly, this also improves the VSA's quality in uniquely identifying artifacts verified by Macaron.

Specifically, built on top of Macaron's recent feature of taking a provenance file as input, this pull request identifies if the PackageURL provided through the command-line argument --package-url/-purl of macaron analyze identifies an artifact being one of the subjects in the provenance. If such a subject is found, the corresponding SHA256 digest is stored in the Macaron database, which can then be retrieved and populated in Macaron-generated VSAs.

For example, given the following analyze command

macaron analyze \
    --provenance-file ./multiple.intoto.jsonl \
    --package-url pkg:maven/io.micronaut/[email protected]?type=jar

Macaron will check if there is a jar file for the micronaut-router package among the subjects in the provenance file ./multiple.intoto.jsonl.

In this PR, only PackageURL of type maven is supported. Other PackageURL types can be added in future pull requests.

Some other related changes include

  • Disabling provenance discovery in the mcn_provenance_available_1 check in case a user-provided provenance is available.
  • Making the mcn_provenance_expectation_1 check also validate the user-provided provenance if it is available.

Below is a comprehensive list detailing the changes made in this PR, spanning across a few of commits:

  • b3eb53a fix: disable provenance discovery in case a provenance is already available

  • 8c49c69 fix: enable provenance expectation validation for user-provided provenances

  • 1b82d5e chore: add a table to the database schema to store sha256 digests of provenance subjects

A new table is created to store the SHA256 hash. One option being considered was to add another column to the _component table. However, since this will introduce breaking changes to the current policy library, I decided against it at this time. Another option considered was the ReleaseArtifact and HashDigest table. However, they are now currently storing a different kind of data to what this pull request wants to achieve. Future revision of the database schema may clarify whether we want to unify these tables and how we want an eventual schema to look like.

  • f2ed7f7 chore: introduce maven artifact types and utilities

  • 12066e3 refactor: improve extraction of subjects being build artifacts from witness provenances

There was an existing utility function for extracting subjects from witness provenance. It was originally created for the Witness provenance check. It has then been modified, taking into account the product attestor to only recognize subjects that are actual artifacts. This utility function is also now used for matching a subject to a given PURL, introduced in the next commit.

  • 7fa98d6 chore: match a witness provenance subject against a maven artifact purl

  • 1fbe7ec chore: modify vsa generation to populate each subject with a sha256 digest if it exists

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 26, 2024
@nathanwn nathanwn changed the title feat: improve vsa generation with digest for each subject feat: improve VSA generation with digest for each subject Mar 26, 2024
@nathanwn nathanwn force-pushed the nathanwn/improve-vsa-generation branch from 39b25e3 to 1877043 Compare March 26, 2024 09:28
@nathanwn nathanwn force-pushed the nathanwn/improve-vsa-generation branch from 1877043 to 1fbe7ec Compare March 26, 2024 09:29
@nathanwn nathanwn self-assigned this Mar 26, 2024
@nathanwn nathanwn added the vsa The issues related to Verification Summary Attestation label Mar 26, 2024
@nathanwn nathanwn marked this pull request as ready for review March 26, 2024 09:47
…larifying support for qualifiers

Signed-off-by: Nathan Nguyen <[email protected]>
@nathanwn nathanwn force-pushed the nathanwn/improve-vsa-generation branch from c0e8284 to ebf8667 Compare March 27, 2024 05:32
@classmethod
def from_artifact_name(
cls,
artifact_name: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

artifact_name seems a bit ambiguous. Is it the filename?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the filename. Thanks for pointing out.
I have renamed artifact_name to artifact_filename in 263414b.

Copy link
Member

@tromai tromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the changes.

Comment on lines 136 to 138
dict[str, str]
list[InTotoV01Subject]
A dictionary in which each key is a subject name and each value is the corresponding SHA256 digest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to remove non-existent extensions parameter from comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in 9e6f98f.

Comment on lines 45 to 60
JAR = _MavenArtifactType(
filename_pattern="{artifact_id}-{version}.jar",
purl_qualifiers={"type": "jar"},
)
POM = _MavenArtifactType(
filename_pattern="{artifact_id}-{version}.pom",
purl_qualifiers={"type": "pom"},
)
JAVADOC = _MavenArtifactType(
filename_pattern="{artifact_id}-{version}-javadoc.jar",
purl_qualifiers={"type": "javadoc"},
)
JAVA_SOURCE = _MavenArtifactType(
filename_pattern="{artifact_id}-{version}-sources.jar",
purl_qualifiers={"type": "java-source"},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patterns in all these types seem to be the same, i.e., {artifact_id}-{version}.extension. Do we really need all these to be created as separate patterns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-javadoc.jar and -sources.jar are not file extensions, but I get the question.

This pattern is currently being used in the MavenArtifact.from_artifact_filename class method to construct a regex to extract the artifact id from an artifact filename.

An alternative I see is to change filename_pattern to something like filename_suffix, which will involve also changing the implementation of the MavenArtifact.from_artifact_filename class method. I am not sure if doing this is exactly simplifying by any means while I chose to do it this was initially to explicitly specify the naming convention of Maven artifacts, but I can change the current implementation to this if you think it makes sense. If you have any other suggestions please let me know.

@nathanwn nathanwn merged commit 5e7ccdc into staging Apr 10, 2024
@nathanwn nathanwn deleted the nathanwn/improve-vsa-generation branch April 10, 2024 01:17
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement. vsa The issues related to Verification Summary Attestation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants