Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulnerabilities]Adding new Job for updating v3 Vulnerabilities files. #9740

Merged
merged 16 commits into from
Dec 20, 2023

Conversation

ryuyu
Copy link
Contributor

@ryuyu ryuyu commented Nov 28, 2023

This is the first proposal for a new job to automatically update the v3 Vulnerability files.
This should implement the design indicated at https://github.com/NuGet/Engineering/pull/4940

Much of the mechanics of pulling from GitHub are tested in the GitHubVulnerabilities2Db.Facts project.

The primary work is done in the FlushAsync method starting at line 53 of src/GitHubVulnerabilities2v3/Extensions/BlobStorageVulnerabilityWriter.cs

Summary of intended flow:

  1. Read the date in the cursor and determine if the run is an update run or a full regeneration run
  2. Query GitHub and collect vulnerabilities starting at the date in the cursor and running up to now.
  3. For regeneration run, update index.json, base.json, and push an empty update.json. Update cursor to now.
  4. For update run, Check the special case
  5. The special case is when we see the same vulnerability URL in the update that already exists in base. For this case, we skip this update and regenerate on the next run.
  6. Otherwise, update the update.json and index.json. Cursor should NOT be changed.

@ryuyu ryuyu self-assigned this Nov 28, 2023
@ryuyu ryuyu requested a review from a team as a code owner November 28, 2023 00:59
@erdembayar
Copy link
Contributor

Quick clarification question before diving into: Should this new job be called NuGet.Jobs? What is the criteria for determining where a new job should go?
Also, I noticed that there is a checked-in binary asset at src/GitHubVulnerabilities2v3/Scripts/nssm.exe. Is this binary asset signed?

@ryuyu
Copy link
Contributor Author

ryuyu commented Nov 29, 2023

Quick clarification question before diving into: Should this new job be called NuGet.Jobs? What is the criteria for determining where a new job should go? Also, I noticed that there is a checked-in binary asset at src/GitHubVulnerabilities2v3/Scripts/nssm.exe. Is this binary asset signed?

Great questions! The general split for jobs being here vs NuGet.Jobs is a little bit arbitrary. Jobs that have harder gallery integration/depend on NuGetGallery.Core or Entities TEND TO lean towards being in NuGetGallery. This job is here becuase it relies on NuGet.Services.GitHub, which one COULD argue should actually be moved to another repo, but as it was extracted from GitHubVulnerabilities2Db (which is in here because of dependence on gallery DB interaction mechanics), it has not been moved as it would likely involve some additional changes to the build.

The checked in binary asset is a copy of the one at https://github.com/NuGet/NuGetGallery/blob/main/src/GitHubVulnerabilities2Db/Scripts/nssm.exe
I believe this is used to install the job as a service, and I'm not 100% sure about the necessity of as I can't quite remember the exact mechanics of how the job installation as a service works.

@erdembayar
Copy link
Contributor

The checked in binary asset is a copy of the one at https://github.com/NuGet/NuGetGallery/blob/main/src/GitHubVulnerabilities2Db/Scripts/nssm.exe I believe this is used to install the job as a service, and I'm not 100% sure about the necessity of as I can't quite remember the exact mechanics of how the job installation as a service works.

It doesn't look signed binary, and it's 3rd party. I prefer not to add 3rd party binary directly into code. At least could we add as nupkg dependency?

return vulnerabilities.Count();
}

public async Task WriteVulnerabilityAsync(PackageVulnerability packageVulnerability, bool wasWithdrawn)
Copy link
Contributor

@zhhyu zhhyu Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasWithdrawn

This is one of main concerns. Should this piece of information be included in "_vulnerabilityDict" and handled properly when generating/regenerating the document?

Copy link
Contributor Author

@ryuyu ryuyu Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That's a great question. I've added a section that removes vulnerabilities if they are withdrawn, but that won't solve the case of it existing in base and being removed down the line. I also don't think client would know how to handle that case (advisory present in base, removed in an update) @zivkan

Worst case, it will be fixed on regeneration 30 days later, and today, is effectively in the same state since we only update manually. @zhhyu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think client would know how to handle that case (advisory present in base, removed in an update)

That's correct. Client simply concatenates all advisories across all files (and across all sources), deduplicates (both advisory URL + version range must match), and uses the resulting list. There's no syntax in the file format to "remove advisory previously declared"

Copy link
Contributor

@zhhyu zhhyu Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the base document get regenerated if one advisory is withdrawn? I don't know how 2DB handles this case, but there is a need to keep the consistency between the two endpoints as much as possible, and also as a customer, I'd like not to see a withdrawn warning and spend time looking into how to resolve the dependency graph when unnecessary.

Copy link
Contributor

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review. My comments are not blocking.

@ryuyu ryuyu requested a review from joelverhagen December 19, 2023 21:13
@ryuyu ryuyu dismissed joelverhagen’s stale review December 19, 2023 21:56

All comments addressed. Requested review.

@erdembayar
Copy link
Contributor

Last questions before approving the PR:

  1. Where are gzip compression and cache setting set? Does it happen at blob storage or CDN? I don't see any changes here.
  2. What happens if someone just change just change the version range for existing advisory? How do you detect it?

@ryuyu
Copy link
Contributor Author

ryuyu commented Dec 19, 2023

Last questions before approving the PR:

  1. Where are gzip compression and cache setting set? Does it happen at blob storage or CDN? I don't see any changes here.
  2. What happens if someone just change just change the version range for existing advisory? How do you detect it?
  1. The header settings are set in the Configuration now. Currently, they have default values, but we can modify them at will.
  2. If an advisory range is changed, that will be picked up on the next run with a collision to an existing advisory URL and would trigger a regeneration (as intended for now).

erdembayar
erdembayar previously approved these changes Dec 20, 2023
Copy link
Contributor

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢
Thank you for your patience.

@ryuyu
Copy link
Contributor Author

ryuyu commented Dec 20, 2023

Great catch by @erdembayar . Gzipping was apparently not enabled. Pushed some new changes that should enable compression of the stream.

Copy link
Contributor

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ryuyu ryuyu merged commit 7736941 into dev Dec 20, 2023
@ryuyu ryuyu deleted the ryuyu-ghv2v3 branch March 16, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants