-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation PackageURL objects for invalid PURLs / Possible improper encoding of colons (":") in PURL fields #152
Comments
Totally agree. Quote from PURL specification:
|
Colons are explicitly not encoded for an unknown reason: This line was added as part of commit d7be020, but there is no explanation as to why. This applies to all fields, not just name, fwiw. @pombredanne as the author of this particular piece of the code, can you share why colons are treated special in this context? :-) |
I just had this long discussion with the folks at Syft to fix a bug in their tool which caused improper PURLs to be generated. Colons in names, namespaces, in versions (probably) need to be escaped. See this bug: >>> p = PackageURL.from_string('pkg:nuget/libiconv%3A%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292')
>>> p.to_string()
'pkg:nuget/libiconv:%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292'
>>> PackageURL.from_string("pkg:nuget/libiconv:%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292")
Traceback (most recent call last):
File "<input>", line 1, in <module>
PackageURL.from_string("pkg:nuget/libiconv:%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292")
File "/Users/tek30584/repos/asset/kodiak-orchestrator/.venv.python312/lib/python3.12/site-packages/packageurl/__init__.py", line 508, in from_string
raise ValueError(msg)
ValueError: Invalid purl 'pkg:nuget/libiconv:%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292' cannot contain a "user:pass@host:port" URL Authority component: ''. Also: >>> p = PackageURL(type="nuget", name="libiconv: character set conversion library", version="1.9", qualifiers={'package-id': 'e11a609df352e292'})
>>> p
PackageURL(type='nuget', namespace=None, name='libiconv: character set conversion library', version='1.9', qualifiers={'package-id': 'e11a609df352e292'}, subpath=None)
>>> p.to_string()
'pkg:nuget/libiconv:%20character%20set%20conversion%[email protected]?package-id=e11a609df352e292' This is a huge blocker for us, so I'm going to work on a PR. |
OK, so let's talk about the standard, from here: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst Namespace:
Name:
Version:
Qualifiers:
Subpath:
So, it would appear there are incorrect tests. For example:
Hmm...does this mean the test data is wrong?
If the version is supposed to be percent-encoded, then a version of How do we go forward with this? This is also incorrect:
The spec says the value of a qualifier must be percent-encoded. Thus, the Where do I go from here? |
This is a misunderstanding about what "percent-encoded" means. |
@matt-phylum I don't think that's right, the spec says |
That is what the spec says, but it's not very clear about it. "It is unambiguous unencoded everywhere" means it is never percent encoded. However, it is actually wrong in some cases. For example, |
Please see the comment above: #152 (comment) package-url python can't de-serialize a string it generates, instead throwing an error. I'm pretty sure |
That's a bug in packageurl-python's parsing code. No other PURL implementation that I am aware of fails in this way. althonos/packageurl.rs, anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-php, package-url/packageurl-ruby, package-url/packageurl-swift, phylum-dev/purl, sonatype/package-url-java all decode the PURL with the unencoded colon correctly. maennchen/purl successfully parses but doesn't decode the name correctly: maennchen/purl#10 |
That's really interesting. I mean, the error message isn't wrong as a PURL
Just curious: are there tests that show it decoding a PURL with a |
I ran everything. I found that I often wanted to know how all the different PURL implementations would handle things so I wrote https://github.com/phylum-dev/purl-survey which provides a purl->json and json->purl for every implementation that I know of. It's been really useful for things like the |
@matt-phylum Awesome! So...sounds like the PR needs to fix the decoding...not the encoding. |
Is there really a package named "libiconv: character set conversion library"? I thought NuGet IDs could not contain spaces. |
I believe the "name" comes from the binary found by Syft. We recently closed a bug with syft where they now encode |
This attempts to comply with the specification. Closes package-url#152 (I think) See discussion at package-url#152
This attempts to comply with the specification. Closes package-url#152 (I think) See discussion at package-url#152 Signed-off-by: Joshua Kugler <[email protected]>
This attempts to comply with the specification. Closes package-url#152 (I think) See discussion at package-url#152 Signed-off-by: Joshua Kugler <[email protected]>
I've opened a PR. I hope it can be merged quickly and unblock my team. :) |
Syft is making Get the name of a DLL from the resources (this is wrong): Look up the name from the ProductName resource: Generate a PURL using that name: The specific iconv string likely comes from some variant of libiconv.rc: It might work for cases where the file comes from a NuGet package and the project producing the NuGet package is using the .Net SDK to build the package and neither the name of the package nor the product name have been customized. Syft is making a big assumption and will often be wrong. In this case it's producing a PURL that refers to a package that AFAIK cannot exist because its name contains characters forbidden by the nuspec spec, but it could produce the name of a different NuGet package. |
It is possible to create PackageURL objects that contain invalid fields, specifically by using the
PackageURL
kwarg constructor and passing in values that contain colons.Simple example:
On closer inspection, it looks like the problem might be that colons (
:
) are not being percent-encoded correctly? I would expect the colon in the name to be encoded to%3A
, but it looks like it is being left as a literal:
in theto_string()
function:I'm not sure I'm interpreting the PURL spec correctly with regards to the treatment of colon characters, but on the surface it sounds like any colon appearing within an individual field value should simply be percent-encoded if the field itself calls for it.
The text was updated successfully, but these errors were encountered: