Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in the description of the checksum qualifier. #73

Open
andrewstein opened this issue Feb 11, 2020 · 3 comments
Open

Inconsistency in the description of the checksum qualifier. #73

andrewstein opened this issue Feb 11, 2020 · 3 comments

Comments

@andrewstein
Copy link

According to the spec:

checksum is a qualifier for one or more checksums stored as a comma-separated list. Each item in the value is in form of lowercase_algorithm:hex_encoded_lowercase_value

and an abbreviated example is given as checksum=sha1:ad9503c3e994a4f...

However, also according to the spec:

A [qualifier] value must be a percent-encoded string

And to build a purl sting which has qualifiers, one must

create a string by joining the lowercased key, the equal '=' sign and the percent-encoded value to create a qualifier

In a percent-encoded string, the colon character, ':', is encoded as '%3A'. And in fact the reference java implementation will encode the above as checksum=sha1%3Aad9503c3e994a4f...

@andrewstein
Copy link
Author

andrewstein commented Feb 11, 2020

Also, as @jdillon has pointed out to me, the documentation sometimes uses "checksum" and sometimes "checksums"...

@andrewstein
Copy link
Author

andrewstein commented Feb 11, 2020

And @jdillon has further pointed out to me that near the top of the spec we have a docker example with

pkg:docker/gcr.io/customer/dockerimage@sha256:244fd47e07d1004f0aed9c

and lower down we have

pkg:docker/gcr.io/customer/dockerimage@sha256%3A244fd47e07d10

So it would seem that the inconsistency relating to the percent-encoding of ':' is also in the docker version, not just the checksum(s).

@matt-phylum
Copy link
Contributor

"Percent encoding" is just a means of encoding and does not specify what should be encoded.

The PURL spec says this colon does not need to be encoded: "the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere".

However, if you're writing software that deals with PURLs, you should expect to see incoming PURLs like checksum=sha1:ad9503c3e994a4f (canonical), checksum=sha1%3Aad9503c3e994a4f (something generic like encodeURIComponent), or even checksum=%73%68%61%31%3A%61%64%39%35%30%33%63%33%65%39%39%34%61%34%66 (maximally encoded). Technically those are all valid and equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants