-
Notifications
You must be signed in to change notification settings - Fork 59
Comparing changes
Open a pull request
base repository: multiformats/go-multihash
base: v0.0.14
head repository: multiformats/go-multihash
compare: v0.0.15
- 18 commits
- 18 files changed
- 2 contributors
Commits on Mar 6, 2021
-
Refactor registry system: no direct dependencies; expose standard has…
…h.Hash; be a data carrier. --- Dependencies: Previously, depending on the go-mulithash package would create direct dependencies to several other modules for their various hash function implementations. This meant that instead of go-multihash being a lightweight, easy-to-accept dependency itself, it became something which would noticably increase the size of your go.mod file, your package graph, your download sizes during development, and most concerningly, your compile output size in final products. Now, there is a registry system (see the Register function), and the main go-multihash package *only* populates the registry with hashes that are available from the golang standard library by default. This means you gain no transitive dependencies on other libraries by importing the go-multihash package, and your binaries will not be bloated by hashers you don't use. (Your go.mod file may still show more repos; but they don't end up in your builds unless you actually refer to them). There are now several new packages in `go-multihash/register/*`. These can be imported to register the hashes in those packages. If you want all the hashes that were previously available, just make sure to import "go-mulithash/register/all" somewhere in your program. (You can register hashes, too, without making PRs to this library; these packages are just here for convenience and easy use.) **This is a breaking change** if you used hashes not found in the golang stdlib, such as blake2 and sha3. However, to update, all you need to do is ensure the relevant `go-multihash/register/*` package is imported anywhere in your program -- an easy change. A `go-multihash/register/all` package can be imported to get a hasher registered for all of the same multihash codes as before (but will correspondingly add the dependency weight back too, of course). --- Standard hash.Hash: Previously, go-multihash had its own definition of a `HashFunc` interface, and only exposed hashing through the `multihash.Sum` method. The problem with this was these definitions did not support streaming: one had to have an entire chunk of memory loaded at once, in a single contiguous byte slice, in order to hash it. (A second, admittedly much more minor, problem with this was that one often had to write glue code to turn a `hash.Hash` into a `multihash.HashFunc`, and since most of golang uses the standard lib `hash.Hash` definition already, this was generally avoidable friction.) Now, the Register function operates in terms of standard `hash.Hash`, and there is a `GetHasher` function which can get you a `hash.Hash`. (Okay, to be more precise, these functions take and return a factory function for a `hash.Hash`. You get the idea.) Since the standard `hash.Hash` interface can operate streamingly, now it's easy to use go-multihash in a streaming way. The `multihash.Sum` method works the same as always. --- Be a data carrier: Previously, go-multihash contained checks that any multihash indicator codes being handled were required to have a hash function registered for them. This made it very difficult to use go-multihash in a "forward compatible" way (and it also made a lot of practical bumps for this dependency-extraction refactor). Now, go-multihash is willing to carry data, even if it doesn't know what kind of hash function would be associated with an indicator code. (Methods that you'd expect to parse things do still parse the varints, making sure they're sanely formatted. They just don't inspect and whitelist the actual integers anymore.) I removed the `ValidCode` predicate entirely. It doesn't seem to serve any good purpose anymore. --- Other: I have not touched the `Codes` and `Names` maps in this diff. I think we should probably review (and probably remove) these, and instead direct people to use the go-multicodec package instead, which has the two advantages of decoupling registration of an implementation versus simply having a description, and also being automatically generated from the multiformats table. However, I wanted to check on feelings about this before doing the work (especially because they're somewhat entangled with a bunch of the tests in this package, making their removal somewhat nontrivial). Most of the test files are now `package multihash_test`. This makes for some colorful diffs, but is not otherwise interesting. The reason for this is because the dependency separation process now requires the tests to import those `register/*` packages, and to avoid a cycle, that means, well, `package mulithash_test`. I think there's probably more work to be done in making this library really shine. For example, in reviewing the `Encode` function, I see some allocations that look very likely to be avoidable... if the function was redesigned to be more aware of how it's likely to be used. However, I took no action on this, in part because this diff is big enough already, and in part because I think it might be reasonable to re-examine the relationship of this code to go-cid at the same time. I dropped `TestSmallerLengthHashID`. It appeared to be testing an API that wasn't actually exported... and the nearest API that *is* exported (Sum) has a general contract of truncating a hash upon short length, so it was overall unclear what this test should be checking. Review might be needed on this. The situation for murmur3 is still in need of resolution. It's commented out entirely for now. Questions are noted in the diff. There's a 'register/miniosha256' package which sets the sha2-256 implementation to a non-stdlib one. If you don't import this package, you still get a sha2-256; it's just the stdlib one. I did not include this in the 'register/all' group. (Maybe it's faster; maybe it's not; but it's definitely not required, and I'm getting some reports it also shows weird on profiles, so I tend to think maybe one should really have to explicitly ask for this one.)
Configuration menu - View commit details
-
Copy full SHA for 9e12c3a - Browse repository at this point
Copy the full SHA 9e12c3aView commit details -
Return to using constant pool from root package.
It seems likely that we should replace many of these with values from https://github.com/multiformats/go-multicodec in the future, however, we have a new issue to track that: #137 And as noted in #136 (comment) it's time to reign in the amount of work going on in this PR.
Configuration menu - View commit details
-
Copy full SHA for 382c37a - Browse repository at this point
Copy the full SHA 382c37aView commit details -
Optimize doubleSha256 construction.
Avoid allocations with more reuse the same slice for results.
Configuration menu - View commit details
-
Copy full SHA for 5647e22 - Browse repository at this point
Copy the full SHA 5647e22View commit details -
Agreed upon in discussion at: #136 (comment) The "REVIEW" comments in the outgoing diff identify the concerns that led to this choice.
Configuration menu - View commit details
-
Copy full SHA for 6897150 - Browse repository at this point
Copy the full SHA 6897150View commit details -
Use unexported constants to describe blake ranges.
Per discussion in: #136 (comment)
Configuration menu - View commit details
-
Copy full SHA for 8c1b61b - Browse repository at this point
Copy the full SHA 8c1b61bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 57ca955 - Browse repository at this point
Copy the full SHA 57ca955View commit details -
Configuration menu - View commit details
-
Copy full SHA for 10afd22 - Browse repository at this point
Copy the full SHA 10afd22View commit details
Commits on Mar 8, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 442f64a - Browse repository at this point
Copy the full SHA 442f64aView commit details
Commits on Mar 9, 2021
-
Resolving review topics around Encode function.
Leaving the error return in, per #136 (comment) May also be interesting to note: I did check if ValidCode (or, roughly the same thing but perhaps renamed to KnownCode) could be reintroduced, and the answer is... actually no, not without broaching other issues. A body of `_, ok := registry[code]` is enough to open a can of worms. See #136 (comment)
Configuration menu - View commit details
-
Copy full SHA for 67c208a - Browse repository at this point
Copy the full SHA 67c208aView commit details
Commits on Mar 10, 2021
-
Replace need for DefaultLength in opts processing.
This creates a hasher long enough to ask it its properties. This is arguably creating garbage; on the other hand, I don't think this is a codepath ever likely to be used in a hot loop anywhere. We can extract this to something that caches the properties later if it proves necessary. It quietly defaults to zero for unknown codes. I don't know if this makes sense, but it's what the old code would have done, so it's what the new code will do, and I'm not looking deeper into it. At this point I'm just trying to make surgically minimal alterations and get this changeset as a whole wrapped up so things can move on.
Configuration menu - View commit details
-
Copy full SHA for bc5cc89 - Browse repository at this point
Copy the full SHA bc5cc89View commit details -
Proactively reject registration of nil functions.
Surely no one would try to do this, nor then be surprised if it creates problems. On the other hand: if someone *does* do this, and the error doesn't appear until arbitrarily far away when the map is read, it's a pain to diagnose... and checking it up front is cheap. So, here we go: check it up front.
Configuration menu - View commit details
-
Copy full SHA for 7be6719 - Browse repository at this point
Copy the full SHA 7be6719View commit details -
Improve situation around ErrSumNotSupported.
The number that's missing is now reported in the error. This is done using the golang error wrap feature. We're keeping a constant value for ErrSumNotSupported for compatibility and change avoidance reasons. Code that cares about this exact error will still have to change; it is now necessary to use `errors.Is` to detect it. The text in the ErrSumNotSupported value is also updated, because the text no longer made sense given an open registry system. As a target-of-opportunity fix, it also now follows golang normative conventions for error messages (no capitalization, no punctuation).
Configuration menu - View commit details
-
Copy full SHA for cbd218c - Browse repository at this point
Copy the full SHA cbd218cView commit details -
Revive TestSmallerLengthHashID, and add a special case for identity m…
…ultihash that rejects truncations. See #136 (comment) for discussion. This change means Sum behaves slightly differently for identity multihashes than it does for any other multihash. I'm not keeping score on the number of ways identity multihash is weird anymore, just documenting it and keeping tests passing. The error message is lifted from the old `sumID` function verbatim.
Configuration menu - View commit details
-
Copy full SHA for 393ba0d - Browse repository at this point
Copy the full SHA 393ba0dView commit details -
We'll need something a little more recent than 1.11.x to be able to use the fmt "%w" and errors.Is features.
Configuration menu - View commit details
-
Copy full SHA for aff2570 - Browse repository at this point
Copy the full SHA aff2570View commit details -
update dependencies, and go mod tidy.
I swear I did not mean to be doing this on a branch, but: https://travis-ci.com/github/multiformats/go-multihash/builds/219591152 - it became necessary to update the go version in CI for %w features - that apparently adds pointer alignment checking to the compiler... - which flunks some stuff in x/crypto... So, okay, we're updating libraries now!
Configuration menu - View commit details
-
Copy full SHA for 4b5e8aa - Browse repository at this point
Copy the full SHA 4b5e8aaView commit details -
Reintroduce DefaultLengths; populate during Register.
While it was possible to remove all use of this from this repo, when attempting propagate changes to downstreams consuming it, it became apparently that other repos also rely on this symbol. Whether or not those usages are important and intentional, whether they're actually worth maintaining, and whether they'd be replacable with other approaches... is not considered at this time. (Probably we should be asking this! The first occasions where this cropped up are in other functions that have been marked "deprecated" since... 2018! But... chasing those things down and straightening them out is becoming problematic. Perhaps we'll be more ready to revisit these things at a later date.)
Configuration menu - View commit details
-
Copy full SHA for 1a96911 - Browse repository at this point
Copy the full SHA 1a96911View commit details
Commits on Mar 11, 2021
-
Add core package (min deps) and give root package full transitive dep…
…s again. The transitive dependencies in the root package are still managed by this whole registry system (which now resides in the 'core' package), so we have parity and *mostly* just one suite of code to maintain. You can now use this 'core' package when interested in dependency minimization. In exchange, the "register/*" imports may be required. You can just also just yank updates to go-multihash, and if you were already using it (and happen to be using one of these now-optional(-but-only-if-you-use-core))... nothing should actually change; the "register/*" imports won't be required because the root package does them for you. Some constants are replicated. This was the minimum step necessary to avoid import cycles. I'm not spending time prettifying it because we really probably ought to be refactoring this to use the package of constants in go-multicodec that's automatically generated, and yet, that is a scope limit we're trying not to cross during this changeset.
Configuration menu - View commit details
-
Copy full SHA for cebc9f8 - Browse repository at this point
Copy the full SHA cebc9f8View commit details -
Merge pull request #136 from multiformats/dependency-separation
Refactor registry system: no direct dependencies; expose standard hash.Hash; be a data carrier.
Configuration menu - View commit details
-
Copy full SHA for c3ba253 - Browse repository at this point
Copy the full SHA c3ba253View commit details
There are no files selected for viewing