Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR-017 single binary sync #1521

Closed
wants to merge 7 commits into from

Conversation

mojtaba-esk
Copy link
Contributor

Overview

Checklist

  • New and updated code has appropriate documentation
  • New and updated code has new and/or updated testing
  • Required CI checks are passing
  • Visual proof for any user facing features like CLI or documentation updates
  • Linked issues closed with keywords

@MSevey MSevey requested a review from a team March 22, 2023 15:45
@github-actions
Copy link

PR Preview Action v1.3.0
🚀 Deployed preview to https://celestiaorg.github.io/celestia-app/pr-preview/pr-1521/
on branch gh-pages at 2023-03-22 15:45 UTC

@evan-forbes
Copy link
Member

is this a draft?

@rootulp rootulp changed the title docs:adr-017 Draft of the ADR for single binary upgrade added docs: ADR-017 single binary sync Mar 22, 2023
@evan-forbes evan-forbes marked this pull request as draft March 23, 2023 08:50

It important to sync from genesis with one binary.
It is needed for wallets, block explorers, indexers, analytics (e.g. Nansen).
It makes it possible to sync headers from chain and download blobs from third party service.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify this. In the meeting the point was made that the Celestia network doesn't need to retain all historical blob data and that blobs could eventually be pruned and fetched from a third party service.

I don't see how single binary sync inherently makes this possible. Maybe this can state that an engineering requirement is that a single celestia-app binary can fetch all historical block headers from the chain (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think might be able to modify this a bit to discuss the added functionality, which is that a single binary is capable of verifying the entire chain. Syncing using a single binary ideally also makes it easy to maintain all of the different binaries, so that node operators can do that in a safe way.

ideally, the first part of this document should provide the context of why, and a summarizing some of the arguments or approaches for the rest of the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumping this 🙂


Instead of having a single binary, we can have multiple binaries that are managed by another supervisory binary e.g. [Cosmovisor](https://docs.desmos.network/fullnode/cosmovisor/).
Osmosis does it through having a significant number of binaries gathered in one docker container and switch them in particular heights.
This approach has a lot of maintenance burden and is fragile to sync from genesis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has a lot of maintenance burden

I'm not convinced that the Osmosis approach has any more maintenance burden than what we're proposing with this ADR.

and is fragile to sync from genesis

Are there scenarios where Osmosis sync from genesis fails? Should we include data to support this point?

Comment on lines +38 to +40
- UX issue to improve experience for node operators to upgrade

- Maintenance issue for maintaining historical releases of binaries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can articulate these as the two compelling points driving this ADR. So proposal to life them to the top as part of a problem statement section

Comment on lines +42 to +70
- What granularity should conditional logic be?
- App struct level
- Module level
- if/else on a line by line basis

- Why app version and not block height?
- App version is bumped via upgrade module
- Depending on app version is universal across multiple networks.
- Block height will differ across testnets / chain-ids so depends on a `.toml` config file

- Are we planning on upstream this?
- [CometBFT](https://github.com/cometbft/cometbft) is working on this for consensus.
- We’re working on Cosmos SDK portion

- `celestia-node` light nodes need to version erasure coding block or celestia-node should import this from celestia-app

- Will celestia-node need to pass a block height to celestia-app? Header contains version.app (state machine version) and version.block (consensus version)

- Is it important to support rolling back versions? Involves upgrading (again) through the faulty version

- Since if conditional logic uses one Go version, all code will be compiled with this one Go version. If behavior changes across Go versions, we have to debug why.

- Option: depend on multiple major versions of a dependency. What is the relationship between if/else conditional statements and Cosmos SDK upstream changes?

- How to do this in CometBFT?
- Always use latest p2p version
- Version.block in header should be used to version changes to Tendermint / CometBFT

- Block sync needs to be aware of changes across block heights
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these raw notes make sense to include in the ADR but defer to your judgement

}
```

This option has an advantage over the option #1 as we do not need to pass an extra parameter everywhere in the code and it has less maintenance burden to modify the version logic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counterpoint is that option 2 makes it more difficult to unit test because pure functions are easier to test than impure functions.

Implementations in option 2 rely on a global state. It is likely still possible to unit test by mocking the version module with a test version module but I think it makes sense to call this out as a downside.

Related question: how do we ensure all necessary packages that want to read the global version will invoke it after it has been set?

There are multiple ways of doing it. The app version can be found in the SDK context.
So, in the `PrepareProposal` method, we can get the app version [from the context](https://github.com/celestiaorg/celestia-app/blob/main/app/prepare_proposal.go#L28) and use it as the version indicator.

#### Option 1: Passing the app version throughout all the required functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start!!

the initial notes are a really good start, but we should take that information and construct it into a more cohesive ADR document.

@rootulp rootulp added the ADR item is directly relevant to writing or modifying an ADR label Mar 29, 2023
This was referenced Mar 30, 2023
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need to flesh this out a bit by providing code examples of using the app version and discussing a few of the edge cases, but I think we're on the right track.

we should also convert this to an ADR and remove any notes that are not relevant to the discussion/decision when we get the chance 🙂

#### Option 1: Passing the app version throughout all the required functions

In this case we need to pass either sdk context or the app version itself to all the functions and objects that we think they might be affected by a version change in future.
So for those functions that do not have the ctx already, we need to add an extra parameter to pass it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which functions do not have access to the app version? I believe just PrepareProposal, as the header is passed to ProcessProposal

Comment on lines +104 to +105
func mergeMaps(mapOne, mapTwo map[coretypes.TxKey]ShareRange) map[coretypes.TxKey]ShareRange {
merged := make(map[coretypes.TxKey]ShareRange, len(mapOne)+len(mapTwo))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is ShareRange here?


## Decision

TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's update this to using the app version and arbitrary switch statements


It important to sync from genesis with one binary.
It is needed for wallets, block explorers, indexers, analytics (e.g. Nansen).
It makes it possible to sync headers from chain and download blobs from third party service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumping this 🙂

Comment on lines +42 to +54
- What granularity should conditional logic be?
- App struct level
- Module level
- if/else on a line by line basis

- Why app version and not block height?
- App version is bumped via upgrade module
- Depending on app version is universal across multiple networks.
- Block height will differ across testnets / chain-ids so depends on a `.toml` config file

- Are we planning on upstream this?
- [CometBFT](https://github.com/cometbft/cometbft) is working on this for consensus.
- We’re working on Cosmos SDK portion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these just notes from the call? if so, perhaps we should clean this up and focus on the content of this ADR before merging

@rootulp rootulp added this to the Post-mainnet milestone Jul 12, 2023
@MSevey MSevey removed their request for review July 24, 2023 13:03
@evan-forbes
Copy link
Member

closing for now in favor of #1562

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADR item is directly relevant to writing or modifying an ADR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants