Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit d110b6f

Browse files
evan-forbesliamsiadlerjohnHlib Kanunnikovrenaynay
authoredSep 22, 2021
Refactor DAH creation to better accommodate celestia-node use case (#539)
* Basic DA functionality (#83) * move Messages field to the end of Block.Data * Add some constants for share computation and the NMT: - also a bunch of todos regarding shares computation * First (compiling) stab on creating shares * Test with Evidence and fix bug discovered by test * remove resolved todos * introduce split method * Introduce LenDelimitedMarshaler interface and some reformatting * Introduce TxLenDelimitedMarshaler * add some test cases * fix some comments * fix some comments & linter * Add reserved namespaces to params * Move ll-specific consts into a separate file (consts.go) * Add MarshalDelimited to HexBytes * Add tail-padding shares * Add ComputeShares method on Data to compute all shares * Fix compute the next square num and not the next power of two * lints * Unexport MakeShares function: - it's likely to change and it doesn't have to be part of the public API * lints 2 * First stab on computing row/column roots * fix rebase glitches: - move DA related constants out of params.go * refactor MakeBlock to take in interm. state roots and messages * refactor state.MakeBlock too * Add todos LenDelimitedMarshaler and extract appendShares logic * Simplify shares computation: remove LenDelimitedMarshaler abstraction * actually use DA header to compute the DataRoot everywhere (will lead to failing tests for sure) * WIP: Update block related core data structures in protobuf too * WIP: fix zero shares edge-case and get rid of Block.Data.hash (use dataAvailabilityHeader.Hash() instead) * Fixed tests, only 3 failing tests to go: TestReapMaxBytesMaxGas, TestTxFilter, TestMempoolFilters * Fix TestTxFilter: - the size of the wrapping Data{} proto message increased a few bytes * Fix Message proto and `DataFromProto` * Fix last 2 remaining tests related to the increased block/block.Data size * Use infectious lib instead of leopard * proto-lint: snake_case * some lints and minor changes * linter * panic if pushing to tree fails, extend Data.ToProto() * revert renaming in comment * add todo about refactoring as soon as the rsmt2d allows the user to choose the merkle tree * clean up some unused test helper functions * linter * still debugging the exact right number of bytes for max data... * Implement spec-compliant share splitting (#246) * Export block data compute shares. * Refactor to use ShareSize constant directly. * Change message splitting to prefix namespace ID. * Implement chunking for contiguous. * Add termination condition. * Rename append contiguous to split contiguous. * Update test for small tx. * Add test for two contiguous. * Make tx and msg adjusted share sizes exported constants. * Panic on hopefully-unreachable condition instead of silently skipping. * Update hardcoded response for block format. Co-authored-by: Ismail Khoffi <[email protected]> * fix overwrite bug (#251) * fix overwrite bug and stop splitting shares of size MsgShareSize * remove ineffectual code * review feedback: better docs Co-authored-by: Ismail Khoffi <[email protected]> * remove uneeded copy and only fix the source of the bug Co-authored-by: Ismail Khoffi <[email protected]> * fix overwrite bug while also being consistent with using NamespacedShares * update to the latest rsmt2d for the nmt wrapper Co-authored-by: Ismail Khoffi <[email protected]> * Spec compliant merge shares (#261) * start spec compliant share merging * refactor and finish unit testing * whoops * linter gods * fix initial changes and use constants * use constant * more polish * docs fix* review feedback: docs and out of range panic protection * review feedback: add panic protection from empty input * use constant instead of recalculating `ShareSize`* don't redeclare existing var* be more explicit with returned nil* use constant instead of recalculating `ShareSize`* review feedback: use consistent capitalization * stop accepting reserved namespaces as normal messages * use a descriptive var name for message length * linter and comparison fix * reorg tests, add test for parse delimiter, DataFromBlock and fix evidence marshal bug * catch error for linter * update test MakeShares to include length delimiters for the SHARE_RESERVED_BYTE * minor iteration change * refactor share splitting to fix bug * fix all bugs with third and final refactor * fix conflict * revert unnecessary changes * review feedback: better docs* reivew feedback: add comment for safeLen * review feedback: remove unnecessay comments * review feedback: split up share merging and splitting into their own files * review feedback: more descriptive var names * fix accidental change * add some constant docs * spelling error Co-authored-by: Hlib Kanunnikov <[email protected]> Co-authored-by: John Adler <[email protected]> Co-authored-by: Ismail Khoffi <[email protected]> * refactor to better accomodate real world use cases (celestia node) Co-authored-by: rene <[email protected]> * thank you linter Co-authored-by: Ismail Khoffi <[email protected]> Co-authored-by: John Adler <[email protected]> Co-authored-by: Hlib Kanunnikov <[email protected]> Co-authored-by: rene <[email protected]>
1 parent 6e592de commit d110b6f

10 files changed

+1089
-168
lines changed
 

‎pkg/consts/consts.go

+6
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"crypto/sha256"
55

66
"github.com/celestiaorg/nmt/namespace"
7+
"github.com/celestiaorg/rsmt2d"
78
)
89

910
// This contains all constants of:
@@ -61,4 +62,9 @@ var (
6162

6263
// NewBaseHashFunc change accordingly if another hash.Hash should be used as a base hasher in the NMT:
6364
NewBaseHashFunc = sha256.New
65+
66+
// DefaultCodec is the default codec creator used for data erasure
67+
// TODO(ismail): for better efficiency and a larger number shares
68+
// we should switch to the rsmt2d.LeopardFF16 codec:
69+
DefaultCodec = rsmt2d.NewRSGF8Codec
6470
)

‎pkg/da/data_availability_header.go

+25-32
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ import (
1414
)
1515

1616
const (
17-
maxDAHSize = consts.MaxSquareSize * 2
18-
minDAHSize = consts.MinSquareSize * 2
17+
maxExtendedSquareWidth = consts.MaxSquareSize * 2
18+
minExtendedSquareWidth = consts.MinSquareSize * 2
1919
)
2020

2121
// DataAvailabilityHeader (DAHeader) contains the row and column roots of the erasure
@@ -38,10 +38,23 @@ type DataAvailabilityHeader struct {
3838
}
3939

4040
// NewDataAvailabilityHeader generates a DataAvailability header using the provided square size and shares
41-
func NewDataAvailabilityHeader(squareSize uint64, shares [][]byte) (DataAvailabilityHeader, error) {
41+
func NewDataAvailabilityHeader(eds *rsmt2d.ExtendedDataSquare) DataAvailabilityHeader {
42+
// generate the row and col roots using the EDS
43+
dah := DataAvailabilityHeader{
44+
RowsRoots: eds.RowRoots(),
45+
ColumnRoots: eds.ColRoots(),
46+
}
47+
48+
// generate the hash of the data using the new roots
49+
dah.Hash()
50+
51+
return dah
52+
}
53+
54+
func ExtendShares(squareSize uint64, shares [][]byte) (*rsmt2d.ExtendedDataSquare, error) {
4255
// Check that square size is with range
4356
if squareSize < consts.MinSquareSize || squareSize > consts.MaxSquareSize {
44-
return DataAvailabilityHeader{}, fmt.Errorf(
57+
return nil, fmt.Errorf(
4558
"invalid square size: min %d max %d provided %d",
4659
consts.MinSquareSize,
4760
consts.MaxSquareSize,
@@ -50,32 +63,14 @@ func NewDataAvailabilityHeader(squareSize uint64, shares [][]byte) (DataAvailabi
5063
}
5164
// check that valid number of shares have been provided
5265
if squareSize*squareSize != uint64(len(shares)) {
53-
return DataAvailabilityHeader{}, fmt.Errorf(
66+
return nil, fmt.Errorf(
5467
"must provide valid number of shares for square size: got %d wanted %d",
5568
len(shares),
5669
squareSize*squareSize,
5770
)
5871
}
59-
6072
tree := wrapper.NewErasuredNamespacedMerkleTree(squareSize)
61-
62-
// TODO(ismail): for better efficiency and a larger number shares
63-
// we should switch to the rsmt2d.LeopardFF16 codec:
64-
extendedDataSquare, err := rsmt2d.ComputeExtendedDataSquare(shares, rsmt2d.NewRSGF8Codec(), tree.Constructor)
65-
if err != nil {
66-
return DataAvailabilityHeader{}, err
67-
}
68-
69-
// generate the row and col roots using the EDS
70-
dah := DataAvailabilityHeader{
71-
RowsRoots: extendedDataSquare.RowRoots(),
72-
ColumnRoots: extendedDataSquare.ColRoots(),
73-
}
74-
75-
// generate the hash of the data using the new roots
76-
dah.Hash()
77-
78-
return dah, nil
73+
return rsmt2d.ComputeExtendedDataSquare(shares, consts.DefaultCodec(), tree.Constructor)
7974
}
8075

8176
// String returns hex representation of merkle hash of the DAHeader.
@@ -143,16 +138,16 @@ func (dah *DataAvailabilityHeader) ValidateBasic() error {
143138
if dah == nil {
144139
return errors.New("nil data availability header is not valid")
145140
}
146-
if len(dah.ColumnRoots) < minDAHSize || len(dah.RowsRoots) < minDAHSize {
141+
if len(dah.ColumnRoots) < minExtendedSquareWidth || len(dah.RowsRoots) < minExtendedSquareWidth {
147142
return fmt.Errorf(
148143
"minimum valid DataAvailabilityHeader has at least %d row and column roots",
149-
minDAHSize,
144+
minExtendedSquareWidth,
150145
)
151146
}
152-
if len(dah.ColumnRoots) > maxDAHSize || len(dah.RowsRoots) > maxDAHSize {
147+
if len(dah.ColumnRoots) > maxExtendedSquareWidth || len(dah.RowsRoots) > maxExtendedSquareWidth {
153148
return fmt.Errorf(
154149
"maximum valid DataAvailabilityHeader has at most %d row and column roots",
155-
maxDAHSize,
150+
maxExtendedSquareWidth,
156151
)
157152
}
158153
if len(dah.ColumnRoots) != len(dah.RowsRoots) {
@@ -190,13 +185,11 @@ func MinDataAvailabilityHeader() DataAvailabilityHeader {
190185
for i := 0; i < consts.MinSharecount; i++ {
191186
shares[i] = tailPaddingShare
192187
}
193-
dah, err := NewDataAvailabilityHeader(
194-
consts.MinSquareSize,
195-
shares,
196-
)
188+
eds, err := ExtendShares(consts.MinSquareSize, shares)
197189
if err != nil {
198190
panic(err)
199191
}
192+
dah := NewDataAvailabilityHeader(eds)
200193
return dah
201194
}
202195

‎pkg/da/data_availability_header_test.go

+31-11
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,13 @@ func TestNewDataAvailabilityHeader(t *testing.T) {
3737
type test struct {
3838
name string
3939
expectedHash []byte
40-
expectedErr bool
4140
squareSize uint64
4241
shares [][]byte
4342
}
4443

4544
tests := []test{
4645
{
47-
name: "typical",
48-
expectedErr: false,
46+
name: "typical",
4947
expectedHash: []byte{
5048
0xfe, 0x9c, 0x6b, 0xd8, 0xe5, 0x7c, 0xd1, 0x5d, 0x1f, 0xd6, 0x55, 0x7e, 0x87, 0x7d, 0xd9, 0x7d,
5149
0xdb, 0xf2, 0x66, 0xfa, 0x60, 0x24, 0x2d, 0xb3, 0xa0, 0x9c, 0x4f, 0x4e, 0x5b, 0x2a, 0x2c, 0x2a,
@@ -54,15 +52,36 @@ func TestNewDataAvailabilityHeader(t *testing.T) {
5452
shares: generateShares(4, 1),
5553
},
5654
{
57-
name: "max square size",
58-
expectedErr: false,
55+
name: "max square size",
5956
expectedHash: []byte{
6057
0xe2, 0x87, 0x23, 0xd0, 0x2d, 0x54, 0x25, 0x5f, 0x79, 0x43, 0x8e, 0xfb, 0xb7, 0xe8, 0xfa, 0xf5,
6158
0xbf, 0x93, 0x50, 0xb3, 0x64, 0xd0, 0x4f, 0xa7, 0x7b, 0xb1, 0x83, 0x3b, 0x8, 0xba, 0xd3, 0xa4,
6259
},
6360
squareSize: consts.MaxSquareSize,
6461
shares: generateShares(consts.MaxSquareSize*consts.MaxSquareSize, 99),
6562
},
63+
}
64+
65+
for _, tt := range tests {
66+
tt := tt
67+
eds, err := ExtendShares(tt.squareSize, tt.shares)
68+
require.NoError(t, err)
69+
resdah := NewDataAvailabilityHeader(eds)
70+
require.Equal(t, tt.squareSize*2, uint64(len(resdah.ColumnRoots)), tt.name)
71+
require.Equal(t, tt.squareSize*2, uint64(len(resdah.RowsRoots)), tt.name)
72+
require.Equal(t, tt.expectedHash, resdah.hash, tt.name)
73+
}
74+
}
75+
76+
func TestExtendShares(t *testing.T) {
77+
type test struct {
78+
name string
79+
expectedErr bool
80+
squareSize uint64
81+
shares [][]byte
82+
}
83+
84+
tests := []test{
6685
{
6786
name: "too large square size",
6887
expectedErr: true,
@@ -79,15 +98,13 @@ func TestNewDataAvailabilityHeader(t *testing.T) {
7998

8099
for _, tt := range tests {
81100
tt := tt
82-
resdah, err := NewDataAvailabilityHeader(tt.squareSize, tt.shares)
101+
eds, err := ExtendShares(tt.squareSize, tt.shares)
83102
if tt.expectedErr {
84103
require.NotNil(t, err)
85104
continue
86105
}
87106
require.NoError(t, err)
88-
require.Equal(t, tt.squareSize*2, uint64(len(resdah.ColumnRoots)), tt.name)
89-
require.Equal(t, tt.squareSize*2, uint64(len(resdah.RowsRoots)), tt.name)
90-
require.Equal(t, tt.expectedHash, resdah.hash, tt.name)
107+
require.Equal(t, tt.squareSize*2, eds.Width(), tt.name)
91108
}
92109
}
93110

@@ -98,8 +115,9 @@ func TestDataAvailabilityHeaderProtoConversion(t *testing.T) {
98115
}
99116

100117
shares := generateShares(consts.MaxSquareSize*consts.MaxSquareSize, 1)
101-
bigdah, err := NewDataAvailabilityHeader(consts.MaxSquareSize, shares)
118+
eds, err := ExtendShares(consts.MaxSquareSize, shares)
102119
require.NoError(t, err)
120+
bigdah := NewDataAvailabilityHeader(eds)
103121

104122
tests := []test{
105123
{
@@ -133,8 +151,10 @@ func Test_DAHValidateBasic(t *testing.T) {
133151
}
134152

135153
shares := generateShares(consts.MaxSquareSize*consts.MaxSquareSize, 1)
136-
bigdah, err := NewDataAvailabilityHeader(consts.MaxSquareSize, shares)
154+
eds, err := ExtendShares(consts.MaxSquareSize, shares)
137155
require.NoError(t, err)
156+
bigdah := NewDataAvailabilityHeader(eds)
157+
138158
// make a mutant dah that has too many roots
139159
var tooBigDah DataAvailabilityHeader
140160
tooBigDah.ColumnRoots = make([][]byte, consts.MaxSquareSize*consts.MaxSquareSize)

‎pkg/wrapper/nmt_wrapper_test.go

+4-4
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ func TestPushErasuredNamespacedMerkleTree(t *testing.T) {
2727
tree := n.Constructor()
2828

2929
// push test data to the tree
30-
for i, d := range generateErasuredData(t, tc.squareSize, rsmt2d.NewRSGF8Codec()) {
30+
for i, d := range generateErasuredData(t, tc.squareSize, consts.DefaultCodec()) {
3131
// push will panic if there's an error
3232
tree.Push(d, rsmt2d.SquareIndex{Axis: uint(0), Cell: uint(i)})
3333
}
@@ -64,7 +64,7 @@ func TestErasureNamespacedMerkleTreePanics(t *testing.T) {
6464
"push over square size",
6565
assert.PanicTestFunc(
6666
func() {
67-
data := generateErasuredData(t, 16, rsmt2d.NewRSGF8Codec())
67+
data := generateErasuredData(t, 16, consts.DefaultCodec())
6868
n := NewErasuredNamespacedMerkleTree(uint64(15))
6969
tree := n.Constructor()
7070
for i, d := range data {
@@ -76,7 +76,7 @@ func TestErasureNamespacedMerkleTreePanics(t *testing.T) {
7676
"push in incorrect lexigraphic order",
7777
assert.PanicTestFunc(
7878
func() {
79-
data := generateErasuredData(t, 16, rsmt2d.NewRSGF8Codec())
79+
data := generateErasuredData(t, 16, consts.DefaultCodec())
8080
n := NewErasuredNamespacedMerkleTree(uint64(16))
8181
tree := n.Constructor()
8282
for i := len(data) - 1; i > 0; i-- {
@@ -104,7 +104,7 @@ func TestExtendedDataSquare(t *testing.T) {
104104

105105
tree := NewErasuredNamespacedMerkleTree(uint64(squareSize))
106106

107-
_, err := rsmt2d.ComputeExtendedDataSquare(raw, rsmt2d.NewRSGF8Codec(), tree.Constructor)
107+
_, err := rsmt2d.ComputeExtendedDataSquare(raw, consts.DefaultCodec(), tree.Constructor)
108108
assert.NoError(t, err)
109109
}
110110

‎types/block.go

+79-23
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"bytes"
55
"errors"
66
"fmt"
7+
"math"
78
"strings"
89
"time"
910

@@ -1112,6 +1113,69 @@ func (data *Data) Hash() tmbytes.HexBytes {
11121113
return data.hash
11131114
}
11141115

1116+
// ComputeShares splits block data into shares of an original data square and
1117+
// returns them along with an amount of non-redundant shares. The shares
1118+
// returned are padded to complete a square size that is a power of two
1119+
func (data *Data) ComputeShares() (NamespacedShares, int) {
1120+
// TODO(ismail): splitting into shares should depend on the block size and layout
1121+
// see: https://github.com/celestiaorg/celestia-specs/blob/master/specs/block_proposer.md#laying-out-transactions-and-messages
1122+
1123+
// reserved shares:
1124+
txShares := data.Txs.SplitIntoShares()
1125+
intermRootsShares := data.IntermediateStateRoots.SplitIntoShares()
1126+
evidenceShares := data.Evidence.SplitIntoShares()
1127+
1128+
// application data shares from messages:
1129+
msgShares := data.Messages.SplitIntoShares()
1130+
curLen := len(txShares) + len(intermRootsShares) + len(evidenceShares) + len(msgShares)
1131+
1132+
// find the number of shares needed to create a square that has a power of
1133+
// two width
1134+
wantLen := paddedLen(curLen)
1135+
1136+
// ensure that the min square size is used
1137+
if wantLen < consts.MinSharecount {
1138+
wantLen = consts.MinSharecount
1139+
}
1140+
1141+
tailShares := TailPaddingShares(wantLen - curLen)
1142+
1143+
return append(append(append(append(
1144+
txShares,
1145+
intermRootsShares...),
1146+
evidenceShares...),
1147+
msgShares...),
1148+
tailShares...), curLen
1149+
}
1150+
1151+
// paddedLen calculates the number of shares needed to make a power of 2 square
1152+
// given the current number of shares
1153+
func paddedLen(length int) int {
1154+
width := uint32(math.Ceil(math.Sqrt(float64(length))))
1155+
width = nextHighestPowerOf2(width)
1156+
return int(width * width)
1157+
}
1158+
1159+
// nextPowerOf2 returns the next highest power of 2 unless the input is a power
1160+
// of two, in which case it returns the input
1161+
func nextHighestPowerOf2(v uint32) uint32 {
1162+
if v == 0 {
1163+
return 0
1164+
}
1165+
1166+
// find the next highest power using bit mashing
1167+
v--
1168+
v |= v >> 1
1169+
v |= v >> 2
1170+
v |= v >> 4
1171+
v |= v >> 8
1172+
v |= v >> 16
1173+
v++
1174+
1175+
// return the next highest power
1176+
return v
1177+
}
1178+
11151179
type Messages struct {
11161180
MessagesList []Message `json:"msgs"`
11171181
}
@@ -1120,26 +1184,27 @@ type IntermediateStateRoots struct {
11201184
RawRootsList []tmbytes.HexBytes `json:"intermediate_roots"`
11211185
}
11221186

1123-
func (roots IntermediateStateRoots) splitIntoShares(shareSize int) NamespacedShares {
1124-
shares := make([]NamespacedShare, 0)
1187+
func (roots IntermediateStateRoots) SplitIntoShares() NamespacedShares {
1188+
rawDatas := make([][]byte, 0, len(roots.RawRootsList))
11251189
for _, root := range roots.RawRootsList {
11261190
rawData, err := root.MarshalDelimited()
11271191
if err != nil {
11281192
panic(fmt.Sprintf("app returned intermediate state root that can not be encoded %#v", root))
11291193
}
1130-
shares = appendToShares(shares, consts.IntermediateStateRootsNamespaceID, rawData, shareSize)
1194+
rawDatas = append(rawDatas, rawData)
11311195
}
1196+
shares := splitContiguous(consts.IntermediateStateRootsNamespaceID, rawDatas)
11321197
return shares
11331198
}
11341199

1135-
func (msgs Messages) splitIntoShares(shareSize int) NamespacedShares {
1200+
func (msgs Messages) SplitIntoShares() NamespacedShares {
11361201
shares := make([]NamespacedShare, 0)
11371202
for _, m := range msgs.MessagesList {
11381203
rawData, err := m.MarshalDelimited()
11391204
if err != nil {
11401205
panic(fmt.Sprintf("app accepted a Message that can not be encoded %#v", m))
11411206
}
1142-
shares = appendToShares(shares, m.NamespaceID, rawData, shareSize)
1207+
shares = appendToShares(shares, m.NamespaceID, rawData)
11431208
}
11441209
return shares
11451210
}
@@ -1346,29 +1411,20 @@ func (data *EvidenceData) FromProto(eviData *tmproto.EvidenceList) error {
13461411
return nil
13471412
}
13481413

1349-
func (data *EvidenceData) splitIntoShares(shareSize int) NamespacedShares {
1350-
shares := make([]NamespacedShare, 0)
1414+
func (data *EvidenceData) SplitIntoShares() NamespacedShares {
1415+
rawDatas := make([][]byte, 0, len(data.Evidence))
13511416
for _, ev := range data.Evidence {
1352-
var rawData []byte
1353-
var err error
1354-
switch cev := ev.(type) {
1355-
case *DuplicateVoteEvidence:
1356-
rawData, err = protoio.MarshalDelimited(cev.ToProto())
1357-
case *LightClientAttackEvidence:
1358-
pcev, iErr := cev.ToProto()
1359-
if iErr != nil {
1360-
err = iErr
1361-
break
1362-
}
1363-
rawData, err = protoio.MarshalDelimited(pcev)
1364-
default:
1365-
panic(fmt.Sprintf("unknown evidence included in evidence pool (don't know how to encode this) %#v", ev))
1417+
pev, err := EvidenceToProto(ev)
1418+
if err != nil {
1419+
panic("failure to convert evidence to equivalent proto type")
13661420
}
1421+
rawData, err := protoio.MarshalDelimited(pev)
13671422
if err != nil {
1368-
panic(fmt.Sprintf("evidence included in evidence pool that can not be encoded %#v, err: %v", ev, err))
1423+
panic(err)
13691424
}
1370-
shares = appendToShares(shares, consts.EvidenceNamespaceID, rawData, shareSize)
1425+
rawDatas = append(rawDatas, rawData)
13711426
}
1427+
shares := splitContiguous(consts.EvidenceNamespaceID, rawDatas)
13721428
return shares
13731429
}
13741430

‎types/share_merging.go

+333
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,333 @@
1+
package types
2+
3+
import (
4+
"bytes"
5+
"encoding/binary"
6+
"errors"
7+
8+
"github.com/celestiaorg/rsmt2d"
9+
"github.com/gogo/protobuf/proto"
10+
tmbytes "github.com/tendermint/tendermint/libs/bytes"
11+
"github.com/tendermint/tendermint/pkg/consts"
12+
tmproto "github.com/tendermint/tendermint/proto/tendermint/types"
13+
)
14+
15+
// DataFromSquare extracts block data from an extended data square.
16+
func DataFromSquare(eds *rsmt2d.ExtendedDataSquare) (Data, error) {
17+
originalWidth := eds.Width() / 2
18+
19+
// sort block data shares by namespace
20+
var (
21+
sortedTxShares [][]byte
22+
sortedISRShares [][]byte
23+
sortedEvdShares [][]byte
24+
sortedMsgShares [][]byte
25+
)
26+
27+
// iterate over each row index
28+
for x := uint(0); x < originalWidth; x++ {
29+
// iterate over each share in the original data square
30+
row := eds.Row(x)
31+
32+
for _, share := range row[:originalWidth] {
33+
// sort the data of that share types via namespace
34+
nid := share[:consts.NamespaceSize]
35+
switch {
36+
case bytes.Equal(consts.TxNamespaceID, nid):
37+
sortedTxShares = append(sortedTxShares, share)
38+
39+
case bytes.Equal(consts.IntermediateStateRootsNamespaceID, nid):
40+
sortedISRShares = append(sortedISRShares, share)
41+
42+
case bytes.Equal(consts.EvidenceNamespaceID, nid):
43+
sortedEvdShares = append(sortedEvdShares, share)
44+
45+
case bytes.Equal(consts.TailPaddingNamespaceID, nid):
46+
continue
47+
48+
// ignore unused but reserved namespaces
49+
case bytes.Compare(nid, consts.MaxReservedNamespace) < 1:
50+
continue
51+
52+
// every other namespaceID should be a message
53+
default:
54+
sortedMsgShares = append(sortedMsgShares, share)
55+
}
56+
}
57+
}
58+
59+
// pass the raw share data to their respective parsers
60+
txs, err := parseTxs(sortedTxShares)
61+
if err != nil {
62+
return Data{}, err
63+
}
64+
65+
isrs, err := parseISRs(sortedISRShares)
66+
if err != nil {
67+
return Data{}, err
68+
}
69+
70+
evd, err := parseEvd(sortedEvdShares)
71+
if err != nil {
72+
return Data{}, err
73+
}
74+
75+
msgs, err := parseMsgs(sortedMsgShares)
76+
if err != nil {
77+
return Data{}, err
78+
}
79+
80+
return Data{
81+
Txs: txs,
82+
IntermediateStateRoots: isrs,
83+
Evidence: evd,
84+
Messages: msgs,
85+
}, nil
86+
}
87+
88+
// parseTxs collects all of the transactions from the shares provided
89+
func parseTxs(shares [][]byte) (Txs, error) {
90+
// parse the sharse
91+
rawTxs, err := processContiguousShares(shares)
92+
if err != nil {
93+
return nil, err
94+
}
95+
96+
// convert to the Tx type
97+
txs := make(Txs, len(rawTxs))
98+
for i := 0; i < len(txs); i++ {
99+
txs[i] = Tx(rawTxs[i])
100+
}
101+
102+
return txs, nil
103+
}
104+
105+
// parseISRs collects all the intermediate state roots from the shares provided
106+
func parseISRs(shares [][]byte) (IntermediateStateRoots, error) {
107+
rawISRs, err := processContiguousShares(shares)
108+
if err != nil {
109+
return IntermediateStateRoots{}, err
110+
}
111+
112+
ISRs := make([]tmbytes.HexBytes, len(rawISRs))
113+
for i := 0; i < len(ISRs); i++ {
114+
ISRs[i] = rawISRs[i]
115+
}
116+
117+
return IntermediateStateRoots{RawRootsList: ISRs}, nil
118+
}
119+
120+
// parseEvd collects all evidence from the shares provided.
121+
func parseEvd(shares [][]byte) (EvidenceData, error) {
122+
// the raw data returned does not have length delimiters or namespaces and
123+
// is ready to be unmarshaled
124+
rawEvd, err := processContiguousShares(shares)
125+
if err != nil {
126+
return EvidenceData{}, err
127+
}
128+
129+
evdList := make(EvidenceList, len(rawEvd))
130+
131+
// parse into protobuf bytes
132+
for i := 0; i < len(rawEvd); i++ {
133+
// unmarshal the evidence
134+
var protoEvd tmproto.Evidence
135+
err := proto.Unmarshal(rawEvd[i], &protoEvd)
136+
if err != nil {
137+
return EvidenceData{}, err
138+
}
139+
evd, err := EvidenceFromProto(&protoEvd)
140+
if err != nil {
141+
return EvidenceData{}, err
142+
}
143+
144+
evdList[i] = evd
145+
}
146+
147+
return EvidenceData{Evidence: evdList}, nil
148+
}
149+
150+
// parseMsgs collects all messages from the shares provided
151+
func parseMsgs(shares [][]byte) (Messages, error) {
152+
msgList, err := parseMsgShares(shares)
153+
if err != nil {
154+
return Messages{}, err
155+
}
156+
157+
return Messages{
158+
MessagesList: msgList,
159+
}, nil
160+
}
161+
162+
// processContiguousShares takes raw shares and extracts out transactions,
163+
// intermediate state roots, or evidence. The returned [][]byte do have
164+
// namespaces or length delimiters and are ready to be unmarshalled
165+
func processContiguousShares(shares [][]byte) (txs [][]byte, err error) {
166+
if len(shares) == 0 {
167+
return nil, nil
168+
}
169+
170+
ss := newShareStack(shares)
171+
return ss.resolve()
172+
}
173+
174+
// shareStack hold variables for peel
175+
type shareStack struct {
176+
shares [][]byte
177+
txLen uint64
178+
txs [][]byte
179+
cursor int
180+
}
181+
182+
func newShareStack(shares [][]byte) *shareStack {
183+
return &shareStack{shares: shares}
184+
}
185+
186+
func (ss *shareStack) resolve() ([][]byte, error) {
187+
if len(ss.shares) == 0 {
188+
return nil, nil
189+
}
190+
err := ss.peel(ss.shares[0][consts.NamespaceSize+consts.ShareReservedBytes:], true)
191+
return ss.txs, err
192+
}
193+
194+
// peel recursively parses each chunk of data (either a transaction,
195+
// intermediate state root, or evidence) and adds it to the underlying slice of data.
196+
func (ss *shareStack) peel(share []byte, delimited bool) (err error) {
197+
if delimited {
198+
var txLen uint64
199+
share, txLen, err = parseDelimiter(share)
200+
if err != nil {
201+
return err
202+
}
203+
if txLen == 0 {
204+
return nil
205+
}
206+
ss.txLen = txLen
207+
}
208+
// safeLen describes the point in the share where it can be safely split. If
209+
// split beyond this point, it is possible to break apart a length
210+
// delimiter, which will result in incorrect share merging
211+
safeLen := len(share) - binary.MaxVarintLen64
212+
if safeLen < 0 {
213+
safeLen = 0
214+
}
215+
if ss.txLen <= uint64(safeLen) {
216+
ss.txs = append(ss.txs, share[:ss.txLen])
217+
share = share[ss.txLen:]
218+
return ss.peel(share, true)
219+
}
220+
// add the next share to the current share to continue merging if possible
221+
if len(ss.shares) > ss.cursor+1 {
222+
ss.cursor++
223+
share := append(share, ss.shares[ss.cursor][consts.NamespaceSize+consts.ShareReservedBytes:]...)
224+
return ss.peel(share, false)
225+
}
226+
// collect any remaining data
227+
if ss.txLen <= uint64(len(share)) {
228+
ss.txs = append(ss.txs, share[:ss.txLen])
229+
share = share[ss.txLen:]
230+
return ss.peel(share, true)
231+
}
232+
return errors.New("failure to parse block data: transaction length exceeded data length")
233+
}
234+
235+
// parseMsgShares iterates through raw shares and separates the contiguous chunks
236+
// of data. It is only used for Messages, i.e. shares with a non-reserved namespace.
237+
func parseMsgShares(shares [][]byte) ([]Message, error) {
238+
if len(shares) == 0 {
239+
return nil, nil
240+
}
241+
242+
// set the first nid and current share
243+
nid := shares[0][:consts.NamespaceSize]
244+
currentShare := shares[0][consts.NamespaceSize:]
245+
// find and remove the msg len delimiter
246+
currentShare, msgLen, err := parseDelimiter(currentShare)
247+
if err != nil {
248+
return nil, err
249+
}
250+
251+
var msgs []Message
252+
for cursor := uint64(0); cursor < uint64(len(shares)); {
253+
var msg Message
254+
currentShare, nid, cursor, msgLen, msg, err = nextMsg(
255+
shares,
256+
currentShare,
257+
nid,
258+
cursor,
259+
msgLen,
260+
)
261+
if err != nil {
262+
return nil, err
263+
}
264+
if msg.Data != nil {
265+
msgs = append(msgs, msg)
266+
}
267+
}
268+
269+
return msgs, nil
270+
}
271+
272+
func nextMsg(
273+
shares [][]byte,
274+
current,
275+
nid []byte,
276+
cursor,
277+
msgLen uint64,
278+
) ([]byte, []byte, uint64, uint64, Message, error) {
279+
switch {
280+
// the message uses all of the current share data and at least some of the
281+
// next share
282+
case msgLen > uint64(len(current)):
283+
// add the next share to the current one and try again
284+
cursor++
285+
current = append(current, shares[cursor][consts.NamespaceSize:]...)
286+
return nextMsg(shares, current, nid, cursor, msgLen)
287+
288+
// the msg we're looking for is contained in the current share
289+
case msgLen <= uint64(len(current)):
290+
msg := Message{nid, current[:msgLen]}
291+
cursor++
292+
293+
// call it a day if the work is done
294+
if cursor >= uint64(len(shares)) {
295+
return nil, nil, cursor, 0, msg, nil
296+
}
297+
298+
nextNid := shares[cursor][:consts.NamespaceSize]
299+
next, msgLen, err := parseDelimiter(shares[cursor][consts.NamespaceSize:])
300+
return next, nextNid, cursor, msgLen, msg, err
301+
}
302+
// this code is unreachable but the compiler doesn't know that
303+
return nil, nil, 0, 0, Message{}, nil
304+
}
305+
306+
// parseDelimiter finds and returns the length delimiter of the message provided
307+
// while also removing the delimiter bytes from the input
308+
func parseDelimiter(input []byte) ([]byte, uint64, error) {
309+
if len(input) == 0 {
310+
return input, 0, nil
311+
}
312+
313+
l := binary.MaxVarintLen64
314+
if len(input) < binary.MaxVarintLen64 {
315+
l = len(input)
316+
}
317+
318+
delimiter := zeroPadIfNecessary(input[:l], binary.MaxVarintLen64)
319+
320+
// read the length of the message
321+
r := bytes.NewBuffer(delimiter)
322+
msgLen, err := binary.ReadUvarint(r)
323+
if err != nil {
324+
return nil, 0, err
325+
}
326+
327+
// calculate the number of bytes used by the delimiter
328+
lenBuf := make([]byte, binary.MaxVarintLen64)
329+
n := binary.PutUvarint(lenBuf, msgLen)
330+
331+
// return the input without the length delimiter
332+
return input[n:], msgLen, nil
333+
}

‎types/share_splitting.go

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
package types
2+
3+
import (
4+
"bytes"
5+
6+
"github.com/celestiaorg/nmt/namespace"
7+
"github.com/tendermint/tendermint/pkg/consts"
8+
)
9+
10+
// appendToShares appends raw data as shares.
11+
// Used for messages.
12+
func appendToShares(shares []NamespacedShare, nid namespace.ID, rawData []byte) []NamespacedShare {
13+
if len(rawData) <= consts.MsgShareSize {
14+
rawShare := append(append(
15+
make([]byte, 0, len(nid)+len(rawData)),
16+
nid...),
17+
rawData...,
18+
)
19+
paddedShare := zeroPadIfNecessary(rawShare, consts.ShareSize)
20+
share := NamespacedShare{paddedShare, nid}
21+
shares = append(shares, share)
22+
} else { // len(rawData) > MsgShareSize
23+
shares = append(shares, splitMessage(rawData, nid)...)
24+
}
25+
return shares
26+
}
27+
28+
// splitMessage breaks the data in a message into the minimum number of
29+
// namespaced shares
30+
func splitMessage(rawData []byte, nid namespace.ID) []NamespacedShare {
31+
shares := make([]NamespacedShare, 0)
32+
firstRawShare := append(append(
33+
make([]byte, 0, consts.ShareSize),
34+
nid...),
35+
rawData[:consts.MsgShareSize]...,
36+
)
37+
shares = append(shares, NamespacedShare{firstRawShare, nid})
38+
rawData = rawData[consts.MsgShareSize:]
39+
for len(rawData) > 0 {
40+
shareSizeOrLen := min(consts.MsgShareSize, len(rawData))
41+
rawShare := append(append(
42+
make([]byte, 0, consts.ShareSize),
43+
nid...),
44+
rawData[:shareSizeOrLen]...,
45+
)
46+
paddedShare := zeroPadIfNecessary(rawShare, consts.ShareSize)
47+
share := NamespacedShare{paddedShare, nid}
48+
shares = append(shares, share)
49+
rawData = rawData[shareSizeOrLen:]
50+
}
51+
return shares
52+
}
53+
54+
// splitContiguous splits multiple raw data contiguously as shares.
55+
// Used for transactions, intermediate state roots, and evidence.
56+
func splitContiguous(nid namespace.ID, rawDatas [][]byte) []NamespacedShare {
57+
shares := make([]NamespacedShare, 0)
58+
// Index into the outer slice of rawDatas
59+
outerIndex := 0
60+
// Index into the inner slice of rawDatas
61+
innerIndex := 0
62+
for outerIndex < len(rawDatas) {
63+
var rawData []byte
64+
startIndex := 0
65+
rawData, outerIndex, innerIndex, startIndex = getNextChunk(rawDatas, outerIndex, innerIndex, consts.TxShareSize)
66+
rawShare := append(append(append(
67+
make([]byte, 0, len(nid)+1+len(rawData)),
68+
nid...),
69+
byte(startIndex)),
70+
rawData...)
71+
paddedShare := zeroPadIfNecessary(rawShare, consts.ShareSize)
72+
share := NamespacedShare{paddedShare, nid}
73+
shares = append(shares, share)
74+
}
75+
return shares
76+
}
77+
78+
// getNextChunk gets the next chunk for contiguous shares
79+
// Precondition: none of the slices in rawDatas is zero-length
80+
// This precondition should always hold at this point since zero-length txs are simply invalid.
81+
func getNextChunk(rawDatas [][]byte, outerIndex int, innerIndex int, width int) ([]byte, int, int, int) {
82+
rawData := make([]byte, 0, width)
83+
startIndex := 0
84+
firstBytesToFetch := 0
85+
86+
curIndex := 0
87+
for curIndex < width && outerIndex < len(rawDatas) {
88+
bytesToFetch := min(len(rawDatas[outerIndex])-innerIndex, width-curIndex)
89+
if bytesToFetch == 0 {
90+
panic("zero-length contiguous share data is invalid")
91+
}
92+
if curIndex == 0 {
93+
firstBytesToFetch = bytesToFetch
94+
}
95+
// If we've already placed some data in this chunk, that means
96+
// a new data segment begins
97+
if curIndex != 0 {
98+
// Offset by the fixed reserved bytes at the beginning of the share
99+
startIndex = firstBytesToFetch + consts.NamespaceSize + consts.ShareReservedBytes
100+
}
101+
rawData = append(rawData, rawDatas[outerIndex][innerIndex:innerIndex+bytesToFetch]...)
102+
innerIndex += bytesToFetch
103+
if innerIndex >= len(rawDatas[outerIndex]) {
104+
innerIndex = 0
105+
outerIndex++
106+
}
107+
curIndex += bytesToFetch
108+
}
109+
110+
return rawData, outerIndex, innerIndex, startIndex
111+
}
112+
113+
// tail is filler for all tail padded shares
114+
// it is allocated once and used everywhere
115+
var tailPaddingShare = append(
116+
append(make([]byte, 0, consts.ShareSize), consts.TailPaddingNamespaceID...),
117+
bytes.Repeat([]byte{0}, consts.ShareSize-consts.NamespaceSize)...,
118+
)
119+
120+
func TailPaddingShares(n int) NamespacedShares {
121+
shares := make([]NamespacedShare, n)
122+
for i := 0; i < n; i++ {
123+
shares[i] = NamespacedShare{
124+
Share: tailPaddingShare,
125+
ID: consts.TailPaddingNamespaceID,
126+
}
127+
}
128+
return shares
129+
}
130+
131+
func min(a, b int) int {
132+
if a <= b {
133+
return a
134+
}
135+
return b
136+
}
137+
138+
func zeroPadIfNecessary(share []byte, width int) []byte {
139+
oldLen := len(share)
140+
if oldLen < width {
141+
missingBytes := width - oldLen
142+
padByte := []byte{0}
143+
padding := bytes.Repeat(padByte, missingBytes)
144+
share = append(share, padding...)
145+
return share
146+
}
147+
return share
148+
}

‎types/shares.go

-62
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,15 @@
11
package types
22

33
import (
4-
"bytes"
54
"encoding/binary"
65

76
"github.com/celestiaorg/nmt/namespace"
8-
"github.com/tendermint/tendermint/pkg/consts"
97
)
108

119
// Share contains the raw share data without the corresponding namespace.
1210
type Share []byte
1311

1412
// NamespacedShare extends a Share with the corresponding namespace.
15-
// It implements the namespace.Data interface and hence can be used
16-
// for pushing the shares to the namespaced Merkle tree.
1713
type NamespacedShare struct {
1814
Share
1915
ID namespace.ID
@@ -45,7 +41,6 @@ func (tx Tx) MarshalDelimited() ([]byte, error) {
4541
lenBuf := make([]byte, binary.MaxVarintLen64)
4642
length := uint64(len(tx))
4743
n := binary.PutUvarint(lenBuf, length)
48-
4944
return append(lenBuf[:n], tx...), nil
5045
}
5146

@@ -55,62 +50,5 @@ func (m Message) MarshalDelimited() ([]byte, error) {
5550
lenBuf := make([]byte, binary.MaxVarintLen64)
5651
length := uint64(len(m.Data))
5752
n := binary.PutUvarint(lenBuf, length)
58-
5953
return append(lenBuf[:n], m.Data...), nil
6054
}
61-
62-
func appendToShares(shares []NamespacedShare, nid namespace.ID, rawData []byte, shareSize int) []NamespacedShare {
63-
if len(rawData) < shareSize {
64-
rawShare := rawData
65-
paddedShare := zeroPadIfNecessary(rawShare, shareSize)
66-
share := NamespacedShare{paddedShare, nid}
67-
shares = append(shares, share)
68-
} else { // len(rawData) >= shareSize
69-
shares = append(shares, split(rawData, shareSize, nid)...)
70-
}
71-
return shares
72-
}
73-
74-
// TODO(ismail): implement corresponding merge method for clients requesting
75-
// shares for a particular namespace
76-
func split(rawData []byte, shareSize int, nid namespace.ID) []NamespacedShare {
77-
shares := make([]NamespacedShare, 0)
78-
firstRawShare := rawData[:shareSize]
79-
shares = append(shares, NamespacedShare{firstRawShare, nid})
80-
rawData = rawData[shareSize:]
81-
for len(rawData) > 0 {
82-
shareSizeOrLen := min(shareSize, len(rawData))
83-
paddedShare := zeroPadIfNecessary(rawData[:shareSizeOrLen], shareSize)
84-
share := NamespacedShare{paddedShare, nid}
85-
shares = append(shares, share)
86-
rawData = rawData[shareSizeOrLen:]
87-
}
88-
return shares
89-
}
90-
91-
func GenerateTailPaddingShares(n int, shareWidth int) NamespacedShares {
92-
shares := make([]NamespacedShare, n)
93-
for i := 0; i < n; i++ {
94-
shares[i] = NamespacedShare{bytes.Repeat([]byte{0}, shareWidth), consts.TailPaddingNamespaceID}
95-
}
96-
return shares
97-
}
98-
99-
func min(a, b int) int {
100-
if a <= b {
101-
return a
102-
}
103-
return b
104-
}
105-
106-
func zeroPadIfNecessary(share []byte, width int) []byte {
107-
oldLen := len(share)
108-
if oldLen < width {
109-
missingBytes := width - oldLen
110-
padByte := []byte{0}
111-
padding := bytes.Repeat(padByte, missingBytes)
112-
share = append(share, padding...)
113-
return share
114-
}
115-
return share
116-
}

‎types/shares_test.go

+458-32
Large diffs are not rendered by default.

‎types/tx.go

+5-4
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,16 @@ func (txs Txs) Proof(i int) TxProof {
8080
}
8181
}
8282

83-
func (txs Txs) splitIntoShares(shareSize int) NamespacedShares {
84-
shares := make([]NamespacedShare, 0)
85-
for _, tx := range txs {
83+
func (txs Txs) SplitIntoShares() NamespacedShares {
84+
rawDatas := make([][]byte, len(txs))
85+
for i, tx := range txs {
8686
rawData, err := tx.MarshalDelimited()
8787
if err != nil {
8888
panic(fmt.Sprintf("included Tx in mem-pool that can not be encoded %v", tx))
8989
}
90-
shares = appendToShares(shares, consts.TxNamespaceID, rawData, shareSize)
90+
rawDatas[i] = rawData
9191
}
92+
shares := splitContiguous(consts.TxNamespaceID, rawDatas)
9293
return shares
9394
}
9495

0 commit comments

Comments
 (0)
Please sign in to comment.