Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobufectomy: server #1077

Merged
merged 30 commits into from
Jul 12, 2024
Merged

Protobufectomy: server #1077

merged 30 commits into from
Jul 12, 2024

Conversation

coolreader18
Copy link
Collaborator

@coolreader18 coolreader18 commented Apr 11, 2024

Description of Changes

Replace protobuf with BSATN in the WebSocket API, and use the same message schema for both BSATN and JSON WebSocket messages.

API and ABI breaking changes

API breaking for all WebSocket clients, incl. all SDKs and the website.

Expected complexity level and risk

5 - lots of churn, breaks a major API, will require updating all SDKs and the website.

Testing

This is dependent on the C# sdk getting updated. I feel confident that CI + a botstest with the updated sdk would get everything.

Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've only reviewed changes to client-api-messages, the Rust SDK and its codegen. Those look excellent. I've left a few low-priority questions and doc-related suggestions.

/// It also will not check for duplicate IDs. They are just a way to match responses to messages.
#[derive(Deserialize, Serialize, serde::Deserialize)]
pub struct OneOffQuery {
pub message_id: Vec<u8>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a Vec<u8> rather than a u32 or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like that on older proto schema, we have a ticket to fix this - #927 waiting for breaking release. May be, this is a correct time to do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that'd be great, I'd been wondering about that as well.

Comment on lines 80 to 88
/// Query should be a "SELECT * FROM Table WHERE ...". Other types of queries will be rejected.
/// Multiple such semicolon-delimited queries are allowed.
///
/// One-off queries are identified by a client-generated messageID.
/// To avoid data leaks, the server will NOT cache responses to messages based on UUID!
/// It also will not check for duplicate IDs. They are just a way to match responses to messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comments should be moved to fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya - I tried to do that for the most part; most/all of the comments are copied from the protobuf, which tended to put field docs on the outer type.

}
}
}
pub use spacetimedb_lib::Identity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closes an outstanding ticket I had. You're my hero.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol - it was breaking something with a type mismatch and I didn't wanna do a from_bytes(to_bytes()) thing to convert it so this seemed simple enough lol.

@gefjon
Copy link
Contributor

gefjon commented Apr 11, 2024

Outstanding TODOs:

  • C# SDK.
  • Unity package of C# SDK.
  • Typescript SDK.
  • Tests of above.
  • Minor updates to Rust SDK docs now that Identity has been changed to use the actual spacetimedb_lib::Identity type.

(Plus make the tests and lints pass, obviously.)

Copy link
Contributor

@kazimuth kazimuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems straightforward enough, I don't have any particular commentary. Thanks for getting it in!

I wonder if I can auto-generate the C# side of this interface...

@@ -45,6 +45,9 @@ mod sym {
/// Matches `sats`.
pub const SATS: Symbol = Symbol("sats");

/// Matches `transparent`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used on stuff in this PR? Just curious

Copy link
Contributor

@kazimuth kazimuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional question: how do we negotiate protocol version now? I believe protobuf transparently supports changing schemas for the most part by making everything optional, but I don't think bsatn has anything like that. We should at minimum send a version number I think, either once per session or once per message...

use crate::timestamp::Timestamp;

#[derive(Deserialize, Serialize, serde::Deserialize)]
pub enum ClientMessage<Args = Row> {
Copy link
Contributor

@kazimuth kazimuth Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add additional messages here for protocol version negotiation?

pub inserts: Vec<Row>,
}

#[derive(Debug, Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Debug, Clone)]
#[derive(Debug, Clone, From)]

Comment on lines 236 to 240
impl From<Bytes> for Row {
fn from(b: Bytes) -> Self {
Row(b)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl From<Bytes> for Row {
fn from(b: Bytes) -> Self {
Row(b)
}
}

@@ -21,17 +21,28 @@ pub(crate) struct SatsType<'a> {
pub(crate) enum SatsTypeData<'a> {
Product(Vec<SatsField<'a>>),
Sum(Vec<SatsVariant<'a>>),
Transparent(SatsField<'a>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some docs here?

@@ -642,6 +618,11 @@ impl ModuleHost {
&self.info.subscriptions
}

/// internal method to access the DatabaseInstanceContext for the module. don't make this public.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be reflected in the method name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to request that the method becomes public API.

The lifetimes of dbic and module should become coupled anyway.

@coolreader18 coolreader18 force-pushed the noa/protobufectomy branch 2 times, most recently from e55d78c to 4d5cdbc Compare April 12, 2024 20:03
@bfops bfops added abi-break A PR that makes an ABI breaking change release-1.0 labels Apr 15, 2024
@coolreader18 coolreader18 force-pushed the noa/protobufectomy branch 3 times, most recently from 5e9d2e2 to f77bfac Compare April 17, 2024 19:46
@coolreader18 coolreader18 force-pushed the noa/protobufectomy branch 2 times, most recently from f111123 to 14b19e0 Compare May 14, 2024 02:58
@gefjon
Copy link
Contributor

gefjon commented Jun 5, 2024

Additional question: how do we negotiate protocol version now? I believe protobuf transparently supports changing schemas for the most part by making everything optional, but I don't think bsatn has anything like that. We should at minimum send a version number I think, either once per session or once per message...

I think we should use the WebSocket protocol for this. In the past we've been quite lax about updating the WebSocket protocol name, continuing to use v1.bin.spacetimedb even after breaking changes, but once this merges and we start providing real stability guarantees, we'll need to update the version prefix whenever we change the schema.

@coolreader18
Copy link
Collaborator Author

mm, that's a good point Phoebe - I was planning on putting a enum { V1(MessageV1) } around the message but that's a better idea.

@gefjon gefjon force-pushed the noa/protobufectomy branch from ca223ba to 7625e0c Compare June 12, 2024 14:30
gefjon added 2 commits June 12, 2024 11:34
Rename `Row` to `BsatnBytes`, as the latter is more descriptive.

Remove `ReducerId`; this change can be made in a follow-up if it's useful.

Sprinkle doc comments around the new websocket message definitions.
gefjon added 3 commits June 17, 2024 12:39
More of these which I changed for testing to get backtraces, but now need to be restored
for graceful handling.
Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly great! The only major issue here is that we are not using the fast path for BSATN serialization anymore. Otherwise, I mostly have minor comments.

))
});
let len = fields.len();
quote!(
spacetimedb::sats::AlgebraicType::product::<
[(Option<&str>, spacetimedb::sats::AlgebraicType); #len]
#krate::sats::AlgebraicType::product::<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: perhaps, in a future PR, we want to re-export these to make it slightly nicer.

#[serde(transparent)]
#[repr(transparent)]
pub struct Timestamp(pub u64);
#[derive(SpacetimeType, Copy, Clone, PartialEq, Eq, Debug, serde::Serialize)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this make the AlgebraicType be Product([("microseconds", U64)])?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. Unfortunately, transparent newtypes don't really work for codegen, so we have to use actual single-element products if we want to generate a named type. I discussed how we might implement them in C# with Ingvar, and the conclusion was that there isn't a good way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see; maybe this will be alleviated by #1287 which gives you a wrapper at the type level but not for values...?
A comment for now laying out the situation and the why would be great.

///
/// Rows are encoded as BSATN or JSON according to the table's schema
/// and the client's requested protocol.
pub inserts: Vec<EncodedValue>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't become a roadblock for the future optimization.

When we do optimize this, in a different PR, to reduce the rate of small allocations, we'll need a structure like this instead:

pub struct TableUpdate {
    ...
    pub inserts: EncodedValueVec,
}

enum EncodedValueVec {
    Binary(BinaryVec),
    Text(Vec<ByteString>),
}

// This version does not provide random access
// as individual rows are of variable length.
struct BinaryVec {
    num_row: usize,
    rows_data: Vec<u8>,
}

// This version does, and admits e.g., parallel decode,
// if that is something that clients are interested in.
struct BinaryVec {
    // Basically this is what `packed_str` does.
    indices: Vec<usize>,
    rows_data: Vec<u8>,
}

Re. "parallel decode", I would be interested to know your thoughts @gefjon or @aasoni re. whether this is something we/clients are interested in using. The cost of maintaining a single additional allocation for indices won't be great.

Copy link
Contributor

@cloutiertyler cloutiertyler Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree with either of the two proposed solutions above. What is currently in this PR is a bit of a performance and data regression from what we currently do in protobuf because it stores an enum tag for each row of a table. This could be significant in the case of small rows like LocationState. The value of the tag is going to be the same for every value of every TableUpdate per connection. Seems quite like a waste. I would put it even further up than TableUpdate even, although there are diminishing returns.

Side note, I don't see why num_row is required in the first version, @Centril.

I am okay to not block the review on this, but only if there is a corresponding ticket to fix this before we stabilize the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, I don't see why num_row is required in the first version, @Centril.

The thinking was to know upfront how many rows to decode, so you can e.g., pre-allocate for that on the client.
Alternatively, you can do a decode-until-empty loop.

I am okay to not block the review on this, but only if there is a corresponding ticket to fix this before we stabilize the API.

Made a ticket: #1450
I decided to go for the version that includes indices as I felt that parallel chunked decode can be valuable for initial subscriptions if not for incremental updates.

I would put it even further up than TableUpdate even, although there are diminishing returns.

The highest level we could do this at is at ws::DatabaseUpdate = Vec<ws::TableUpdate> which would require reverting some of the changes in this PR. This would save 2 tags per table as compared to making the protocol choice per list of rows. This scales with the number of tables projected out of the SQL query, which is likely to be small, so in practice this may not win us any notable perf or network savings. But, if we want to special case ws::DatabaseUpdate by making it more generic, I'm not opposed to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_rows is necessary given the way Joshua has implemented subscription row-count metrics.

Ideally, I would like to make codegen properly support generic types, and then make ServerMessage and ClientMessage generic over a row-encoding. Failing that, I think separating ServerMessageJson from ServerMessageBsatn would be reasonable. Hoisting the tag higher up is probably the least good option, but would be fine. I appreciate making this a follow-up; I think that making changes to the WS schema will be much easier once this PR merges.

/// Part of a [`DatabaseUpdate`] received by client from database for alterations to a single table.
#[derive(SpacetimeType, Debug, Clone)]
#[sats(crate = spacetimedb_lib)]
pub struct TableUpdate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made generic over the protocol type? If not, I think we should make two types of TableUpdate: TextTableUpdate and BinaryTableUpdate. If we're going to store this information at runtime, best to store it at this level (or higher) rather than per row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't (currently) use generic types meaningfully in code that goes through spacetime generate because computing a ModuleDef/TypeSpace treats them incorrectly: it emits only the first-encountered specialization, and all later references refer incorrectly to that specialization. So if we had TableUpdate<Protocol>, depending on how our code was laid out, we'd get either struct TableUpdate { rows: RowsJson } or struct TableUpdate { rows: RowsBsatn }, but not both, and all references would refer to the one chosen.

I think we're unlikely to add generics to SATS any time soon, but we could in the near future fix this to emit all used specializations under unique names, like struct TableUpdate_RowsJson and struct TableUpdate_RowsBsatn, which would eliminate the need for enum EncodedRow.

Or, as discussed above, we could just hoist the tag higher up in the message hierarchy.

@gefjon gefjon requested a review from Centril June 20, 2024 15:42
Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now! What is left re. tags and optimizations can be done in follow ups.

@Centril
Copy link
Contributor

Centril commented Jun 25, 2024

Benchmarking full-scan: Collecting 100 samples in estimated 9.4
full-scan               time:   [91.444 ms 92.044 ms 92.657 ms]
                        change: [-6.6002% -5.7439% -4.8937%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking full-join: Collecting 100 samples in estimated 5.6
full-join               time:   [280.53 µs 281.62 µs 283.71 µs]
                        change: [-9.7521% -8.7268% -7.8533%] (p = 0.00 < 0.05)
                        Performance has improved.

Compared against current master (637a654) on i7-7700K, 64GB RAM.
This does not appear to be a fluke as I ran the benchmarks several times.
On full-scan, it varied between -3.5% to -6% and -8% to -11% on full-join.
Incremental was not affected as the benchmarks do not exercise the encoding cost yet.

I think we can be confident in merging this PR and that we'll see perf improvements soon.

@joshua-spacetime
Copy link
Collaborator

@Centril do you know where the improvements were made? I don't believe those benchmarks exercised protobuf at all since it happened off the main thread.

@Centril
Copy link
Contributor

Centril commented Jun 25, 2024

@joshua-spacetime What was exercised in the benchmark was the serialization to ProtocolDatabaseUpdate, which does involve some generated protobuf structs like TableUpdate vs. the new version of that struct. It's hard to say where the improvements come from, but, if they are real, it could be e.g., getting rid of the insert/delete tags, but on the other hand, there's now a tag per row.

Copy link
Collaborator

@jdetter jdetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this as a comment for now but I think we should address this now if possible and if its not possible then address it after this PR merges.

@@ -28,12 +29,14 @@ const INDENT: &str = "\t";
pub fn cli() -> clap::Command {
clap::Command::new("generate")
.about("Generate client files for a spacetime module.")
.override_usage("spacetime generate --lang <LANG> --out-dir <DIR> [--project-path <DIR> | --wasm-file <PATH>]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're breaking the API here a bit - some scripts in the BitCraft project assume that you don't have to pass a path and that it will use the current directory as the default.

IMO I think we should maintain the old behavior of defaulting to the current directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need to update all of the quickstarts, tutorials, docs, etc. on the website after this merges otherwise people will run into the same issue that I ran into:

Generating C# files
error: the following required arguments were not provided:
  <--wasm-file <wasm_file>|--project-path <project_path>|--json-module [<json_module>]>

Usage: spacetime generate --lang <LANG> --out-dir <DIR> [--project-path <DIR> | --wasm-file <PATH>]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to push a fix to this branch, feel free. I have no strong feelings about this; the only required change is that we need the new --json-module as an alternative to --project-path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to push a fix to this branch, feel free. I have no strong feelings about this; the only required change is that we need the new --json-module as an alternative to --project-path.

ok understood, I will push the fix

@coolreader18
Copy link
Collaborator Author

Just confirming - this is progressing fine without me? It seems like there's nothing I need to touch up that others couldn't do, right?

@joshua-spacetime joshua-spacetime changed the title Protobufectomy Protobufectomy: server Jul 8, 2024
@joshua-spacetime joshua-spacetime assigned lcodes and unassigned gefjon Jul 11, 2024
@Centril Centril added this pull request to the merge queue Jul 12, 2024
Merged via the queue into master with commit 10b151b Jul 12, 2024
6 of 7 checks passed
@joshua-spacetime joshua-spacetime mentioned this pull request Jul 15, 2024
@bfops bfops mentioned this pull request Aug 12, 2024
@coolreader18 coolreader18 deleted the noa/protobufectomy branch September 23, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abi-break A PR that makes an ABI breaking change release-1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.