-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobufectomy: server #1077
Protobufectomy: server #1077
Conversation
efcd187
to
5d4080a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I've only reviewed changes to client-api-messages
, the Rust SDK and its codegen. Those look excellent. I've left a few low-priority questions and doc-related suggestions.
crates/client-api-messages/src/ws.rs
Outdated
/// It also will not check for duplicate IDs. They are just a way to match responses to messages. | ||
#[derive(Deserialize, Serialize, serde::Deserialize)] | ||
pub struct OneOffQuery { | ||
pub message_id: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a Vec<u8>
rather than a u32
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's like that on older proto schema, we have a ticket to fix this - #927 waiting for breaking release. May be, this is a correct time to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that'd be great, I'd been wondering about that as well.
crates/client-api-messages/src/ws.rs
Outdated
/// Query should be a "SELECT * FROM Table WHERE ...". Other types of queries will be rejected. | ||
/// Multiple such semicolon-delimited queries are allowed. | ||
/// | ||
/// One-off queries are identified by a client-generated messageID. | ||
/// To avoid data leaks, the server will NOT cache responses to messages based on UUID! | ||
/// It also will not check for duplicate IDs. They are just a way to match responses to messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comments should be moved to fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya - I tried to do that for the most part; most/all of the comments are copied from the protobuf, which tended to put field docs on the outer type.
} | ||
} | ||
} | ||
pub use spacetimedb_lib::Identity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This closes an outstanding ticket I had. You're my hero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol - it was breaking something with a type mismatch and I didn't wanna do a from_bytes(to_bytes())
thing to convert it so this seemed simple enough lol.
Outstanding TODOs:
(Plus make the tests and lints pass, obviously.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems straightforward enough, I don't have any particular commentary. Thanks for getting it in!
I wonder if I can auto-generate the C# side of this interface...
crates/bindings-macro/src/lib.rs
Outdated
@@ -45,6 +45,9 @@ mod sym { | |||
/// Matches `sats`. | |||
pub const SATS: Symbol = Symbol("sats"); | |||
|
|||
/// Matches `transparent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used on stuff in this PR? Just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional question: how do we negotiate protocol version now? I believe protobuf transparently supports changing schemas for the most part by making everything optional, but I don't think bsatn has anything like that. We should at minimum send a version number I think, either once per session or once per message...
crates/client-api-messages/src/ws.rs
Outdated
use crate::timestamp::Timestamp; | ||
|
||
#[derive(Deserialize, Serialize, serde::Deserialize)] | ||
pub enum ClientMessage<Args = Row> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add additional messages here for protocol version negotiation?
crates/client-api-messages/src/ws.rs
Outdated
pub inserts: Vec<Row>, | ||
} | ||
|
||
#[derive(Debug, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Debug, Clone)] | |
#[derive(Debug, Clone, From)] |
crates/client-api-messages/src/ws.rs
Outdated
impl From<Bytes> for Row { | ||
fn from(b: Bytes) -> Self { | ||
Row(b) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl From<Bytes> for Row { | |
fn from(b: Bytes) -> Self { | |
Row(b) | |
} | |
} |
crates/bindings-macro/src/module.rs
Outdated
@@ -21,17 +21,28 @@ pub(crate) struct SatsType<'a> { | |||
pub(crate) enum SatsTypeData<'a> { | |||
Product(Vec<SatsField<'a>>), | |||
Sum(Vec<SatsVariant<'a>>), | |||
Transparent(SatsField<'a>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some docs here?
crates/core/src/host/module_host.rs
Outdated
@@ -642,6 +618,11 @@ impl ModuleHost { | |||
&self.info.subscriptions | |||
} | |||
|
|||
/// internal method to access the DatabaseInstanceContext for the module. don't make this public. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be reflected in the method name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to request that the method becomes public API.
The lifetimes of dbic and module should become coupled anyway.
e55d78c
to
4d5cdbc
Compare
5e9d2e2
to
f77bfac
Compare
f111123
to
14b19e0
Compare
I think we should use the WebSocket protocol for this. In the past we've been quite lax about updating the WebSocket protocol name, continuing to use |
14b19e0
to
ca223ba
Compare
mm, that's a good point Phoebe - I was planning on putting a |
ca223ba
to
7625e0c
Compare
Rename `Row` to `BsatnBytes`, as the latter is more descriptive. Remove `ReducerId`; this change can be made in a follow-up if it's useful. Sprinkle doc comments around the new websocket message definitions.
More of these which I changed for testing to get backtraces, but now need to be restored for graceful handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly great! The only major issue here is that we are not using the fast path for BSATN serialization anymore. Otherwise, I mostly have minor comments.
)) | ||
}); | ||
let len = fields.len(); | ||
quote!( | ||
spacetimedb::sats::AlgebraicType::product::< | ||
[(Option<&str>, spacetimedb::sats::AlgebraicType); #len] | ||
#krate::sats::AlgebraicType::product::< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: perhaps, in a future PR, we want to re-export these to make it slightly nicer.
#[serde(transparent)] | ||
#[repr(transparent)] | ||
pub struct Timestamp(pub u64); | ||
#[derive(SpacetimeType, Copy, Clone, PartialEq, Eq, Debug, serde::Serialize)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this make the AlgebraicType
be Product([("microseconds", U64)])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will. Unfortunately, transparent
newtypes don't really work for codegen, so we have to use actual single-element products if we want to generate a named type. I discussed how we might implement them in C# with Ingvar, and the conclusion was that there isn't a good way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see; maybe this will be alleviated by #1287 which gives you a wrapper at the type level but not for values...?
A comment for now laying out the situation and the why would be great.
/// | ||
/// Rows are encoded as BSATN or JSON according to the table's schema | ||
/// and the client's requested protocol. | ||
pub inserts: Vec<EncodedValue>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't become a roadblock for the future optimization.
When we do optimize this, in a different PR, to reduce the rate of small allocations, we'll need a structure like this instead:
pub struct TableUpdate {
...
pub inserts: EncodedValueVec,
}
enum EncodedValueVec {
Binary(BinaryVec),
Text(Vec<ByteString>),
}
// This version does not provide random access
// as individual rows are of variable length.
struct BinaryVec {
num_row: usize,
rows_data: Vec<u8>,
}
// This version does, and admits e.g., parallel decode,
// if that is something that clients are interested in.
struct BinaryVec {
// Basically this is what `packed_str` does.
indices: Vec<usize>,
rows_data: Vec<u8>,
}
Re. "parallel decode", I would be interested to know your thoughts @gefjon or @aasoni re. whether this is something we/clients are interested in using. The cost of maintaining a single additional allocation for indices
won't be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree with either of the two proposed solutions above. What is currently in this PR is a bit of a performance and data regression from what we currently do in protobuf because it stores an enum tag for each row of a table. This could be significant in the case of small rows like LocationState
. The value of the tag is going to be the same for every value of every TableUpdate
per connection. Seems quite like a waste. I would put it even further up than TableUpdate
even, although there are diminishing returns.
Side note, I don't see why num_row
is required in the first version, @Centril.
I am okay to not block the review on this, but only if there is a corresponding ticket to fix this before we stabilize the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, I don't see why
num_row
is required in the first version, @Centril.
The thinking was to know upfront how many rows to decode, so you can e.g., pre-allocate for that on the client.
Alternatively, you can do a decode-until-empty loop.
I am okay to not block the review on this, but only if there is a corresponding ticket to fix this before we stabilize the API.
Made a ticket: #1450
I decided to go for the version that includes indices
as I felt that parallel chunked decode can be valuable for initial subscriptions if not for incremental updates.
I would put it even further up than
TableUpdate
even, although there are diminishing returns.
The highest level we could do this at is at ws::DatabaseUpdate = Vec<ws::TableUpdate>
which would require reverting some of the changes in this PR. This would save 2 tags per table as compared to making the protocol choice per list of rows. This scales with the number of tables projected out of the SQL query, which is likely to be small, so in practice this may not win us any notable perf or network savings. But, if we want to special case ws::DatabaseUpdate
by making it more generic, I'm not opposed to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_rows
is necessary given the way Joshua has implemented subscription row-count metrics.
Ideally, I would like to make codegen properly support generic types, and then make ServerMessage
and ClientMessage
generic over a row-encoding. Failing that, I think separating ServerMessageJson
from ServerMessageBsatn
would be reasonable. Hoisting the tag higher up is probably the least good option, but would be fine. I appreciate making this a follow-up; I think that making changes to the WS schema will be much easier once this PR merges.
/// Part of a [`DatabaseUpdate`] received by client from database for alterations to a single table. | ||
#[derive(SpacetimeType, Debug, Clone)] | ||
#[sats(crate = spacetimedb_lib)] | ||
pub struct TableUpdate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be made generic over the protocol type? If not, I think we should make two types of TableUpdate
: TextTableUpdate
and BinaryTableUpdate
. If we're going to store this information at runtime, best to store it at this level (or higher) rather than per row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't (currently) use generic types meaningfully in code that goes through spacetime generate
because computing a ModuleDef
/TypeSpace
treats them incorrectly: it emits only the first-encountered specialization, and all later references refer incorrectly to that specialization. So if we had TableUpdate<Protocol>
, depending on how our code was laid out, we'd get either struct TableUpdate { rows: RowsJson }
or struct TableUpdate { rows: RowsBsatn }
, but not both, and all references would refer to the one chosen.
I think we're unlikely to add generics to SATS any time soon, but we could in the near future fix this to emit all used specializations under unique names, like struct TableUpdate_RowsJson
and struct TableUpdate_RowsBsatn
, which would eliminate the need for enum EncodedRow
.
Or, as discussed above, we could just hoist the tag higher up in the message hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now! What is left re. tags and optimizations can be done in follow ups.
Compared against current master (637a654) on i7-7700K, 64GB RAM. I think we can be confident in merging this PR and that we'll see perf improvements soon. |
@Centril do you know where the improvements were made? I don't believe those benchmarks exercised protobuf at all since it happened off the main thread. |
@joshua-spacetime What was exercised in the benchmark was the serialization to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this as a comment for now but I think we should address this now if possible and if its not possible then address it after this PR merges.
@@ -28,12 +29,14 @@ const INDENT: &str = "\t"; | |||
pub fn cli() -> clap::Command { | |||
clap::Command::new("generate") | |||
.about("Generate client files for a spacetime module.") | |||
.override_usage("spacetime generate --lang <LANG> --out-dir <DIR> [--project-path <DIR> | --wasm-file <PATH>]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're breaking the API here a bit - some scripts in the BitCraft project assume that you don't have to pass a path and that it will use the current directory as the default.
IMO I think we should maintain the old behavior of defaulting to the current directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also need to update all of the quickstarts, tutorials, docs, etc. on the website after this merges otherwise people will run into the same issue that I ran into:
Generating C# files
error: the following required arguments were not provided:
<--wasm-file <wasm_file>|--project-path <project_path>|--json-module [<json_module>]>
Usage: spacetime generate --lang <LANG> --out-dir <DIR> [--project-path <DIR> | --wasm-file <PATH>]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to push a fix to this branch, feel free. I have no strong feelings about this; the only required change is that we need the new --json-module
as an alternative to --project-path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to push a fix to this branch, feel free. I have no strong feelings about this; the only required change is that we need the new
--json-module
as an alternative to--project-path
.
ok understood, I will push the fix
Just confirming - this is progressing fine without me? It seems like there's nothing I need to touch up that others couldn't do, right? |
Description of Changes
Replace protobuf with BSATN in the WebSocket API, and use the same message schema for both BSATN and JSON WebSocket messages.
API and ABI breaking changes
API breaking for all WebSocket clients, incl. all SDKs and the website.
Expected complexity level and risk
5 - lots of churn, breaks a major API, will require updating all SDKs and the website.
Testing
This is dependent on the C# sdk getting updated. I feel confident that CI + a botstest with the updated sdk would get everything.