-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a lockfile when opening config files #1196
Conversation
In the past, we've had issues where multiple concurrent CLI processes would race to read and write the CLI config file, leading to data loss. We considered using `flock`/`LockFileEx` and blocking until the file became available, but unfortunately it's not possible to atomically create and lock a nonexistent file, which we need to do in the case where the configuration doesn't yet exist. Instead, we opt for a classic lockfile-based scheme: Before opening a config file `foo.conf`, attempt to exclusively create `foo.lock`, and panic if the exclusive creation fails. Once it becomes clear that we will not write the config any more, i.e. in `Config::drop`, delete the lockfile, allowing another process to operate. This means that attempting to run multiple concurrent Spacetime CLI processes with the same config file is now a hard error.
This commit fixes two CI failures: - `spacetime start`, and a few other CLI subcommands, do not access their `Config` at all, but the CLI constructs it unconditionally in `main`, which made it an error to run any CLI command while `spacetime start` was running. This is fixed by having subcommands which don't need a `Config` drop it before doing anything. - Contrary to my assumption, the test configuration created by `Config::new_with_localhost` does get `drop`ped, because the test harness `clone`s is and passes an owned version to the CLI. This was causing it to attempt to delete the empty path, which failed. This is fixed by having the home configuration be `Option`al, and setting it to `None` in tests.
Ugh, if Clap parsing fails, it doesn't panic; instead it directly calls Edit: Fixed in latest commit; adding more testing steps. |
Perform Clap argument parsing as the very first thing in a CLI process, before locking the config, because Clap calls `exit` directly on error rather than panicing (presumably to have more control over error output), which prevents destructors from running, leaving stale lockfiles.
Also deduplicate logic for finding config file paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, otherwise LGTM 👍
if let Some(config_path) = Self::find_config_path(&cur_dir) { | ||
Self::load_from_file(&config_path) | ||
.inspect_err(|e| eprintln!("config file {config_path:?} is invalid: {e:#}")) | ||
.unwrap_or_default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was like this before, but it might be worth raising the error here. I would prefer a CLI tool crashes rather than continuing with unknown defaults when there's a config error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but I'd like @jdetter to weigh in before I change any semantics more than is required for the ticket.
.with_context(|| format!("Unable to remove lockfile {:?}", self.path)) | ||
.unwrap(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FWIW this is veeeeery close to a Config
-independent struct.
do you have any opinions on factoring it out a la https://github.com/clockworklabs/SpacetimeDB/pull/1202/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong feelings one way or the other. If you prefer that, I'm into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, left some nits and one I-think-bug.
What do you think about splitting out the internal-only refactors into a base PR like https://github.com/clockworklabs/SpacetimeDB/pull/1203/files ?
(I suspect it would make this diff hard-to-argue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested + working. Originally I hit a snag on windows but I am now not able to reproduce it at all and it might be a somewhat rare issue. Even if the issue occurs it didn't currupt my config file which is what I care about.
Thanks Phoebe 👍
Description of Changes
In the past, we've had issues where multiple concurrent CLI processes would race to read and write the CLI config file,
leading to data loss.
We considered using
flock
/LockFileEx
and blocking until the file became available, but unfortunately it's not possible to atomically create and lock a nonexistent file, which we need to do in the case where the configuration doesn't yet exist.Instead, we opt for a classic lockfile-based scheme: Before opening a config file
foo.conf
, attempt to exclusively createfoo.lock
, and panic if the exclusive creation fails. Once it becomes clear that we will not write the config any more, i.e. inConfig::drop
, delete the lockfile, allowing another process to operate.This means that attempting to run multiple concurrent Spacetime CLI processes with the same config file is now a hard error.
API and ABI breaking changes
Arguably this is a CLI interface break, in the sense that running multiple concurrent CLI processes with the same config file is now a hard error, where before it sometimes worked.
Expected complexity level and risk
3 - CLI stuff is finicky and requires manual testing.
Testing
~/.spacetime/config.lock
, then attempt the above, and observe the error.~/.spacetime/config.lock
, try again, and see that it works again.spacetime lsit
), then run again with a valid parse (e.g.spacetime identity list
), and verify that there is no stale lockfile preventing the corrected command.~/.spacetime
directory, then run a CLI command and verify that the lockfile can be created despite the directory not existing.