-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cargo must not package arbitrary files #2063
Comments
Doesn't this solve your problem? http://doc.crates.io/manifest.html#the-exclude-and-include-fields-optional |
No. It lets me work around problematic behaviour, now that I know about it. The problem is the default. |
I too would've been surprised by cargo picking up any junk in the working directory. I'd appreciate it aborting with an error unless a command line override is given. In the absence of |
I've done a check on all 3.248 crates and 17.085 tarballs on crates.io as of 15/10/24, checking
The good news is that as far as I can see there are no serious problems as of
The full table with all crate-lint-warnings can be found below. My two cents of all this:
If someone is interested in the linter's code I've written over the weekend, I All the gory details:
|
Two crates currently have their owner's API-token exposed by a strace-dump having being pulled into the tarball. The owner has recreated his token by now. Any recommendations on how to proceed on this? |
Another comment by @felixc on #2245: I'm not sure whether to file this as a bug, a request for clearer documentation, or a suggestion for concrete changes in behaviour. Regardless, I wanted to bring up how What I'd like to argue is that implicitly grabbing all files in the local directory and including them in the published crate is a very dangerous feature. At least my mental model of "the contents of my crate" consists of just source files, docs, tests, and ancillary files like README and LICENSE; not of everything that happens to be in the same directory. Like the Python folks say, "explicit is better than implicit". The safe default behaviour would be to require an explicit listing of what files to include (yes, I know the option exists, but it's not the default). If you'd like to encourage people to remember to keep it up to date, you could print a warning when you see e.g. a LICENSE or CONTRIBUTING file that isn't explicitly listed, or even any new file found for the first time. Having everything grabbed would work better as an opt-in for those who know about the feature and choose to accept the risk, but defaults should be chosen so as to fail in a safe manner. If, on the other hand, this behaviour is really too desirable to give up, it should be very very explicitly called out, and loudly signalled (i.e. printed to Currently the only mention of it that I can find is in the crates.io doc here, and even that isn't particularly explicit or clear. Firstly, I never though of my TODO list for the crate, my personal notes on my release process, etc. as "part of the crate", and thus publishable. Secondly, though it says "Cargo will automatically ignore files ignored by your version control system when packaging", I assumed that meant "files not tracked by your version control system", rather than specifically "files listed in But, most of all, it's a big scary footgun feature that's just mentioned in passing in one skimmable section (I admit I went straight to So, in sum, please reconsider the default behaviour of this feature, and please aim for safety over convenience, even if convenience is offered as an option for those who choose it. |
A suggestion I would make would be to change the semantics of Another suggestion would be to add some kind of further version control integration so that it can take advantage of something like Of course, this wouldn't then be able to include any files generated as part of a build process that aren't in source control, and would add more coupling to particular version control systems. It could be made to work more generally and without the problem of not being able to use built steps by just having something like Note that both of these suggestions could be useful independently. Using an export command can help when using a VCS to ensure that you only include an actual committed version rather than something dirty; but it doesn't help you exclude files that are in version control but you don't want to release (there's no reason your Always requiring a whitelist, and having the blacklist apply on top of that, would make it much safer to use, but it wouldn't fix the problem of people accidentally releasing some uncommitted state and then not being able to reproduce that state later from version control. Yes, you'd have the opposite problem of people sometimes uploading crates without everything included, but that's generally failing safer than people uploading too much, as if not everything is included you'll generally notice and can then fix it, while you can never un-publish secrets. |
From time to time I pull new and updated crates from the index and scan them for possibly sensitive files and contents. I posted the stats from 2015/10 above, the index now has ~3.800 crates and ~20.600 tarballs. So far three crates had their private API token exposed through an strace-dump that was pulled into the tarball. The most "severe" problem with crate-hygiene are local build-, trash- and vcs-files. I would suggest to leave the
To produce the stats in the lengthy comment above, I hacked a linter in python (https://github.com/lukaslueg/lint_crates/). It checks for dozens of unhygienic files, possible tarball/gzip-bombs and such. One might simply port that to Rust. |
In the meantime, @alexcrichton leaked his API-key in toml-0.1.26 due to a strace-dump getting pulled into the crate; he revoked his API-key. |
I think the current behavior is unnecessarily error-prone. The packaged files should be opt-in rather than opt-out. |
Oh. API key reset. Thanks for the heads-up. I had no idea that footgun even existed and was blindly following instructions on crates.io w/r/t publishing. I had assumed Cargo would only package things tracked by git, and didn't realize it would take all contents of the directory, including .cargo (which was there rather than in ~ because I had set CARGO_HOME to the top-level directory of the repository). Now that I read the documentation (http://doc.crates.io/manifest.html), it seems really weird that Cargo would use the exclude directives given to git et cetera (e.g. .gitignore) but not the include information (which files are tracked by the repository). I would expect documentation, either in the "The exclude and include Fields (optional)" section or in the "The Project Layout" section, to state that Cargo considers all files in the directory to be part of the package and will publish them, unless exceptions are configured. It also seems strange that the default set of files (all of them) may be subtracted from using |
In the meantime the crates token, trust-dns and yabs leak their owners' coveralls.io-token |
In the meantime @ckuwata lost his bitbucket username and password due to |
@lukaslueg would be interested in helping fix this issue? That'd mean you no longer have to keep checking crates.io and posting here! I think the best proposal to fix this is #2714 which would block publishes by default whenever there are untracked files that would otherwise get packaged. |
At least in my case refusing to publish on dirty directory would not have helped. The file containing the credentials was explicitely ignored (in my global .gitignore), therefore my directory was clean. |
@badboy in theory Cargo never packages anything mentioned in .gitignore, so if that's what happened then it's definitely a bug! Could you try running |
Hm, in that case it might be that it was dirty indeed, so the fix would have helped. I'm gonna check that tomorrow and will report if it happens. |
Please do not resolve this by preventing publishing while there are untracked files present. That approach makes some huge assumptions about people's workflows and tooling, not to mention that there are perfectly legitimate use cases for having temporarily untracked files present even if one in general tries to keep the repo clean. I stand by my earlier comments that I believe this magic behaviour should be removed or made explicitly opt-in, and that the default behaviour should be a safe one. |
@alexcrichton Ok, tested it again. Error on my side. Files in local or global |
@badboy thanks for checking! |
There should be some indication on what an acceptable solution (read: pull) would look like. My own suggestion is to check some of the basic hygiene as mentioned above during In the meantime @jhasse and @asaaki lost some github-tokens due to build-logs and unencrypted travis files getting sucked into their crates; both reset those tokens. |
Cargo could ask the user whether a file should be included in the package whenever it did not see the file before. |
This commit alters Cargo's behavior to prevent publishing a crate by default if any files in that crate are determined to be dirty, that is either modified or not part of the working tree. This can prevent common mistakes like many listed in rust-lang#2063, and subsequently... Closes rust-lang#2063
I've posted an implementation of how I'd like to solve this at #2781 |
Naturally my input doesn't carry the same weight as that of a core developer, but I'd like to plead again to not paper over the problem by preventing packaging/publishing based on the state of the VCS. There are a million different workflows out there. Some are used by choice, some by necessity, some just because that's how people got things working when they first stumbled through it. Cargo seems to be making increasingly many assumptions about how projects are structured (e.g. the increasing integrations with Git-and-only-Git). This is being done in the name of convenience, but opaque "magic" behaviour is only convenient when it is perfect and infallible. For anyone who deviates even a little from the "expected" workflow, suddenly all that magic is a dark curse — and one that's totally unexpected, unintuitive, and mysterious to track down. The root of this problem was exactly this kind of magic, namely that all files were vacuumed up and distributed, because it was assumed everyone would a) be using a Git repo, b) have a full and comprehensive The option @vks suggested seems like a reasonable middle ground? The user could be prompted to either include/exclude it this one time; add it to In the longer term, I think my ideal solution would look like:
|
Prevent packaging a crate if any files are dirty This commit alters Cargo's behavior to prevent publishing a crate by default if any files in that crate are determined to be dirty, that is either modified or not part of the working tree. This can prevent common mistakes like many listed in #2063 and enables features like #841. Closes #1597 Closes #2063
Is this meant to be fully "closed" by the referenced commit; i.e. this is no longer being considered? |
@felixc sorry I meant to respond earlier before that PR landed, but there's some previous discussion which may be useful as well. The key idea is that Cargo is already a workflow tool, and the benefit is that massive swaths of the ecosystem "just work" as expected. I do agree that this means there are some surprising corner cases once odd configuration comes into effect, but the rationale is that the benefit far outweighs the downside here. Of course there's always an escape hatch to be applied as well. For example there's The idea behind reading conventional configuration is that you just don't have to reduplicate everything. Almost everyone's And yes this was intended to be closed as this basically isn't going to happen any more. If a file is committed into a repo then I think it's safe to say it's not arbitrary, and if a file is being included and isn't committed, a publish is blocked by default. |
This seems only true if you are using git. Would you accept pull requests that implement support for other VCS? |
Thanks for the info and the extra context. How does it work for users of other/unrecognised VCSs? |
Certainly!
This is basically what happens for |
Running `cargo publish` from a non-colocated repo (such as my usual repo) is currently quite scary because it uploads all non-hidden files, even if they're ignored by `.gitignore` (rust-lang/cargo#2063). I noticed this a while ago and have always run the command from a fresh clone since then. To avoid the need for that, let's use the workaround mentioned on the bug, which is to explicitly list patterns we want to publish.
Running `cargo publish` from a non-colocated repo (such as my usual repo) is currently quite scary because it uploads all non-hidden files, even if they're ignored by `.gitignore` (rust-lang/cargo#2063). I noticed this a while ago and have always run the command from a fresh clone since then. To avoid the need for that, let's use the workaround mentioned on the bug, which is to explicitly list patterns we want to publish.
Running `cargo publish` from a non-colocated repo (such as my usual repo) is currently quite scary because it uploads all non-hidden files, even if they're ignored by `.gitignore` (rust-lang/cargo#2063). I noticed this a while ago and have always run the command from a fresh clone since then. To avoid the need for that, let's use the workaround mentioned on the bug, which is to explicitly list patterns we want to publish.
Today I learned 'cargo package' includes everything in the working directory not explicitly excluded through
.gitignore
or thepackage.excludes
directive inCargo.toml
. This wasn't what I expected at all, and differs from what, for example, GNU autotools implements as best practice. I expected cargo to only package files which were part of of the build description and its dependent source.I only discovered this because I copied a few test videos into my working tree and the package was suddenly too large to upload.
In #1597 @bluss suggests a warning for untracked files. That would be an improvement, at perhaps educate people about the default, but I think that doesn't go far enough. Uploading random files from the working directory is a terrible developer experience, and
makes is very easy to leak confidential or unlicensed data.
The text was updated successfully, but these errors were encountered: