-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust: Add regular expression injection query #18946
base: main
Are you sure you want to change the base?
Conversation
QHelp previews: rust/ql/src/queries/security/CWE-730/RegexInjection.qhelpRegular expression injectionConstructing a regular expression with unsanitized user input can be dangerous. A malicious user may be able to modify the meaning of the expression causing it to match unexpected strings and to construct large regular expressions by using counted repetitions. RecommendationBefore embedding user input into a regular expression, escape the input string using a function such as regex::escape to escape meta-characters that have special meaning. If purposefully supporting user supplied regular expressions, then use RegexBuilder::size_limit to limit the pattern size such that it is no larger than necessary. ExampleThe following example construct a regular expressions from the user input use regex::Regex;
fn get_value<'h>(key: &str, property: &'h str) -> Option<&'h str> {
// BAD: User provided `key` is interpolated into the regular expression.
let pattern = format!(r"^property:{key}=(.*)$");
let re = Regex::new(&pattern).unwrap();
re.captures(property)?.get(1).map(|m| m.as_str())
} The regular expression is intended to match strings starting with If user input is used to construct a regular expression it should be escaped first. This ensures that the user cannot insert characters that have special meanings in regular expressions. use regex::{escape, Regex};
fn get_value<'h>(key: &str, property: &'h str) -> option<&'h str> {
// GOOD: User input is escaped before being used in the regular expression.
let escaped_key = escape(key);
let pattern = format!(r"^property:{escaped_key}=(.*)$");
let re = regex::new(&pattern).unwrap();
re.captures(property)?.get(1).map(|m| m.as_str())
} References
|
49b18d5
to
14722dc
Compare
14722dc
to
494f914
Compare
* A taint configuration for detecting regular expression injection vulnerabilities. | ||
*/ | ||
module RegexInjectionConfig implements DataFlow::ConfigSig { | ||
predicate isSource(DataFlow::Node source) { source instanceof ThreatModelSource } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not fully looked into this, but I would've though this should have been ActiveThreatModelSource
, but that made the test results disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should indeed be ActiveThreatModelSource
, but then your tests will have to use a remote rather than a local source. OR we could turn on local sources as active threat models for tests. I've thought about doing the latter in the past and I do have instructions for how to do it somewhere.
pack: codeql/rust-all | ||
extensible: summaryModel | ||
data: | ||
- ["repo:https://github.com/rust-lang/regex:regex", "crate::escape", "Argument[0].Reference", "ReturnValue", "taint", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model is accurate, but I mostly added it to make sure that the barrier actually worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🎉
We will need to check if there are any results in a DCA run, and I'm very interested to get an idea of how many sinks there are (regardless of results) as an indication of how much regex usage we are seeing.
<code>"property"</code> such as <code>"property:foo=bar"</code>. However, a | ||
malicious user might inject the regular expression <code>".*^|key"</code> and | ||
unexpectedly cause strings such as <code>"key=secret"</code> to match. | ||
</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great example! I've added the ready-for-doc-review
tag so this PR to attract a member of the docs team for further review of the .qhelp
.
*/ | ||
abstract class RegexInjectionBarrier extends DataFlow::Node { } | ||
|
||
/** A sink for `a` in `Regex::new(a)` when `a` is not a literal. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit to defining this sink in QL rather than models-as-data?
/** | ||
* Provides a taint-tracking configuration for detecting regular expression | ||
* injection vulnerabilities. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're generally going for a simpler two-files-per-query setup in Rust:
lib/.../FooExtensions.qll
src/.../Foo.ql
rather than three:
lib/.../FooExtensions.qll
lib/.../FooQuery.qll [this file]
src/.../Foo.ql
Before we had query inline expectations tests, the latter pattern was useful as the taint config in FooQuery.qll
could be used to define a (non-query) inline expectations test. These days there's less benefit to not just defining the taint configuration in the .ql
, which I like because there are fewer files to navigate between.
So I'd prefer we combined this file into the .ql
, though I don't feel particularly strongly if you'd rather we worry about things more consistent with other queries later on.
* A taint configuration for detecting regular expression injection vulnerabilities. | ||
*/ | ||
module RegexInjectionConfig implements DataFlow::ConfigSig { | ||
predicate isSource(DataFlow::Node source) { source instanceof ThreatModelSource } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should indeed be ActiveThreatModelSource
, but then your tests will have to use a remote rather than a local source. OR we could turn on local sources as active threat models for tests. I've thought about doing the latter in the past and I do have instructions for how to do it somewhere.
* @id rust/regex-injection | ||
* @tags security | ||
* external/cwe/cwe-730 | ||
* external/cwe/cwe-400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put this query under CWE-730 as that's where the Swift one was. But given that the Rust regex implementation is robust regarding DOS attacks, I think we should find another directory for it?
How about:
- CWE-624: Executable Regular Expression Error (which sounds right, but contains hints that it might be talking about something much more specific; sadly lacking examples)
- a generic CWE like CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component or CWE-20: Improper Input Validation.
- leave it how it is? As you noted in the
.qhelp
, a more limited DOS attack is still possible, and leaving it as CWE-730 is nicely consistent with other languages.
Note that if we change the CWE tags, we need to regenerate the @security-severity
value as well. I can talk you through how to do that.
Note:
CWE-730
as that's where the Swift one was. But given that the Rust regex implementation is robust regarding DOS attacks, I think we should find another directory for it?