Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Add metrics buffering and sampling #821

Merged
merged 14 commits into from
Oct 29, 2020
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
**Features**:

- Rename upstream retries histogram metric and add upstream requests duration metric. ([#816](https://github.com/getsentry/relay/pull/816))
- Add options for metrics buffering (`metrics.buffering`) and sampling (`metrics.sample_rate`). ([#821](https://github.com/getsentry/relay/pull/821))

**Internal**:

Expand Down
19 changes: 15 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ clean:
# Builds

build: setup-git
@cargo +stable build --all-features
cargo +stable build --all-features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unrelated, can we revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it useful to see what we're actually building, it's the same behavior we have for test and lint targets.

.PHONY: build

release: setup-git
Expand Down
3 changes: 2 additions & 1 deletion relay-common/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ publish = false

[dependencies]
backoff = "0.1.6"
cadence = "0.17.1"
cadence = "0.22.0"
chrono = "0.4.11"
failure = "0.1.8"
globset = "0.4.5"
Expand All @@ -20,6 +20,7 @@ lazycell = "1.2.1"
log = { version = "0.4.8", features = ["serde"] }
lru = "0.4.0"
parking_lot = "0.10.0"
rand = "0.7.3"
regex = "1.3.9"
sentry-types = "0.20.0"
schemars = { version = "0.8.0-alpha-4", features = ["uuid", "chrono"], optional = true }
Expand Down
78 changes: 73 additions & 5 deletions relay-common/src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
//! # use std::collections::BTreeMap;
//! use relay_common::metrics;
//!
//! metrics::configure_statsd("myprefix", "localhost:8125", BTreeMap::new());
//! metrics::configure_statsd("myprefix", "localhost:8125", BTreeMap::new(), true, 1.0);
//! ```
//!
//! ## Macro Usage
Expand Down Expand Up @@ -60,13 +60,21 @@
//! [Metric Types]: https://github.com/statsd/statsd/blob/master/docs/metric_types.md

use std::collections::BTreeMap;
use std::net::ToSocketAddrs;
use std::net::{ToSocketAddrs, UdpSocket};
use std::ops::{Deref, DerefMut};
use std::sync::Arc;

use cadence::{Metric, MetricBuilder, StatsdClient};
use cadence::{
BufferedUdpMetricSink, Metric, MetricBuilder, QueuingMetricSink, StatsdClient, UdpMetricSink,
};
use lazy_static::lazy_static;
use parking_lot::RwLock;
use rand::distributions::{Distribution, Uniform};

use crate::LogError;

/// Maximum number of metric events that can be queued before we start dropping them
const METRICS_MAX_QUEUE_SIZE: usize = 100_000;

/// Client configuration object to store globally.
#[derive(Debug)]
Expand All @@ -75,6 +83,8 @@ pub struct MetricsClient {
pub statsd_client: StatsdClient,
/// Default tags to apply to every metric
pub default_tags: BTreeMap<String, String>,
/// Global sample rate
pub sample_rate: f32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: After merging, this needs to be added to docs as well.

}

impl Deref for MetricsClient {
Expand All @@ -98,11 +108,37 @@ impl MetricsClient {
where
T: Metric + From<String>,
{
if !self._should_send() {
return;
}

for (k, v) in &self.default_tags {
metric = metric.with_tag(k, v);
}

metric.send();
if let Err(error) = metric.try_send() {
log::error!(
"Error sending a metric: {}, maximum capacity: {}",
LogError(&error),
METRICS_MAX_QUEUE_SIZE
);
};
}

fn _should_send(&self) -> bool {
if self.sample_rate <= 0.0 {
false
} else if self.sample_rate >= 1.0 {
true
} else {
// Using thread local RNG and uniform distribution here because Rng::gen_range is
// "optimized for the case that only a single sample is made from the given range".
// See https://docs.rs/rand/0.7.3/rand/distributions/uniform/struct.Uniform.html for more
// details.
let mut rng = rand::thread_rng();
RNG_UNIFORM_DISTRIBUTION
.with(|uniform_dist| uniform_dist.sample(&mut rng) <= self.sample_rate)
}
}
}

Expand All @@ -112,6 +148,7 @@ lazy_static! {

thread_local! {
static CURRENT_CLIENT: Option<Arc<MetricsClient>> = METRICS_CLIENT.read().clone();
static RNG_UNIFORM_DISTRIBUTION: Uniform<f32> = Uniform::new(0.0, 1.0);
}

/// Internal prelude for the macro
Expand Down Expand Up @@ -140,15 +177,46 @@ pub fn configure_statsd<A: ToSocketAddrs>(
prefix: &str,
host: A,
default_tags: BTreeMap<String, String>,
buffering: bool,
sample_rate: f32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is now sufficiently complex to become a builder. I'm happy to make the changes in a follow-up, since I'd like to move all metrics and logging related concerns from relay-common into a separate crate anyway.

) {
let addrs: Vec<_> = host.to_socket_addrs().unwrap().collect();
if !addrs.is_empty() {
log::info!("reporting metrics to statsd at {}", addrs[0]);
}
let statsd_client = StatsdClient::from_udp_host(prefix, &addrs[..]).unwrap();

// Normalize sample_rate
let sample_rate = sample_rate.max(0.).min(1.);
log::debug!(
"metrics sample rate is set to {}{}",
sample_rate,
if sample_rate == 0.0 {
", no metrics will be reported"
} else {
""
}
);

let socket = UdpSocket::bind("0.0.0.0:0").unwrap();
socket.set_nonblocking(true).unwrap();

let statsd_client = if buffering {
let udp_sink = BufferedUdpMetricSink::from(host, socket).unwrap();
let queuing_sink = QueuingMetricSink::with_capacity(udp_sink, METRICS_MAX_QUEUE_SIZE);
StatsdClient::from_sink(prefix, queuing_sink)
} else {
let simple_sink = UdpMetricSink::from(host, socket).unwrap();
StatsdClient::from_sink(prefix, simple_sink)
};
log::debug!(
"metrics buffering is {}",
if buffering { "enabled" } else { "disabled" }
);

set_client(MetricsClient {
statsd_client,
default_tags,
sample_rate,
});
}

Expand Down
18 changes: 18 additions & 0 deletions relay-config/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,12 @@ struct Metrics {
default_tags: BTreeMap<String, String>,
/// A tag name to report the hostname to, for each metric. Defaults to not sending such a tag.
hostname_tag: Option<String>,
/// If set to true, emitted metrics will be buffered to optimize performance.
/// Defaults to true.
buffering: bool,
/// Global sample rate for all emitted metrics. Should be between 0.0 and 1.0.
/// For example, the value of 0.3 means that only 30% of the emitted metrics will be sent.
sample_rate: f32,
}

impl Default for Metrics {
Expand All @@ -433,6 +439,8 @@ impl Default for Metrics {
prefix: "sentry.relay".into(),
default_tags: BTreeMap::new(),
hostname_tag: None,
buffering: true,
sample_rate: 1.0,
}
}
}
Expand Down Expand Up @@ -1247,6 +1255,16 @@ impl Config {
self.values.metrics.hostname_tag.as_deref()
}

/// Returns true if metrics buffering is enabled, false otherwise.
pub fn metrics_buffering(&self) -> bool {
self.values.metrics.buffering
}

/// Returns the global sample rate for all metrics.
pub fn metrics_sample_rate(&self) -> f32 {
self.values.metrics.sample_rate
}

/// Returns the default timeout for all upstream HTTP requests.
pub fn http_timeout(&self) -> Duration {
Duration::from_secs(self.values.http.timeout.into())
Expand Down
8 changes: 7 additions & 1 deletion relay/src/setup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,13 @@ pub fn init_metrics(config: &Config) -> Result<(), Error> {
default_tags.insert(hostname_tag.to_owned(), hostname);
}
}
metrics::configure_statsd(config.metrics_prefix(), &addrs[..], default_tags);
metrics::configure_statsd(
config.metrics_prefix(),
&addrs[..],
default_tags,
config.metrics_buffering(),
config.metrics_sample_rate(),
);

Ok(())
}