-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: replaced otel auto instrumentation with manual #3176
misc: replaced otel auto instrumentation with manual #3176
Conversation
WalkthroughThe pull request updates the telemetry instrumentation configuration in the backend. It modifies the dependencies in the Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 ESLint
backend/src/lib/telemetry/instrumentation.tsOops! Something went wrong! :( ESLint: 8.57.1 ESLint couldn't find the plugin "@typescript-eslint/eslint-plugin". (The package "@typescript-eslint/eslint-plugin" was not found when loaded as a Node module from the directory "/backend".) It's likely that the plugin isn't installed correctly. Try reinstalling by running the following:
The plugin "@typescript-eslint/eslint-plugin" was referenced from the config file in "backend/.eslintrc.js". If you still can't figure out the problem, please stop by https://eslint.org/chat/help to chat with the team. 📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
backend/src/lib/telemetry/instrumentation.ts (3)
71-73
: Add error handling for host metrics initializationThe host metrics initialization lacks error handling. If the
hostMetrics.start()
call fails, it could potentially crash the application.Consider adding error handling around the initialization:
- const hostMetrics = new HostMetrics({ meterProvider }); - hostMetrics.start(); + try { + const hostMetrics = new HostMetrics({ meterProvider }); + hostMetrics.start(); + console.log("Host metrics collection started successfully"); + } catch (error) { + console.error("Failed to initialize host metrics:", error); + // Continue application execution even if metrics collection fails + }
76-78
: Consider configuring HTTP instrumentation optionsThe HTTP instrumentation is initialized with default options, which may capture more data than necessary. To further optimize Prometheus ingestion, consider configuring the HTTP instrumentation with specific options.
Consider adding configuration options to further control what gets captured:
- registerInstrumentations({ - instrumentations: [new HttpInstrumentation()] - }); + registerInstrumentations({ + instrumentations: [ + new HttpInstrumentation({ + // Ignore health check endpoints to reduce noise + ignoreIncomingPaths: ['/health', '/metrics'], + // Only capture specific response attributes to reduce data volume + responseHook: (span, response) => { + if (response.statusCode >= 400) { + span.setAttribute('http.status_code', response.statusCode); + } + } + }) + ] + });
1-107
: Consider adding an explicit shutdown mechanism for telemetryThe current implementation initializes telemetry but doesn't provide a clean way to shut it down when the application terminates, which could lead to missing final metrics or errors during shutdown.
Consider adding a shutdown function that can be called during application termination:
const initTelemetryInstrumentation = ({ // ... existing parameters }) => { // ... existing code opentelemetry.metrics.setGlobalMeterProvider(meterProvider); registerInstrumentations({ instrumentations: [new HttpInstrumentation()] }); + + // Return objects that need to be shut down + return { meterProvider, hostMetrics }; }; +// Export a shutdown function to be called on application termination +export const shutdownTelemetry = (telemetryObjects?: { meterProvider?: MeterProvider; hostMetrics?: HostMetrics }) => { + if (!telemetryObjects) return; + + console.log("Shutting down telemetry..."); + if (telemetryObjects.hostMetrics) { + try { + telemetryObjects.hostMetrics.shutdown(); + } catch (error) { + console.error("Error shutting down host metrics:", error); + } + } + + if (telemetryObjects.meterProvider) { + try { + telemetryObjects.meterProvider.shutdown() + .then(() => console.log("Meter provider shut down successfully")) + .catch((error) => console.error("Error shutting down meter provider:", error)); + } catch (error) { + console.error("Error initiating meter provider shutdown:", error); + } + } +}; +let telemetryObjects: ReturnType<typeof initTelemetryInstrumentation> | undefined; + const setupTelemetry = () => { const appCfg = initEnvConfig(); if (appCfg.OTEL_TELEMETRY_COLLECTION_ENABLED) { console.log("Initializing telemetry instrumentation"); - initTelemetryInstrumentation({ + telemetryObjects = initTelemetryInstrumentation({ otlpURL: appCfg.OTEL_EXPORT_OTLP_ENDPOINT, otlpUser: appCfg.OTEL_COLLECTOR_BASIC_AUTH_USERNAME, otlpPassword: appCfg.OTEL_COLLECTOR_BASIC_AUTH_PASSWORD, otlpPushInterval: appCfg.OTEL_OTLP_PUSH_INTERVAL, exportType: appCfg.OTEL_EXPORT_TYPE }); } // ... existing Datadog tracer code }; void setupTelemetry(); +// Register shutdown handler +process.on('SIGTERM', () => { + console.log('SIGTERM signal received. Shutting down telemetry...'); + shutdownTelemetry(telemetryObjects); +}); + +process.on('SIGINT', () => { + console.log('SIGINT signal received. Shutting down telemetry...'); + shutdownTelemetry(telemetryObjects); +});
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (2)
backend/package.json
(1 hunks)backend/src/lib/telemetry/instrumentation.ts
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Check TS and Lint
- GitHub Check: Run integration test
🔇 Additional comments (4)
backend/package.json (2)
158-159
: Good choice of focused instrumentation librariesReplacing the generic
@opentelemetry/auto-instrumentations-node
with specific instrumentation libraries (@opentelemetry/host-metrics
and@opentelemetry/instrumentation-http
) aligns well with the PR objective to reduce Prometheus ingestion and lower hosting costs. This targeted approach allows for more control over which metrics are collected.
157-157
:❓ Verification inconclusive
Ensure compatibility with existing OpenTelemetry packages
There's a version mismatch between
@opentelemetry/host-metrics
(^0.35.5) and other OpenTelemetry packages that are at much higher versions (^0.55.0, ^0.57.2, ^1.x.x). This could lead to compatibility issues.
🏁 Script executed:
#!/bin/bash # Check compatibility between OpenTelemetry packages echo "Checking OpenTelemetry package versions..." grep -A 10 -B 10 "@opentelemetry/" package.json | grep -E "@opentelemetry/.*\".*" # Compare with npm compatibility documentation echo -e "\nChecking latest versions and compatibility information..." npm view @opentelemetry/host-metrics version npm view @opentelemetry/sdk-metrics version npm view @opentelemetry/instrumentation-http versionLength of output: 475
Attention: Verify OpenTelemetry Package Compatibility
The current dependency configuration shows that
@opentelemetry/host-metrics
is fixed at version0.35.5
(which is the latest published version per npm), while other OpenTelemetry packages such as@opentelemetry/sdk-metrics
(at1.30.1
) and@opentelemetry/instrumentation-http
(at0.57.2
) are on higher version lines. This version discrepancy may lead to compatibility issues if these packages expect aligned or compatible versions.Action Items:
- Confirm whether the differing version numbers between these packages are supported according to OpenTelemetry’s compatibility guidelines.
- If incompatibilities are identified, consider aligning the dependency versions or document the rationale for using distinct versions.
- Ensure any integration tests verify that the mixed versions do not introduce runtime issues.
backend/src/lib/telemetry/instrumentation.ts (2)
4-4
: Appropriate imports for manual instrumentationThe imports for
HostMetrics
andHttpInstrumentation
align with the package.json changes and support the move from auto to manual instrumentation.Also applies to: 6-6
84-93
:❓ Verification inconclusive
Verify telemetry configuration in production environment
The PR aims to reduce Prometheus ingestion. After deployment, verify that the metrics volume has indeed decreased with these changes.
🏁 Script executed:
#!/bin/bash # Note: This script demonstrates the verification approach but requires access to your Prometheus/monitoring system echo "To verify the impact of these changes after deployment:" echo "1. Compare the rate of metrics ingestion before and after the change" echo "2. Check that essential metrics are still being collected" echo "3. Verify no unusual errors in the logs related to telemetry" # Example commands to run in your monitoring environment: # - Check total number of metrics: 'count(count by (__name__)({job="infisical-core"}))' # - Compare ingestion rate: 'rate(prometheus_tsdb_head_samples_appended_total[1h])'Length of output: 523
Action: Verify Telemetry Metrics Post Deployment
The telemetry instrumentation in
backend/src/lib/telemetry/instrumentation.ts
(lines 84–93) is correctly gated by theappCfg.OTEL_TELEMETRY_COLLECTION_ENABLED
flag and passes the proper configuration settings toinitTelemetryInstrumentation()
. As a reminder, after deployment please ensure that:
- The total metrics ingestion rate has decreased: Compare the rate using queries (e.g., checking
count(count by (__name__)({job="infisical-core"}))
or usingrate(prometheus_tsdb_head_samples_appended_total[1h])
).- Essential metrics continue to be collected: Confirm that no critical telemetry data is missing.
- No unexpected telemetry errors appear in logs.
These steps will verify that the changes aimed at reducing Prometheus ingestion are effective without compromising necessary telemetry monitoring.
Description 📣
Type ✨
Tests 🛠️
# Here's some code block to paste some code snippets
Summary by CodeRabbit