Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: replaced otel auto instrumentation with manual #3176

Merged
merged 2 commits into from
Mar 4, 2025

Conversation

sheensantoscapadngan
Copy link
Member

@sheensantoscapadngan sheensantoscapadngan commented Mar 4, 2025

Description 📣

  • This PR removes OTEL auto-instrumentation and replaces it with manual integration of desired instrumentations. This will help reduce prometheus ingestion and overall reduce hosting costs

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

Tests 🛠️

# Here's some code block to paste some code snippets

Summary by CodeRabbit

  • New Features
    • Enhanced telemetry capabilities by focusing on HTTP instrumentation for improved monitoring of network requests.
    • Upgraded HTTP monitoring to deliver more accurate insights into network performance.

@sheensantoscapadngan sheensantoscapadngan marked this pull request as ready for review March 4, 2025 16:19
Copy link

coderabbitai bot commented Mar 4, 2025

Walkthrough

The pull request updates the telemetry instrumentation configuration in the backend. It modifies the dependencies in the backend/package.json file by removing the @opentelemetry/auto-instrumentations-node package (version ^0.53.0) and adding the @opentelemetry/instrumentation-http package (version ^0.57.2). Additionally, the telemetry code in backend/src/lib/telemetry/instrumentation.ts has been updated to remove the import for automatic Node.js instrumentations and replace it with an import for HttpInstrumentation. The initTelemetryInstrumentation function has also been modified to register HttpInstrumentation instead of the previous automatic instrumentations. These changes reflect a shift towards more focused HTTP instrumentation rather than broader automatic monitoring for Node.js applications.

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

backend/src/lib/telemetry/instrumentation.ts

Oops! Something went wrong! :(

ESLint: 8.57.1

ESLint couldn't find the plugin "@typescript-eslint/eslint-plugin".

(The package "@typescript-eslint/eslint-plugin" was not found when loaded as a Node module from the directory "/backend".)

It's likely that the plugin isn't installed correctly. Try reinstalling by running the following:

npm install @typescript-eslint/eslint-plugin@latest --save-dev

The plugin "@typescript-eslint/eslint-plugin" was referenced from the config file in "backend/.eslintrc.js".

If you still can't figure out the problem, please stop by https://eslint.org/chat/help to chat with the team.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7b42a and e42f860.

⛔ Files ignored due to path filters (1)
  • backend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (2)
  • backend/package.json (1 hunks)
  • backend/src/lib/telemetry/instrumentation.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/package.json
  • backend/src/lib/telemetry/instrumentation.ts
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Run integration test
  • GitHub Check: Check TS and Lint

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/src/lib/telemetry/instrumentation.ts (3)

71-73: Add error handling for host metrics initialization

The host metrics initialization lacks error handling. If the hostMetrics.start() call fails, it could potentially crash the application.

Consider adding error handling around the initialization:

-  const hostMetrics = new HostMetrics({ meterProvider });
-  hostMetrics.start();
+  try {
+    const hostMetrics = new HostMetrics({ meterProvider });
+    hostMetrics.start();
+    console.log("Host metrics collection started successfully");
+  } catch (error) {
+    console.error("Failed to initialize host metrics:", error);
+    // Continue application execution even if metrics collection fails
+  }

76-78: Consider configuring HTTP instrumentation options

The HTTP instrumentation is initialized with default options, which may capture more data than necessary. To further optimize Prometheus ingestion, consider configuring the HTTP instrumentation with specific options.

Consider adding configuration options to further control what gets captured:

-  registerInstrumentations({
-    instrumentations: [new HttpInstrumentation()]
-  });
+  registerInstrumentations({
+    instrumentations: [
+      new HttpInstrumentation({
+        // Ignore health check endpoints to reduce noise
+        ignoreIncomingPaths: ['/health', '/metrics'],
+        // Only capture specific response attributes to reduce data volume
+        responseHook: (span, response) => {
+          if (response.statusCode >= 400) {
+            span.setAttribute('http.status_code', response.statusCode);
+          }
+        }
+      })
+    ]
+  });

1-107: Consider adding an explicit shutdown mechanism for telemetry

The current implementation initializes telemetry but doesn't provide a clean way to shut it down when the application terminates, which could lead to missing final metrics or errors during shutdown.

Consider adding a shutdown function that can be called during application termination:

 const initTelemetryInstrumentation = ({
   // ... existing parameters
 }) => {
   // ... existing code

   opentelemetry.metrics.setGlobalMeterProvider(meterProvider);

   registerInstrumentations({
     instrumentations: [new HttpInstrumentation()]
   });
+  
+  // Return objects that need to be shut down
+  return { meterProvider, hostMetrics };
 };

+// Export a shutdown function to be called on application termination
+export const shutdownTelemetry = (telemetryObjects?: { meterProvider?: MeterProvider; hostMetrics?: HostMetrics }) => {
+  if (!telemetryObjects) return;
+  
+  console.log("Shutting down telemetry...");
+  if (telemetryObjects.hostMetrics) {
+    try {
+      telemetryObjects.hostMetrics.shutdown();
+    } catch (error) {
+      console.error("Error shutting down host metrics:", error);
+    }
+  }
+  
+  if (telemetryObjects.meterProvider) {
+    try {
+      telemetryObjects.meterProvider.shutdown()
+        .then(() => console.log("Meter provider shut down successfully"))
+        .catch((error) => console.error("Error shutting down meter provider:", error));
+    } catch (error) {
+      console.error("Error initiating meter provider shutdown:", error);
+    }
+  }
+};

+let telemetryObjects: ReturnType<typeof initTelemetryInstrumentation> | undefined;
+
 const setupTelemetry = () => {
   const appCfg = initEnvConfig();

   if (appCfg.OTEL_TELEMETRY_COLLECTION_ENABLED) {
     console.log("Initializing telemetry instrumentation");
-    initTelemetryInstrumentation({
+    telemetryObjects = initTelemetryInstrumentation({
       otlpURL: appCfg.OTEL_EXPORT_OTLP_ENDPOINT,
       otlpUser: appCfg.OTEL_COLLECTOR_BASIC_AUTH_USERNAME,
       otlpPassword: appCfg.OTEL_COLLECTOR_BASIC_AUTH_PASSWORD,
       otlpPushInterval: appCfg.OTEL_OTLP_PUSH_INTERVAL,
       exportType: appCfg.OTEL_EXPORT_TYPE
     });
   }

   // ... existing Datadog tracer code
 };

 void setupTelemetry();

+// Register shutdown handler
+process.on('SIGTERM', () => {
+  console.log('SIGTERM signal received. Shutting down telemetry...');
+  shutdownTelemetry(telemetryObjects);
+});
+
+process.on('SIGINT', () => {
+  console.log('SIGINT signal received. Shutting down telemetry...');
+  shutdownTelemetry(telemetryObjects);
+});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3045477 and 9f7b42a.

⛔ Files ignored due to path filters (1)
  • backend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (2)
  • backend/package.json (1 hunks)
  • backend/src/lib/telemetry/instrumentation.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Check TS and Lint
  • GitHub Check: Run integration test
🔇 Additional comments (4)
backend/package.json (2)

158-159: Good choice of focused instrumentation libraries

Replacing the generic @opentelemetry/auto-instrumentations-node with specific instrumentation libraries (@opentelemetry/host-metrics and @opentelemetry/instrumentation-http) aligns well with the PR objective to reduce Prometheus ingestion and lower hosting costs. This targeted approach allows for more control over which metrics are collected.


157-157:

❓ Verification inconclusive

Ensure compatibility with existing OpenTelemetry packages

There's a version mismatch between @opentelemetry/host-metrics (^0.35.5) and other OpenTelemetry packages that are at much higher versions (^0.55.0, ^0.57.2, ^1.x.x). This could lead to compatibility issues.


🏁 Script executed:

#!/bin/bash
# Check compatibility between OpenTelemetry packages

echo "Checking OpenTelemetry package versions..."
grep -A 10 -B 10 "@opentelemetry/" package.json | grep -E "@opentelemetry/.*\".*"

# Compare with npm compatibility documentation
echo -e "\nChecking latest versions and compatibility information..."
npm view @opentelemetry/host-metrics version
npm view @opentelemetry/sdk-metrics version
npm view @opentelemetry/instrumentation-http version

Length of output: 475


Attention: Verify OpenTelemetry Package Compatibility

The current dependency configuration shows that @opentelemetry/host-metrics is fixed at version 0.35.5 (which is the latest published version per npm), while other OpenTelemetry packages such as @opentelemetry/sdk-metrics (at 1.30.1) and @opentelemetry/instrumentation-http (at 0.57.2) are on higher version lines. This version discrepancy may lead to compatibility issues if these packages expect aligned or compatible versions.

Action Items:

  • Confirm whether the differing version numbers between these packages are supported according to OpenTelemetry’s compatibility guidelines.
  • If incompatibilities are identified, consider aligning the dependency versions or document the rationale for using distinct versions.
  • Ensure any integration tests verify that the mixed versions do not introduce runtime issues.
backend/src/lib/telemetry/instrumentation.ts (2)

4-4: Appropriate imports for manual instrumentation

The imports for HostMetrics and HttpInstrumentation align with the package.json changes and support the move from auto to manual instrumentation.

Also applies to: 6-6


84-93:

❓ Verification inconclusive

Verify telemetry configuration in production environment

The PR aims to reduce Prometheus ingestion. After deployment, verify that the metrics volume has indeed decreased with these changes.


🏁 Script executed:

#!/bin/bash
# Note: This script demonstrates the verification approach but requires access to your Prometheus/monitoring system

echo "To verify the impact of these changes after deployment:"
echo "1. Compare the rate of metrics ingestion before and after the change"
echo "2. Check that essential metrics are still being collected"
echo "3. Verify no unusual errors in the logs related to telemetry"

# Example commands to run in your monitoring environment:
# - Check total number of metrics: 'count(count by (__name__)({job="infisical-core"}))'
# - Compare ingestion rate: 'rate(prometheus_tsdb_head_samples_appended_total[1h])'

Length of output: 523


Action: Verify Telemetry Metrics Post Deployment

The telemetry instrumentation in backend/src/lib/telemetry/instrumentation.ts (lines 84–93) is correctly gated by the appCfg.OTEL_TELEMETRY_COLLECTION_ENABLED flag and passes the proper configuration settings to initTelemetryInstrumentation(). As a reminder, after deployment please ensure that:

  • The total metrics ingestion rate has decreased: Compare the rate using queries (e.g., checking count(count by (__name__)({job="infisical-core"})) or using rate(prometheus_tsdb_head_samples_appended_total[1h])).
  • Essential metrics continue to be collected: Confirm that no critical telemetry data is missing.
  • No unexpected telemetry errors appear in logs.

These steps will verify that the changes aimed at reducing Prometheus ingestion are effective without compromising necessary telemetry monitoring.

maidul98
maidul98 previously approved these changes Mar 4, 2025
@maidul98 maidul98 merged commit 026f883 into main Mar 4, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants