|
| 1 | +--- |
| 2 | +title: Debug and troubleshoot Machine Learning Pipelines in Application Insights |
| 3 | +titleSuffix: Azure Machine Learning |
| 4 | +description: Add logging to your training and batch scoring pipelines and view the logged results in application insights. |
| 5 | +services: machine-learning |
| 6 | +author: anrode |
| 7 | +ms.author: anrode |
| 8 | +ms.reviewer: anrode |
| 9 | +ms.service: machine-learning |
| 10 | +ms.subservice: core |
| 11 | +ms.workload: data-services |
| 12 | +ms.topic: conceptual |
| 13 | +ms.date: 01/15/2020 |
| 14 | + |
| 15 | +ms.custom: seodec18 |
| 16 | +--- |
| 17 | +# Debug and troubleshoot Machine Learning Pipelines in Application Insights |
| 18 | +[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)] |
| 19 | + |
| 20 | +The [OpenCensus](https://opencensus.io/quickstart/python/) python library can be used to route logs to Application Insights from your scripts. The benefit of having all logs for multiple pipeline runs in Application Insights is that you can track trends over time across similar pipeline runs, or compare pipeline runs with different parameters and data. |
| 21 | + |
| 22 | +In addition, it can provide history of exceptions and error messages. Since Application Insights integrates with Azure Alerts, you can also create alerts based on Application Insights queries. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +* Follow the steps to create an [Azure Machine Learning](./how-to-manage-workspace.md) workspace and [create your first pipeline](./how-to-create-your-first-pipeline.md) |
| 27 | +* [Configure your development environment](./how-to-configure-environment.md) to install the Azure Machine Learning SDK. We used Visual Studio Code to write the python scripts in this example |
| 28 | + * Follow this [guide](https://code.visualstudio.com/docs/python/python-tutorial) to set up your Visual Studio Code Python environment |
| 29 | + * We recommend creating a virtual environment and [installing new packages](https://code.visualstudio.com/docs/python/python-tutorial#_install-and-use-packages) there |
| 30 | +* Install the [Python OpenCensus package](https://pypi.org/project/opencensus/) |
| 31 | +* Create an [Application Insights instance](../azure-monitor/app/opencensus-python.md)(this doc also contains information on getting the connection string for the resource) |
| 32 | + |
| 33 | +## Getting Started |
| 34 | + |
| 35 | +The following is a quickstart for using OpenCensus specific to this use case. For a detailed tutorial, see [OpenCensus Azure Monitor Exporters](https://github.com/census-instrumentation/opencensus-python/tree/master/contrib/opencensus-ext-azure) |
| 36 | + |
| 37 | +After you install the OpenCensus Python library, import the AzureLogHandler class. This helps to route logs to Application Insights. You will also need the Python Logging library. |
| 38 | + |
| 39 | +```python |
| 40 | +from opencensus.ext.azure.log_exporter import AzureLogHandler |
| 41 | +import logging |
| 42 | +``` |
| 43 | + |
| 44 | +Then, create a Python logger, and add an AzureLogHandler to it. You will also need to set the required `APPLICATIONINSIGHTS_CONNECTION_STRING` environment variable or provide the instrumentation key inline. |
| 45 | + |
| 46 | +```python |
| 47 | +# Use OpenCensus Logging |
| 48 | + |
| 49 | +# If you do not want to use env variable, instantiate handler this way: |
| 50 | +handler = AzureLogHandler(connection_string='<connection string>') |
| 51 | +logger.addHandler(handler) |
| 52 | + |
| 53 | +# Otherwise, you must set the env variable APPLICATIONINSIGHTS_CONNECTION_STRING |
| 54 | +try: |
| 55 | + logger.addHandler(AzureLogHandler()). |
| 56 | +except ValueError as ex: |
| 57 | + logger.warning("Could not find application insights key. Either set the APPLICATIONINSIGHTS_CONNECTION_STRING " \ |
| 58 | + "environment variable or pass in a connection_string to AzureLogHandler.") |
| 59 | +``` |
| 60 | + |
| 61 | +## Logging with Custom Dimensions |
| 62 | + |
| 63 | +Plaintext strings as logs are helpful in the case where an engineer or data scientist is diagnosing one specific pipeline step, and is already has context of the experiment, parent pipeline, and step that is being evaluated. |
| 64 | +In other cases, like when someone is managing several models, a model's performance over time, or doesn't have the time to dive into each individual step and download the logs to view progress, Custom Dimensions can provide helpful context to a log message. |
| 65 | + |
| 66 | +Custom Dimensions are a dictionary of key-value (stored as string, string) pairs that is sent to Application Insights and displayed as a column in the query results. Its individual dimensions can be queried on. |
| 67 | + |
| 68 | +### Helpful dimensions to include |
| 69 | + |
| 70 | +| Field | Reasoning/Example | |
| 71 | +|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 72 | +| parent_run_id | Can query across logs for those with same parent_run_id to see logs over time for all steps, instead of having to dive into each individual step | |
| 73 | +| step_id | Can query across logs for those with same step_id to see where an issue occurred with a narrow scope to just the individual step | |
| 74 | +| step_name | Can query across logs to find how a specific step has performed over time, or find a step_id for recent runs without diving into the portal UI | |
| 75 | +| experiment_name | Can query across logs to find how a specific experiment has performed over time, or find a parent_run_id or step_id for recent runs without diving into the portal UI | |
| 76 | +| experiment_url | Can provide a link directly back to the experiment run for further investigation with less clicks, or to use to drill into from a dashboard | |
| 77 | +| build_url and/or build_version | Can correlate logs to the code version that provided the step and pipeline logic. This can further help to diagnose issues, or identify models with specific traits (log/metric values) | |
| 78 | +| run_type | Can differentiate between different model types, or training vs. scoring runs | |
| 79 | + |
| 80 | +### Creating the custom dimensions dictionary |
| 81 | + |
| 82 | +```python |
| 83 | +run = Run.get_context(allow_offline=False) |
| 84 | + |
| 85 | +# get value from environment variable |
| 86 | +build_id = os.environ["BUILD_ID"] |
| 87 | + |
| 88 | +custom_dimensions = { |
| 89 | + "parent_run_id": run.parent.id, |
| 90 | + "step_id": run.id, |
| 91 | + "step_name": run.name, |
| 92 | + "experiment_name": run.experiment.name, |
| 93 | + "run_url": run.parent.get_portal_url(), |
| 94 | + "build_id": build_id, |
| 95 | + # construct Azure DevOps url from helper given this id |
| 96 | + "build_url": "https://dev.azure.com/<your org here>/<your project here>/_build/results?buildId={build_id}&view=results", |
| 97 | + "run_type": "training" |
| 98 | +} |
| 99 | + |
| 100 | +# logger has AzureLogHandler registered previously |
| 101 | +logger.info("Info for application insights", custom_dimensions) |
| 102 | + |
| 103 | +``` |
| 104 | + |
| 105 | +## OpenCensus Python logging considerations |
| 106 | + |
| 107 | +The OpenCensus AuzreLogHandler is used to normal traditional Python logs to Application Insights. Due to this behavior, normal Python logging nuances should be considered. For example, when a logger is created, it has a default log level and will show logs greater than or equal to that level. A good reference for understanding and effectively utilizing the Python logging features is the [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html). |
| 108 | + |
| 109 | +The `APPLICATIONINSIGHTS_CONNECTION_STRING` environment variable is needed for the OpenCensus library. Consider setting this environment variable instead of passing it in as a pipeline parameter to reduce the amount of parameters needed and avoid passing around plaintext connection strings. # TODO: How to |
| 110 | + |
| 111 | +## Querying logs in Application Insights |
| 112 | + |
| 113 | +The logs routed to Application Insights will show up under 'traces'. Be sure to adjust your time window to include your pipeline run. |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +The result in Application Insights will show the log message and level, file path and code line number the log is from, as well as any custom dimensions included. In this image, the customDimensions dictionary shows the key/value pairs from the previous [code sample](#creating-custom-dimensions-dictionary). |
| 118 | + |
| 119 | +## Additional helpful queries |
| 120 | + |
| 121 | +This section contains helpful queries besides just the 'traces' query we initially used as en example to verify that logs are being piped to your Application Insights instance. |
| 122 | + |
| 123 | +Some of the queries below use ‘severityLevel’. For more information on Application Insights severity levels, see this [reference](https://docs.microsoft.com/en-us/dotnet/api/microsoft.applicationinsights.datacontracts.severitylevel?view=azure-dotnet). These severity levels correspond to the level the Python log was originally sent with. For additional query information, see [Azure Monitor Log Queries](https://docs.microsoft.com/en-us/azure/azure-monitor/log-query/query-language). |
| 124 | + |
| 125 | +| Use case | Query | |
| 126 | +|------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------| |
| 127 | +| Log results for specific custom dimension, for example 'parent_run_id' | `traces`<br>`| where customDimensions.['parent_run_id'] == '931024c2-3720-11ea-b247-c49deda841c1'` | |
| 128 | +| Log results for training runs over the last 7 days | `traces`<br>`| where timestamp > ago(7d) and customDimensions['run_type'] == 'training'` | |
| 129 | +| Log results with severityLevel Error from the last 7 days | `traces`<br>`| where timestamp > ago(7d) and severityLevel == 3` | |
| 130 | +| Count of log results with severityLevel Error over the last 7 days | `traces`<br>`| where timestamp > ago(7d) and severityLevel == 3`<br>`| summarize count()` | |
0 commit comments