title | titleSuffix | description | services | ms.service | ms.subservice | ms.author | author | ms.date | ms.topic | ms.reviewer | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|---|
Troubleshoot prebuilt docker images |
Azure Machine Learning |
Troubleshooting steps for using prebuilt Docker images for inference. |
machine-learning |
machine-learning |
mlops |
ssambare |
shivanissambare |
10/21/2021 |
how-to |
larryfr |
deploy, docker, prebuilt, troubleshoot |
Learn how to troubleshoot problems you may see when using prebuilt docker images for inference with Azure Machine Learning.
Important
Using Python package extensibility for prebuilt Docker images with Azure Machine Learning is currently in preview. Preview functionality is provided "as-is", with no guarantee of support or service level agreement. For more information, see the Supplemental terms of use for Microsoft Azure previews.
If model deployment fails, you won't see logs in Azure Machine Learning Studio and service.get_logs()
will return None.
If there is a problem in the init() function of score.py, service.get_logs()
will return logs for the same.
So you'll need to run the container locally using one of the commands shown below and replace <MCR-path>
with an image path. For a list of the images and paths, see Prebuilt Docker images for inference.
Go to the directory containing score.py
and run:
docker run -it -v $(pwd):/var/azureml-app -e AZUREML_EXTRA_PYTHON_LIB_PATH="myenv/lib/python3.7/site-packages" <mcr-path>
Go to the directory containing score.py
and run:
docker run -it -v $(pwd):/var/azureml-app -e AZUREML_EXTRA_REQUIREMENTS_TXT="requirements.txt" <mcr-path>
The local inference server allows you to quickly debug your entry script (score.py
). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred. Learn more about Azure Machine Learning inference HTTP Server
For problems when deploying a model from Azure Machine Learning to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS), see Troubleshoot model deployment.
HTTP server in our Prebuilt Docker Images run as non-root user, it may not have access right to all directories.
Only write to directories you have access rights to. For example, the /tmp
directory in the container.
- Check if there's a typo in the environment variable or file name.
- Check the container log to see if
pip install -r <your_requirements.txt>
is installed or not. - Check if source directory is set correctly in the inference config constructor.
- If installation not found and log says "file not found", check if the file name shown in the log is correct.
- If installation started but failed or timed out, try to install the same
requirements.txt
locally with same Python and pip version in clean environment (that is, no cache directory;pip install --no-cache-dir -r requriements.txt
). See if the problem can be reproduced locally.
- Check if there's a typo in the environment variable or directory name.
- The environment variable must be set to the relative path of the
score.py
file. - Check if source directory is set correctly in the inference config constructor.
- The directory needs to be the "site-packages" directory of the environment.
- If
score.py
still returnsModuleNotFound
and the module is supposed to be in the directory mounted, try to print thesys.path
ininit()
orrun()
to see if any path is missing.
- If failed during apt package installation, check if the user has been set to root before running the apt command? (Make sure switch back to non-root user)
GPU base images can't be used for local deployment, unless the local deployment is on an Azure Machine Learning compute instance. GPU base images are supported only on Microsoft Azure Services such as Azure Machine Learning compute clusters and instances, Azure Container Instance (ACI), Azure VMs, or Azure Kubernetes Service (AKS).
-
The non-root user needs to be
dockeruser
. Otherwise, the owner of the following directories must be set to the user name you want to use when running the image:/var/runit /var/log /var/lib/nginx /run /opt/miniconda /var/azureml-app
-
If the
ENTRYPOINT
has been changed in the new built image, then the HTTP server and related components needs to be loaded byrunsvdir /var/runit