title | description | ms.devlang | ms.topic | ms.date | ms.custom |
---|---|---|---|---|---|
Tutorial - Trigger a Batch job using Azure Functions |
Tutorial - Apply OCR to scanned documents as they're added to a storage blob |
csharp |
tutorial |
08/23/2021 |
mvc, devx-track-csharp |
In this tutorial, you'll learn how to trigger a Batch job using Azure Functions. We'll walk through an example in which documents added to an Azure Storage blob container have optical character recognition (OCR) applied to them via Azure Batch. To streamline the OCR processing, we will configure an Azure function that runs a Batch OCR job each time a file is added to the blob container. You learn how to:
[!div class="checklist"]
- Use Batch Explorer to create pools and jobs
- Use Storage Explorer to create blob containers and a shared access signature (SAS)
- Create a blob-triggered Azure Function
- Upload input files to Storage
- Monitor task execution
- Retrieve output files
- An Azure account with an active subscription. Create an account for free.
- An Azure Batch account and a linked Azure Storage account. See Create a Batch account for more information on how to create and link accounts.
- Batch Explorer.
- Azure Storage Explorer.
Sign in to the Azure portal.
In this section, you'll use Batch Explorer to create the Batch pool and Batch job that will run OCR tasks.
- Sign in to Batch Explorer using your Azure credentials.
- Create a pool by selecting Pools on the left side bar, then the Add button above the search form.
- Choose an ID and display name. We'll use
ocr-pool
for this example. - Set the scale type to Fixed size, and set the dedicated node count to 3.
- Select Ubuntuserver > 18.04-lts as the operating system.
- Choose
Standard_f2s_v2
as the virtual machine size. - Enable the start task and add the command
/bin/bash -c "sudo update-locale LC_ALL=C.UTF-8 LANG=C.UTF-8; sudo apt-get update; sudo apt-get -y install ocrmypdf"
. Be sure to set the user identity as Task user (Admin), which allows start tasks to include commands withsudo
. - Select OK.
- Choose an ID and display name. We'll use
- Create a job on the pool by selecting Jobs on the left side bar, then the Add button above the search form.
- Choose an ID and display name. We'll use
ocr-job
for this example. - Set the pool to
ocr-pool
, or whatever name you chose for your pool. - Select OK.
- Choose an ID and display name. We'll use
Here you'll create blob containers that will store your input and output files for the OCR Batch job. In this example, the input container is named input
and is where all documents without OCR are initially uploaded for processing. The output container is named output
and is where the Batch job writes processed documents with OCR.
- Sign in to Storage Explorer using your Azure credentials.
- Using the storage account linked to your Batch account, create two blob containers (one for input files, one for output files) by following the steps at Create a blob container.
- Create a shared access signature for your output container in Storage Explorer by right-clicking the output container and selecting Get Shared Access Signature.... Under Permissions, select Write. No other permissions are necessary.
In this section you'll create the Azure Function that triggers the OCR Batch job whenever a file is uploaded to your input container.
- Follow the steps in Create a function triggered by Azure Blob storage to create a function.
- For runtime stack, choose .NET. We'll write our function in C# to leverage the Batch .NET SDK.
- When prompted for a storage account under Hosting, use the same storage account that you linked to your Batch account.
- While creating the Azure Blob storage account trigger, be sure to set the path as
input/{name}
(to match the name of your input container).
- Once the blob-triggered function is created, select Code + Test. Use the
run.csx
andfunction.proj
from GitHub in the Function.function.proj
doesn't exist by default, so select the Upload button to upload it into your development workspace.run.csx
is run when a new blob is added to your input blob container.function.proj
lists the external libraries in your Function code, for example, the Batch .NET SDK.
- Change the placeholder values of the variables in the
Run()
function of therun.csx
file to reflect your Batch and storage credentials. You can find your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.- Retrieve your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.
Upload any or all of the scanned files from the input_files
directory on GitHub to your input container. Monitor Batch Explorer to confirm that a task gets added to ocr-pool
for each file. After a few seconds, the file with OCR applied is added to the output container. The file is then visible and retrievable on Storage Explorer.
Additionally, you can watch the logs file at the bottom of the Azure Functions web editor window, where you'll see messages like this for every file you upload to your input container:
2019-05-29T19:45:25.846 [Information] Creating job...
2019-05-29T19:45:25.847 [Information] Accessing input container <inputContainer>...
2019-05-29T19:45:25.847 [Information] Adding <fileName> as a resource file...
2019-05-29T19:45:25.848 [Information] Name of output text file: <outputTxtFile>
2019-05-29T19:45:25.848 [Information] Name of output PDF file: <outputPdfFile>
2019-05-29T19:45:26.200 [Information] Adding OCR task <taskID> for <fileName> <size of fileName>...
To download the output files from Storage Explorer to your local machine, first select the files you want and then select the Download on the top ribbon.
Tip
The downloaded files are searchable if opened in a PDF reader.
You are charged for the pool while the nodes are running, even if no jobs are scheduled. When you no longer need the pool, delete it with the following steps:
- In the account view, select Pools and the name of the pool.
- Select Delete.
When you delete the pool, all task output on the nodes is deleted. However, the output files remain in the storage account. When no longer needed, you can also delete the Batch account and the storage account.
For more examples of using the .NET API to schedule and process Batch workloads, see the samples on GitHub.
[!div class="nextstepaction"] Batch C# samples