You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Ingest historical data into your target platform | Microsoft Docs
3
+
description: Learn how to ingest historical data into your selected target platform.
4
4
author: limwainstein
5
5
ms.author: lwainstein
6
6
ms.topic: how-to
7
7
ms.date: 05/03/2022
8
-
ms.custom: ignite-fall-2021
9
8
---
10
9
11
-
# Export and ingest data
10
+
# Ingest historical data into your target platform
12
11
13
-
Overview text
12
+
After you [select a target platform](migration-ingestion-target-platform.md) for your historical data and [a tool to transfer your data](migration-ingestion-tool.md), and the historical data is stored in a staging location, you can start to ingest the data into the target platform.
14
13
15
-
## Export data
14
+
This article describes how to ingest your historical data into your selected target platform.
16
15
17
-
## Ingest data to Sentinel
16
+
## Export data from the legacy SIEM
17
+
18
+
In general, SIEMs can export or dump data to a file in your local file system, so you can use this method to extract the historical data. It’s also important to set up a staging location for your exported files. The tool you use to transfer the data ingestion can copy the files from the staging location to the target platform.
19
+
20
+
This diagram shows the high-level the export and ingestion process.
21
+
22
+
:::image type="content" source="media/migration-export-ingest/export-data.png" alt-text="Diagram illustrating steps involved in export and ingestion." lightbox="media/migration-export-ingest/export-data.png":::
23
+
24
+
To export data from your legacy SIEM, see one of the following sections:
25
+
-[Export data from ArcSight](migration-arcsight-historical-data.md)
26
+
-[Export data from Splunk](migration-splunk-historical-data.md)
27
+
-[Export data from QRadar](migration-qradar-historical-data.md)
18
28
19
29
## Ingest to Azure Data Explorer
20
30
21
-
## Ingest to Microsoft Sentinel Basic logs
31
+
To ingest your historical data into Azure Data Explorer (ADX) (option 1 in the [diagram above](#export-data-from-the-legacy-siem):
32
+
33
+
1.[install and configure LightIngest](/azure/data-explorer/lightingest) on the system where logs are exported, or install LightIngest on another system that has access to the exported logs. LightIngest supports Windows only.
34
+
1. If you don't have an existing ADX cluster, create a new cluster and copy the connection string. Learn how to [set up ADX](/azure/data-explorer/create-cluster-database-portal).
35
+
1. In ADX, create tables and define a schema for the CSV or JSON format (for QRadar). Learn how to create a table and define a schema [with sample data](/azure/data-explorer/ingest-sample-data?tabs=one-click-ingest) or [without sample data](/azure/data-explorer/one-click-table).
36
+
1.[Run LightIngest](/azure/data-explorer/lightingest#run-lightingest) with the folder path that includes the exported logs as the path, and the ADX connection string as the output. When you run LightIngest, ensure that you provide the target ADX table name, that the argument pattern is set to `*.csv`, and the format is set to `.csv` (or `json` for QRadar).
37
+
38
+
## Ingest data to Microsoft Sentinel Basic Logs
39
+
40
+
To ingest your historical data into Microsoft Sentinel Basic Logs (option 2 in the [diagram above](#export-data-from-the-legacy-siem):
41
+
42
+
1. If you don't have an existing Log Analytics workspace, create a new workspace and [install Microsoft Sentinel](quickstart-onboard#enable-microsoft-sentinel.md).
43
+
1.[Create an App registration to authenticate against the API](../azure-monitor/logs/tutorial-custom-logs#configure-application.md).
44
+
1.[Create a data collection endpoint](../azure-monitor/logs/tutorial-custom-logs#create-data-collection-endpoint.md) that this will act as the API endpoint that accepts the data.
45
+
1.[Create a custom log table](/azure/azure-monitor/logs/tutorial-custom-logs#add-custom-log-table.md) to store the data, and provide a data sample. In this step you can also define a transformation before the data is ingested.
46
+
1.[Collect information from the data collection rule](../azure/azure-monitor/logs/tutorial-custom-logs#collect-information-from-dcr.md) and assign permissions to the rule.
47
+
1.[Change the table from Analytics to Basic Logs](/azure/azure-monitor/logs/basic-logs-configure?tabs=api-1%2Cportal-1.md).
48
+
1. Run the [Custom Log Ingestion script](https://github.com/Azure/Azure-Sentinel/tree/master/Tools/CustomLogsIngestion-DCE-DCR). The script asks for the following details:
49
+
- Path to the log files to ingest
50
+
- Azure AD tenant ID
51
+
- Application ID
52
+
- Application secret
53
+
- DCE endpoint
54
+
- DCR immutable ID
55
+
- Data stream name from the DCR
56
+
57
+
The script returns the number of events that have been sent to the workspace.
58
+
59
+
## Ingest to Azure Blob Storage
60
+
61
+
To ingest your historical data into Azure Data Explorer (ADX) (option 3 in the [diagram above](#export-data-from-the-legacy-siem):
22
62
23
-
## Ingest to Azure Blob Storage
63
+
1.[Install and configure AzCopy](../storage/common/storage-use-azcopy-v10.md) on the same system where the logs are exported, or install AzCopy on another system that has access to the exported logs.
64
+
1.[Create an Azure Blob Storage account](../storage/common/storage-account-create?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal.md) and copy the authorized Azure Active Directory credentials or Shared Access Signature token.
65
+
1.[Run AzCopy](../storage/common/storage-use-azcopy-v10?toc=/azure/storage/blobs/toc.json#run-azcopy.md) with the folder path that includes the exported logs as the source, and the Azure Blob Storage connection string as the output.
|**Features/benefits**: |- Leverage most of the existing Azure Monitor Logs experiences at a lower cost.<br>- Basic Logs are retained for 8 days, and are then automatically transferred to the archive (according to the original retention period).<br>- Use [search jobs](/azure/azure-monitor/logs/search-jobs) to search across petabytes of data and find specific events.<br>- For deep investigations on a specific time range, [restore data from the archive](/azure/azure-monitor/logs/restore). The data is then available in the hot cache for further analytics. |- Both ADX and Microsoft Sentinel use the Kusto Query Language (KQL), allowing you to query, aggregate, or correlate data in both platforms. For example, you can run a KQL query from Microsoft Sentinel to [join data stored in ADX with data stored in Log Analytics](/azure/azure-monitor/logs/azure-monitor-data-explorer-proxy).<br>- With ADX, you have substantial control over the cluster size and configuration. For example, you can create a larger cluster to achieve higher ingestion throughput, or create a smaller cluster to control your costs. |- Data is stored in a blob storage, which is low in costs.<br> - You use ADX to query the data in KQL, allowing you to easily access the data. [Learn how to query Azure Monitor data with ADX](/azure/azure-monitor/logs/azure-data-explorer-query-storage) |
20
-
|**Usability**: |**Great**<br><br>The archive and search options reside in Microsoft Sentinel, simplifying access to these options. |**Good**<br><br>Fairly easy to use in the context of Microsoft Sentinel. For example, you can use an Azure workbook to visualize data spread across both Microsoft Sentinel and ADX. You can also query ADX data from the Microsoft Sentinel portal using the [ADX proxy](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/azure-monitor-data-explorer-proxy). |**Fair**<br><br>While using the `externaldata` operator is very challenging with large numbers of blobs to reference, using external ADX tables eliminates this issue. The external table definition understands the blob storage folder structure, and allows you to transparently query the data contained in many different blobs and folders. |**Poor**<br><br>With historical data migrations, you might have to deal with millions of files, and exploring the data becomes a challenge. |
21
-
|**Management overhead**: |**Fully managed**<br><br>The search and archive options are managed by Microsoft, so these options do not add management overhead.|**High**<br><br>ADX is external to Microsoft Sentinel, which requires monitoring and maintenance. |**Medium**<br><br>With this option, you maintain and monitor ADX and Azure Blob Storage, both of which are external components to Microsoft Sentinel. While ADX can be shut down at times, we recommend to consider the additional management with this option. |**Low**<br><br>While this platform requires very little maintenance, selecting this platform adds monitoring and configuration tasks, such as setting up lifecycle management. |
22
-
|**Performance**: |**Medium**<br><br>You typically interact with basic logs within the archive using [search jobs](/azure/azure-monitor/logs/search-jobs), which are suitable when you want to maintain access to the data, but do not need immediate access to the data. |**High to low**<br><br>- The query performance of an ADX cluster depend on multiple factors, including the number of nodes in the cluster, the cluster virtual machine SKU, data partitioning, and more.<br>- As you add nodes to the cluster, the performance increases, together with the cost.<br>- If you use ADX, we recommend that you configure your cluster size to balance performance and cost. This depends on your organizations needs, including how fast your migration needs to complete, how often the data is accessed, and the expected response time. |**Low**<br><br>Because Blob Storage resides at the same location as the data, you can expect the same performance as with Azure Blob Storage. |**Low**<br><br>offers two performance tiers: Premium or Standard. Although both tiers are an option for long-term storage, Standard is more cost-efficient. Learn about performance and scalability limits. |
23
-
|**Cost**: |**Highest**<br><br>The cost is comprised of two components: - **Ingestion cost**: Every GB of data ingested into Basic Logs is subject to Microsoft Sentinel and Azure Monitor Logs ingestion costs, which sum up to approximately $1/GB. See the [pricing details](https://azure.microsoft.com/pricing/details/microsoft-sentinel/).<br> - **Archival cost**: This is the cost for data in the archive tier, and sums up to approximately $0.02/GB per month. See the [pricing details](https://azure.microsoft.com/pricing/details/monitor/). |**High to low**<br><br>- Because ADX is a cluster of virtual machines, you are charged based on compute, storage and networking usage, plus an ADX markup (see the [pricing details](https://azure.microsoft.com/pricing/details/data-explorer/). Therefore, the more nodes you add to your cluster and the more data you store, the higher the cost.<br>- ADX also offers autoscaling capabilities to adapt to workload on demand. On top of this, ADX can benefit from Reserved Instance pricing. You can run your own cost calculations in the [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/). |**Low**<br><br>The cluster size does not affect the cost, because ADX only acts as a proxy. In addition, you need to run the cluster only when you need quick and simple access to the data. |**Low**<br><br>With optimal setup, this is the option with the lowest costs. In addition, the data works in an automatic lifecycle, so older blobs move into lower-cost access tiers. |
20
+
|**Usability**: |**Great**<br><br>The archive and search options are simple to use and accessible from the Microsoft Sentinel portal. However, the data is not immediately available for queries. You need to perform a search to retrieve the data, which might take some time, depending on the amount of data being scanned and returned. |**Good**<br><br>Fairly easy to use in the context of Microsoft Sentinel. For example, you can use an Azure workbook to visualize data spread across both Microsoft Sentinel and ADX. You can also query ADX data from the Microsoft Sentinel portal using the [ADX proxy](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/azure-monitor-data-explorer-proxy). |**Fair**<br><br>While using the `externaldata` operator is very challenging with large numbers of blobs to reference, using external ADX tables eliminates this issue. The external table definition understands the blob storage folder structure, and allows you to transparently query the data contained in many different blobs and folders. |**Poor**<br><br>With historical data migrations, you might have to deal with millions of files, and exploring the data becomes a challenge. |
21
+
|**Management overhead**: |**Fully managed**<br><br>The search and archive options are fully managed and do not add management overhead. |**High**<br><br>ADX is external to Microsoft Sentinel, which requires monitoring and maintenance. |**Medium**<br><br>With this option, you maintain and monitor ADX and Azure Blob Storage, both of which are external components to Microsoft Sentinel. While ADX can be shut down at times, we recommend to consider the additional management with this option. |**Low**<br><br>While this platform requires very little maintenance, selecting this platform adds monitoring and configuration tasks, such as setting up lifecycle management. |
22
+
|**Performance**: |**Medium**<br><br>You typically interact with basic logs within the archive using [search jobs](/azure/azure-monitor/logs/search-jobs), which are suitable when you want to maintain access to the data, but do not need immediate access to the data. |**High to low**<br><br>- The query performance of an ADX cluster depend on multiple factors, including the number of nodes in the cluster, the cluster virtual machine SKU, data partitioning, and more.<br>- As you add nodes to the cluster, the performance improves, together with the cost.<br>- If you use ADX, we recommend that you configure your cluster size to balance performance and cost. This depends on your organizations needs, including how fast your migration needs to complete, how often the data is accessed, and the expected response time. |**Low**<br><br>Because Blob Storage resides at the same location as the data, you can expect the same performance as with Azure Blob Storage. |**Low**<br><br>offers two performance tiers: Premium or Standard. Although both tiers are an option for long-term storage, Standard is more cost-efficient. Learn about performance and scalability limits. |
23
+
|**Cost**: |**Highest**<br><br>The cost is comprised of two components: - **Ingestion cost**: Every GB of data ingested into Basic Logs is subject to Microsoft Sentinel and Azure Monitor Logs ingestion costs, which sum up to approximately $1/GB. See the [pricing details](https://azure.microsoft.com/pricing/details/microsoft-sentinel/).<br> - **Archival cost**: This is the cost for data in the archive tier, and sums up to approximately $0.02/GB per month. See the [pricing details](https://azure.microsoft.com/pricing/details/monitor/).<br>In addition to these two cost components, if you need frequent access to the data, take into account additional costs when you access data via search jobs. |**High to low**<br><br>- Because ADX is a cluster of virtual machines, you are charged based on compute, storage and networking usage, plus an ADX markup (see the [pricing details](https://azure.microsoft.com/pricing/details/data-explorer/). Therefore, the more nodes you add to your cluster and the more data you store, the higher the cost.<br>- ADX also offers autoscaling capabilities to adapt to workload on demand. On top of this, ADX can benefit from Reserved Instance pricing. You can run your own cost calculations in the [Azure Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/). |**Low**<br><br>The cluster size does not affect the cost, because ADX only acts as a proxy. In addition, you need to run the cluster only when you need quick and simple access to the data. |**Low**<br><br>With optimal setup, this is the option with the lowest costs. In addition, the data works in an automatic lifecycle, so older blobs move into lower-cost access tiers. |
24
24
|How to access data |**Search jobs**|**Direct KQL queries**|**Modified KQL data**|**externaldata**|
25
25
|Scenario |**Occasional access**<br><br>Relevant in scenarios where you don’t need to run heavy analytics or trigger analytics rules. |**Frequent access**|**Occasional access**|**Compliance/audit**<br><br>- Optimal for storing massive amounts of unstructured data.<br>- Relevant in scenarios where you do not need quick access to the data or high performance, such as for compliance or audits. |
@@ -82,4 +82,4 @@ Performance of your storage account can also greatly vary depending on the numbe
82
82
83
83
The amount of data is the main factor that affects the duration of the migration process, if you do not change the rest of your pipeline. You should therefore consider how to set up your environment depending on your data set.
84
84
85
-
To determine the minimum duration of the migration and where the bottleneck could be, consider the amount of data and the ingestion speed of the target platform. For example, if you select a target platform that can ingest 1 GB per second, and you have to migrate 100 TB, your migration will take a minimum of 100000 GB or 1 GB per second, which calculates to 27 hours. This is obviously if the rest of the components in the pipeline, such as the local disk, the network, and the virtual machines, can perform at a speed of 1 GB per second.
85
+
To determine the minimum duration of the migration and where the bottleneck could be, consider the amount of data and the ingestion speed of the target platform. For example, if you select a target platform that can ingest 1 GB per second, and you have to migrate 100 TB, your migration will take a minimum of 100000 GB, multiplied by the 1 GB per second speed. Divide the result by 3600, which calculates to 27 hours. This is correct if the rest of the components in the pipeline, such as the local disk, the network, and the virtual machines, can perform at a speed of 1 GB per second.
0 commit comments