For data warehouses or big data analytics, we need to
automate the process of uploading one or more files to the Azure Data Lake on a
regular basis. The file sources and format could be anything and should be
uploaded as fast as possible to the Azure Data Lake container. Inside the container,
we may also need to categorize each file type in a separate folder. These files
(csv, xml, orc, avro, Jason, parquet, and so on) could reside in a local server,
network shared drive, cloud services, ftp or anywhere else.
We also want to use Azure Service Principal along with the AzCopy tool with restricted privileges so that the automated processes can only affect the data movement process and nothing else.
What is Azure Service
Principal: Azure Service Principal is an identity to access any
Azure resources while applying the least privileges principle method. It can be
used with automated tools/applications to prevent interactive login with
restricted permissions.
Service Principal: There are two types of authentications available for service principals:
- Password-based authentication
- Certificate-based authentication
To create an Azure Service Principal, we can use either Azure Portal or PowerShell. In this tutorial, we will be using the password-based method to create a service principal that we can use in our automated process.
Create an Azure Service Principal:
Step #1: Sign in to the Azure Portal
Step #2: Search for the Azure Active Directory, then select App Registrations.
Step#3: Select New Registration, then enter the following information and click the Register button at the bottom. Add a meaningful name for the application, use Public client (mobile & desktop) and enter any URL.
Assign a role to Service Principal:
To read and write to Azure Data Lake, we need to use any one of the following roles. In our tutorial, we will grant Storage Blob Data Contributor to the myAutomatedTool:
- Storage Blob Data Contributor
- Storage Blob Data Owner
Step#1: Navigate to Subscription, then select Access control (IAM). Click Add, and then select Add role assignment.
Step#2: In the next window, select the role Storage Blob Data Contributor and search for the application name we registered earlier and select it. Click Save once you are done.
Create a new application secret:
In our example, we will be using the password-based authentication. To create a new application secret for “myAutomatedTool”, follow the steps bellow:
- Select Azure Active Directory.
- From App registrations in Azure AD, select the application.
- Select Certificates & secrets.
- Select Client secrets -> New client secret.
- Provide a description of the secret, and a duration. When done, select Add.
After saving the client secret, the value of the client secret is displayed. Copy and save this value right away as you won't be able to retrieve the key later. We need to use this key (also known as secret or password) value with the Application ID to sign in with our automation application.
Collect Tenant ID and App ID values for signing in:
Next we need to collect the Tenant ID and App ID to use with the automation tool for signing in. Follow the steps below:
- Select Azure Active Directory.
- From App registrations in Azure AD, select the application.
- Copy the Directory (Tenant) ID.
- Copy the Application ID.
Using AzCopy to upload files.
Download the AzCopy tool from Microsoft download. Extract the zipped file and place the azcopy.exe in C:\Windows\System32 folder.
Upload files to Azure Data Lake:
We have the storage account home80 and the blob container dailyupload. We need to upload files from our local D:\Upload folder.
Open A PowerShell ISE/VS Code and execute the following PowerShell code to upload files. In this code snippet, we are using “AZCOPY_SPA_CLIENT_SECRET” as the runtime environment variable for secret (password) based service principal authentication.
# storage account and container name
$StorageAccountName = 'home80'
$ContainerName = 'dailyupload'
# storage URL and local folder location
$StorageURL = 'https://' + $StorageAccountName + '.blob.core.windows.net/' + $ContainerName
# optionally we can add folder to the tager container
# $StorageURL= 'https://'+ $StorageAccountName + '.blob.core.windows.net/'+$ContainerName +'/ORC2020/'
$LocalPath = 'D:\Upload\*.csv'
# service primcipal information
$TenantID = '953c78eb-391d-4c0b-b35f-72204d135267'
$AppID = '88b76bcd-683c-486b-8799-7f1bad602e1b'
$Password = 'LICgfH_mvyq-3j5vv~t9aXr-pm9J-m1MA0'
# runtime environment variable
$env:AZCOPY_SPA_CLIENT_SECRET = $Password
# login to Azure Cloud with the Service Principal
azcopy login --service-principal --application-id $AppID --tenant-id $TenantID
# copy files from local to Azure storage
azcopy copy $LocalPath $StorageURL
AzCopy Output:
Azure Data Container:
References:
- howto-create-service-principal-portal: https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/active-directory/develop/howto-create-service-principal-portal.md
- Create an Azure service principal with
Azure PowerShell:
https://docs.microsoft.com/en-us/powershell/azure/create-azure-service-principal-azureps?view=azps-4.5.0 - Use the Azure portal to assign an Azure role for access to blob and queue data: https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-rbac-portal
- Quickstart: Register an application with the Microsoft identity platform: https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app
No comments:
New comments are not allowed.