For data warehouses or big data analytics, we need to
automate the process of uploading one or more files to the Azure Data Lake on a
regular basis. The file sources and format could be anything and should be
uploaded as fast as possible to the Azure Data Lake container. Inside the container,
we may also need to categorize each file type in a separate folder. These files
(csv, xml, orc, avro, Jason, parquet, and so on) could reside in a local server,
network shared drive, cloud services, ftp or anywhere else.
We also want to use Azure Service Principal along with the AzCopy tool with restricted privileges so that the automated processes can only affect the data movement process and nothing else.
What is Azure Service
Principal: Azure Service Principal is an identity to access any
Azure resources while applying the least privileges principle method. It can be
used with automated tools/applications to prevent interactive login with
restricted permissions.
Service Principal: There are two types of authentications available for service principals:
- Password-based authentication
- Certificate-based authentication
To create an Azure Service Principal, we can use either Azure Portal or PowerShell. In this tutorial, we will be using the password-based method to create a service principal that we can use in our automated process.
Create an Azure Service Principal:
Step #1: Sign in to the Azure Portal
Step #2: Search for the Azure Active Directory, then select App Registrations.
Step#3: Select New Registration, then enter the following information and click the Register button at the bottom. Add a meaningful name for the application, use Public client (mobile & desktop) and enter any URL.
Assign a role to Service Principal:
To read and write to Azure Data Lake, we need to use any one of the following roles. In our tutorial, we will grant Storage Blob Data Contributor to the myAutomatedTool:
- Storage Blob Data Contributor
- Storage Blob Data Owner
Step#1: Navigate to Subscription, then select Access control (IAM). Click Add, and then select Add role assignment.
Create a new application secret:
In our example, we will be using the password-based authentication. To create a new application secret for “myAutomatedTool”, follow the steps bellow:
- Select Azure Active Directory.
- From App registrations in Azure AD, select the application.
- Select Certificates & secrets.
- Select Client secrets -> New client secret.
- Provide a description of the secret, and a duration. When done, select Add.
Collect Tenant ID and App ID values for signing in:
Next we need to collect the Tenant ID and App ID to use with the automation tool for signing in. Follow the steps below:
- Select Azure Active Directory.
- From App registrations in Azure AD, select the application.
- Copy the Directory (Tenant) ID.
- Copy the Application ID.