Here, we will discuss Azure Interview Questions that interviewers ask in most company interviews for Azure System Administration job positions.
Table of Contents
1. What is Azure?
Azure is Microsoft’s cloud platform offering computing, storage, networking, and AI services. Key components include VMs, Azure AD, Blob Storage, AKS, and Cosmos DB.
It supports hybrid cloud, scalability, and security via RBAC, encryption, and compliance tools.
2. Azure Interview Topics
- Core Services:
- Compute: VMs, App Services, Functions (Serverless).
- Storage: Blob, Table, Cosmos DB (NoSQL).
- Networking: VNet, Load Balancer, Azure CDN.
- Security & Identity:
- Azure AD, RBAC, Key Vault, NSGs.
- DevOps & Monitoring:
- Azure DevOps (CI/CD), ARM Templates (IaC), Monitor & Log Analytics.
- Scalability & Cost:
- Auto-scaling, Reserved Instances, Cost Management.
- Hybrid & Advanced:
- Azure Arc, Kubernetes (AKS), AI/ML Services.

Azure Interview Questions
1. What is meant by Microsoft Azure and Azure diagnostics?
Microsoft Azure is a cloud computing interface that is implemented by Microsoft to benefit from cloud computing.
Azure diagnostics is an API based system that collects the data to diagnose the application which is constantly running. It tunes with the verbose monitoring by enabling roles of the cloud services.
2. What is meant by cloud computing?
Cloud Computing is a high-level abstraction procedure that focuses on business logic. This is a service delivered via the internet that aids you with computing services without placing much importance on the infrastructural needs
3. What is the scalability of cloud computing?
- Vertical scaling, where the configuration yields to increase the existing capacity of the machine. Just like expanding the size of the RAM from 4GB to 32GB.
- Horizontal Scaling, an aspect where the physical aspect is increased, like putting multiple machines to work instead of replacing the existing machine.
4. What are the advantages of cloud computing?
- The versatility of the system
- They are highly available.
- The system is capable of fault tolerance.
- The service allows you to pay as you go.
5. What is meant by PaaS, SaaS, and IaaS?
- Platform as a Service(PaaS) enables you to get a platform to deliver without directly giving authorization to the OS software.
- Software as a Service(SaaS) is devoid of platform infrastructure software that can be used without direct purchase.
- Infrastructure as a Service(IaaS) enables you to get the hardware from the provider as the desired service, which can be configured by the user.
6. Explain the different deployment models of the cloud.
- Private Cloud Deployment Model
- Public Cloud Deployment Model
- Hybrid Cloud Deployment Model
7. What are the main functions of the Azure Cloud Service?
The main functions of the Azure Cloud Service are:
- It is designed to host the running application and, at the same time, manage the background running application.
- The application of web processing is termed as “web role,” whereas the background processing is termed as the “worker role”.
8. State the purpose of the cloud configuration file.
- There is a primary .csfg file available with every cloud service. The main purpose of this file is
- They hold the main copy of certificates.
- They have the storage of user-defined settings.
- There are many instances in any service project.
9. Which services are used to manage the resources in Azure?
Azure resource manager is the infrastructure that is involved in managing deployments or deleting all the resources.
10. What do you mean by roles?
Roles in cloud management are often referred to be no servers that are linked to managing and balancing the platform
11. What are the different types of roles?
- Web Role
- VM Role
- Worker Role
12. What do you mean by a domain?
The interconnected and interlinked nodes that are often a measure undertaken by the organization are known as the domain. These relations are carried by only one point of the organization.
13. Explain the fault domain.
Fault domain is a logical working domain in which the underlying hardware is sharing a common power source and switch network.
This means that when VMs are created the Azure distributes the VM across the fault domain that limits the potential impact of hardware failure, power interruption, or outages of the network.
14. What do you mean by a BLOB, and what are its types?
BLOB is a Binary Large Object that is composed of any size and type of file. They are mainly of two types – page and block blob.
15. What is meant by the block blob and page BLOB?
A blob is a block that has a specific block ID. Each block in this block BLOB comprises 4MB, and the maximum size of this BLOB is limited to 200 GB.
Whereas the Page blob contains pages in which the data range is determined by the offsets.
The maximum limit is 1TB, where a single page is of the size 1 TB.
16. What is meant by the Dead Letter queue?
Messages are transferred to the Dead Letter queue in the following situation:
- When the delivery count exceeds for a message that is in a queue.
- When the expiry date of the message has crossed and the entire expired message is held in a queue.
- When there is an evaluation exception set by default, and the subscription is enabled with a dead
- letter filter.
17. How is the price of the Azure subscription placed?
- The Free Model
- The BYOL Scheme
- The Trial of the Free Software
- Usage-Based Fee
- Monthly Bills.
18. What are the sizes of the Azure VMs?
- The extra large computer has 8*1.6 GHz of Instance size, with instance storage of 2040 GB, CPU memory of 14 GB. The I/O performance is high.
- The large computer has 4*1.6 GHz of Instance size, with instance storage of 1000 GB, CPU memory of 7 GB. The I/O performance is high.
- The medium computer has 2*1.6 GHz of Instance size, with instance storage of 490 GB, CPU memory of 3.5 GB. The I/O performance is high.
- Small computer has 1.6 GHz of Instance size, with instance storage of 225 GB, CPU memory of 1.75 GB. The I/O performance is moderate.
- The extra small computer has 1.0 GHz of Instance size of 20 GB, with instance storage of 20 GB, CPU memory of 768MB. The I/O performance is low.
19. What is meant by table storage?
Table Storage is an interface that is capable of storing bulk amount of structured but non-relational data.
It is a service of the NoSQL data store that takes authenticated calls from either outside or inside the Azure cloud.
20. Differentiate between the repository and the powerhouse server?
Repository servers are those which are in lieu of the integrity, consistency, and uniformity whereas the powerhouse server governs the integration of different aspects of the database repository.
21. What is meant by enterprise warehousing?
Enterprise Warehousing is the phenomenon where the data is developed by the organization having access to a single point throughout the globe.
The warehousing enables the server to get linked to a single point with the assistance of periodic handling.
22. What do you mean by lookup transformation?
Lookup transformation aids in determining the source qualifier. It can be an active or passive lookup transformation. The process is yielding to get access to the relevant information or the data.
23. What is the primary ETL service in Azure?
- Ingest
- Control Flow
- Data flow
- Schedule
- Monitor
24. What data masking features are available in Azure?
Dynamic data masking plays various significant roles in data security. It restricts sensitive information to a specific set of users.
- It is available for Azure SQL Database, Azure SQL Managed Instance and Azure Synapse Analytics.
- It can be implemented as a security policy on all the SQL Databases across an Azure subscription.
- Users can control the level of masking as per their requirements.
- It only masks the query results for specific column values on which the data masking has been applied. It does not affect the actual stored data in the database.
25. What is Polybase?
Polybase optimizes the data ingestion into PDW and supports T-SQL. It enables developers to query external data transparently from supported data stores, irrespective of the storage architecture of the external data store.
Polybase can be used to:
- Query Data
- Import Data
- Export data
26. What is reserved capacity in Azure?
Microsoft provides an option of reserved capacity on Azure storage to optimize the Azure Storage costs. For the reservation period on Azure cloud, the reserved storage provides a fixed amount of capacity to customers.
It is available for Block Blobs and Azure Data Lake to Store Gen 2 data in a standard storage account.
27. Which service would you use to create a Data Warehouse in Azure?
Azure Synapse is a limitless analytics service that brings together Big Data analytics and enterprise data warehousing.
It gives users the freedom to query data on individual terms for using either serverless on-demand or provisioned resources at scale.
28. Explain the architecture of Azure Synapse Analytics
Azure Synapse Analytics is designed to process massive amounts of data with hundreds of millions of rows in a table. Azure Synapse Analytics processes complex queries and returns the query results within seconds, even with massive data, because Synapse SQL runs on a Massively Parallel Processing (MPP) architecture that distributes data processing across multiple nodes.
Applications connect to a control node that acts as a point of entry to the Synapse Analytics MPP engine. On receiving the Synapse SQL query, the control node breaks it down into an MPP-optimised format. Further, the individual operations are forwarded to the compute nodes that can perform the operations in parallel, resulting in much better query performance.
29. Difference between ADLS and Azure Synapse Analytics?
Both Azure Data Lake Storage Gen2 and Azure Synapse Analytics are highly scalable and can ingest and process vast amounts of data (on a petabyte scale).
ADLS Gen2 | Azure Synapse Analytics |
Compliant with regulatory standards such as HIPAA. | Optimized for processing structured data in a well-defined schema. |
Used for data exploration and analytics by data scientists and engineers. | Used for Business Analytics or disseminating data to business users. |
Built to work with Hadoop. | Built on SQL Server. |
No regulatory compliance. | Synapse SQL (an improved version of TSQL) is used for accessing data. |
USQL (combination of C# and TSQL) and Hadoop are used for accessing data. | USQL (a combination of C# and TSQL) and Hadoop are used for accessing data. |
Can handle data streaming using tools such as Azure Stream Analytics. | Built-in data pipelines and data streaming capabilities. |
30. What are Dedicated SQL Pools?
Dedicated SQL Pool is a collection of features that enable the implementation of a more traditional Enterprise Data Warehousing platform using Azure Synapse Analytics. The resources are measured in Data Warehousing Units (DWU) that are provisioned using Synapse SQL.
A dedicated SQL pool uses columnar storage and relational tables to store data, improving query performance and reducing the required amount of storage.
31. How do you capture streaming data in Azure?
Azure provides a dedicated analytics service called Azure Stream Analytics that provides a simple SQL-based language that is Stream Analytics Query Language. It allows developers to extend the ability of query language by defining additional ML (Machine Learning) functions.
Azure Stream Analytics can process a huge amount of data on a scale of over a million events per second and also deliver the results with ultra-low latency.
32. What are the various windowing functions in Azure Stream Analytics?
A window in Azure Stream Analytics refers to a block of time-stamped event data that enables users to perform various statistical operations on the event data.
- Tumbling Window
- Hopping window
- sliding window
- session window
33. What are the different types of storage in Azure?
There are five types of storage in Azure:
- Azure Blobs: Blob stands for a large binary object. It can support all kinds of files including, text files, videos, images, documents, binary data etc.
- Azure Queues: Azure Queues is a cloud-based messaging store for establishing and brokering communication between various applications and components.
- Azure Files: It is an organized way of storing data in the cloud. Azure Files has one main advantage over Azure Blobs, it allows organizing the data in a folder structure, and it is SMB compliant, i.e. it can be used as a file share.
- Azure Disks: It is used as a storage solution for Azure VMs (Virtual Machines).
- Azure Tables: A NoSQL storage solution for storing structured data that does not meet the standard relational database schema.
34. Explore Azure Storage Explorer and its uses.
Azure Storage Explorer is a versatile standalone application available for Windows, Mac OS and Linux to manage Azure Storage from any platform. It provides access to multiple Azure data stores such as ADLS Gen2, Cosmos DB, Blobs, Queues, Tables, etc., with an easy-to-navigate GUI.
One of the key features of Azure Storage Explorer is that it allows users to work even when they are disconnected from the Azure cloud service by attaching local emulators.
35. What is Azure Databricks, and how is it different from regular data bricks?
Azure Databricks is the Azure implementation of Apache Spark, which is an open-source big data processing platform. In the data lifecycle, Azure Databricks lies in the data preparation or processing stage.
First of all, data is ingested in Azure using Data Factory and stored in permanent storage (such as ADLS Gen2 or Blob Storage). Further, data is processed using Machine Learning (ML) in Databricks, and then extracted insights are loaded into the Analysis Services in Azure, like Azure Synapse Analytics or Cosmos DB. Finally, insights are visualized and presented to the end-users with the help of Analytical reporting tools like Power BI.
36. What is Azure table storage?
Azure table storage is a storage service optimized for storing structured data. In structured data, table entities are the basic units of data equivalent to rows in a relational database table. Each entity represents a key value pair, and the properties for table entities are as follows:
- PartitionKey
- RowKey
- Timestamp
37. What is serverless database computing in Azure?
Serverless computing follows the stateless code nature, i.e. the code does not require any infrastructure. Users have to pay for the compute resources used by the code during a short period while executing the code. It is very cost-effective, and users only need to pay for the resources used.
38. What Data security options are available in Azure SQL DB?
- Azure SQL firewall Rules
- Azure SQL Always Encrypted
- Azure SQL Transparent Data Encryption
- Azure SQL database auditing
39. What is data redundancy in Azure?
Azure constantly retains several copies of data to provide high levels of data availability. Some data redundancy solutions are accessible to clients in Azure, depending on the criticality and duration necessary to provide access to the replica.
- Locally Redundant Storage
- Zone Redundant storage
- Geo Redundant storage
- Read access geo Redundant storage
40. What are some ways to ingest data from on-premise storage to Azure?
While choosing a data transfer solution, the main factors to consider are:
- Data Size
- Data Transfer Frequency (One-time or Periodic)
- Network Bandwidth
- Offline transfer
- Network transfer: Over a network connection, data transfer can be performed in the following ways:
- Graphical interface
- Programming interface
- On-premises devices
- Managed data factory pipeline.
41. What is the best way to migrate data from an on-premise database to Azure?
- SQL Server stretch database
- azure sql database
- SQL Server managed instance
- SQL Server on a virtual machine.
42. What are multi-model databases?
Azure Cosmos DB is Microsoft’s premier NoSQL service offering on Azure. It is the first globally distributed, multi-model database offered on the cloud by any vendor.
It is used to store data in various data storage models such as Key-value pair, document-based, graph-based, column-family based, etc. Low latency, consistency, global distribution and automatic indexing features are the same no matter what data model the customer chooses.
43. What is the Azure Cosmos DB synthetic partition key?
Azure Cosmos DB is crucial to select a good partition key that can distribute the data evenly across multiple partitions. We can create a Synthetic partition key when there is no right column with properly distributed values.
The three ways to create a synthetic partition key are:
- Concatenate properties
- Random suffix
- pre-calculated suffix
44. What are the various consistency models available in Cosmos DB?
- Strong
- Bounded staleness
- Session
- Consistent prefix
- Eventual
45. How is data security implemented in ADLS Gen2?
ADLS Gen2 has a multi-layered security model. They are –
- Authentication
- Access control
- Network isolation
- Data protection
- Advanced threat protection
- Auditing
46. What are pipelines and activities in Azure?
ADF activities are grouped into three parts:
- Data movement activities
- Data transformation activities
- control activities
47. How do you manually execute the Data Factory pipeline?
A Data Factory pipeline can run with Manual or On-demand execution. To execute the pipeline manually or programmatically, we can use the PowerShell command.
48. Explain Control Flow vs Data Flow in Azure Data Factory
- Control Flow is an activity that affects the path of execution of the Data Factory pipeline. For example, an activity that creates a loop if conditions are met.
- Data Flow Transformations are used when we need to transform the input data, for example, Join or Conditional Split.
Control Flow Activity | Data Flow Transformation |
It affects the execution sequence or path of the pipeline. | Transforms the ingested data. |
Can be recursive. | Non-recursive. |
No source/sink. | A source and a sink are required. |
Implemented at the pipeline level. | Implemented at the activity level. |
49. Name the data flow partitioning schemes in Azure
Partitioning Scheme is a way to optimize the performance of Data Flow. This partitioning scheme setting can be accessed on the Optimize tab of the configuration panel for the Data Flow Activity.
- ‘Use current partitioning’ is the default setting recommended by Microsoft in most cases that uses native partitioning schemes.
- The ‘Single Partition’ option is used when users want to output to a single destination, for example, a single file in ADLS Gen2.
- Round robin.
- Hash
- Dynamic range
- Fixed range
- Key
50. What is the trigger execution in Azure Data Factory?
In Azure Data Factory, pipelines can be triggered or automated.
- Schedule trigger
- Tumbling window trigger
- Event-based trigger
51. What are mapping Dataflows?
Microsoft provides Mapping Data Flows that do not require writing code for a more straightforward data integration experience than Data Factory Pipelines. It is a visual way to design data transformation flows. The data flow becomes Azure Data Factory (ADF) activities and gets executed as a part of the ADF pipelines.
52. What is the role of an Azure Data Engineer?
Azure Data Engineers are responsible for integrating, transforming, operating, and consolidating data from structured or unstructured data systems. They also build, implement, and support Business Intelligence solutions by applying knowledge of technologies, methodologies, processes, tools, and applications.
In short, they handle all the data operations stored in the cloud, such as Azure.
53. What is Azure Data Factory?
Cloud-based integration service that allows creating data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
Using Azure Data Factory, you can create and schedule data-driven workflows(called pipelines) that can ingest data from disparate data stores.
54. What is the integration runtime?
The integration runtime is the compute infrastructure that Azure Data Factory uses to provide the following data integration capabilities across various network environments.
- Azure integration runtime
- self-hosted integration runtime
- Azure SSIS integration runtime
55. What is the limit on the number of integration runtimes?
There is no hard limit on the number of integration runtime instances you can have in a data factory.
56. What is the difference between Azure Data Lake and Azure Data Warehouse?
Data Warehouse is a traditional way of storing data that is still widely used.
Data Lake is complementary to Data Warehouse i.e if you have your data at a data lake that can be stored in the data warehouse as well but there are certain rules that need to be followed.
DATA LAKE | DATA WAREHOUSE |
Schema on write(data is written in a Structured form or in a particular schema). | Maybe sourced to the data lake. |
Schema on read (not structured, you can define your schema in n number of ways). | Schema on write(data is written in Structured form or in a particular schema). |
One language to process data of any format(USQL). | It uses SQL. |
57. What is blob storage in Azure?
Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or binary data. You can use Blob Storage to expose data publicly to the world or to store application data privately.
Common uses of Blob Storage include:
- Serving images or documents directly to a browser
- Storing files for distributed access
- Streaming video and audio
- Storing data for backup and disaster recovery, and archiving
- Storing data for analysis by an on-premises or Azure-hosted service
58. What are the steps for creating an ETL process in Azure Data Factory?
Steps for Creating ETL in Azure Data Factory:
- Create a Linked Service for the source data store, which is a SQL Server Database
- Assume that we have a cars dataset
- Create a Linked Service for the destination data store, which is Azure Data Lake Store
- Create a dataset for Data Saving
- Create the pipeline and add a copy activity
- Schedule the pipeline by adding a trigger
59. What are the top-level concepts of Azure Data Factory?
- Pipeline
- Activities
- Datasets
- Linked services
60. How can I schedule a pipeline?
You can use the scheduler trigger or time window trigger to schedule a pipeline in Azure Data Factory.
61. Explain the two levels of security in ADLS Gen2.
Two levels of security in ADLS Gen2 are:
- Role-based access control(RBAC)
- Access Control Lists
62. Define reserved capacity in Azure.
Microsoft has included a reserved capacity option in Azure storage to optimize costs. The reserved storage gives its customers a fixed amount of capacity during the reservation period on the Azure cloud.
63. What is the linked service in the Azure data factory?
Linked service is one of the components in the Azure data factory, which is used to make a connection hence, to connect to any of the data sources you have to first create the linked service based upon the type of data source.
64. What is the dataset in the Azure Data Factory?
Dataset needs to read-write data to any data source using the ADF. Dataset is the representation of the type of data holds by the data source.
65. What are the parameters in the ADF?
- Linked service parameters
- Dataset parameters
- Pipeline parameters
- Global parameters
66. How to check the history of a pipeline execution run in ADF?
In the Azure Data Factory, we can check the pipeline execution run by going to the monitor tab. There we can search all the pipelines run history. You can search the history based on various parameters like the name of the pipeline, time duration, the status of execution run (Pass/fail) and etc.
67. Why will you use the data flow from the Azure data factory?
Data flow is used for a no-code transformation.
For example, when you are doing any ETL operation that you want to do a couple of transformations and put some logic on your input data. You may not have found it comfortable to type the query, or when you are using the files as input, then in that case, you cannot write the query at all. Hence, data flow will come as a Savior in this situation.
Using the data flow, you can just do drag and drop and write almost all your business logic without writing any code. Behind the scenes, the data flow gets converted into the Spark code, and it will run on the cluster.
68. What is the foreach activity in the data factory?
In the Azure data factory, whenever you have to do some of the work repetitively then probably you will be going to use the foreach activity.
In the foreach activity, you pass an array, and the foreach loop will run for all the items of this array.
As of now, nested foreach activity is not allowed, which means you cannot have one foreach activity within another for each activity.
69. What is the get metadata activity in the Azure data factory?
In the Azure data factory get metadata activity is used to get the information about the files.
70. What is the custom activity in the Azure data factory?
Custom activity is used in the Azure data factory to execute a Python or a PowerShell script.
Assume that you have some code that is written in a Python or PowerShell script, and you want to execute it as part of your pipeline. Then you can use a custom activity that will help you execute the code.
71. How can you connect the Azure data factory with the Azure Databricks?
To connect to Azure databricks we have to create a linked service that will point to the Azure databricks account. Next in the pipeline, you will be going to use the notebook activity there you will provide the linked service created for Databricks.
You will also be providing the notebook path available in the Azure Databricks workspace.
That’s how you can use Databricks from the Data Factory.
72. Is it possible to connect a MongoDB database from the Azure data factory?
Yes, it is possible to connect MongoDB from the Azure data factory. You have to provide the proper connection information about the MongoDB server.
In case if this MongoDB server is residing outside the Azure workspace, then you probably have to create a self-hosted integration runtime, through which you can connect to the MongoDB server.
73. Can Azure Data Factory directly connect to the different Azure services?
Yes, the Azure data factory can connect with various other Azure services like Azure blob storage, Azure functions, logic app. However, for all of them, they have to provide the proper roles using the RBAC.
74. How to connect Azure Data Factory to GitHub?
Azure Data Factory can connect to GitHub using the GIT integration. You are probably using Azure DevOps, which has a git repo. We can configure the GIT repository path in the Azure data factory.
So that all the changes we make in the Azure data factory get automatically synced with the GitHub repository
75. How can you move your changes from one environment to another environment for the Azure data factory?
We can migrate the code from one environment to another environment for the Azure data factory using the ARM template. ARM template is the JSON representation of the pipeline that we have created.
76. What is Synapse SQL?
Synapse SQL is the ability to do T-SQL based analytics in the Synapse workspace. Synapse SQL has two consumption models: dedicated and serverless.
- For the dedicated model, use dedicated SQL pools. A workspace can have any number of these pools.
- To use the serverless model, use the serverless SQL pools. Every workspace has one of these pools.
- Inside Synapse Studio, you can work with SQL pools by running SQL scripts.
77. How can you use Apache Spark through Azure Synapse Analytics?
In Azure analytics, you can run the spark code either using the notebook or you can create a job that will run the spark code.
For running the Spark code you need a Spark pool which is nothing just a cluster of the nodes having a spark installed on it.
78. What are the different types of Synapse SQL pools available?
Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and Big Data analytics.
Dedicated SQL pool (formerly SQL DW) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.
There are two types of Synapse SQL pool –
- Serverless SQL pool
- Dedicated SQL pool
79. What is Delta Lake?
Delta Lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and durability) transactions to Apache Spark and big data workloads.
The current version of Delta Lake included with Azure Synapse has language support for Scala, PySpark, and .NET.
80. What is Azure Synapse Runtime?
Apache Spark pools in Azure Synapse use runtimes to tie together essential component versions, Azure Synapse optimizations, packages, and connectors with a specific Apache Spark version.
These runtimes will be upgraded periodically to include new improvements, features, and patches.
These runtimes have the following advantages:
- Faster session startup times
- Tested compatibility with specific Apache Spark versions
- Access to popular, compatible connectors and open-source packages
81. Can you run a machine learning algorithm using Azure Synapse Analytics?
Yes, it is possible to run the machine learning algorithm using Azure synapse Analytics. In the Azure synapse analytics, we have an Apache Spark and there we can write the machine learning code and which can be executed on the Spark cluster.
82. What is Azure Databricks?
Databricks is the organization that provides the Spark-based Cluster in the cloud.
83. What are the two different types of execution modes provided by Databricks?
You can run the Spark code in two modes: interactive mode or job-scheduled mode.
In the interactive mode, you can run the code line by line and see the output. In the job mode, it will run all the code together, and then you will see the output.
84. What are the two different types of clusters provided by Databricks?
Two different types of clusters provided by Databricks are the interactive cluster and the job cluster.
To run the interactive notebook, you will be going to use an interactive cluster, and to run the job, we will use the job cluster.
85. What is Azure Data Factory used for?
Azure Data Factory is the data orchestration service provided by Microsoft Azure. ADF is used for the following use cases mainly :
- Data migration from one data source to another
- On-premises to cloud data migration
- ETL purpose
- Automated the data flow.
There is huge data laid out there and when you want to move the data from one location to another in automated way within the cloud or from on-premises to the azure cloud azure data factory is the best service we have available.
86. What are the main components of the Azure Data Factory?
- Pipeline
- Integration Runtime
- Activities
- DataSet
- Linked Services
- Triggers
87. What is the pipeline in the ADF?
A pipeline is the set of activities specified to run in defined sequence. For achieving any task in the azure data factory we create a pipeline which contains the various types of activity as required for fulfilling the business purpose.
Every pipeline must have a valid name and an optional list of parameters.
88. What is the data source in the Azure Data Factory?
Azure Data Factory is the source or destination system that contains the data to be used or operated upon. Data could be of any type, like text, binary, JSON, CSV type files, or maybe audio, video, image files, or maybe a proper database.
Data source examples are: Azure blob storage, azure data lake storage, any database like azure sql database, mysql db, postgres and etc. There are 80+ different data source connector provided by the azure data factory to get in/out data from the data source.
89. What is the integration runtime in Azure Data Factory?
Azure Data Factory is the powerhouse of the Azure data pipeline. Integration runtime is also known as IR, is the one that provides the computer resources for the data transfer activities and for dispatching the data transfer activities in Azure Data Factory.
Integration runtime is the heart of the Azure Data Factory.
In Azure Data Factory, the pipeline is made up of activities. An activity represents some action that needs to be performed. This action could be a data transfer that acquired some execution, or it will be a dispatch action. Integration runtime provides the area where this activity can execute.
90. What are the different types of integration runtime?
There are 3 types of integration runtime available in the Azure Data Factory:
- Azure IR
- self hosted
- Azure SSIS
91. What is the main advantage of the AutoResolveIntegrationRuntime?
Advantage of AutoResolveIntegrationRuntime is that it will automatically try to run the activities in the same region if possible or close to the region of the sink data source. This can improve the performance a lot.
92. What are Self-Hosted Integration Runtimes in Azure Data Factory?
Self hosted integration runtime as the name suggested, is the IR managed by you itself rather than azure. This will make you responsible for the installation, configuration, maintenance, installing updates and scaling.
Now, as you host the IR, it can access the on-premises network as well.
93. What are the Azure-SSIS Integration Runtimes?
Azure-SSIS integration runtimes are actually the set of vm running the SQL Server Integration Services (SSIS) engine, managed by Microsoft.
Again the responsibility of the installation, maintenance, are of azure only. Azure Data Factory uses azure-SSIS integration runtime for executing SSIS packages.
94. How to install Self-Hosted Integration Runtimes in Azure Data Factory?
- Create a self-hosted integration runtime by simply giving general information like name and description.
- Create Azure VM (If u already have then you can skip this step)
- Download the integration runtime software on an Azure virtual machine. and install it.
- Copy the autogenerated key from step 1 and paste it newly installed integration runtime on the Azure VM.
95. What is the use of the lookup activity in Azure Data Factory?
Lookup activity used to pull the data from source dataset and keep it as the output of the activity. Output of the lookup activity generally used further in the pipeline for making some decision, configuration accordingly.
96. What do you mean by variables in the Azure Data Factory?
Variables are the adf pipeline provides the functionality to temporarily hold the values. They are used for similar reasons as we use variables in the programming language. They are available inside the pipeline, and there is set inside the pipeline.
Set Variable and Append Variable are two types of activities used for setting or manipulating the variable’s values.
There are two types of variables:
- System variable
- User Variables
97. What are the ways to create the Linked Service?
There are two ways to create the Linked Service:
- Using the Azure Portal
- ARM template way
98. Can we debug the pipeline?
Debugging is one of the key features for any developer. To solve and test issues in the code, developers use the debug feature in general.
Azure Data Factory also provides a debugging feature.
99. What is the breakpoint in the ADF pipeline?
Debug part of the pipeline using the break points: While doing if you want to check the pipeline up to certain activity you can do it by using the breakpoints.
For example you have 3 activities in the pipeline and you want to debug up to 2nd activity only. You can do this by putting the break point at the 2nd activity.
100. What are the different pricing tiers of Azure Databricks available?
There are two tiers provided by Azure for the databricks service :
- Standard Tier
- Premium Tier
101. How many different types of cluster modes are available in Azure Databricks?
- Standard cluster
- High concurrency cluster
- Single-node cluster
102. How can you connect your ADB cluster to your favorite IDE (Eclipse, IntelliJ, PyCharm, RStudio, Visual Studio)?
Databricks Connect is the way to connect the Databricks cluster to a local IDE on your local machine.
You need to install the databricks-connect client and then need the configuration details like ADB URL, token, etc.
Using all these,e you can configure the local IDE to run and debug the code on the cluster.
103. How typical Azure Databricks CI/CD pipeline consist?
- Continuous integration
- Continuous delivery
104. Explain Azure cloud services.
Azure Cloud Services is a PaaS (platform-as-a-service) product that intends to provide robust, efficient, and cost-effective applications. Azure Cloud Services are hosted on virtual machines. By launching a cloud service instance, Azure cloud services can be utilized to implement multi-tier web-based apps in Azure.
There are two types of Azure cloud services –
- Web role
- Worker role