#5 Succinct Introduction to Azure Synapse Analytics

Think of Azure Synapse Analytics as an Azure SQL Data Warehouse on steroids. It is a limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics, all into a single service.

To better understand Azure Synapse, let’s take a step back and start by briefly covering what Azure SQL Data Warehouse (SQL DW) is. SQL DW was a massively parallel processing (MPP) cloud-based, scale-out relational database, designed to process large volumes of data within the Microsoft Azure cloud platform. Over the last couple of years, Microsoft has added features that have evolved SQL DW into a more powerful and unique data analytics solution known as Azure Synapse Analytics. 

One of the key capabilities of Azure Synapse Analytics is that it combines data warehousing and big data analytics into a single service. Formerly we used SQL DW to access structured data and Data Lake for big data analysis; now Synapse merges both under a single bracket. Azure Synapse Analytics is integrated with numerous other Azure services, such as Azure Data Catalog, Azure Databricks, Azure HDInsight, Azure Machine Learning and Power BI.

The central data storage of the Synapse Workspace is based on an Azure Data Lake Storage Gen2. On top of the blob storage, you can choose between two different kinds of analytic runtimes:

  • SQL-based runtime using a dedicated or serverless approach
  • Apache Spark runtime 

Azure Synapse Analytics comes with an integrated orchestration engine that is identical to Azure Data Factory to create data pipelines and rich data transformation capabilities within the Synapse workspace itself.

Another key aspect is the security features in Azure Synapse Analytics:

  • Already compliant with the industry-leading compliances like ISO, DISA, HIPPA etc
  • Supports AD authentication, SQL authentication and Multi-factor authentication
  • Supports data encryption at rest and in transit as well as data classification for sensitive data 
  • Supports row-level, column-level, as well as object-level security along with dynamic data masking. 
  • Supports network-level security with a virtual network as well as firewalls 

Create an Azure Synapse Analytics Workspace

Navigate to the Azure Portal and search for Azure Synapse Analytics. You will land on the homepage of the service. Click on the ‘Add’ button (or if that is your first workspace you can also use the Create Synapse workspace).

Once the wizard opens in the Basic section, enter your preferred Subscription, Resource Group, Workspace name and Region. Next is the “Select Data Lake Storage Gen2” section. Data in Azure Synapse can be accessed from different repositories and Synapse can natively read data from Azure Data Lake Storage Gen2 – for that purpose we need to have an account as well as a file system. 

If you already have an account you can manually specify the URL. To create a new Azure Data Lake Storage Gen2 account, click on the Create New button under Account Name. Check the box titled “Assigned myself the Storage Blob Data Contributor role” as Contributor level access is required by different Synapse features to access the data.

In the Security tab, under the SQL administrator credentials, you need to provide the administration credential that would be used to connect to SQL pools.

Next is the Network tab. You can enable a Synapse-managed virtual network by selecting “Enable managed virtual network” – this will ensure that the traffic uses the Azure internal network only (Note: This feature has an additional cost). Here you can also specify which IP addresses can connect to the workspace and you have the option to select “Allow connection to all IP addresses”.

Optionally you can add any Tags to add metadata to your workspace and, voila, you have provided all the required information to create your workspace. Review all the details and before clicking ‘Create’, note that when creating a Synapse workspace a SQL On-demand pool is created by default and the estimated cost for it is £3.73/TB of data scanned (cost as of February 2021).

Creating the Azure Synapse Workspace should take a couple of minutes. Click on the ‘Go to resource’ button to open the workspace and you will land on the dashboard page where you can find different properties, endpoint information, create new pools, restart your credentials, change Firewall settings, start Synapse Studio and others.

Azure Synapse Studio

Now that you have a workspace, let’s jump onto Azure Synapse Studio. There are two ways you can open the tool. You can either follow the link from your workspace dashboard page; or you can go to https://web.azuresynapse.net and sign in to your workspace.

On the dashboard page, you can see four categories: Ingest, Explore & Analyse, Visualize and Learn.

If you are new to Synapse, I strongly recommend you check the Learn section. It provides samples and tutorials that will help you kick start your Azure Synapse adventure.

Go back to  the home dashboard; by clicking ‘New’, you will see a dropdown menu that gives you the option to create various artefacts like SQL Scripts, Notebooks, Data Flow, Spark Job or Pipelines. You can also select ‘Import’ if you want to copy data from an external data repository.

In the menu on the left, you will see  six buttons. Below the Home icon, there is  the Data section where you can create databases, tables and other database objects, as well as create linked databases to access data from external data repositories.

Next is the Developer tab which allows you to create new artefacts such as SQL scripts, Notebook, etc.

The fourth tab is the Integration section. You can create data pipelines, jump directly to the Copy tool which allows you to create data pipelines step by step using a wizard, or browse a gallery of samples or previously created data pipelines.

Next in the list is the Monitoring tab. In addition to everything so far, Azure Synapse Studio also works as an administrative console. Here you can see a history of all the activities taking place in the workspace and identify which ones are active now. You can monitor pipelines, triggers, and integration runtimes as well as Spark and SQL activities.

Finally, we have the Tools tab.

And finally…

Without a doubt, Azure Synapse Analytics is a game-changer in Data processing and Analytics. I hope that with this post I have sparked your interest and have covered the basics you need to go and explore the service with limitless possibilities.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s