#5 Succinct Introduction to Azure Synapse Analytics

Think of Azure Synapse Analytics as an Azure SQL Data Warehouse on steroids. It is a limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics, all into a single service.

To better understand Azure Synapse, let’s take a step back and start by briefly covering what Azure SQL Data Warehouse (SQL DW) is. SQL DW was a massively parallel processing (MPP) cloud-based, scale-out relational database, designed to process large volumes of data within the Microsoft Azure cloud platform. Over the last couple of years, Microsoft has added features that have evolved SQL DW into a more powerful and unique data analytics solution known as Azure Synapse Analytics. 

One of the key capabilities of Azure Synapse Analytics is that it combines data warehousing and big data analytics into a single service. Formerly we used SQL DW to access structured data and Data Lake for big data analysis; now Synapse merges both under a single bracket. Azure Synapse Analytics is integrated with numerous other Azure services, such as Azure Data Catalog, Azure Databricks, Azure HDInsight, Azure Machine Learning and Power BI.

The central data storage of the Synapse Workspace is based on an Azure Data Lake Storage Gen2. On top of the blob storage, you can choose between two different kinds of analytic runtimes:

  • SQL-based runtime using a dedicated or serverless approach
  • Apache Spark runtime 

Azure Synapse Analytics comes with an integrated orchestration engine that is identical to Azure Data Factory to create data pipelines and rich data transformation capabilities within the Synapse workspace itself.

Another key aspect is the security features in Azure Synapse Analytics:

  • Already compliant with the industry-leading compliances like ISO, DISA, HIPPA etc
  • Supports AD authentication, SQL authentication and Multi-factor authentication
  • Supports data encryption at rest and in transit as well as data classification for sensitive data 
  • Supports row-level, column-level, as well as object-level security along with dynamic data masking. 
  • Supports network-level security with a virtual network as well as firewalls 

Create an Azure Synapse Analytics Workspace

Navigate to the Azure Portal and search for Azure Synapse Analytics. You will land on the homepage of the service. Click on the ‘Add’ button (or if that is your first workspace you can also use the Create Synapse workspace).

Once the wizard opens in the Basic section, enter your preferred Subscription, Resource Group, Workspace name and Region. Next is the “Select Data Lake Storage Gen2” section. Data in Azure Synapse can be accessed from different repositories and Synapse can natively read data from Azure Data Lake Storage Gen2 – for that purpose we need to have an account as well as a file system. 

If you already have an account you can manually specify the URL. To create a new Azure Data Lake Storage Gen2 account, click on the Create New button under Account Name. Check the box titled “Assigned myself the Storage Blob Data Contributor role” as Contributor level access is required by different Synapse features to access the data.

In the Security tab, under the SQL administrator credentials, you need to provide the administration credential that would be used to connect to SQL pools.

Next is the Network tab. You can enable a Synapse-managed virtual network by selecting “Enable managed virtual network” – this will ensure that the traffic uses the Azure internal network only (Note: This feature has an additional cost). Here you can also specify which IP addresses can connect to the workspace and you have the option to select “Allow connection to all IP addresses”.

Optionally you can add any Tags to add metadata to your workspace and, voila, you have provided all the required information to create your workspace. Review all the details and before clicking ‘Create’, note that when creating a Synapse workspace a SQL On-demand pool is created by default and the estimated cost for it is £3.73/TB of data scanned (cost as of February 2021).

Creating the Azure Synapse Workspace should take a couple of minutes. Click on the ‘Go to resource’ button to open the workspace and you will land on the dashboard page where you can find different properties, endpoint information, create new pools, restart your credentials, change Firewall settings, start Synapse Studio and others.

Azure Synapse Studio

Now that you have a workspace, let’s jump onto Azure Synapse Studio. There are two ways you can open the tool. You can either follow the link from your workspace dashboard page; or you can go to https://web.azuresynapse.net and sign in to your workspace.

On the dashboard page, you can see four categories: Ingest, Explore & Analyse, Visualize and Learn.

If you are new to Synapse, I strongly recommend you check the Learn section. It provides samples and tutorials that will help you kick start your Azure Synapse adventure.

Go back to  the home dashboard; by clicking ‘New’, you will see a dropdown menu that gives you the option to create various artefacts like SQL Scripts, Notebooks, Data Flow, Spark Job or Pipelines. You can also select ‘Import’ if you want to copy data from an external data repository.

In the menu on the left, you will see  six buttons. Below the Home icon, there is  the Data section where you can create databases, tables and other database objects, as well as create linked databases to access data from external data repositories.

Next is the Developer tab which allows you to create new artefacts such as SQL scripts, Notebook, etc.

The fourth tab is the Integration section. You can create data pipelines, jump directly to the Copy tool which allows you to create data pipelines step by step using a wizard, or browse a gallery of samples or previously created data pipelines.

Next in the list is the Monitoring tab. In addition to everything so far, Azure Synapse Studio also works as an administrative console. Here you can see a history of all the activities taking place in the workspace and identify which ones are active now. You can monitor pipelines, triggers, and integration runtimes as well as Spark and SQL activities.

Finally, we have the Tools tab.

And finally…

Without a doubt, Azure Synapse Analytics is a game-changer in Data processing and Analytics. I hope that with this post I have sparked your interest and have covered the basics you need to go and explore the service with limitless possibilities.

#4 Getting Started with Dbatools

Being efficient is an integral part of being an effective DBA. In this post, I will introduce you to a tool that can significantly improve your work as SQL DBA -dabtools.

Dbatools is an open-source PowerShell module for managing SQL Server. Started by Chrissy LeMaire(B|T) back in 2014 as a project to migrate SharePoint SQL Server instances, nowadays the module comes with almost 600 commands for automating SQL Server from the command line.  One of the things I love most about dbatools is that the module is community-driven and there are more than 190 contributors- if you get stuck or have any questions, the amazing folks from the community are always happy to help.

1. Installing Dbatools

Getting started with dbatools is quick and easy. First, you need to install the module- open PowerShell as Administrator and run the following command:

Install-Module dbatools 

If you don’t have administrative rights on the machine or you want to instal the module only for your current user you can use:

Install-Module dbatools -Scope CurrentUser 

Or if you don’t have internet access, you can download the module and then copy it to the machine with no internet. More information about offline install is available on dbatools.io

2. Sp_Configure with dbatools

Almost every DBA uses sp_configure to view server settings or to make config changes – thanks to dbatools we can use sp_configure capabilities via the command line across multiple servers simultaneously.

To get information about server-level system configuration you can use Get-DbaSpConfigure. The command will return all system configuration information. You can also add parameters to obtain a narrow view, for example specifying –Name will return only that specific configuration.

#Get all Configuration properties
Get-DbaSpConfigure -SqlInstance $Instance1 
 
#Get the value for XPCmdShellEnabled
Get-DbaSpConfigure -SqlInstance $Instance1 -Name XPCmdShellEnabled

The Set-DbaSpConfigure command helps you make changes to the server level system configuration.

#Set the value for XPCmdShellEnabled
Set-DbaSpConfigure -SqlInstance $Instance1 -Name XPCmdShellEnabled -Value 1

Once you have made some configuration changes you may want to export the advanced sp_configure global configuration to a SQL file:

#Export the configuration to a .sql file (useful for documentation purposes)
Export-DbaSpConfigure  -SqlInstance $Instance1 -Path C:\temp

The real power of PowerShell is that you can work with multiple instances. Here is an example of how you can compare the configuration information of two instances:

#Compare configurations of two servers 
$SPConfigure_Instance1 = Get-DbaSpConfigure -SqlInstance $Instance1
$SPConfigure_Instance2 = Get-DbaSpConfigure -SqlInstance $Instance2
$propcompare = foreach ($prop in $SPConfigure_Instance1) {
    [pscustomobject]@{
        Config                  = $prop.DisplayName
        'Windows Node1 setting' = $prop.RunningValue
        'Windows Node2 Setting' = $SPConfigure_Instance2 | Where DisplayName -eq $prop.DisplayName | Select -ExpandProperty RunningValue
    }
}

Copying sp_configure settings from one instance to another – something easily achievable with dbatools:

#Copy configuration from one Instance to the other
Import-DbaSpConfigure -Source $Instance1 -Destination $Instance2

You can also see and change the  sp_configure settings of multiple servers simultaneously.

#Configure multiple Instances simultaneously
$AllInstances = $Instance1 , $Instance2
Get-DbaSpConfigure -SqlInstance $AllInstances -Name XPCmdShellEnabled
Set-DbaSpConfigure -SqlInstance $AllInstances -Name XPCmdShellEnabled -Value 0

3. Backups with Dbatools

You are only as good as your last backup, so making regular backups is a keystone of any business continuity and disaster recovery strategy. Backups are among the most performed tasks by every DBA- luckily dbatools can help you to quickly and easily perform and test backups.

The command for backing up one or multiple databases is Backup-DbaDatabase. If you want to do a full backup on all databases on an instance and place the backup files to a certain location, you can run:

#Full backup of all databases
Backup-DbaDatabase -SqlInstance $Instance1 -Path C:\temp\Backup -Type Full

To check how long it will take to complete the backup use Measure-DbaBackupThroughput. The command uses the information stored in MSDB and returns minimum and maximum throughput, average backup duration, first and last backup date, database backups count and average backup size.

Measure-DbaBackupThroughput -SqlInstance $Instance1

Get-DbaBackuphistory is used to get details about all backups belonging to a particular database or instance in SQL Server with minimal effort. The output includes the backup type, size, backup device type duration, start and end time. 

#Backup history details for all databases on Instance1:
Get-DbaDbBackupHistory -SqlInstance $Instance1

Before restoring a database, it is a good practice to check that we have enough space on the target instance. Once we know we have the required space we can perform the restore. Here is an example of how you can do this with dbatools:

#Check we have enough space to restore AdventureWorks2019 from Instancce1 to Instance2
Measure-DbaDiskSpaceRequirement -Source $Instance1 -Destination $Instance2 -Database AdventureWorks2019

#Restore databases on Instance2 from file location
Restore-DbaDatabase -SqlInstance $Instance2 -Path C:\temp\Backup -WithReplace

Having a backup that cannot be restored is like having lifeboats that don’t float- so testing your backup is essential. With dbatools you can do this easily – Test-DbaLastBackup tests the last set of full backups in a SQL instance via performing the following steps:

  • It gathers the last full backup information for the specified database. If we do not specify a particular database, it gathers information for all databases
  • It restores the last full backup of a database with a new name in the specified server. If we do not specify any instance, it restores database SQL Server on the source instance. By default, the new database name is in the format of dbatools-testrestore-$databaseNameWe can specify a custom database name as well
  • It also renames the logical and physical file of a new database. It ensures no conflict with database filenames
  • It performs DBCC CHECKTABLE to Checks the integrity of all the pages and structures for all tables in a new database
  • In the last step, it drops the newly restored database
Test-DbaLastBackup -SqlInstance $Instance2

4. Logins with Dbatools

Ensuring that users and applications can successfully connect to the databases they require is essential – with dbatools you can simplify the work that is required to administer instance logins and database users.

You can create new SQL /or Windows authentication/ login. You can also check all logins that already exist on an instance.

#Returns info for all the logins from Instance1 using NT authentication
Get-DbaLogin -SqlInstance $Instance1
 
#Create a new SQL login named Lina1 on Instance1
New-DbaLogin -SqlInstance $Instance1 -Login Lina1

You may also want to get the users of a Database and then add a new user. 

#Gets the users for AdventureWorks2019 
Get-DbaDbUser -SqlInstance $Instance1 -Database AdventureWorks2019 | Format-Table

#New sql user with login named Lina1 in AdventureWorks2019
New-DbaDbUser -SqlInstance $Instance1 -Database AdventureWorks2019 -Login Lina1

In addition, you can check the members’ roles and easily assign a role to a user.

#Returns all members roles of AdventureWorks2019 on Instance1
Get-DbaDbRoleMember -SqlInstance $Instance1 -Database AdventureWorks2019

#Add user Lina1 to the roles db_datareader and db_datawriter in AdventureWorks2019
Add-DbaDbRoleMember -SqlInstance $Instance1 -Database AdventureWorks2019 -Role db_datareader, db_datawriter  -User Lina1

Covering all 500+ SQL Server best practices, administration, development and migration commands included in dbatools is a mission impossible, but I strongly recommend checking the module if you are looking for automation or an easier and quicker way to perform tasks. Although I am still new to dbatools, I was lucky to be introduced to the module by my amazing colleagues; I am currently using it daily and it has helped me significantly improve my efficiency. This blog was a short summary of just a few of the many brilliant things you can do with the module- I strongly suggest checking their website and exploring it further. I would love to hear what you think of dbatools – please share your experience in the comments below.  

T-SQL Tuesday #135: The outstanding tools of the trade that make your job awesome

Welcome to my first T-SQL Tuesday blog and thanks to Michael Bronowski (b|t) for hosting and picking a great topic. Not familiar with T-SQL Tuesday? Started by Adam Mechanic (b|t) back in 2009, the idea is that each month we have a different host who selects the theme and on the second Tuesday each month bloggers from around the world share their knowledge on the topic. Let the blog party start now!

I still remember the first time when a user contacted me with the question: “The database is running slow. Can you help?”. Wouldn’t it be amazing if you have a tool that can help you capture detailed information on what’s running on your instance in situations like that? There is! Today I want to share with you one of my favourite tools- the stored procedure sp_whoisactive developed by Adam Mechanic. Sp_whoisactive allows you to get a lot of information about the active session of a SQL Server instance such as the query being executed, the user who is executing, the wait event, runtime, CPU usage, Tempdb usage and more. You can get even more information if you pass a parameter.

Getting started with sp_WhoIsActice is easy. To download the procedure go to the website downloads page and select the relevant release. Open who_is_active.sql file using SQL Server Management Studio and execute the script. You will see that by default sp_WhoIsActive only returns running user processes.

The procedure provides the following information for each process:

ColumnDescription
dd hh:mm:ss.mssProcess elapsed time
session_idThe process session id
sql_textThe currently running SQL command
login_nameThe login name associated with the process
wait_infoThe process wait information (aggregated)
CPUThe CPU time
tempdb_allocationsNumber of Tempdb writes done
tempdb_currentNumber of Tempdb pages currently allocated
blocking_session_idThe blocking session Id
readsNumber of reads done
writesNumber of writes done
physical readsNumber of physical reads done
used_memoryThe amount of memory used
statusThe process status
open_tran_countThe number of transactions used
percent_completeThe query completion percentage
host_nameThe host machine name
database_nameThe database name where the query is executed
program_nameThe application that executed the query
start_timeThe process start time
login_timeThe login time
request_idThe request Id
collection_timeThe time that this last select was run

Looking for some additional information? You can obtain more information from the procedure by passing additional parameters. For example, if you want to find information not only for the running user processes but for the system processes as well you can use the following command:

Exec sp_whoisactive @show_system_spids = 1

One of my favourite things about sp_whoisactive is how well it is documented- so much useful information plus all of the additional parameters and their description. Use this command to get the information:

Exec sp_whoisactive @help = 1

Using Azure Data Studio? Then you should check out the sp_whoisactive extension. Search for sp_whoisactive in the market place and install. Connect to your SQL Instance is Azure Data Studio, right-click on the instance and select Manage. The sp_whoisactive output is in a graphical format and by clicking on any of the blocks you will automatically get a query that includes all required parameters.

sp_whoisactive has helped me so many times dealing with performance issues. I strongly recommend adding it to your collection of tools – it will make your life easier!