What is Azure Synapse Analytics?

Hello friends, hope you checked out the the previous article on A sneak peek into Angular Language Service. Do read it in case you have not. What if you want an analytics service which does not put any limits around the statistics, scale of data, and one that combines the power of Big Data analytics and enterprise data warehousing? Then data analytics using Azure Synapse is the answer. By the end of this post, you will have a fair idea of what is Azure Synapse Analytics? Also there is a hands-on Azure Portal tutorial which you can try out.

azure synapse
Image source: Microsoft docs

What is Azure Synapse Analytics?

What was formerly known as SQL Data warehouse in Azure is now termed as Azure Synapse Analytics. It is basically an ‘Enterprise Data Warehouse comes to Big Data analytics’ feature. You can fetch scaled data using hosted resources or on-demand server-less based resources.

How does Azure Synapse work?

Below is a very high level view on how Azure Synapse works.

  • The data is first pulled into big data format.
  • Then using technologies like Spark and machine learning the data is processed and prepared.
  • When you are ready to do analysis on this data, Synapse SQL pool uses Poly Base for querying the big data source.
  • Finally using Transact-SQL queries data comes into the tables of Synapse SQL pool.

This data is stored in a relational format which cuts costs on data storage and also results in faster queries. One call also execute analytics on this stored data with scaling. The end result is that the queries take very little time (in seconds) compared to hours / minutes using normal databases.

What are the components of Azure Synapse?

Synapse has four major components:

  • Spark – Apache Spark
  • Studio – Simple user experience
  • Pipelines – For hybrid data
  • SQL – T-SQL based analytics

Some basic terminology

Synapse workspace– A kind of namespace for performing Azure based analytics. It is wrapped inside of a resource group.

Synapse SQL– It allows us to use T-SQL for analytics using the above workspace. Dedicated and server less are the two models available.

Spark– It’s a combination of processes of driver and executor. A Spark pool allows an application to run inside of it.

Azure Synapse Architecture

azure synapse architecture
Image source: Microsoft docs

The diagram above depicts an architecture which is based on nodes.

Control node – Entry point for applications to connect to Synapse and is the core part of the architecture. It is also responsible for transforming the T-SQL queries submitted by the application / source into smaller queries which run in parallel.

MPP (Massive Parallel Processing) Engine – The Control node also execute a MPP engine which is responsible for query optimization and parallel computing. It then passes the control to the compute nodes.

Compute node – As the name suggest, they provide the raw compute resources. The data received from Control nodes is stored in Azure Storage and queries are executed in parallel mode.

DMS (Data Movement Services) – In order to run parallel queries, a data movement service works with the compute nodes for movement and coordination of data and returning of the processed result sets.

Azure Storage – Azure Storage is responsible for data security. For performance, this data is partitioned into distributions.

Step-by-step Azure Portal Tutorial

NOTE: Sorry for the blurry images, but the idea is just to show you where you can find the settings.

Creation of SQL pool

Step 1 – Using your credentials login to the Azure portal

Azure portal -> Create a resource

azure synapse create resource
Image source: Microsoft docs

Step 2 – Databases -> Azure Synapse Analytics (formerly called SQL DW)

azure synapse create database
Image source: Microsoft docs

Step 3 – Basics -> Enter your subscription, resource group (myResourceGroup), SQL pool name (mySampleDataWarehouse), and server name

azure synapse enter basics
Image source: Microsoft docs

Step 4 – Performance level -> Select performance level (optional) – To change the configuration using a slider

azure synapse performance
Image source: Microsoft docs

Step 5 – Additional Settings -> Use existing data -> Sample (use AdventureWorksDW as db)

azure synapse additional settings
Image source: Microsoft docs

Step 6 – Review + Create -> Create will create the SQL pool (wait for few minutes)

azure synapse review and create
Image source: Microsoft docs

Step 7 – Toolbar -> Notifications (to check progress)

azure synapse notifications
Image source: Microsoft docs

Adding firewall rule for server

Synapse service by default creates a firewall which prevents external connections to the server / databases. For connecting them, a firewall rule should be added for the connection client IP. Note that port 1433 needs to be open for Azure Synapse communication else it doesn’t work.

Step 1 – On deployment completion -> All services -> Azure Synapse Analytics

Step 2 – From Azure Synapse Analytics page -> Left menu -> Azure Synapse Analytics -> mySampleDataWarehouse. The full server name can be viewed for e.g. sqlpoolservername.database.windows.net.

Step 3 – Copy the full server name from above step 2 and click the name to open the settings.

server settings
Image source: Microsoft docs

Step 4 – Click Show firewall settings.

show firewall settings
Image source: Microsoft docs

Step 5 – Under Firewall settings

firewall settings
Image source: Microsoft docs

Step 6 – Click Add client IP

Step 7 – Click Save. This opens port 1433 for your I.P.

Step 8 – Click OK

Using SQL Management Studio for testing

In this step, we will connect to the SQL pool we created earlier, connect to our sample database and fire some queries against it.

Step 1 – Sign in to the Azure portal.

Step 2 – Left hand menu -> Azure Synapse Analytics -> Azure Synapse -> Analytics page.

Step 3 – Essentials -> Server name (copy). It will something like sqlpoolservername.database.windows.net

copy server name
Image source: Microsoft docs

Step 4 – SQL Server Management Studio -> Connect to Server and enter below

Server type              Database engine
Server name             Full server name (e.g. sqlpoolservername.database.windows.net)
Authentication        SQL Server Authentication
Login / Password    Enter your credentials

sql mgmt studio
Image source: Microsoft docs

Step 5 – Connect -> Databases -> mySampleDatabase

sql db
Image source: Microsoft docs

Step 6 – mySampleDataWarehouse -> New Query -> Type below query

SELECT * FROM sys.databases

Step 7 – Execute. Results will show master and mySampleDataWarehouse as the two databases.

sql results
Image source: Microsoft docs

Step 7 – Add some more queries as below and check out the results.

SELECT LastName, FirstName FROM dbo.dimCustomer 
WHERE LastName = 'Adams'AND NumberChildrenAtHome = 3;
sql results of second query
Image source: Microsoft docs

Deleting resources

Left hand menu -> Azure Synapse Analytics -> Azure Synapse -> Analytics page -> Delete

delete resources
Image source: Microsoft docs

Click myResourceGroup -> Delete resource group, to delete the resource group. This way you can clean up your resources and avoid unnecessary cloud costs.

Summary

In this post we learnt what is Azure Synapse Analytics and how it helps in enhancing the performance of loading data and running heavy query workload using distribution and partitioning. Among other features it also has support for .csv files also. Hope you liked this post and will try out the Azure Synapse tutorial. Do share your feedback in the comments section below.

Hitesh Boricha

I have a little over a decade experience in the IT industry. Having worked in various roles in this industry, I am passionate about technology.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.