Hello friends, hope you checked out the the previous article on A sneak peek into Angular Language Service. Do read it in case you have not. What if you want an analytics service which does not put any limits around the statistics, scale of data, and one that combines the power of Big Data analytics and enterprise data warehousing? Then data analytics using Azure Synapse is the answer. By the end of this post, you will have a fair idea of what is Azure Synapse Analytics? Also there is a hands-on Azure Portal tutorial which you can try out.
What is Azure Synapse Analytics?
What was formerly known as SQL Data warehouse in Azure is now termed as Azure Synapse Analytics. It is basically an ‘Enterprise Data Warehouse comes to Big Data analytics’ feature. You can fetch scaled data using hosted resources or on-demand server-less based resources.
How does Azure Synapse work?
Below is a very high level view on how Azure Synapse works.
- The data is first pulled into big data format.
- Then using technologies like Spark and machine learning the data is processed and prepared.
- When you are ready to do analysis on this data, Synapse SQL pool uses Poly Base for querying the big data source.
- Finally using Transact-SQL queries data comes into the tables of Synapse SQL pool.
This data is stored in a relational format which cuts costs on data storage and also results in faster queries. One call also execute analytics on this stored data with scaling. The end result is that the queries take very little time (in seconds) compared to hours / minutes using normal databases.
What are the components of Azure Synapse?
Synapse has four major components:
- Spark – Apache Spark
- Studio – Simple user experience
- Pipelines – For hybrid data
- SQL – T-SQL based analytics
Some basic terminology
Synapse workspace– A kind of namespace for performing Azure based analytics. It is wrapped inside of a resource group.
Synapse SQL– It allows us to use T-SQL for analytics using the above workspace. Dedicated and server less are the two models available.
Spark– It’s a combination of processes of driver and executor. A Spark pool allows an application to run inside of it.
Azure Synapse Architecture
The diagram above depicts an architecture which is based on nodes.
Control node – Entry point for applications to connect to Synapse and is the core part of the architecture. It is also responsible for transforming the T-SQL queries submitted by the application / source into smaller queries which run in parallel.
MPP (Massive Parallel Processing) Engine – The Control node also execute a MPP engine which is responsible for query optimization and parallel computing. It then passes the control to the compute nodes.
Compute node – As the name suggest, they provide the raw compute resources. The data received from Control nodes is stored in Azure Storage and queries are executed in parallel mode.
DMS (Data Movement Services) – In order to run parallel queries, a data movement service works with the compute nodes for movement and coordination of data and returning of the processed result sets.
Azure Storage – Azure Storage is responsible for data security. For performance, this data is partitioned into distributions.
Step-by-step Azure Portal Tutorial
NOTE: Sorry for the blurry images, but the idea is just to show you where you can find the settings.
Creation of SQL pool
Step 1 – Using your credentials login to the Azure portal
Azure portal -> Create a resource
Step 2 – Databases -> Azure Synapse Analytics (formerly called SQL DW)
Step 3 – Basics -> Enter your subscription, resource group (myResourceGroup), SQL pool name (mySampleDataWarehouse), and server name
Step 4 – Performance level -> Select performance level (optional) – To change the configuration using a slider
Step 5 – Additional Settings -> Use existing data -> Sample (use AdventureWorksDW as db)
Step 6 – Review + Create -> Create will create the SQL pool (wait for few minutes)
Step 7 – Toolbar -> Notifications (to check progress)
Adding firewall rule for server
Synapse service by default creates a firewall which prevents external connections to the server / databases. For connecting them, a firewall rule should be added for the connection client IP. Note that port 1433 needs to be open for Azure Synapse communication else it doesn’t work.
Step 1 – On deployment completion -> All services -> Azure Synapse Analytics
Step 2 – From Azure Synapse Analytics page -> Left menu -> Azure Synapse Analytics -> mySampleDataWarehouse. The full server name can be viewed for e.g. sqlpoolservername.database.windows.net.
Step 3 – Copy the full server name from above step 2 and click the name to open the settings.
Step 4 – Click Show firewall settings.
Step 5 – Under Firewall settings
Step 6 – Click Add client IP
Step 7 – Click Save. This opens port 1433 for your I.P.
Step 8 – Click OK
Using SQL Management Studio for testing
In this step, we will connect to the SQL pool we created earlier, connect to our sample database and fire some queries against it.
Step 1 – Sign in to the Azure portal.
Step 2 – Left hand menu -> Azure Synapse Analytics -> Azure Synapse -> Analytics page.
Step 3 – Essentials -> Server name (copy). It will something like sqlpoolservername.database.windows.net
Step 4 – SQL Server Management Studio -> Connect to Server and enter below
Server type Database engine
Server name Full server name (e.g. sqlpoolservername.database.windows.net)
Authentication SQL Server Authentication
Login / Password Enter your credentials
Step 5 – Connect -> Databases -> mySampleDatabase
Step 6 – mySampleDataWarehouse -> New Query -> Type below query
SELECT * FROM sys.databases
Step 7 – Execute. Results will show master and mySampleDataWarehouse as the two databases.
Step 7 – Add some more queries as below and check out the results.
SELECT LastName, FirstName FROM dbo.dimCustomer WHERE LastName = 'Adams'AND NumberChildrenAtHome = 3;
Left hand menu -> Azure Synapse Analytics -> Azure Synapse -> Analytics page -> Delete
Click myResourceGroup -> Delete resource group, to delete the resource group. This way you can clean up your resources and avoid unnecessary cloud costs.
In this post we learnt what is Azure Synapse Analytics and how it helps in enhancing the performance of loading data and running heavy query workload using distribution and partitioning. Among other features it also has support for .csv files also. Hope you liked this post and will try out the Azure Synapse tutorial. Do share your feedback in the comments section below.