Sql Server knowledge sharing blog

Wednesday, April 24, 2013

Sharding, Scale-out with Windows Azure SQL Database (Step by Step)

Database sharding is a technique of horizontal partitioning data across multiple physical servers to provide application scale-out. SQL Database combined with database sharding techniques provides for virtually unlimited scalability of data for an application.

Sharding:

Sharding is an application pattern for improving the scalability and throughput of large-scale data solutions. To “shard” an application is the process of breaking an application’s logical database into smaller chunks of data, and distributing the chunks of data across multiple physical databases to achieve application scalability. Each physical database in this architecture is what is referred to as a shard.

The benefits of sharding or Federation with SQL Database

Scale out using tens, hundreds or thousands of database nodes using commodity hardware instead of expensive scale-up systems,
To achieve scalable performance as the number of nodes increases,
Build a solution with an excellent price-performance ratio derived from the use of commodity hardware instead of expensive application servers
SQL Database provides a high availability SLA of 99.9% for all databases, no need to implement RAID and other availability techniques yourself.

How to Shard your database

To start working with Sql server sharding I have used SQLAzureMW tool to done sharding activity. You can download the tool from

Sql azure Managment view Tool

Before starting sharding you need to design or update your database keeping following points in mind

Every table must have a primary key.
Sharded key must be a primary key or part of primary key in related tables (Clustered index required).
Identity and Timestamp are not included as a part of data types for sharded tables
Identity can be set in Root DB tables and Sharding reference tables. (both terms defined during step by step creation)
Root DB collation and size are replicated to all sharded databases.
You need to manage schema changes in all databases manually to avoid merging problems.
Replication not support with table scripts.
Need to mention Federation key at table creation level even root database are not created yet (using this tool).

Run the tool and select Database

Connect to your local db where Database exists that needs to shard or migrate on SQL Azure

If you want to migrate to simple Azure DB select SQLAzure for Federation, Select as selected.

Also select objects like Tables, procedures etc.

You can also select Table and Schema or Table or Schema only.

Script generated for Selected objects and if there are any errors it will show in red lines in Summary section.

For federation as we don't connect our Azure server yet. We need to add federation key manually so that when script were executed at the end table are ready to created on the basis of federation key.

On this screen you can view FEDERATED ON (UserId = UserId).

Federated On is keyword to add this table into federation
"UserId" is key attribute on which we need to generated shards.
"= UserId" is the Key column that has data and on the basis of this data definition shards data will be moved from one to other federation.

If you are planning to migrate your database onto Azure Plateform. You do not need to mention FEDERATED ON clause.

Use "Connect to Server" and by providing Azure credentials as we can see in next screen shot you will be connected to Azure Database server.
As there is no database created yet. We need a database or Root database to start our database migration.
Click on Create Database and You can et its size and Edition along with its name.
Following editions are available

Express -- Web Editions
Workgroup -- Business Editions
Standard
Enterprise

If you want to migrate your database without federation you can click next and deploy your changes to database.

for Federation:

You have successfully created Root database for your Federation database. Now connect again to create federations.

The difference is change your server type to SQL Azure Federation and mention your database that was recently created. In this case UserDBRoot

Now you have created Federation Root db and need to create it Federation Key on which you data will be moved to respective shards.

You can set Distribution data type as BigInt, Float and UniqueIdentifier.

Now you have successfully created Federation Member and you can see its range is Min value of BigInt to Max value of BigInt

In this Screen shot I am creating Split point that will be key decision factor to distribute data. For sample I have set to 5. (It depends on your requirements).

On click next scripts will generate member databases and also their respective table structures and data (if mentioned) using BCP command.

You can connect and query your database from Online Management Studio and Local management studio as well.

Merging Federated Members

If I want to merge my federation members I will use Delete button and specified options.

For merging you need to remember
All data of deleting member will be lost
Take backup of both members (merger and deleting)
After merging member you need to insert all deleted member data manually
If you delete Federation all members will be lost
You need to make your federation members offline to avoid problems to your customer

There will be another post that will define following in detail

How to Query a specific Shard.
How to Check database usage size.
How to View your sharding database ranges and Stats by Query.
Querying your database using Online and On-premises database management studio.

Tuesday, March 26, 2013

SQL Azure Data Synchronization Process (Step by Step)

Migrating data from cloud into on-premises is always on priority of every organization. Data requires for multiple purposes like reporting, cleaning, archiving etc. One of the reason of data migration or synchronization on and off the cloud is its storage limitation as we know we cannot create huge databases on SQL Azure during trail version and even we have to pay after purchasing that is obviously not too costly but still we need to move our archived data from the cloud DBs to local Database.

Another scenario of data synchronization is at enterprise applications there are multiple offices on multiple locations and we need on-premises data to calculate reporting in easy and efficient way. Lets look at the basic architecture of azure sync.

Azure Sync Portal

SQL Data Sync comprises four main components:

· SQL Data Sync service
· Windows Azure SQL Database
· Windows Azure Storage
· Data Sync Agent

Let discuss how can create synchronization using SQL Azure cloud and make connected to our on-premises database.

Basic idea behind this is to merge azure and local data on defined rules. We need to create a Database hub to synchronize data and then distribute it to data Members or spoke. To do this we need following steps

Provision a Data Sync Server

Before you can begin the process to create a sync group you must first provision a Data Sync server on your Windows Azure subscription.

Create SQL Agent

List Shows available databases

Click on create Sync Agent and follow Wizard

Generate Key

Unique key has been generated to Link up your Azure DBs with Local service

Download and Install SQL Data sync Agent Preview on your local machine

Key generated above used to connect to Azure

Copy --> Paste key and click OK

You can use your windows auth and SQL user auth to connect local db.Review Agent status on cloud on refresh

Local db has been synchronized and status is Good

Create the Sync Group

The first step in the process to create a sync group is to give the sync group a unique and meaningful name.

Use your closet Region

Add a SQL Azure Hub Database

Hub Wins: The first row change written to the hub is kept. Subsequent attempts to write to the same row in the hub are ignored. The first write to the hub is propagated out to all member databases.
Client Wins: Each row change in a member database is written to the hub, overwriting prior changes to the same row. The last write to the hub is then propagated out to all member databases.

No matter which policy you adopt, one of the changed rows is kept and the others are lost whenever a

Conflict arises.

You can use one of your database as Hub and Can add one or multiple as Reference.

Hub DB call sync service with local setting. If will copy all Reference tables in Hub db on Back end

Add a SQL Azure Member or Reference Database

Select one of the database that use sync service to sync data. It would be Client to Hub, Hub to Client or Bi-directional

We have successfully configured sync databases

Configure the Sync Group

Select the sync group schema source

Creating rule for synchronize.

Set the synchronization frequency.

Select the tables to synchronize.

Select the columns to synchronize.
Optionally filter the rows to synchronize.

1. You can also select all tables and all columns but there are some restrictions applied as follows

I. Maximum number of SQL Data Sync Servers per subscription: 1

II. Maximum number of sync groups any database can belong to: 5

III. Filters per table: Up to 12 (optionally 13 if one is on the primary key column)

IV. Database, table, schema, and column names: 50 characters per name

V. Tables in a sync group: 100

VI. Columns in a table in a sync group: 1000

Added reference tables to Sync Database

Along with the data tables (HumanResources.Employee and Person.BusinessEntity), SQL Data Sync created six

new tables, system synchronization tables, in both databases:

• schema_info: Tracks member schema information.

• scope_config / scope_info: Used by the Sync Framework to determine what tables, filters, and so on are being synchronized. Each database that is participating in a sync includes these tables and includes at least one scope (if they’re being synchronized).

• <table>_dss_tracking: Tracks changes to the related user table.

• provision_marker: A metadata table used internally as part of the change-tracking mechanism.

Now we have successfully configured our sync environment. Lets test how it works

Deploy Sync Group and Verify Changes

We have same data on both databases.

Large Query window returning result from local whereas Small Query tab showing data from Cloud. Data is same.

Adding 2 new rows on the local database we have different numbers on both sides.

After adding 2 rows on local db Count is changed from 28 to 30 whereas on cloud query tab count is still 28

Now execute the sync service on cloud. For testing I do not wait frequency to execute sync

Now look at the data on both sides’ data has been synchronized on both databases.

After executing service data is synced and equal on both sides.

As we have configured our reference database as bi-directional. Lets go to the cloud db and verify what happens when there is data change in cloud db. I have deleted some rows on cloud db and look at the total rows difference on both sides now.

After running sync service data has been updated on both sides.

Debugging the Logs

We have configured and test changes to sync data on Azure. Hope you will get benefit out of this post.

Tuesday, March 12, 2013

Working with SQL Azure Management Portal

In previous posts we have installed and connect our Azure database using Azure Management portal and SQL Server Management Studio.

In this post we will how can we perform SQL related tasks on Azure Management Portal