Friday, 16 November 2012

C# Bulk Data Management

Download Source Here

Introduction

           Microsoft .NET has very good functionality to implement. It provides all sorts of functionality to achieve our feature needs. Only thing is we need to know which namespace and which class we should use for it.
           One of them and very useful class provided by microsoft is SqlBulkCopy it is under namespace System.Data.SqlClient.

            This class is very useful in transferring data from application to SQL Server. This is one of the best practices to be practiced by each .net developer.



Background

           Its been 4 years when I was giving interview to ValueLabs, the interviewer asked me a simple question.

                  Q. How are you inserting data to tables?
                  A. I replied by using a stored procedure.
           
                  Q. If you have some ten records to be inserted at once what you will do?
                  A. I will prepare an XML and send it to storedprocedure. inside sp I will write code to handle xml.
                  Q. What if you are given a Excel file to insert some 1000's of records to insert. preparing xml and sending will take lots of time even your varchar parameter type may not support some time how you will do in that case?
                   A. XML is the only way I am using so far to insert set of records. That way it works and it never gave me any error.
           
                   Q. what if we want to update a set of records, does your xml scenario work?
                   A. I have no answer, as I never faced such situation. So I was quite

      Then he replied and explained me have you ever heard of SqlBulkCopy? Which will insert any number of records in fractions of seconds?

       Then I was amazed and happy too know this. I was not through that interview but I knew a new concept. So I implemented same in my company and got very good appreciation from higher officials. Now I would like to share my experience and the SqlBulkCopy class usage with you.

SqlBulkCopy - Differenet Scenarios

    Scenario-I:  Contacts File Upload

                        In your applications you may get a requirement like upload number of contacts that are in a EXCEL, CSV, or XML file to a database table. Which is most common and repeating work and every developer in his life time will do this kind applications.

    Scenario-II: Producer-Consumer Problem

                        This is also some application problem where the requirement is like all producers (users or services) send message or requests to server queue. The server queue is periodically cleaned by processing or parsing each message and inserting them into database table.

    Scenario-III: Inserting Multiple Records From UI

                         Consider you are given a requirement for adding more than one customer record in a page. Provided multiple rows to add multiple customer in a single page. what will you do int this scenario. Do you insert one by one records which requires lot of db calls and will reduce the application performance.

 there are many scenarios like this to discuss. For all these things to insert more than one record into db table we require a mechanism to handle it very efficiently and with best optimized performance. ms.net framework has a very good class to handle this. Now we will discuss the same. in this article. you can refer SqlBulkCopy class here

Using The Code

        Lets start with an example. Just insert multiple rows into database table in 3 different ways and examine the results

  Software Requirements

            Visual Studio 2010 / 2012
            Sql Sevrer 2008 R2 / 2012

 Create Customers table on database by using the following script

 USE [BlogSamples]
GO

/****** Object:  Table [dbo].[Customers]    Script Date: 11/17/2012 1:07:38 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Customers](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) NOT NULL,
[LastName] [varchar](50) NULL,
[Address1] [varchar](50) NOT NULL,
[Address2] [varchar](50) NULL,
[City] [varchar](50) NOT NULL,
[State] [varchar](50) NOT NULL,
[Country] [varchar](50) NOT NULL,
[ZipCode] [varchar](50) NULL,
 CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Create a Consol application on visual studio as shown in the figure

Bulk Data Management Application
 

Now we will insert the rows into the table from 3 different ways and will examine the performance

Multiple Rows Multiple DB Calls:

    Create a stored procedure to insert individual rows into the database

       -- ================================================
-- Template generated from Template Explorer using:
-- Create Procedure (New Menu).SQL
--
-- Use the Specify Values for Template Parameters 
-- command (Ctrl-Shift-M) to fill in the parameter 
-- values below.
--
-- This block of comments will not be included in
-- the definition of the procedure.
-- ================================================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author,,Name>
-- Create date: <Create Date,,>
-- Description: <Description,,>
-- =============================================
CREATE PROCEDURE Customers_Insert 
-- Add the parameters for the stored procedure here
@firstname varchar(50),
@lastname varchar(50),
@address1 varchar(50),
@address2 varchar(50),
@city varchar(50),
@state varchar(50),
@country varchar(50),
@zipcode varchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

    INSERT INTO [dbo].[Customers]
           ([FirstName]
           ,[LastName]
           ,[Address1]
           ,[Address2]
           ,[City]
           ,[State]
           ,[Country]
           ,[ZipCode])
     VALUES
           (@firstname,@lastname,@address1,@address2,@city,@state,@country,@zipcode)

END
GO

  Create a new class to handle all db calls and name it as BulkDataManagement.cs. add the following code to insert one by one row.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.SqlClient;

namespace BulkDataManagement
{
    public class BulkDataManagement
    {
        public BulkDataManagement()
        {
            //strConnectionString=@"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=Pagination;User ID=sa;Password=srijailu123*";
        }
        public static void AddCustomer(string strConnectionString, string firstname, string lastname, string address1, string address2, string city, string state, string country, string zipcode)
        {
            using (SqlConnection con = new SqlConnection(strConnectionString))
            {
                con.Open();
                SqlCommand cmd = new SqlCommand("Customers_Insert", con);
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                cmd.Parameters.Add(new SqlParameter("@firstname", firstname));
                cmd.Parameters.Add(new SqlParameter("@lastname", lastname));
                cmd.Parameters.Add(new SqlParameter("@address1", address1));
                cmd.Parameters.Add(new SqlParameter("@address2", address2));
                cmd.Parameters.Add(new SqlParameter("@city", city));
                cmd.Parameters.Add(new SqlParameter("@state", state));
                cmd.Parameters.Add(new SqlParameter("@country", country));
                cmd.Parameters.Add(new SqlParameter("@zipcode", zipcode));
                cmd.ExecuteNonQuery();
            }
        }
    }
}

Now prepare you program.cs file to input the values to the application as shown below.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace BulkDataManagement
{
    class Program
    {
        static string connectionString = @"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=BlogSamples;User ID=sa;Password=srijailu123*";
        static void Main(string[] args)
        {
            Console.WriteLine("Enter Number of Records to be Inserted: ");
            int noofrecords = int.Parse(Console.ReadLine());
            DateTime start = DateTime.Now;
            Console.WriteLine("Start Time : "+start);
            for (int i = 0; i < noofrecords; i++)
                BulkDataManagement.AddCustomer(connectionString, "First Name - " + i,
                    "Last Name - " + i, "Address1 - " + i, "Address2 - " + i, "City - I" + i, "State - " + i, "Country - " + i, "Zip Code - " + i);
            DateTime end = DateTime.Now;
            Console.WriteLine("End Time : "+end);
            TimeSpan duration = end - start;
            Console.WriteLine("Total Time Spent : "+duration.TotalSeconds);
            Console.ReadLine();
        }
    }
}

Now run the application and input your number, lets assume 10000 and measure the application performance how long it takes will be noted in the application. It gave me almost 13 seconds duration as in below figure.

Bulk Data Management with 10000 records
Fig. 10000 Records
Now run it for 100,000 records and measure the performance as in the previous step. And the results are showing 129.23 Seconds to insert those many records.

Bulk Data Management With 100000 records
Fig. 100000 records

Multiple Records With Single XML data

Now we will try with another approach where we can insert data by means of preparing an xml and sending that to database Stored Procedure.

The xml can be in the following format.

Bulk Data Management Customer XML Format
Fig. XML Format
now modiy your code and add new stored procedure as per the below

Stored Procedure

-- ================================================
-- Template generated from Template Explorer using:
-- Create Procedure (New Menu).SQL
--
-- Use the Specify Values for Template Parameters 
-- command (Ctrl-Shift-M) to fill in the parameter 
-- values below.
--
-- This block of comments will not be included in
-- the definition of the procedure.
-- ================================================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author,,Name>
-- Create date: <Create Date,,>
-- Description: <Description,,>
-- =============================================
CREATE PROCEDURE Customer_Insert_XML 
-- Add the parameters for the stored procedure here
@customerxml varchar(max)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE @idoc int
    EXEC sp_xml_preparedocument @idoc OUTPUT, @customerxml

Insert into Customers(FirstName,LastName,Address1,Address2,City,State,Country,ZipCode) 
select * from OPENXML (@idoc,'/Root/Customer') WITH
(
firstname varchar(50),
lastname varchar(50),
address1 varchar(50),
address2 varchar(50),
city varchar(50),
state varchar(50),
country varchar(50),
zipcode varchar(50)
)

END
GO

BulkDataManagement.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.SqlClient;

namespace BulkDataManagement
{
    public class BulkDataManagement
    {
        public BulkDataManagement()
        {
            //strConnectionString=@"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=Pagination;User ID=sa;Password=srijailu123*";
        }
        public static void AddCustomer(string strConnectionString, string firstname, string lastname, string address1, string address2, string city, string state, string country, string zipcode)
        {
            using (SqlConnection con = new SqlConnection(strConnectionString))
            {
                con.Open();
                SqlCommand cmd = new SqlCommand("Customers_Insert", con);
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                cmd.Parameters.Add(new SqlParameter("@firstname", firstname));
                cmd.Parameters.Add(new SqlParameter("@lastname", lastname));
                cmd.Parameters.Add(new SqlParameter("@address1", address1));
                cmd.Parameters.Add(new SqlParameter("@address2", address2));
                cmd.Parameters.Add(new SqlParameter("@city", city));
                cmd.Parameters.Add(new SqlParameter("@state", state));
                cmd.Parameters.Add(new SqlParameter("@country", country));
                cmd.Parameters.Add(new SqlParameter("@zipcode", zipcode));
                cmd.ExecuteNonQuery();
            }
        }

        public static void AddCustomerWithXML(string strConnectionString, string customerxml)
        {
            using (SqlConnection con = new SqlConnection(strConnectionString))
            {
                con.Open();
                SqlCommand cmd = new SqlCommand("Customer_Insert_XML", con);
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                SqlParameter param = new SqlParameter("@customerxml", System.Data.SqlDbType.VarChar);
                param.Value = customerxml;
                cmd.Parameters.Add(param);
                cmd.ExecuteNonQuery();
            }
        }
    }
}

Program.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace BulkDataManagement
{
    class Program
    {
        static string connectionString = @"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=BlogSamples;User ID=sa;Password=srijailu123*";
        static void Main(string[] args)
        {
            Console.WriteLine("Enter Number of Records to be Inserted: ");
            int noofrecords = int.Parse(Console.ReadLine());
            Console.WriteLine("Enter Which Method to Execute /nNormal Insert->1/nXML Insert->2/nSqlBulkCopy->3");
            int choice = int.Parse(Console.ReadLine());
            DateTime start = DateTime.Now;
            Console.WriteLine("Start Time : " + start);
            if (choice == 1)
                Customer_Insert(noofrecords);
            else if (choice == 2)
                Customer_XML_Insert(noofrecords);
            DateTime end = DateTime.Now;
            Console.WriteLine("End Time : " + end);
            TimeSpan duration = end - start;
            Console.WriteLine("Total Time Spent : " + duration.TotalSeconds);
            Console.ReadLine();
        }

        static void Customer_Insert(int num)
        {
            for (int i = 0; i < num; i++)
                BulkDataManagement.AddCustomer(connectionString, "First Name - " + i,
                    "Last Name - " + i, "Address1 - " + i, "Address2 - " + i, "City - I" + i, "State - " + i, "Country - " + i, "Zip Code - " + i);
        }

        static void Customer_XML_Insert(int num)
        {
            StringBuilder sb = new StringBuilder("<Root>");
            for (int i = 0; i < num; i++)
            {
                sb.Append("<Customer firstname='First Name - XML -" + i + "' ");
                sb.Append("lastname='Last Name -  XML -" + i + "' ");
                sb.Append("address1='Address1 -  XML -" + i + "' ");
                sb.Append("address2='Address2 -  XML -" + i + "' ");
                sb.Append("city='City -  XML -" + i + "' ");
                sb.Append("state='State -  XML -" + i + "' ");
                sb.Append("country='Country -  XML -" + i + "' ");
                sb.Append("zipcode='Zipcode -  XML -" + i + "' />");
            }
            sb.Append("</Root>");
            BulkDataManagement.AddCustomerWithXML(connectionString, sb.ToString());
        }

        static void Customer_Insert_With_SqlBulkCopy(int num)
        {

        }
    }
}

Here I added the choice section so that while running you can choose appropriate and note down the timings

1-> Normal Insert
2-> XML Insert
3-> Using SqlBulkCopy

Now execute the application and note down the results for XML Insert as we did in the previous section.

amazing can you check the results how fast it inserted the rows into the database table.

Bulk Data Management with XML for 10000 Records
Fig. XML 10000 Records
Now execute the same for 100,000 records. Check the application performance how fast it is and how cute the application. 100000 records also in 15 seconds amazing performance.

Bulk Data Management With XML 100000 Records
Fig. Xml 100000 records
What an amazing performance with xml.

Bulk Data with SqlBulkCopy Class

Now update your BulkDataManagement.cs code as below. here we dont need any stored procedure. We can execute the commands directly.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.SqlClient;
using System.Data;

namespace BulkDataManagement
{
    public class BulkDataManagement
    {
        public BulkDataManagement()
        {
            //strConnectionString=@"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=Pagination;User ID=sa;Password=srijailu123*";
        }
        public static void AddCustomer(string strConnectionString, string firstname, string lastname, string address1, string address2, string city, string state, string country, string zipcode)
        {
            using (SqlConnection con = new SqlConnection(strConnectionString))
            {
                con.Open();
                SqlCommand cmd = new SqlCommand("Customers_Insert", con);
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                cmd.Parameters.Add(new SqlParameter("@firstname", firstname));
                cmd.Parameters.Add(new SqlParameter("@lastname", lastname));
                cmd.Parameters.Add(new SqlParameter("@address1", address1));
                cmd.Parameters.Add(new SqlParameter("@address2", address2));
                cmd.Parameters.Add(new SqlParameter("@city", city));
                cmd.Parameters.Add(new SqlParameter("@state", state));
                cmd.Parameters.Add(new SqlParameter("@country", country));
                cmd.Parameters.Add(new SqlParameter("@zipcode", zipcode));
                cmd.ExecuteNonQuery();
            }
        }

        public static void AddCustomerWithXML(string strConnectionString, string customerxml)
        {
            using (SqlConnection con = new SqlConnection(strConnectionString))
            {
                con.Open();
                SqlCommand cmd = new SqlCommand("Customer_Insert_XML", con);
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                SqlParameter param = new SqlParameter("@customerxml", System.Data.SqlDbType.VarChar);
                param.Value = customerxml;
                cmd.Parameters.Add(param);
                cmd.ExecuteNonQuery();
            }
        }

        public static void CustomersBulkCopy(string connectionstring, DataTable dtCustomers)
        {
            using (SqlBulkCopy scopy = new SqlBulkCopy(connectionstring, SqlBulkCopyOptions.TableLock))
            {
                scopy.DestinationTableName = "dbo.Customers";
                scopy.WriteToServer(dtCustomers);
            }
        }
    }
}

and change your program.cs file code as below.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data;

namespace BulkDataManagement
{
    class Program
    {
        static string connectionString = @"Data Source=WLW7-SRINIVASUP\PEMMA;Initial Catalog=BlogSamples;User ID=sa;Password=srijailu123*";
        static void Main(string[] args)
        {
            Console.WriteLine("Enter Number of Records to be Inserted: ");
            int noofrecords = int.Parse(Console.ReadLine());
            Console.WriteLine("Enter Which Method to Execute /nNormal Insert->1/nXML Insert->2/nSqlBulkCopy->3");
            int choice = int.Parse(Console.ReadLine());
            DateTime start = DateTime.Now;
            Console.WriteLine("Start Time : " + start);
            if (choice == 1)
                Customer_Insert(noofrecords);
            else if (choice == 2)
                Customer_XML_Insert(noofrecords);
            else if (choice == 3)
                Customer_Insert_With_SqlBulkCopy(noofrecords);
            DateTime end = DateTime.Now;
            Console.WriteLine("End Time : " + end);
            TimeSpan duration = end - start;
            Console.WriteLine("Total Time Spent : " + duration.TotalSeconds);
            Console.ReadLine();
        }

        static void Customer_Insert(int num)
        {
            for (int i = 0; i < num; i++)
                BulkDataManagement.AddCustomer(connectionString, "First Name - " + i,
                    "Last Name - " + i, "Address1 - " + i, "Address2 - " + i, "City - I" + i, "State - " + i, "Country - " + i, "Zip Code - " + i);
        }

        static void Customer_XML_Insert(int num)
        {
            StringBuilder sb = new StringBuilder("<Root>");
            for (int i = 0; i < num; i++)
            {
                sb.Append("<Customer firstname='First Name - XML -" + i + "' ");
                sb.Append("lastname='Last Name -  XML -" + i + "' ");
                sb.Append("address1='Address1 -  XML -" + i + "' ");
                sb.Append("address2='Address2 -  XML -" + i + "' ");
                sb.Append("city='City -  XML -" + i + "' ");
                sb.Append("state='State -  XML -" + i + "' ");
                sb.Append("country='Country -  XML -" + i + "' ");
                sb.Append("zipcode='Zipcode -  XML -" + i + "' />");
            }
            sb.Append("</Root>");
            BulkDataManagement.AddCustomerWithXML(connectionString, sb.ToString());
        }

        static void Customer_Insert_With_SqlBulkCopy(int num)
        {
            DataTable dtCustomers = new DataTable("Customers");

            DataColumn dc = new DataColumn("FirstName");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("LastName");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("Address1");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("Address2");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("City");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("State");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("Country");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            dc = new DataColumn("ZipCode");
            dc.DataType = System.Type.GetType("System.String");
            dtCustomers.Columns.Add(dc);

            for (int i = 0; i < num; i++)
            {
                DataRow dr = dtCustomers.NewRow();
                dr["FirstName"] = "First Name sc - " + i;
                dr["LastName"] = "Last Name sc - " + i;
                dr["Address1"] = "Address1 sc - " + i;
                dr["Address2"] = "Address2 sc - " + i;
                dr["City"] = "City sc - " + i;
                dr["State"] = "State sc - " + i;
                dr["Country"] = "Country sc - " + i;
                dr["ZipCode"] = "ZipCode sc - " + i;
                dtCustomers.Rows.Add(dr);
            }
            BulkDataManagement.CustomersBulkCopy(connectionString, dtCustomers);
        }
    }
}

Now as we did in last 2 sections execute the application for sqlbulkcopy mode and check the application performance.

Wow its really amazing and the data 10000 records inserted time seems to be pretty fine and we achieved the best among these three approaches.

Bulk Data Management with SqlBulkCopy 10000 records
Fig. SqlBulkCopy 10000 Records
Check with 100,000 records now. The application performance is pretty amazing and the most efficient solution we have built now. See the performance which can insert 100000 records just in 2 seconds. This is what we are trying to prove till now

Bulk Data Management with SqlBulkCopy 100000 records
Fig. SqlBulkCopy 100000 Records.

Conclusion:

    From the above experiment and the real time values we can conclude that SqlBulkCopy has more benefits when dealing with bulk data. here we have a comparison matrix from our experiment

Bulk Data Management Comparision for Normal, XML and SqlBulkCopy Scenarios
Fig. Comparision
Download Source Here