Fastest way to insert rows in mysql vs sqlserver (large dataset)

Question 1

I'm looking for any possible improvements that could be made to run the mysql piece of code faster.

I created a simple test winform application that creates two docker databases :

Once the instances are created, it creates a similar table on both. I then use a stored procedure to insert a Category as fast as possible in both setups.

Here's the definition of a category :

public class Category
{
 public int Id { get; set; }
 [System.ComponentModel.DataAnnotations.StringLength(75)]
 public string CategoryName { get; set; }
 [System.ComponentModel.DataAnnotations.StringLength(300)]
 public string Description { get; set; }
 public DateTime CreationTime { get; set; }
}

The test results are the following:

100k items
MySql Inserted 100000 items in 2955ms
MySql Inserted 100000 items in 2801ms
MySql Inserted 100000 items in 2706ms
MySql Inserted 100000 items in 2512ms
MySql Inserted 100000 items in 2850ms
SqlServer Inserted 100000 items in 1004ms
SqlServer Inserted 100000 items in 902ms
SqlServer Inserted 100000 items in 858ms
SqlServer Inserted 100000 items in 1421ms
SqlServer Inserted 100000 items in 905ms
600k items
MySql Inserted 600000 items in 21849ms
MySql Inserted 600000 items in 17089ms
MySql Inserted 600000 items in 16776ms
SqlServer Inserted 600000 items in 5677ms
SqlServer Inserted 600000 items in 4635ms
SqlServer Inserted 600000 items in 5474ms

Here is the setup for MySql

MySql Stored procedure :

USE `BenchmarkDb`;
DROP procedure IF EXISTS `BenchmarkDb`.`CategoriesInsertWithoutId`;
DELIMITER $$
USE `BenchmarkDb`$$
CREATE DEFINER=`root`@`%` PROCEDURE `CategoriesInsertWithoutId`(IN JsonPayload LONGTEXT)
BEGIN
 insert into BenchmarkDb.Categories
(Category,
Description)
 SELECT tt.CategoryName,tt.Description
FROM
 JSON_TABLE(
 JsonPayload
 ,"$[*]"
 COLUMNS(
 Id int PATH "$.Id",
 CategoryName VARCHAR(75) PATH "$.CategoryName",
 Description VARCHAR(300) PATH "$.Description",
 CreationTime DateTime PATH "$.CreationTime"
 )
 ) AS tt;
 END$$
DELIMITER ;
;

Using the latest 8.0 NPM mysql driver. Sends a large Json string containing all the data. The stored procedure will then turn it into a table and insert from that.

MySql c# code :

Stopwatch stopwatch = new Stopwatch();
string JsonPayload = JsonConvert.SerializeObject(
 TestingDataHelpers.GenerateTestingCategories(100000)
 ,new IsoDateTimeConverter() { DateTimeFormat= "yyyy-MM-dd HH:mm:ss" });
stopwatch.Start();
var parameters=new List<MySqlParameter>()
 {
 new MySqlParameter()
 {
 MySqlDbType=MySqlDbType.LongText,
 ParameterName="JsonPayload",
 Value=JsonPayload
 }
 };
DataSet ResultsDataset = new DataSet();
using (var connection = new MySqlConnection("Server=localhost;Uid=root;Pwd=password1234;"))
{
 using (var command = connection.CreateCommand())
 {
 command.CommandText = "BenchmarkDb.CategoriesInsertWithoutId";
 command.CommandType = CommandType.StoredProcedure;
 if (parameters != null && parameters.Count() > 0)
 {
 foreach (var parameter in parameters)
 {
 command.Parameters.Add(parameter);
 }
 }
 using (var dataAdapter = new MySqlDataAdapter(command))
 {
 dataAdapter.Fill(ResultsDataset);
 }
 }
}
stopwatch.Stop();

Here is the similar SqlServer code that uses Structured datasets and Table-valued parameters to send data to the stored procedure.

SqlServer Stored procedure:

IF EXISTS ( SELECT *
 FROM sys.objects
 WHERE object_id = OBJECT_ID(N'CategoriesInsertWithoutId')
 AND type IN ( N'P', N'PC' ) ) 
 DROP PROCEDURE [dbo].[CategoriesInsertWithoutId]
IF type_id('[dbo].[CategoryType]') IS NOT NULL
 DROP TYPE [dbo].[CategoryType];
CREATE TYPE CategoryType AS TABLE
( Id int,
CategoryName nvarchar(75),
Description nvarchar(300),
CreationTime DateTime); 
CREATE OR ALTER PROCEDURE [dbo].[CategoriesInsertWithoutId]
 @CategoriesToInsert CategoryType READONLY
AS
BEGIN
 SET NOCOUNT ON;
 insert into [dbo].[Categories] (Category,Description)
 select c.CategoryName,c.Description from @CategoriesToInsert c
END

SqlServer c# Code:

Stopwatch stopwatch = new Stopwatch();
var categories = SqlManagerHelpers.ToDataTable(TestingDataHelpers.GenerateTestingCategories(100000));
stopwatch.Start();
var parameters=new List<SqlParameter>()
 {
 new SqlParameter()
 {
 SqlDbType=SqlDbType.Structured,
 ParameterName="@CategoriesToInsert",
 Value=categories
 }
 };
DataSet ResultsDataset = new DataSet();
using (var connection = new SqlConnection("Data Source=.;User Id=sa;password=password1234;"))
{
 using (var command = connection.CreateCommand())
 {
 command.CommandText = "dbo.CategoriesInsertWithoutId";
 command.CommandType = CommandType.StoredProcedure;
 if (parameters != null && parameters.Count() > 0)
 {
 foreach (var parameter in parameters)
 {
 command.Parameters.Add(parameter);
 }
 }
 using (var dataAdapter = new SqlDataAdapter(command))
 {
 dataAdapter.Fill(ResultsDataset);
 }
 }
}
stopwatch.Stop();

Here are some extra helper classes that are referrenced above.

 static class TestingDataHelpers
 {
 static Random rnd = new Random();
 public static List<Category> GenerateTestingCategories(int NumberOfEntriesToMake)
 {
 List<Category> categories=new List<Category>();
 for(int i = 0; i < NumberOfEntriesToMake; i++)
 {
 categories.Add(new Category()
 {
 Id = i,
 CategoryName=CategoryNames[rnd.Next(CategoryNames.Count)],
 Description=CategoryDescriptions[rnd.Next(CategoryDescriptions.Count)],
 CreationTime=DateTime.Now
 });
 }
 return categories;
 }
 #region CategoryNames
 private static List<string> CategoryNames = new List<string>()
 {
"Redacted data is redacted.. enjoy some redacted data",
"Redacted data is redacted.. enjoy some redacted data",
"Redacted data is redacted.. enjoy some redacted data",
 };
 #endregion
 #region CategoryDescriptions
 private static List<string> CategoryDescriptions = new List<string>()
 {
"Redacted data is redacted.. enjoy some redacted data",
"Redacted data is redacted.. enjoy some redacted data",
"Redacted data is redacted.. enjoy some redacted data",
 };
 #endregion
 }
 static class SqlManagerHelpers
{
 public static DataTable ToDataTable<T>(this IList<T> data)
 {
 var props = typeof(T).GetProperties().Where(pi => pi.GetCustomAttributes(typeof(SkipPropertyAttribute), true).Length == 0).ToList();
 DataTable table = new DataTable();
 for(int i =0;i<props.Count;i++)
 {
 var prop = props[i];
 table.Columns.Add(prop.Name, Nullable.GetUnderlyingType(prop.PropertyType) ?? prop.PropertyType);
 StringLengthAttribute stringLengthAttribute= prop.GetCustomAttributes(typeof(StringLengthAttribute), false).Cast<StringLengthAttribute>().SingleOrDefault();
 if (stringLengthAttribute != null)
 {
 table.Columns[i].MaxLength = stringLengthAttribute.MaximumLength;
 }
 } 
 foreach (T item in data)
 {
 DataRow row = table.NewRow();
 foreach (var prop in props)
 row[prop.Name] = prop.GetValue(item) ?? DBNull.Value;
 table.Rows.Add(row);
 }
 return table;
 }
 public class SkipPropertyAttribute : Attribute
 {
 }
}

Here are the required database schemas

MySql database and table definition

CREATE DATABASE `BenchmarkDb`;
CREATE TABLE `BenchmarkDb`.`Categories` (
 `Id` INT NOT NULL AUTO_INCREMENT,
 `Category` VARCHAR(75) NULL,
 `Description` VARCHAR(300) NULL,
 `CreationTime` DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
 PRIMARY KEY (`id`)) ENGINE=InnoDB;

SqlServer database and table definition

CREATE TABLE [BenchmarkDb].[dbo].[Categories] (
 [Id] [int] IDENTITY(1,1) NOT NULL,
 [Category] [nvarchar](75) NULL,
 [Description] [nvarchar](300) NULL,
 [CreationTime] DATETIME NOT NULL DEFAULT GETDATE(),
 CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED 
 (
 [Id] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
 ) ON [PRIMARY]

Question 2

This works? That last MySqlDataAdapter(command) is on a SqlConnection.

Question 3

It is suppose to be working code only here. ResultDataset is not defined.

Question 4

see my answer to this question stackoverflow.com/questions/12467431/…

Question 5

This seems more complex than it needs to be docs.microsoft.com/en-us/sql/relational-databases/tables/… and it is still has last MySqlDataAdapter(command) is on a SqlConnection. You need to make this into working code soon or it will get closed.

Question 6

The code is okay now. The goal is to be able to handle a large table-valued parameter in the stored procedure. I want to send data in both direction from mysql to sqlserver and the opposite.

Question 7

I don't know anything about mysql, so I'm ignoring that part of your question.

For SQL Server, if you're trying to make an insert go faster you're going to want to:

Do it in bulk
Do it in parallel
Make it minimally logged
Do it in batches

There are some things you can do that will handle all of this for you, which I list below, otherwise you'll have to write something yourself.

Bulk operations

Bulk operations ultimately boil down to trying to do as much work as possible in a single operation, in a way that doesn't tank performance (transaction logging is the most common thing this helps with, but there are some more). The link mentions a few benefits, the main ones I'm highlighting here are:

Minimal logging
Better locking (BU locks)
Batching
Optional triggers/constraints

If you were inserting directly from a file, then BULK INSERT is your friend. It will handle pretty much all of the considerations above for you (besides parallelism, which is outside of BULK INSERT's control.

Inserting from C#, however would be better suited to use SqlBulkCopy. This lets you perform bulk insert operations into a table, and can be configured to ignore constraints, triggers, identity columns, etc.

Parallel Inserts

Parallel inserts are what allow SQL to insert multiple rows into a table at once instead of doing row-by-row operations. This generally requires a few things:

A heap
No IDENTITY columns
The right kind of lock on the table

If you don't have a heap (e.g. there are indices), then the index maintenance prohibits the parallel insert, and it will do the insert serially. For large-scale ETL workloads, this is a good use-case for a "staging" database/table that has none of these things and as such can get the best performing insert. Brent Ozar has a good post that touches on this a bit as well.

IDENTITY columns also prevent parallel inserts, as maintaining the order of the inserts is required for it to work correctly.

If you don't have the right locks on the table (BU locks work, as does the TABLOCK(X) hint(s)) then SQL Server has to consider that another session could be modifying the table as well, which also prevents parallelism.

If you are able to meet all of these requirements, however, then your operations (whether using built-in bulk operations as above, or rolling your own as below) will be able to run faster by taking advantage of the additional cores SQL Server has.

Minimal logging

Minimal logging is the method by which you prevent the transaction log from overflowing. Some operations can be rolled back more easily than others or require less transaction log space than others. Maintaining the transaction log isn't free/cheap, so reducing how much of it is necessary also helps performance. In general, if you follow the rules you should be able to achieve minimal logging. At a high level, the following are minimally logged:

An insert into a heap (i.e. a table without a clustered index) that has no non-clustered indexes, using a TABLOCK hint, having a high enough cardinality estimate (> ~1000 rows)
An insert into a table with a clustered index that has no non-clustered indexes, without TABLOCK, having a high enough cardinality estimate (> ~1000 rows)
Adding an index to a table, even if that table already has data.

Batches

Lastly, you can break a large chunk of work up into smaller batches. This can be useful if transaction logging is the main concern as each batch becomes its own transaction.

This is tricky to implement generically, and can be unpleasant to do yourself. Another major concern is with correctness of the data; if another user hits the database while you're not done with your batches, then they may get inconsistent results. This is a good use-case for doing the work in another table, then swapping tables out, as well as for snapshot isolation.

Question 8

Thanks for the answer ! I'll get a read through these links. I've never used SqlBulkCopy so we'll see how that goes.

Dan Oberlam Dan Oberlam 8,0492 gold badges33 silver badges74 bronze badges · Answer 1 · 2019-08-27 20:16:22Z

I don't know anything about mysql, so I'm ignoring that part of your question.

For SQL Server, if you're trying to make an insert go faster you're going to want to:

Do it in bulk
Do it in parallel
Make it minimally logged
Do it in batches

There are some things you can do that will handle all of this for you, which I list below, otherwise you'll have to write something yourself.

Bulk operations

Bulk operations ultimately boil down to trying to do as much work as possible in a single operation, in a way that doesn't tank performance (transaction logging is the most common thing this helps with, but there are some more). The link mentions a few benefits, the main ones I'm highlighting here are:

Minimal logging
Better locking (BU locks)
Batching
Optional triggers/constraints

If you were inserting directly from a file, then BULK INSERT is your friend. It will handle pretty much all of the considerations above for you (besides parallelism, which is outside of BULK INSERT's control.

Inserting from C#, however would be better suited to use SqlBulkCopy. This lets you perform bulk insert operations into a table, and can be configured to ignore constraints, triggers, identity columns, etc.

Parallel Inserts

Parallel inserts are what allow SQL to insert multiple rows into a table at once instead of doing row-by-row operations. This generally requires a few things:

A heap
No IDENTITY columns
The right kind of lock on the table

If you don't have a heap (e.g. there are indices), then the index maintenance prohibits the parallel insert, and it will do the insert serially. For large-scale ETL workloads, this is a good use-case for a "staging" database/table that has none of these things and as such can get the best performing insert. Brent Ozar has a good post that touches on this a bit as well.

IDENTITY columns also prevent parallel inserts, as maintaining the order of the inserts is required for it to work correctly.

If you don't have the right locks on the table (BU locks work, as does the TABLOCK(X) hint(s)) then SQL Server has to consider that another session could be modifying the table as well, which also prevents parallelism.

If you are able to meet all of these requirements, however, then your operations (whether using built-in bulk operations as above, or rolling your own as below) will be able to run faster by taking advantage of the additional cores SQL Server has.

Minimal logging

Minimal logging is the method by which you prevent the transaction log from overflowing. Some operations can be rolled back more easily than others or require less transaction log space than others. Maintaining the transaction log isn't free/cheap, so reducing how much of it is necessary also helps performance. In general, if you follow the rules you should be able to achieve minimal logging. At a high level, the following are minimally logged:

An insert into a heap (i.e. a table without a clustered index) that has no non-clustered indexes, using a TABLOCK hint, having a high enough cardinality estimate (> ~1000 rows)
An insert into a table with a clustered index that has no non-clustered indexes, without TABLOCK, having a high enough cardinality estimate (> ~1000 rows)
Adding an index to a table, even if that table already has data.

Batches

Lastly, you can break a large chunk of work up into smaller batches. This can be useful if transaction logging is the main concern as each batch becomes its own transaction.

This is tricky to implement generically, and can be unpleasant to do yourself. Another major concern is with correctness of the data; if another user hits the database while you're not done with your batches, then they may get inconsistent results. This is a good use-case for doing the work in another table, then swapping tables out, as well as for snapshot isolation.

Thanks for the answer ! I'll get a read through these links. I've never used SqlBulkCopy so we'll see how that goes.

Stack Exchange Network

Fastest way to insert rows in mysql vs sqlserver (large dataset)

1 Answer 1

Bulk operations

Parallel Inserts

Minimal logging

Batches

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

1 Answer 1

Batches

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related