Background: I built a tool which grabs a lot of data from multiple third party APIs and stores it in a SQL Server database. Then, I built a visualization to display this data in tabular and chart format.
Problem: I just recently migrated from SQL Server on EC2 (m1.medium) to RDS (db.m3.medium) and used this tool to get all my data into my RDS instance.
My application layer is PHP on EC2. Everything is working perfectly. After migration I was able to enter ~300,000 rows INTO the DB via PHP and ODBC driver. Today I went to run a transform stored procedure which will take the ~300,000 rows of data from landing and move it into dbo. On EC2+SQL Server this script took at most 15 minutes. Now using RDS this script has taken up to 5 hours.
My transform script simply creates temp tables, moves the data from landing into the temp tables, creates a join between temp table and dbo, and where there is no data in that join inserts the data from temp table into dbo. In short, if the data is not in dbo it inserts it.
Example logic is included below:
SELECT DISTINCT *
INTO ##temptable
FROM lnd.landingtable
INSERT INTO dbo.realtable(
[DataTable].id
,[DataTable].name
,[DataTable].createdBy
,[DataTable].createdDate
,[DataTable].updatedBy
,[DataTable].updatedDate)
SELECT
[Landing].id
,[Landing].name
,SYSTEM_USER
,GETDATE()
,SYSTEM_USER
,GETDATE()
FROM
##temptable [Landing]
LEFT JOIN dbo.realtable [DataTable] ON [Landing].id = [DataTable].id
WHERE
[DataTable].id IS NULL
I cannot figure out what I am doing wrong. I did not provision IOPS due to cost however from what I read online this should not be necessary. The difference in performance is so dramatic that I cannot understand why anyone would use RDS. Obviously I am doing something majorly wrong. Any guidelines, links, or comments would be greatly appreciated!!
-
115min for 300k rows is extremely long. You are doing something wrong in both cases. Post the execution plans.usr– usr2015年04月05日 20:51:05 +00:00Commented Apr 5, 2015 at 20:51
-
You cannot compare apples to oranges. You should provision IOPS on RDS to match what you had on EC2. Did you recreate your indexes? The Azure tool needs to have that option selected specifically. RDS is a good option if you don't want to worry about having a DBA manage your instance, but Amazon really dropped the ball on not providing the ability to perform a simple db restore.datagod– datagod2016年05月17日 12:32:17 +00:00Commented May 17, 2016 at 12:32
2 Answers 2
If you want to remove them indexes run this script on each database you restore
DECLARE @sql NVARCHAR(MAX);
SET @sql = N'';
SELECT @sql = @sql + N'DROP INDEX '
+ QUOTENAME(name) + ' ON '
+ QUOTENAME(OBJECT_SCHEMA_NAME([object_id]))
+ '.' + QUOTENAME(OBJECT_NAME([object_id])) + ';
'
FROM sys.indexes
WHERE index_id > 0
AND OBJECTPROPERTY([object_id], 'IsMsShipped') = 0
and name like '%ci_az%'
ORDER BY [object_id], index_id DESC;
EXEC sp_executesql @sql;
-
Thanks Thomas, I don't have the indices anymore to test this script but the logic is the same as my solution - remove the indices added by 'Azure'.gr4y– gr4y2014年10月17日 13:00:56 +00:00Commented Oct 17, 2014 at 13:00
It seems that the tool I used to migrate my DB from EC2 to RDS added a bunch of weird indexes which caused the performance of the DB to tank. Once I removed any and all elements of the DB with 'azure' in their name my performance returned to normal! The tool was extremely useful but if anyone happens upon this question know that you have to remove anything Azure created other than your original data.
-
1It is completely normal (and expected) practice to disable indices during substantive bulk loading, and rebuild them upon completion. I would check closely why these indices were created before deleting them permanently, or you are likely o be back here wondering why your application is now slow and/or corrupt.Pieter Geerkens– Pieter Geerkens2014年09月14日 16:34:59 +00:00Commented Sep 14, 2014 at 16:34