SSIS – Fastest or quickest way to move data on a local system – Interesting testing
Over the weekend I was doing some work to in order to catch up a lot of data that I had loaded incorrectly. Whilst I was moving this data around I for some reason or other opened up the Resource Monitor on my 2012 Server. What I saw below confused me quite a lot.
Which I will explain after the screenshot below.
Now the interesting part about the above screenshot is that when this was taken the data flow was moving data on the same SQL Server instance, same SQL Server Database, just between different tables. The entire SSIS package was also running on the same physical server. And the attached storage is super quick, where it can get up to about 600 MB\sec.
My confusion was how could the Network card be used when data is moving locally on the server. There is no ways that this should be happening. And the impact of this, is that my data flow task would not be running as quickly and efficiently as possible. So I began testing and trying to find out what was the cause of this and how to ensure that I could find a work around.
So I began my investigation, in the past I had always used the ADO.NET source and destinations in my data flow tasks. And I did read the white papers which suggested using the OLE DB source and destinations in your data flow tasks, but in the past I did a lot of testing and found the performance throughput to be negligible. And I am going to assume that it was because of the screenshot above, which was due to the movement of data was between two servers, so it had to go via the network.
I then completed a series of tests, not only to find out which would be the quickest and most efficient way to transfer the data, but also which Data Flow Source would NOT go via the network card.
Here are the outcome of my tests below:
|Data Flow Source||Data Flow Destination||Commit Size if applicable||Destination Config Settings||Time Taken in seconds||Total Rows||Throughput||Network Card transferred|
|OLE DB Source||OLE DB Destination||Commit Size = 0||Only Table Lock Settings||24,22||3102471||128105,9955||No|
|OLE DB Source||OLE DB Destination||Commit Size = 0||Table Lock Settings, check constraints||20,22||3102471||153450,9348||No|
|OLE DB Source||SQL Server Destination||Default||Defaults||28,47||3102471||108980,9962||No|
|ADO Net Source||ADO Net Destination||Default||Defaults||98,72||3102471||31427,29363||Yes|
|ADO Net Source||ADO Net Destination||10000 rows||Batch Size||98,75||3102471||31417,42785||Yes|
|ADO Net Source||ADO Net Destination||10000 rows||Shared Memory||99,49||3102471||31185,31437||No|
|ADO Net Source||ADO Net Destination||10000 rows||Shared Memory||99,51||3102471||31177,47965||No|
As you can see from the above tests, by far the quickest way for transferring the data is the OLE DB source. And if possible with the Commit Size of Zero.
And once again, I should have put into action what I had read in the Microsoft White Paper. For the life of me I cannot find the link to the white paper, but when I do I will update it with the link.
What I have found out from completing my testing is the following below:
- I found out that because I was configuring my ADO.NET Source and Destinations to be using Network Library called TCP/IP as shown below, SSIS would then transfer the data via TCP/IP. And due to it being configured this way it would then transfer the data via the network card.
- Even when changing the ADO.NET Source to use Shared Memory instead of TCP/IP it would still take a lot longer to move the data than the OLE DB Connection.
- It also was apparent in my testing the OLE DB was faster than the SQL Server Destination.
So in conclusion it is definitely quicker to use OLE DB for your source and destinations in your data flow tasks when you are moving data between tables on the same database.
The reason for this is that the OLE DB connection manager is smart enough to detect that if the databases and or tables are local, the to use the Shared Memory to move the data. Therefore making the transfer quick. If it is from different servers across the network then it will use TCP/IP in order to get the data.
And the larger the data that you want to move, the more of an impact this will have.
UPDATE: 13 May 2013
Just a quick update to blog I posted above. During the past week I was moving a lot of data again. And what I have found out and what I wanted to put into this blog here was the following:
- I had 2 servers one that was under memory pressure and a new server that was NOT under any memory pressure.
- Both of the servers were moving about 10 million rows, and both had an OLE DB Destination with the commit size set to zero.
- On the server with the memory pressure it kept on failing when trying to commit the 10 million rows.
- While on the server with NO memory pressure would commit the data fine.
What I concluded was that either SSIS or SQL Server does not have enough memory to take the entire dataset and commit it to the SQL Server, which was due to the memory pressure on the one server. Even though during the SSIS data flow task it looks like it is inserting the rows. When it fails there were no rows in the destination table.
I hope that this helps.