Skip to main content

Loading a CSV file of more than 10 GB in 7 minutes and 51 seconds

Exchanging files between systems has been around for a while and to my opinion it will not to disappear soon. The reasons that this type of data exchange is still around could be related to:
  • Security rules prevent to connect directly to the source system
  • The source system is outside the organization’s network so it is not possible to connect directly
  • There is no compatible data connector for the source system
  • etc.
Most systems can generate and read text files and the formats are rather trivial. The most common is the comma separated values format or better known as the CSV format. Everyone knows them and how to handle them. Tools like Excel can be used to examine them and every database system provides a connector to load them, including SQL Server.

CSV files do not escape on the fact that our data need is growing, thus so are CSV files. Due to that the processing of these files take more and more time and it happens nowadays more and more that IT departments ran out of their available daily batch time for loading and processing data.

Once the files are loaded the database knows how to handle the large volumes of data using parallelism and compression. Query execution times of 1 second on hundreds of millions of rows is rather common on a FastTrack Data Warehouse solution and this is in sharp contrast with the loading of a large CSV file, which can take several hours.

More and more our customers contact us with the question how to handle the problem of loading large files. This paper describe a general solution developed for a relative normal production environment with one ETL server using Integration Services and one SQL Server Fast Track Data Warehouse server.

I have put everything in a easy to read document that you can download here.

Comments

Popular posts from this blog

Privacy and the liberty to express yourself on LinkedIn

Unaware that LinkedIn has such a strong filtering policy that it does not allow me posting a completely innocent post on a Chinese extreme photography website I tried to post the following: "As an Mpx lover I was suprised to find out that the M from Million is now replaced by the B from Billion. This picture is 24 Bpx! Yes you read this well, 24 billion pixels.  Searching on the picture I stumbled on a fellow Nikon lover. If you want to search for him yourself you can find him here: http://www.bigpixel.cn/t/5834170785f26b37002af46a " In my eyes nothing is wrong with this post, but LinkedIn considers it as offending. I changed the lover words, but I could not post it.  Even taking a picture and post it will not let this pass:  Or my critical post on LinkedIn crazy posting policy: it will not pass and I cannot post it.  The technology LinkedIn shows here is an example what to expect in the near future.  Newspapers will have a unified reporting using

Windows Server 2016 with ... XBox extensions !!!

Microsoft must have been thinking that the live of a Windows administrator gets so boring that they need a distraction and they have integrated the XBox Live extensions standard in Windows Server 2016.  No kidding: take a look. I did not select it as a feature and it is there. As you can see the XBox Live service is started automatically even in mode manual (see error log).  To my opinion these 3 XBox Live services should never be available on a server. Unless the server is part of the XBox Live platform of cause but honestly I don't think that Microsoft will allow that.  Every IT manager with a serious Windows production environment would fire any administrator playing XBox Live on the production servers. So though it might be tempting don't do it Windows administrator. It is a trap! :-)

How to run SQL Server 2016 with In-Databasse R on Windows 2016 CTP5

For those who like me tried to run SQL Server 2016 with In-Database R might have run into the same problem as me: In-Database R or the LaunchPad service gives a timeout and won't start. I did several clean installations with different configuration options - for instance I like to put my data on another disk than the system disk - but in the end I tried to do the next, next, next, finish install to see if it something in the setup options is hard coded in there (yes, it happens developers!). For some reason this problem is related to Windows 2016 and not on Windows 2012R2 and I hope the SQL Server team will soon resolve these issues because they are in one word a bit sloppy.  There are 2 issues (maybe even 3 so I give this one also):  The R setup does not create the ExtensibilityLog directory in the "C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Log" directory The R setup sets the number of users in the SQL Server Launchpad service to 0 it is pos