Skip to main content

Storage Spaces in Microsoft Analytics Platform System or SQL Server Parellel Data Warehouse

Storage Spaces is a component available in Windows 2012. Microsoft has used this technique in their Microsoft Analytics Platform System or previously named SQL Server 2012 Parallel Data Warehouse. 

But as I will show they got stuck in the 'old way of thinking' which will only work (slightly?) better if we live in a perfect world, were all data is requested at the same time. In a lot of cases this is not so....

The hardware behind Microsoft Analytics Platform System

When looking closer at the hardware the Hewlett and Packard (HP) solution based on the DL360 Gen8, which has 2 PCIe slots, is equipped with one H221 SAS controller with two 4x6 Gb/s ports and one Melanox Infini Band 56 Gb/s controller. In the HP solution the JBOD is divided in two: every node manages 32 disks. 



Dell addresses the solution with the PowerEdge R620 which has maximum 3 PCIe slots but 2 JBODs so in order to address all 102 disks in the JBOD's they need to do the same as HP namely one SAS  4x6 Gb/s connections per JBOD. The Quanta solution resembles the Dell solution with 3 nodes and 2 JBOD's. 


The software architecture of Microsoft Analytics Platform System

Because the solution needs to be high available and therefore be capable to move from one defective node to another the whole SQL Server environment is visualised in a Windows 2012 VM. 

In order to provide Cluster Shared Volumes access to the JBOD's every node has also a Windows 2012 storage cluster VM up and running. If a server controller fails the node uses the storage services provided by the Storage VM on the second or in case of Dell and Quanta one of the other nodes. But this VM need to be accessed by the Infini Band network. This means that in case of a node failure performance of both nodes will be reduced to 50% in case of an HP solution and by 33% in case of a Dell/Quanta solution. 

The storage architecture of Microsoft Analytics Platform System 

The way storage is organised is pretty standard: every node addresses 16 mirrored data volumes build on several disks in the JBOD(s) and there is a general tempdb and LOG volume. This resembles more or less the traditional RAID1+0 or RAID10 solution. 

If you have read my other post on Storage Spaces I have my thoughts about this storage architecture. 

If your data is not so well organised for your query and for some reason you hit only one or a few volumes you also hit the physical limits of the mirror or RAID1+0 or RAID10 architecture. This is certainly the case when your query collects all data and writes them in the tempdb and then performs transformations in the tempdb on it. 

But wait: there is a better way to use Storage Spaces with SQL Server

My tests on existing Microsoft Fast Track reference architectures provided a 50% performance improvement using the same hardware and can handle way more simultaneous users



My newly developed ATOS Performance Solutions based on SQL Server 2014 with Windows 2012R2 Storage Spaces show that we are able serve at least a 100 users simultaneously with a reasonable good performance (from 12s to 90s). 

How did we do it?

Like I explained in my previous post we created one storage space for all the hard disks (80) and one for all the SSD disks (4 per JBOD) in the 4 JBOD's (4).  I created 8 virtual disks on the HDD storage pool using all the available disks and 2 virtual disks on the SSD storage pool. 

All virtual disks were created with 64 KB interleave size and the volumes formatted with the same 64KB block size. The data files were placed on the 8 HDD volumes and the tempdb files (12) and the LOG files were placed on a separate SSD volume. 

And the performance?

What we saw was when we used only one volume for sequential reads of 512KB we hit already 4.3 GB/s of I/O which is impossible to get when using a traditional RAID10 solution. Using more and more volumes we got up to 8.7 GB/s using only 4 JBOD's. 

With a traditional Fast Track reference solution we obtained 9.6 GB/s with 8 disk arrays and 32 LUN's or volumes. 

Loading the data was also way easier: we used only 4 parallel loading tasks in SQL Server Integration Services and obtained an effective data load rate of 700 MB/hour.  

We had obtained actually more than 1 TB/hour but we lost 300 MB/hour on the fact that since SQL Server 2012 data is first loaded in the tempdb before it is transferred to the actual table. This was all visible in the performance tools we used.

Using a Microsoft Fast Track Reference architecture way harder to obtain this value and the solution used is way harder to maintain in a production environment. 

It is all about latency!

The most important about this new architecture with Storage Spaces is that the latency was drastically reduced. A Fast Track reference architecture will provide the results with a 24ms+ latency and our ATOS Performance Solutions based on SQL Server 2014 was capable to return the data much faster with less processors (think about the license costs reduction!) and therefore could serve way more users (easily more than a 100 simultaneously in fact). 


The future architecture of Microsoft Analytics Platform System

I don't think that Microsoft Analytics Platform System as it exist now will continue: it is just too complex. Next to that it uses old 6 Gb/s storage hardware when everywhere there are 12 Gb/s solutions available and some JBOD's do not even support the use of SSD's (HP). Not very flexible then.

A new and more hybrid solution will be based on a Windows Storage Server solution with Cluster Shared Volumes. Probably one Cluster per rack. The storage cluster solution means that the servers used need to have more PCI slots to provide a resilient solution. 

On top of that will be running the 8 (HP) or 9 (Dell/Quanta) SQL Server compute nodes and because the CSV are highly available by design (maybe using two Melinox 56 Gb/s Infini Band cards per node) and a lower latency and a higher performance on the tempdb overall performance off the Microsoft Analytics Platform System increases beyond expectations.

Comments

Popular posts from this blog

Privacy and the liberty to express yourself on LinkedIn

Unaware that LinkedIn has such a strong filtering policy that it does not allow me posting a completely innocent post on a Chinese extreme photography website I tried to post the following: "As an Mpx lover I was suprised to find out that the M from Million is now replaced by the B from Billion. This picture is 24 Bpx! Yes you read this well, 24 billion pixels.  Searching on the picture I stumbled on a fellow Nikon lover. If you want to search for him yourself you can find him here: http://www.bigpixel.cn/t/5834170785f26b37002af46a " In my eyes nothing is wrong with this post, but LinkedIn considers it as offending. I changed the lover words, but I could not post it.  Even taking a picture and post it will not let this pass:  Or my critical post on LinkedIn crazy posting policy: it will not pass and I cannot post it.  The technology LinkedIn shows here is an example what to expect in the near future.  Newspapers will have a unified re...

Windows Server 2016 with ... XBox extensions !!!

Microsoft must have been thinking that the live of a Windows administrator gets so boring that they need a distraction and they have integrated the XBox Live extensions standard in Windows Server 2016.  No kidding: take a look. I did not select it as a feature and it is there. As you can see the XBox Live service is started automatically even in mode manual (see error log).  To my opinion these 3 XBox Live services should never be available on a server. Unless the server is part of the XBox Live platform of cause but honestly I don't think that Microsoft will allow that.  Every IT manager with a serious Windows production environment would fire any administrator playing XBox Live on the production servers. So though it might be tempting don't do it Windows administrator. It is a trap! :-)

How to run SQL Server 2016 with In-Databasse R on Windows 2016 CTP5

For those who like me tried to run SQL Server 2016 with In-Database R might have run into the same problem as me: In-Database R or the LaunchPad service gives a timeout and won't start. I did several clean installations with different configuration options - for instance I like to put my data on another disk than the system disk - but in the end I tried to do the next, next, next, finish install to see if it something in the setup options is hard coded in there (yes, it happens developers!). For some reason this problem is related to Windows 2016 and not on Windows 2012R2 and I hope the SQL Server team will soon resolve these issues because they are in one word a bit sloppy.  There are 2 issues (maybe even 3 so I give this one also):  The R setup does not create the ExtensibilityLog directory in the "C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Log" directory The R setup sets the number of users in the SQL Server Launchpad service to 0 it is pos...