Readers of my blog know that I share best practices and client experiences here in my blog. In the post Isilon as a TSM Backup Target – Analyses of a Real Deployment , I described the “before and after” situation in an Isilon deployment for an IBM Spectrum Protect (former name Tivoli Storage Manager, TSM) solution. These results were simply a view on how a production work load looked like and how throughput and resulting backup windows evolved.
Interestingly, IBM –in the intent to position their IBM Spectrum Scale/Elastic Storage Server as the better solution-, hired a marketing firm to evaluate a performance benchmark on IBM Elastic Storage Server (ESS) against my mentioned post and publish that as a white paper . The results highlighted in this paper indicate, that IBM Spectrum Scale is 11 times faster than a similar EMC Isilon configuration. I’m just guessing at why IBM did not publish that themselves, rather than paying a marketing firm to do it. I assume they are too serious to publish a comparison between snapshots of an averagely loaded production environment with a prepared benchmark test that was suited to evaluate the maximum performance of their solution.
The results published in my post were by no means showing any limits of the platform and the results were influenced by external clients, additional server and network traffic etc. Also nothing was said about server and storage utilization or any other means of potential limits or bottlenecks. It’s obvious that the authors of the white paper did not read my blog, otherwise it would be hard to explain, how they could accept such a comparison.
To wit: IBM sponsored a white paper that compared an early customer workload of a new production environment with a well prepared benchmark. They used quite different equipment but called it a “similar environment”.
Not a like to like workload comparisonThe results published in my post are by no means showing any limits of the platform and the results were influenced by external clients, additional server and network traffic etc. Also nothing was said
about server and storage utilization or any other means of potential limits or bottlenecks. It’s obvious that the authors of the white paper did not read my blog, otherwise it would be hardly explain how they could accept such a comparison.
In summary: IBM sponsored a white paper, that compared an early customer workload of a new production environment with a well prepared benchmark and called it a “similar environment”.
IBM used three times the number of disk and Infiniband and called that a “similar configuration”Beside the fact that it has not been a benchmark to benchmark comparison, there are more things worth mentioning:
- The IBM GPFS Storage Server contained 348 NLSAS Disks whereas the Isilon system contained just three NL400 nodes with 108 SATA disks in total.
- The TSM Servers were connected through 56 Gb/s to the GPFS storage servers whereas 10 Gb/s Ethernet was used to connect to the Isilon Nodes. I don’t need to emphasize here that Infiniband-Networks are not widely deployed in commercial production environments.
- The load on the IBM system was generated by TSM clients running co-located on the TSM servers and generated benchmark data which were not read from disk but pumped directly to the servers.
- In the environment I described in my blog post, the clients were connected via Ethernet resulting in additional latency and shared network resources and backed up real file systems incrementally.
The issue with IBM Elastic Storage Server is not performance – it’s the complexityI worked 17 years for Big Blue and there is no doubt that the core of ESS, the Spectrum Scale (former name GPFS) is a saleable filesystem. For good reason, it has some installations for High Performance Computing where the required expert skills are available to bring a GPFS cluster up and keep it running. This is typically not the case in commercial environments were IT staff has limited resources and business requires simple to manage rather than complex solutions. I know various customers that have or are in transition from GPFS to Isilon and they all prefer Isilon due to the simplicity for installation and maintenance.
Gartner’s view on the IBM Enterprise Storage Server vs. EMC IsilonHere is an original statement from Gartner regarding IBM Elastic Store , “[..] Elastic Storage lacks features such as built-in de-duplication, compression and thin provisioning. Although IBM has made improvements by modeling the graphical user interface (GUI) after the popular XIV interface, overall manageability continues to be complex.
In the same paper, they write about Isilon: “Among the distributed file systems for scalable capacity and performance on the market, Isilon stands out, with its easy-to-deploy clustered storage appliance approach and well-rounded feature sets. The product includes a tightly integrated file system, volume manager and data protection in one software layer;
IDC findingsIDC evaluated already in 2011 the business impacts of Scale Out NAS Solutions. They evaluated the Isilon OPEX savings to be 48% over traditional solutions. I am confident that this is the result of Isilon’s integrated architecture and the resulting simplicity. That’s affirmed by customers that responded to IDC in interviews
“With Isilon, storage provisioning takes maybe an hour a year. Before it took three to four hours a week. Storage allocation used to take five to six hours a week, and now it takes six hours a year.”
“We're much more competitive because of Isilon and we're winning more jobs. Our three biggest competitors have all bought Isilon [solutions] since we did.”
“Isilon allows us to manage petabytes of storage with a tiny staff and scale easily so we don't have to worry about creating and managing volumes, or managing a bunch of other things that create costs.”
“As we grow, we can add a node in 60 seconds, which means we can take on large customers and also be more responsive to existing customers.”
Although IBM has now come up with a GUI which intends to make some management tasks more intuitive, that does not mean that the architecture in general has been simplified. Let’s look at some details why IBM Spectrum Scale is still complex.
The complexity of Elastic Storage Server – it starts from the bottom: RAID !Even though the GPFS Native RAID (GNR) has removed some limitations of the traditional Hardware-RAID implementations, it’s still RAID and it needs to be understood, configured and maintained. The Administration Guide for GNR  has 262 pages – just for GNR. It looks like there are a lot of concepts and details to learn and consider before you can even start to implement other components of the cluster. Some important GNR concepts are based on entities like
- declustered arrays
- recovery groups
- pdisk-group fault tolerance
- pdisk paths
- Log vdisks
- GPFS Native RAID vdisk configuration data (VCD)
- VCD spares
- RAID Codes
- Block size
- vdisk size
- Log vdisks
Vdisks are created within declustered arrays, and vdisk tracks are declustered across all of an array's pdisks. A recovery group may contain up to 16 declustered arrays. A declustered array can contain up to 256 pdisks (but the total number of pdisks in all declustered arrays within a recovery group cannot exceed 512). A pdisk may belong to only one declustered array. The name of a declustered array must be unique within a recovery group; that is, two recovery groups may each contain a declustered array named DA3, but a recovery group cannot contain two declustered arrays named DA3. The pdisks within a declustered array must all be of the same size and should all have similar performance characteristics.
As we can see, there are many low level concepts that an administrator needs to understand before he/she can configure a reliable and balanced system. And the level of detail the admin needs to consider goes even further. For example: When creating a declustered array, several attributes need to be configured for each array [5, S.24]
The number of disks' worth of equivalent spare space used for rebuilding vdisk data if pdisks fail. This defaults to one for arrays with nine or fewer pdisks, and two for arrays with 10 or more pdisks.
The number of disks that can be unavailable while the GPFS Native RAID server continues to function with full replication of vdisk configuration data (VCD). This value defaults to the number of data spares. To enable pdisk-group fault tolerance, this parameter is typically set to a larger value during initial system configuration (half of the number of pdisks in the declustered array + 1, for example).
The number of disks that must fail before the declustered array is marked as needing to have
disks replaced. The default is the number of data spares.
The number of days over which all the vdisks in the declustered array are scrubbed for errors.
The default is 14 days.
Let’s look at task level to understand the resulting management implications of the ESS complexity:
Tasks to set up GNR on ESSJust for setting up GNR on the Elastic Storage Server, you need to conceptually perform the following steps [5, S.47]:
- Configuring GNR recovery groups on the ESS
- Preparing ESS recovery group servers
- Disk enclosure and HBA cabling
- Verifying that the GL4 building block is configured correctly
- Creating recovery groups on the ESS
- Configuring GPFS nodes to be recovery group servers
- Defining the recovery group layout
- Defining and creating the vdisks
- Creating NSDs from vdisks
Isilon has a well-integrated RAID, Volume Manger and Filesystem for SimplicityIn contrast, Isilon has a well thought through concept of an integrated RAID-level, Volume Manger and Filesystem. There is no need to deal with any of the above concepts as there are no RAID arrays or Volumes/vdisks in Isilon. There is just a single filesystem that’s available right after you boot up the appliance. Data as well as parity data is spread across all nodes and disk in the system. The protection level can easily set by cluster, pool, directory or even file level. It can be changed any time on the fly. The education required to administer an Isilon Cluster is a just a couple of hours – if at all.
Figure 1: OneFS has integrated RAID, Volume Manager and File System which makes it extremely easy to manage.
For further details on the Isilon and OneFS Architecture please refer to .
Adds to complexity: Lack of native Multi-Protocol support on the IBM Elastic Storage ServerAs we have seen from the Edison Group Paper , IBM installed GPFS clients on the TSM servers. On one hand, this stands for efficient and fast communication by leveraging the proprietary NSD protocol. On the other hand, this is required due to lack of any native standard protocol support on the ESS. If your applications require NFS or CIFS support, you need to add a protocol server that has a GPFS client installed and serves the application via NFS or SMB. This is not something I would consider a Scale Out solution. How would things like cluster locking be done in such as case ? Install SAMBA and the clustered trivial database ? That concept has already failed with SONAS and is not even supported by IBM. Accordingly, you must install and maintain GPFS clients an all your application servers that require access to the shared filesystem. You’d have Windows and/or Linux Servers to maintain in addition to ESS.
In addition, you need a management server running xCat and another one for the Hardware Management Console. Both interact with each other for various management and monitoring tasks.
You may agree that another full time employee would be required to keep the zoo running with many problems on the horizon:
- Mange the IBM Spectrum Scale Filesystem, potentially hundreds of declustered RAID arrays, volumes, NSD and clients
- Consider inter-dependencies on OS-Level, GPFS-Client/Server versions, TSM versions
- Non-disruptive upgrade of all GPFS-Clients and Servers (this might be possible in theory but practically I haven’t seen it)
- Cluster aware locking
- Homogeneous Monitoring, Reporting, Auditing, Authentication, Security,…..
Isilon on the other hand as a rich set of natively supported protocols like NFS3, NFS4, SMB2, SMB3 Multi-Channel, FTP, HTTP, NDMP, SWIFT etc. No external protocol servers required, no plugins, connectors etc. All protocols are implemented natively, even HDFS. Everything is tightly integrated in terms of authentication and authorization supporting multiple instances of Active Directory, Kerberos, LDAP, NIS and local providers. For more info see .
The ESS dual server building block vs. Isilon Node typesThe IBM Elastic Storage Server has been build around the concept of a ‘building block’. Each building block contains always two IBM Power S822L Servers (of which one has a storage enclosure attached). You can choose between two model lines (GS and GL) with two types and various number of SAS storage enclosures. Whatever use case you are planning for, a building block has always two of the very same servers. Within the rack you can scale to something like a petabyte of storage, but still it’ll remain two servers. That looks to me like a legacy dual controller concept rather than a scale out architecture (at least within the rack; you might be able to add additional servers but they’d require additional SAS enclosures and cabling). Also it seems to me that the one size fit’s all strategy (it’s always the same server type, regardless of the workload) might not be the most cost efficient one. With Isilon you have the choice of 4 node types ranging from very fast S-nodes to highest dense HD-nodes. You can choose the right node type that fits to your workload in a most efficient way. Nodes can be mixed and data is placed or tiered by policies (the concept of tiering is quite similar in both, GPFS and OneFS).
Using an HPC like setup for backup/archive or an easy to manage EMC Isilon clusterAs the evaluation report  illustrates, IBM is selling an HPC-like solution with impressive performance numbers for a backup/archive solution. The solution is based upon a cluster were all clients (in this case the TSM servers) have to be members of the cluster. The interconnect technology is Infiniband, well suited and used in HPC environments. Beside the complexity that users need to manage, I’d be curious to see whether this is a cost effective solution for backup and archive purposes.
The EMC Isilon solution is based on commodity components in a well integrated a easy to use appliance. The interconnect technology between the TSM servers and the Isilon cluster is 10 Gigabit Ethernet which is typically available in every datacenter . With dsmISI  for IBM Spectrum Protect, the whole operation of IBM Spectrum Protect gets even more simplified and optimized in a fully automated manner.
Further differences to considerThere are many more aspects to consider when customers look for a solution, for example:
- Reliability of the Architecture
- Solution certification for 3rd Party solutions
- Anti Virus support
- Monitoring/Reporting Tools
- Integration into VMware and other environments
- Maturity of the solution, number of shipped systems
SummaryThe competitive evaluation report of the Edison Group  has compared a real production environment workload profile with a specifically tailored benchmark that IBM has performed to demonstrate that IBM Elastic Storage Server is better suited to provide a backup target for IBM Spectrum Protect. While not comparing a similar configuration and real Isilon maximum performance values, the evaluation report did not consider the biggest issue with IBM Elastic Storage Server and IBM Spectrum Scale: complexity. While it may provide high throughput values, the solution requires a very high degree of skills and administration efforts compared to EMC Isilon. A Similar finding can be read in the Gartner Research Note .
AcknowledgementsThanks to Matthias Radtke and Lars Henningsen for reviewing my writing and providing useful comments.
References and Further Reading The Edison Group: IBM® Spectrum Scale™ vs EMC Isilon for IBM® Spectrum Protect™ Workloads; A Competitive Test and Evaluation Report; https://www-01.ibm.com/marketing/iwm/iwm/web/….
 Gartner Research Note: Critical Capabilities for Scale-Out File System Storage, January 2015
 IDC Market Scape - Scale-Out File–Based Storage Market, January, 2013
 IDC Lab Validation Brief: EMC Isilon Scale Out Data Lake Foundation, Essential Capabilities for
building Big Data Infrastructure, October 2014
 Product Documentation: GPFS Native RAID, Version 4 Release 1.0.5, Administration
 EMC Isilon – OneFS – A Technical Overview, White Paper, November 2013
 Quantifying the Business Benefits of Scale-Out NAS Solutions, IDC White Paper,
 Isilon as a TSM Backup Target – Analyses of a Real Deployment,
Blog Post, http://stefanradtke.blogspot.com/2014/06/isilon-as-tsm-backup-target-analyses-of.html
 How to optimize Tivoli Storage Manager operations with dsmISI and the OneFS Scale-Out File System.
Blog Post, http://stefanradtke.blogspot.com.es/2015/06/how-to-optimize-tsm-operations-with.html