01 April 2015

Some Aspects on Online Data vs. Tape

We often get involved in  discussions around cost comparison of online data vs. data that has been backed up to tape. There are tons of TCO tools that you can use which you could use to make one or the other look cheaper. I’ll give two examples here:

  • An argument that is often used in favor of tape is: their acquisition cost is about 1/3 of dense archive storage. As a result, the overall TCO should be something like a third of disk.
  • Well, that’s obviously just a part of the story. The real question is: how long does it take to restore business relevant data and what does it cost if you need to wait for that restore for days or weeks to complete ?

The latter aspect is no fiction. I have had customers telling me that they stopped a restore of an important database restore after eleven days. This database was just 11 TB of size but during a restore from tape you don’t have any idea how fragmented it is and what the progress on the restore is. At the end, the customer lost a lot of money because this database was not available and the company could not work on customer request. It doesn’t require a lot of creativity to think about use cases where you lose thousands of Euros every minute.

This example shows how careful you need to be when someone shows you a TCO study that proofs that one or the other technology is cheaper. You really need to understand the use case for a serious TCO calculation that includes all relevant aspects. 

However, one thing is clear on tape: the only thing that you can do with data is to restore it to disks. That’s its. And you can hope that the media will be readable after some time and that the maintenance tasks of the backup software and tape library went well to keep the data readable.

Need to analyze or access data with Hadoop ?

Furthermore, these days, companies have already realized that they have often big value in their data that they want to monetize. The keyword here is big data analytics with solutions like Hadoop. That of course is only possible if you have all your data online and accessible for relevant tools. Solution like Isilon are ideal for this purpose as they allow you to keep active data and archive data on the same platform [1]. Policies allow you to move the data to the most (cost) efficient storage media while their logical access path doesn’t change. Applications and users will find the data always on and always at the same place. Since Isilon is a multiprotocol system you can even access the data via NFS, CIFS, HDFS, Openstack Swift, FTP etc.

 

image

Figure 1: Multiprotocol Access to data on Isilon with policy based tiering.

I’ll probably will do another post that talks about the advantages of using Isilon for Hadoop but this one is about tape vs. online data. There are several tools out there which could be used to do your TCO calculation and I just wanted to remind you with this little article that there is more to consider than just €/TB.

 

Further Reading:

[1]  White Paper: Next Generation Storage Tiering With EMC Isilon SmartPools

[2] White Paper: Next-Generation Storage Efficiency with Isilon SmartDedupe

[3] White Paper: EMC Isilon OneFS: A Technical Overview

3 comments:

  1. Great post!
    Thanks for sharing this list!
    It helps me a lot finding a relevant blog in my niche!

    hadoop administration course in chennai
    hadoop administration training

    ReplyDelete
  2. This type of message always inspiring and I prefer to read quality content, so happy to find good place to many here in the post, the writing is just great, thanks for the post.

    hotmail sign up login

    ReplyDelete
  3. THANKS FOR SHARING SUCH A GREAT WORK
    GOOD CONTENT!!
    SAN Solutions in Dubai

    ReplyDelete