Last week IBM announced it was building a storage array comprising 200,000 hard disk drives that run together, providing 120PB of storage capacity in a single container. The array will use IBM General Parallel File System (GPFS), to open and manage data across the disks. This is exciting news for a few reasons, for example:
- Firstly, theoretically IBM GPFS has always been able to scale up to handle 120PB of data. However this has not been demonstrated to my knowledge. This latest announcement from IBM re-enforces the message. It will certainly be good to see it working and proved.
- Second, the fact that the product is container based is also very interesting – both for customers that have already opted for the ‘containerised’ compute route, but also for customers looking for instant processing power or storage (for many reasons, often customers can’t wait for a data centre upgrade).
- Third, I suspect the array will need to be built using an alternative data protection strategy to traditional hardware RAID 6. This is because disk failures, multiple disk failures or disk failures during a RAID rebuild in windows, in a 200,000-disk installation, would be unavoidable. Commenting on the story in Technology Review, Bruce Hillsberg, director of storage research at IBM and leader of the project says, “the inevitable failures that occur regularly in such a large collection of disks present another major challenge.”
We don’t have many customers demanding GPFS manage 120 PB’s of data right now (We have just announced Durham University’s 1.1 PB storage system) but with the rapid rate at which companies are generating and storing data; I think this will be an essential product for many customers soon.
Please click the image below to open and view the gallery:
[picasaview album=”201109_DurhamUniversity” instantview=”yes”]