Distributed Storage: Scalable Storage for Public Clouds

The promise of cloud computing is straightforward: self service, always available, pay-as-you-go and scalable – grow as you need. Users no longer have to pay for capacity they are not using. They just buy what they need and add capacity as their business grows.  For storage, this translates as follows: (a) A user can start with any size storage environment and scale it as needed, infinitely. (b) The data is and will always be available, reliably. (c) The service is very competitively priced, cost-efficiently, and usage-based from a billing perspective.

Cloud computing has helped creating a lot of opportunities. The drivers for this were the many innovations that have been brought to the market over the past three years. Those innovations range from full cloud platforms to online end user applications. There are about a dozen IAAS, or cloud infrastructure platforms on the market, and many more cloud-enabling technologies such as server and network virtualization products, cloud governance solutions, metering and billing applications etc.

An area where innovation has not kept up is storage. Most cloud infrastructures either run on very expensive storage systems that will not be sustainable from a price point of view or on very basic infrastructures that do not provide the required availability or scalability as required by standard Cloud SLA’s. The few “big” cloud providers have created their own storage infrastructures and it remains to be seen whether those infrastructures will meet the requirements as they grow.

To make things worse, recent evolutions in disk technologies (bigger disks at cheaper prices) have made it even harder to meet the storage requirements in cloud environments. Due to larger disks, RAID – the defacto standard protection scheme – has lost a lot of its advantages: restores take too long and leave the system less protected, RAID scales badly and it is not cost efficient due to the overhead.

Reliable storage is a fundamental requirement in providing online services, and quite a few storage system providers have been working on technologies to solve the shortcomings of RAID. The solutions can be put in three categories: new RAID schemes, software layers on top of traditional RAID and distributed storage.

A lot of research has been done to create new RAID schemes. RAID was a great technology so why not keep it and improve it? Unfortunately new RAID schemes do not solve the fundamental problems with RAID: long restore times, poor reliability scores, limited scalability etc. Other suppliers are trying to improve storage reliability and efficiency by putting a software layer on top of traditional RAID. While some of these solutions might give a benefit on the short term and reduce the management cost to some extent, this is not a viable long-term solution.

A more recent innovation for large-scale storage is distributed storage. The concept is simple: reliability & availability are guaranteed by spreading data over the entire storage system, i.e, over “all” the available disks. Using erasure coding, data blocks are split in sub blocks, which are spread over different disks, appliances etc. according to spread policies. The codec only needs a selection of the sub blocks to reconstruct the original data block. One vendor compares it to a Sudoku puzzle and that is a strong conceptual analogy.

While we are still early on the adoption curve, distributed storage is being recognized as the logical next step. A few solutions are commercially available and are seeing a growing success.  Efficiency and reliability claims vary but look very promising. One vendor claims they need as little as 40% overhead to provide better reliability than RAID6 and replication  (numbers depend on the size of the infrastructure and the exact desired reliability). Some solutions feature the possibility to spin down disks or even full appliances. This has a great impact on the energy bill.

The architecture of distributed storage systems makes it particularly easy to scale the infrastructure. As a matter of fact, distributed storage systems become more efficient and provide better reliability “as the infrastructure grows”: the more appliances and disk an infrastructure has, the wider the spread can be.

Performance numbers again vary for the different solutions but most products provide local caching (usually a mix of SSD and SATA) to improve read and write speeds. With online applications and virtual environments in mind, distributed storage systems typically support standard interfaces such as ISCI, NFS, CIFS etc. Other typical features are out of band optimizations for data integrity verification and assurance, zero copy snapshots and cloning.

Distributed storage is obviously a very new concept. The widely adopted RAID will not disappear overnight. But as storage needs (especially online storage needs) grow, solutions based on distributed storage will see a fast growing interest from the market.

An interesting Distributed Storage implementation to have a look at: Amplidata

Advertisements

~ by tomleyden on November 25, 2010.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: