Shifting The Storage Paradigm (Part One): The Evolution of Data

•February 26, 2014 • 1 Comment

The storage industry is going through a big paradigm shift caused by drastic changes in how we generate and consume data. As a result, we also have to drastically change how we store data: the market needs massive, online storage pools that can be accessed from anywhere and anytime. Object Storage has emerged as a solution to meet the changing needs of the market and it is currently a hot topic as it creates opportunities for new revenue streams.

In this three-part blog series, I will explore how storage has changed – creating a need for new methodologies – and why object storage is the prevalent platform for scale-out storage infrastructures.

To understand how storage has changed, let’s take a look at how data has evolved over the past three decades, paying special attention to data generation and consumption.

Transactional Data

In the 1980s and 1990s, the most valuable digital data was transactional data – database records, created and accessed through database applications. This led to the success of large database and database application companies. Transactional data continues to be important today, but there are no signs on the horizon that database solutions won’t be able to manage the – relatively slow – growth of structured information. From a storage point of view, the structured data challenge is handled well by block-based (SAN) storage platforms, designed to deliver the high IOPS needed to run large enterprise databases.

 

Unstructured Data

With the advent of the office suite, unstructured data became much more important than it had ever been before. Halfway the 1990s, every office worker had a desktop computer with an office suite. E-mail allowed us to send those files around; storage consumption went through the roof. Enterprises would soon be challenged to build shared file storage infrastructures – backup and archiving became another challenge. Tiered storage was born. Storage was both hot and cool. In the next two decades we would see plenty of innovations to manage fast-growing unstructured data sets – the file storage (NAS) industry skyrocketed.

But people can only generate so many office documents. The average Powerpoint file is probably three times as big today as it was back in 1999, but that is not even close to data growth predictions we continue to hear (x2 every year). Just like SANs have evolved sufficiently to cope with the changing database requirements, NAS platforms would have been able to cope with the growth of unstructured data if it weren’t for the sensor-induced Big Data evolution of the past decade.

Big Data

The first mentions of Big Data refer to what we now understand as Big Data Analytics: scientists (mostly) were challenged to store research data from innovative information-sensing devices, captured for analytics purposes. Traditional databases would not scale sufficiently for this data, so alternative methods were needed. This led to innovations like Hadoop/MapReduce, which we also like to refer to as Big “semi-structured” Data: the data is not structured as in a database, but it is not really unstructured either.

Bigger Data

Information-sensing devices are not exclusive to scientific analytics environments, however. Smartphones, tablets, photo cameras and scanners – just to name a few – are all information-sensing devices that create the vast majority of all unstructured information generated today. In the past decade we have not only seen a massive increase in the popularity of these devices, but also continuous quality improvements. This led to more and bigger data.  The result of this is a true data explosion of mostly immutable data: contrary to office documents, most of the sensor data is never changed.

This immutable nature of unstructured data holds the key to solving the scalability problem of traditional file storage. Tune into my next posts, where I will dive into how to leverage this aspect of enterprise data to develop an object storage solution for the shifting storage paradigm.

OPTIMIZING OBJECT STORAGE FOR AND WITH OPEN COMPUTE PROJECT (OCP)

•February 7, 2014 • Leave a Comment

 

I had the honor of sharing DDN’s WOS with audiences at the Open Compute Project Summit last week. Figure this— a marketing guy is invited to get on stage and tell a room full of open source aficionados that proprietary software is the way to go, and, that Swift is much more expensive when you look at the full TCO, including time to market, storage management and hardware overhead. True story. We have several TCO studies to demonstrate that.

040711-opencompute

I got plenty of advice on how to survive the event, the best piece being, “Don’t wear a button-down shirt, geeks don’t like that. A T-shirt is probably safest. Maybe a Star Trek T-shirt.” Since my Star Trek knowledge is limited to knowing that there is a guy with funny ears who can put you asleep by grabbing your neck and that everyone wears funny suits, I figured I might as well wear the button-down shirt. If you’re going to crash, you may as well crash with style.

But I didn’t crash. No one threw tomatoes, no one even boo-ed. Maybe that has to do with how interesting the proprietary object storage really is for the Open Compute community. It definitely had to do with the new partnership I was there to announce: WOS is now integrated with the Hyve OCP inspired hardware. I sensed even more excitement when I mentioned that WOS will go through the OCP certification program that had been announced in the keynote on the first day of the event.

WOS + OCP = A Recipe for Object Storage Success

The key message of my presentation was that DDN’s WOS leverages OCP to optimize efficiency for object storage, but at the same time enables the Open Compute Community to build scale-out and more efficient object storage. The NO-FS (no file system) architecture of WOS enables OCP hardware customers to fully leverage all the benefits of the hardware and exploit the low-energy and high-density features.

On the first day of the conference, Facebook’s VP of Infrastructure had explained that efficiency and flexibility are the two main requirements at Facebook when making decisions on new infrastructure. This connected smoothly to my presentation, as I dedicated a good part of the time to explaining how WOS gives customers the flexibility to optimize their object storage infrastructure to any of the five key object storage requirements: efficiency, reliability, accessibility, scalability and of course performance. Stay tuned for more detailed information on this in the coming weeks.

The interesting thing about the integration of WOS with Hyve – and future other platforms – is that it also adds to the flexibility of WOS. While the WOS7000 will probably remain the fastest and densest object storage node in the industry, customers can now choose to build WOS clouds with OCP compliant servers. The Hyve Ambient Series 1312 gives good density and allows for granular scaling (12 disks in a 1U node and no controller nodes needed), while Hyve Open Vault provides more flexibility towards optimizing compute power vs. capacity – combining separate compute nodes with high-density JBODs and fully leveraging the OCP form factor.

Playtime is over, object storage just got serious. I’m convinced that WOS+Hyve will drastically change the object storage landscape. Our evidence is the serious interest we generated at the OCP Summit last week, with some of the biggest names that are headquartered between San Jose and San Francisco.

SuperStorage

•November 21, 2013 • Leave a Comment

The temperature has been rising at DDN. Everybody has been working overtime, printing out datasheets, posters, presentations, slides etc. This is by far the best part of the year: it’s the SC’13 Conference in Denver, the annual get-together of “the real deal” of the scalable technology and high performance computing (HPC) industry. This is where it all happens – where research institutions, space centers, universities and government agencies come to find out what’s new in the industry and learn how to build bigger and faster supercomputers. And DDN is of course very much on their agenda. 

HPC is our home turf; it’s what we are known for and where DDN’s success started in 1998. This is the industry for which we created EXAScaler and GRIDScaler, for example, scale-out parallel file systems designed for supercomputers that have the highest performance requirements.

In 2009, with over 10 years of experience in designing high-performance storage for Supercomputers, DDN embarked on a new adventure: Web Object Scaler or WOS as we call it, scale-out object storage for web and cloud applications. WOS was designed to store trillions of objects in a cost-efficient manner and serve petabytes of data to users spread all over the word. As object storage takes off, DDN WOS is staking its claim in a whole new industry, and – a little bit to our surprise – also becoming a darling in HPC.Image

The scalability and performance numbers of WOS have caught the interest of universities and research centers, and we’re seeing deployments of WOS for projects like research collaboration or genomics archiving, for which the integration with iRODS is a great asset

Another such project, which we announced today at SC13, is the major WOS storage cloud project that is being deployed in the state of Florida: The Sunshine State Education & Research Computing Alliance (SSERCA – members, include Florida State University, University of Florida, University of South Florida, University of Central Florida, Florida International University and University of Miami) is deploying WOS to provide thousands of researchers across all disciplines seamless access to cloud-based scientific computing, communication and educational resources. They will be able to search, retrieve and archive vital research securely stored in a multi-site WOS cloud, initially sized at 12PB, but architected to scale far beyond that.Image

Innovation never stops at DDN: in September we launched GRIDScaler-WOS Bridge, which enables customers to integrate their GRIDScaler environments with WOS. Use cases are plenty; think of federating parallel file systems through the cloud, GRIDScaler archiving to an active Cloud Archive or building a cloud for DR purposes.

Universities or research centers can now deploy WOS storage clouds similar to the one SSERCA is building, which may be fully integrated with their GRIDScaler environment. But, there is of course much more than that. WOS-Bridge could, for example, be used on a multi-site medical campus to share sequencing data – and save the cost of acquiring multiple sequencers – e.g. one site could have a sequencer and scientists in other sites could review the data through WOS Access. At SC’13, we have been showcasing in our Booth #1307 live WOS-GRIDScaler storage tiering and data management demonstrations. We have also deployed an infrastructure that is representative for real-life use cases. And, it’s been an interesting week in terms of the discussions we’ve been having about how our customers and prospects consider using WOS.

Stay tuned – I’ll be sharing more on those use cases for WOS Bridge from the end user conversations we’ve had at SC13.

 

 

Object Storage from Dusk ‘till Dawn

•November 13, 2013 • Leave a Comment

Who said storage is boring? I’m not lying when I’m writing that I’m on my way to a place in the desert, far away from regular society, for a secret meeting with the Top Guns of the Cloud & Web Industry. I’m one of a select group of Object Storage visionaries who have been invited to meet the industry’s greatest, listen to their challenges and teach them how object storage – when done well – can solve all of their problems.

There is no arguing that object storage is currently the most watched storage paradigm. Object storage is potentially an even more important evolution than virtualization and cloud computing were 5 years ago: it’s about data, information, companies’ most valuable assets. The challenge of building Exabyte-size storage clouds is no longer just a concern for Facebook, Google and Amazon.

The concept of object storage is not new: there have been several object storage architectures in the past, and AWS S3 was actually the first offering of AWS, even before EC2. The problem is all of this gave object storage a bad name because it wasn’t done right. Consequently, today, object storage is primarily seen as slow storage.

Digital data continues to double every two years. Over 80% of that data is unstructured. As a result, cost-efficient high-performance storage for massive volumes of unstructured data is the holy grail of the storage industry. Many storage companies are trying to solve this challenge, but most are approaching the problem from the wrong angle, data protection, and are not digging deep enough to find the root of the problem: the file system. File-locking mechanisms make file-based storage complex and difficult to scale and the file system is mostly superfluous when storing massive amounts of unstructured data.

File based storage became the de facto storage architecture when office documents still formed the bulk of unstructured data. “Filers” were designed to keep data organized in hierarchic directory structures; locking mechanisms would prevent multiple users to access the same files – and corrupt the data.

But today, most unstructured data is immutable data, like photos and video recordings, and applications have become much better alternatives to keep data organized than directory structures. This is what object storage vendors need to leverage as only by taking away the file system and striving to pure object storage, you can meet all performance requirements scale out web application providers have.

I’m very much looking forward to learning more about what keeps cloud CTO’s awake today, which concerns they still have about object storage. I’ve been told I’ll be near territory where people disappear and are never found back, near desert ghost towns, and areas with trigger-happy people. Still I’m hoping to post some interesting insights in a next article though. I’ll make sure not to visit bars on desolate highways between Dusk & Dawn.

 

WOS and E=MC2

•October 25, 2013 • Leave a Comment

I’ve been blogging quite a bit at DDN these days, which doesn’t leave me with a lot of time to write here, so I thought I’d syndicate a few of my WOS articles here at Clouds & Beer:

When Albert Einstein attended the Hollywood premier of the movie, City Lights, the crowds cheered him on wildly. Charlie Chaplin told him, “They cheer me because they all understand me, and they cheer you because no one understands you.” When Einstein won the Noble Prize in Physics in 1921, it was for “his services to Theoretical Physics, and especially for his discovery of the law of the photoelectric effect.” Still, no one in the world truly understood his theory of relativity at the time. Object storage is not any different from any other smart historic paradigm shift – technology or otherwise. In essence, it’s a pretty simple storage theory, but people have yet to fully grasp the real value of object storage and time will tell whether they’re able to leverage the opportunities it enables.

While DDN probably won’t win the Noble prize for Physics anytime soon, we are getting recognition for how we’re changing the storage industry with Web Object Scaler (WOS). And these days, that recognition is coming directly from Albert Einstein, or at least the University he agreed to give his name to in 1953.

The Albert Einstein College of Medicine was founded as a department of Yeshiva University in 1951 to “prepare physicians and researchers to excel in the science and the art of medicine and basic, translational and clinical investigation.” Einstein was charmed by the fact that the school would “welcome students of all creeds and races.”

Fast forward to 2013: Shailesh Shenoy, the program’s director of engineering and operations at Albert Einstein College of Medicine, had a storage problem to solve: the Integrated Imaging Program’s (IIP) microscopes in his department were generating more data than his infrastructure could manage. We at DDN call that ‘a Big Data problem’. The IIP microscopes generate approximately 1 TB per week of specimen images. Like most research centers, the department has limited resources and IT staff. Therefore, it needed a storage system that would be easy to scale and manage, and provide integrated multi-site data protection and collaboration. More details about the use case can also be found in this article by TechTarget’s Carol Sliwa.

At DDN, we are always very excited to learn and write about customer projects. This project is still in its infancy, but there are a few very interesting elements in it with regards to object storage:

 – The data in case is sensor data, more specifically state-of-the-art microscope data. Future upgrades of these “sensors” will provide even higher resolution images, which again will impact the department’s storage requirements. As a result, data volumes for this project will increase both in object quantities and object sizes.

– Shailesh Shenoy understood early on that he needed to deploy a storage infrastructure with no scalability limitations. While his current needs could still be managed by conventional storage, he recognized that only object storage would allow him to avoid future migration projects. Also, as the storage for the IIP is managed by the engineering and operations team, management-intensive storage architectures were not an option.

– The biggest value of WOS for the IIP, however, is the straightforward application integration through the native WOS API’s. Instead of mounting shared network drives, users can now collaborate using a simple Java client that connects to the open source OMERO microscope image management system. And this is only the start as Mr. Shenoy is planning to have more applications connect to WOS. One shared storage pool for multiple applications, it’s what object storage was designed for.

What I found particularly interesting in Carol Sliwa’s coverage of this DDN success story, is how it neutralizes the two key arguments against object storage: “Two knocks on object storage systems have been performance and proprietary APIs, but neither has been a problem for Shenoy. He said WOS performance has been good, and DDN’s proprietary WOSLib and HTTP/REST APIs don’t concern him. Some of IIP’s collaborators also use WOS.”

Such testimonials are very important in the object storage debate, and it’s great to see WOS solve the issues that other platforms have been creating. Once again, the NoFS, pure object storage design (WOS is the only object storage platform with no file systems anywhere in the architecture) is key to the success. In this case it helped to meet the performance requirements for Shenoy’s project. Even more important are DDN’s efforts to provide easy application integration: by offering a wide selection of API’s (Native REST, S3, application-specific API’s), file system gateways and by supporting third party applications like iRODS, the University was able to put WOS in production in the shortest time. We’re pretty sure that Albert Einstein would have been proud!

DDN probably won’t win the Nobel prize for Physics, but we did come close this year! A Belgian co-winner is good enough for me!

Object Storage is THE Big Thing

•October 1, 2013 • Leave a Comment

In my opinion, Object Storage officially broke through this month. It went from being the “next” big thing to being “the” big thing. Why? Because Larry said so. Well, technically he didn’t: someone else had to do it for him as he skipped the Oracle Open World Keynote to see Team USA win the America’s Cup in the most exciting regatta of all times. So I’ve been told. I wasn’t there and I’m not a sailor. I kite surf, like Richard Branson.

What has Larry Ellison to do with Object storage and how is he an authority on the topic? Isn’t object storage about unstructured data, rather than databases? Well, correct: I don’t expect a whole lot from Oracle’s object storage technology. They are a database company and they are pretty damn good at that. But they haven’t been exceptionally successful with the Sun storage legacy other than the tape part of that business. Oracle’s object storage technology could have been good if they had recycled some of Sun’s honeycomb technology, but it doesn’t look like that is going to be the case. The little information that is out there points to OpenStack Swift, and we all know how well Oracle and open source blend! Also, Swift isn’t exactly storage-efficient: the technology was designed to run on cheap, commodity hardware – not really Oracle’s game.

Oracle could have owned the ecosystem that has been built around OpenStack. It was called the Sun Cloud, which was the first open alternative to Amazon Web Services. I was involved in the European launch of the Sun Cloud, and for many of us, we never doubted it would be a success. But, the compute cloud wasn’t all ready for launch when Oracle killed the project, despite the storage cloud (REST, WebDAV, buckets, multi-tenancy, the whole shebang) doing pretty well in the internal betas. Mind you, I’m talking 2009. Exactly one year after the Sun Cloud project was killed, the OpenStack initiative was started.

So, if not too much is to be expected from the Oracle Object Storage platform, why is it so important? Well, exactly five years ago, Larry Ellison had this to say about Cloud Computing. Those were the days when 80% of the cloud discussions were arguments around whether cloud computing was the right path forward or not. About two years later, when the Oracle marketing machine was cloudifying their product line at full speed, cloud conversations had moved to cloud and security, private versus public cloud, and cloud applications. No one doubted cloud computing anymore.

History does repeat, in spite of what these guys sing. There is no hilarious YouTube video of Larry saying object storage is just marketing buzz, but after two years of evangelization and discussions on whether object storage will replace file based storage or not, Oracle is now “re”joining the game. Coincidentally, that announcement was made just a week after two of my favorite object storage critics changed their tunes (they still need a few final proof points though, which I’m working on getting to them) and at exactly the same time when object storage conversations have shifted from object versus file storage to private versus public object storage, object storage security and the effect object storage has on bandwidth consumption.

It’s times like these that make a technology marketer’s heart beat faster.  We’re finally on cruise speed, ready to talk about what’s really interesting: use cases, integrated applications and interesting customer successes.

The Structure in Next Generation Object Storage

•June 30, 2013 • Leave a Comment

It’s about time to revive this blog: it has been almost 6 months since I joined DDN and it has been a roller coaster ride (I love roller coasters). I’ve learned a lot about how to build a better object storage platform and I hope I’ve helped a lot of people here how to better explain the benefits, opportunities and use cases of object storage to customers. Talking about customers, it’s been exciting to visit our major WOS accounts and see object storage in action at massive scale.

That said, last week (actually, almost 2 weeks back now) it was demonstrated again that we’re only halfway there. It was the time of the year for GigaOm Structure and the guys from the Next Generation Object Storage Summit (in)conveniently scheduled their object storage get-together the same week (avoid the overlap guys). At NGOSS, a surprisingly large amount of the conference was spent on the “what is object storage” debate. At Structure, my workshop about how the right type of object storage can resolve the scalability, efficiency and performance challenges of large scale web applications was very well attended, even though a lot of it was a basic object storage 101.

I had avoided the NGOSS event previously as I didn’t see the point of just presenting to my competitors (half joking), but this event turned out to be very enlightening. It was good to learn more about Intel’s initiatives (adding Erasure Coding to Swift and their CosBench “object storage benchmarking” project) and there were a few very interesting presentations and panels. As a matter of fact I think a next event should have us spend some time on discussing performance for object storage in general. Performance has been the missing piece in many object storage conversations but yet it is oh so important. In times of whistleblowers, the Yottastor use case got a great deal of attention and the discussion around government use of object storage was very exciting. Based on the number of questions Reuven Cohen of Forbes had for Bob Carlson at Yottastor, we should hope for some coverage on the solution in his blog some time soon. If not, I’ll dedicate a post to the Yottastor solution on this blog so stay tuned.

The fun moment of the conference was when Chris Mellor, absent for good reasons, challenged us all through a video message: “show me where the file system fails and where object storage saves the day”. Chris, I would suggest you consider the question differently and not suggest that file systems will ultimately die: just like tape will never die, the file system will continue to play a strong role in an integrated solution stack where object storage becomes one element of a larger data center framework. There will always be use cases for file storage and some file systems will scale massively (and probably never break… DDN certainly has a number of high scale use cases to talk to). But it’s about avoiding the file system complexity and overhead. It’s about storing large volumes of mostly immutable data more efficiently etc. etc. etc. We invite you to talk with University College London (a train ride away…) and learn what they’re using object storage for, their ambitions to go to and beyond 100PB, the value of a scalable, efficient and portable platform….

Back to the essence: believe it or not, but I thought the “what makes a storage platform an object storage platform” discussion still the most interesting one. Is a REST API enough (no)? Does it need to be pure object, without POSIX on the disk level (to be efficient and scale… of course). Isn’t it defined by the benefits the platform provides (yes, that as well). Industry analyst Ben Woo offered to take the lead in composing a check list to help buyers decide whether a proposed object storage solution will do the job for them, and Deni Connor pointed out a lot of the debate is also covered in their recent object storage survey report.

Coincidentally (or not), most of what’s been discussed at the NGOSS are topics that my readers have been reading about for some time. As these conferences confirm there is still a lot of education to do, let me dedicate the next few posts to the essence of object storage. By the time the summer is over (if it ever gets started here in Belgium), it will all be crystal clear (kind of).