I started my career in a company that was specialized in data migration: the company I was working for specialized in migrations from flat file database systems to relational databases. Those were the days when we thought relational databases were the future of data management and data storage: soon, we’d be storing all our data in huge relational databases – or so we thought back then (http://www.orafaq.com/wiki/BLOB)… The NAS market boom still had to start and object storage was definitely not on our agenda.
But I’m diverting: the reason why I’m reminiscing about my data migration days is because of the how painful those migration projects were. The company I was working for was pretty good at what they did but in spite of that every project would be stressful at best, a nightmare more often than that. Think of debugging code that was written 20 years earlier (dead men’s code), missing standards, incompatibilities of all kinds … In short, I’m an advocate of architectures and systems that won’t require migrations – EVER!
Today, I find myself in a very different environment: structured data became unstructured data, gigabytes became petabytes and cloud, big data and IoT have made data more valuable than ever. The market is flooded with applications that enable companies to extract more value from their data. Tape archives are becoming less and less attractive – the disk storage market is booming. According to IDC, Worldwide enterprise storage systems revenue grew 7.2 percent year over year to $38 billion in 2014, good for just not 100 exabyte of capacity.
The big unknown for companies who are planning to deploy long-term storage systems is how much capacity they will need. Most storage managers know how much storage they need today and they may have a good idea of what their organization will need two years from now, but where will they be five years from now? Will data keep growing steadily or will new innovations generate even more data? Or less? So, how much storage should they be buying initially and how big should the acquired systems be able to scale? The answer is simple: you don’t know and to avoid having to migrate their data from one system to another at some point in the future, companies need to design architectures that will scale infinitely and will never require data migration. Here are the key requirements for such infrastructures:
Clearly, if you plan to never have to migrate your archive, the platform of choice needs to scale infinitely. This counts for total storage volume, but also for number of files/objects.
Make sure the system is hardware independent. This means the platform needs to be hardware agnostic (i.e. you don’t have to certify hardware so you can benefit from new hardware models as soon as they are available AND you can optimize operational servers with more memory, bigger disks etc.) and allow for a heterogeneous hardware infrastructure, which will allow you to gradually renew the hardware as needed.
Make sure you can upgrade, update and scale your architecture without downtime. This requires a complete separation of the software layer from the hardware. Especially for active archives at scale, you don’t want to have to take down your service for maintenance tasks.
Architecture flexibility: a free choice to deploy your architecture on one site or multiple sites and the ability to freely add more sites over time will guarantee that you can scale as needed or even gradually move data between sites without having to do an actual migration.
Not all the data in your archive is equally vital or needs the same access speed. A choice of data protection mechanisms enables you to optimize for availability, durability or cost-efficiency. Also, you may want to protect smaller data objects/files differently than large ones.
And finally, make sure your platform provides support for different data types and access protocols. This is especially important for archives that serve different workflows and workloads. Both Object and file access are desirable.
(article written for the Scality blog)
Attending a Super Computing Conference is always a humbling experience. It’s the event where you meet scientists that are helping to solve the hardest problems in the world, such as the discovery of new drugs, genomics research to better discover and treat diseases, finding new energy sources or the precise prediction of the next superstorm. The SC Conference is the place where innovations that are making these life-changing discoveries possible are shared.
What started as an informal get-together of a couple hundred computer scientists 25 years ago has become a major tradeshow, attended by thousands of researchers and technologists. This is not an audience of trade show tire-kickers but one of supercomputing specialists who continuously push technology to its limits to solve some of the greatest scientific challenges.
SC14 was Scality’s second participation in the show. Last year’s event led – amongst others – to Los Alamos National Laboratory (LANL) selecting the Scality RING as an active archive for complex simulations used to monitor the health of the US nuclear stockpile.
Lesson 1: a half-exabyte customer reference in this industry gets your technology to go viral quickly.
The two key use cases Scality supports for the supercomputing industry are Active Archives and Distributed Computing. Scality RING for Active Archives enables HPC customers to offload data off their expensive Tier 1 storage to free cycles for other high-performance workloads. The Distributed Computing use case focuses on the RING’s ability to run highly parallel and independent computing jobs.
Both use cases are very distinct, each with specific characteristics, but at the same time, they go very closely together: LANL is using the Scality RING both as scale-out archive storage and high-performance storage for simulations.
Lesson 2: the HPC industry is very actively investigating how they can lower costs by running compute jobs on industry-standard servers.
Challenged by Amdahl’s law, HPC storage architects are working hard to shift to more parallel and distributed architectures in diverse areas such as genomics simulations, biology research and electronic design problems. In many areas of research, the quantity of data being manipulated is growing exponentially. This is certainly true of numerical simulations that typically store short term results on very high performance “scratch” file-systems before storing the data more permanently on researchers “home” directories or on tape. It’s also true in all fields using increasingly accurate physical sensors to acquire data, such as oil and gas seismic surveys, radar and satellite imaging and electron microscopy.
Interest is growing in archiving these massive amounts of data. Having it readily accessible for further study increases its value and furthers the objectives of the research being performed.
A key benefit of the Scality RING is the ability to seamlessly integrate with existing high-performance storage systems. Scality is the only software-defined storage platform that supports mixed environments of GPFS, Lustre and object storage, with native support for NFS and SMB file access. HPC customers are leveraging Scality’s open source block driver to build data movers for their GPFS infrastructures and several technology partners are integrating their Lustre systems with the RING, using the Lustre HSM functionality.
Lesson 3: the RING is a perfect solution for “home” storage, but also supports some Scratch storage requirements. Think of Home Storage with Scratch performance capabilities
The Scality RING offers unique flexibility in this industry. Using industry-standard servers, customers can build Exabyte-scale storage systems that can be used for active archives, home storage and to some extent, scratch. Thanks to some highly visible customer successes in the HPC industry, the result is that Scality has very quickly become the hottest storage technology in this market. One visitor, who had been doing his homework before visiting us at the show described us as “Scality? Well that’s like Ceph that works, right?”
It’s a good start – we can live with that.
(article written for the Scality blog)
It has been six years since I wrote my first object storage blog, even though I didn’t mention object storage in that specific article. I had my head in the clouds as our early cloud startup was being acquired by Sun Microsystem when someone asked me to write a piece on EMC’s new baby, Atmos so I took a spin at it. That was object storage before object storage.
A lot has happened since. I’ve traveled around the world to evangelize object storage, organized dozens of object storage panels and wrote hundreds of pages on why companies need object storage, object storage use cases and early customer success stories. Today, I’m on a mission to help grow the success of the Scality RING, probably the most successful object storage platform in the industry. Since I joined Scality six months ago, I’ve seen more happy customers than the four years before combined.
So what is different? Why is Scality so much more successful than the other object storage players? Is it a better time? Maybe. Do we have better sales people? Quite probably. Is the product so much better? Most definitely.
There is indeed a big change in the market: data sets have finally grown beyond petabytes. But, companies still need to feel the pain before they take action. Technology evangelists’ predictions may force them to think, but budgets for new technologies are only approved when things get out of control. This still does not explain why Scality in particular is becoming the rising star.
I like to explain the success of Scality with what I call “the object storage paradox,” which consists of two parts:
First, Scality has abandoned object storage. Scality was the first company to understand that in order to sell more object storage it had to … stop selling object storage. In the Scality RING architecture, object storage is an enabler. It’s a small piece of a big piece of software that enables companies to build massive-scale storage pools. The object storage layer abstracts the underlying hardware and enables customers to deploy the RING on any industry-standard servers. However, to scale both storage capacity and performance to massive levels, the Scality RING software relies on a distributed, 100% parallel, scale-out architecture with a set of intelligent services for data access and presentation, data protection and systems management.
The second part of the paradox is that Scality was the first company to acknowledge that people still think – and will always be thinking – “files.” Yes, REST API’s are great – for certain use cases – but to enable a wider use of its technology, Scality supports native file system access to RING storage through a choice of file connectors and the integrated Scale-Out File System (SOFS). SOFS is a POSIX compliant virtual file system that provides file storage services without the need for external file gateways, as is commonly the case in other object storage solutions.
The SOFS file systems can be scaled-out in capacity across as many storage nodes as needed to support application requirements, and can be accessed by multiple NFS, FUSE, SMB or CDMI connectors to support application load requirements. The fact that EMC just acquired file gateway provider Maginatics after their earlier acquisition of Twinstrata – stresses the importance of native file system support.
Object storage solves the biggest challenge the storage industry faces: to store massive volumes of data reliably, with the highest availability and in a cost-efficient way. The paradigm is a reaction to the scalability limitations of traditional (SAN/NAS) storage systems, but that doesn’t mean customers are willing to give up the benefits of file-based storage.
A simple analogy is the increasing success of the electric car. We all know that the use of internal combustion engines will soon be reaching its limits as our fossil fuel resources are not infinite. But that doesn’t mean we want to give up the convenience of cars and start using electric trains. Electric cars allow us to continue to enjoy the benefit of owning private transport but solve many of the problems of traditional cars (pollution and cost of gas). Scality’s software-defined storage takes a similar approach: it enables customers to consume storage in a way they have been used to over the past 3 decades, but with a better engine under the hood.
They say good things come to those who wait – but sometimes good things happen simultaneously and you don’t even have to wait to revel in them. While I was enjoying a week of snowboarding in Val d’Aosta, right by the legendary Matterhorn, DDN was announcing the new WOS 360 at the Next Generation Object Storage Summit. As you can read in this article by Chris Mellor for The Register, the newest version of the industry’s most complete and versatile object storage platform was very well received.
The sportier reader may know that snowboarders and skiers spend half of their time on ski lifts. This gave me plenty of time to think about storage challenges and new opportunities for object storage. My epiphany came to me on the slopes of sunny Italy.
Being a gadget freak, I hit the slopes with a GoPro high definition cam on my helmet. I tried to mount it onto my snowboard as well, but the result was a bit shaky to say the least. While high-definition cameras have been around for a while, I was surprised to see how many skiers and boarders carried a camera. Turns out I’m much later on the adoption curve than I had been thinking. At times I counted one camera for every 10 skiers/boarders queuing at the lifts. Obviously, my random counts are not very statistically correct, so let’s say one in every 100 people were actually carrying a cam to be on the safe side. I shot about 8 GB of data per day, which doubles easily in the post-processing, but let’s say that the average cam user is less fanatic than me and shoots about 2GB per day, which doubles in the post-processing.
According to the 2013 International Report on Snow and Mountain Tourism, “there are about 70 countries worldwide that offer open air ski areas” and “about 2000 ski resorts have been identified.” Also according to the report, there are approximately 400 million skier visits worldwide, a ski visit being a day trip, or part of a day. That figure is said to have been stable over the last 10 years.
And now comes the fun (math) part:
400 million ski visits, means 4 million cameras, multiplied by 4GB = 16PB per ski season.
Add other action sports like surfing, mountain biking, speed bikes, skaters, etc. and we will soon see Exabytes of action sports data being shot every year.
I carry a MacBook with a 256GB SSD. Obviously, that’s OK for application data and documents, but definitely not for the tens of GB’s of action footage I was shooting last week. It has been common practice for a few years already to store pictures in the cloud and stream music, TV series and movies from online services, but I doubt there is a service that is ready to ingest PB’s of video footage from home users.
Strangely enough, there is no reason for such a service not to exist: Object Storage platforms like WOS make it possible to build profitable online storage infrastructures for large volumes of high definition video data. I hope to spark some ideas at the upcoming 2014 NAB Show.
Previously in this series, I explained the evolution of unstructured data and how storage requirements have changed over the past decade due to changes from above and below: the massive growth in unstructured data, mostly immutable data, requires cost-efficient, easy-to-scale storage architectures without the complexity of file systems. I noted that object storage was designed for this purpose, and that in addition to scalability and efficiency benefits, object storage also provides great benefits when it comes to accessibility as REST and other protocols make it very easy for applications to connect to the storage directly and give users access to their data through all sorts of mobile and web applications.
I also explained how information-sensing devices are not exclusive to scientific analytics environments: think of cameras and smart phones but also cheap network cameras for home security, thermostats that warm up our houses when we are on our way home, smart fridges or watering devices that allow us to keep our plants healthy and happy, even when we are on a vacation. This wave of innovation based on the capabilities to generate, process and leverage data in apps and devices is now popularly called The Internet of Things (IoT).
IoT is not a new concept. Wikipedia says: “The term Internet of Things was proposed by Kevin Ashton in 1999 though the concept has been discussed in the literature since at least 1991.” In its early stages, the concept relates to the use of radio-frequency identification (RFID) and how “if all objects and people in daily life were equipped with identifiers, they could be managed and inventoried by computers.” Finally, Wikipedia adds that, “Equipping all objects in the world with minuscule identifying devices or machine-readable identifiers could transform daily life.” And this is exactly what is happening today and also what makes IoT more important than ever.
Apple retail stores are full of “gadgets” that can make our lives easier, healthier and more enjoyable. Google has jumped on the gadget bandwagon with Google Glasses. And in terms of sizing this gadget market, Gartner is predicting that there will be over 25 billion IoT devices by 2020.
So what does this have to do with the topic at hand – Object Storage?
The one thing all those IoT devices have in common is that they log, generate and process data and turn it into information that can help us to keep track of our workouts, optimize energy consumption or bring our household automation (which sounds so much better in Italian: “domotica”) to the next level.
The fact that all these IoT devices are connected to the Internet means that more and more data will be uploaded from those devices. A lot of it is very small data, but from a volume point of view, if we take the sum of all the data those devices are generating, we are talking about exabytes and exabytes of information in the form of unstructured data – almost too much to fathom!
Traditional storage is simply not capable of handling these types of data. So, Object storage has emerged as the logical paradigm for storing such data. Designed for large volumes of data, it supports distributed environments and the applications that run on these devices can integrated directly with the storage layer. Not all object storage platforms qualify for IoT data, however. The reasons for this are:
- Small files in particular are a challenge for many object storage platforms: NoFS (No file System) becomes a key requirement
- Performance needs to scale in all dimensions, especially IOPS and latency.
- Different data types and different applications require different data protection mechanisms, as well as flexible architectures.
IoT has been around for a while but things are only getting started. Innovation doesn’t stand still, so who knows how data storage requirements will evolve over the next decade?
So here ends my three-blog series, which covered the evolution of digital data over the past 4 decades and illustrated how storage platforms have evolved to meet the requirements of new data types and applications. Object storage is designed for long-term storage strategies but we understand it will probably not be the end point.
In part one of this three-part series I summed up how the way we produce and consume data has evolved over the last three decades, creating a need for new storage methodologies that can help enterprises store and effectively manage massive pools of data. I concluded that the immutable nature of unstructured data storage holds the key to solving the scalability and availability problems of traditional file storage.
Unstructured data has traditionally been stored in file-based systems, which enable users to access files simultaneously and modify them. This is great functionality for office environments, where multiple users might indeed be updating each other’s spreadsheets, but it is complete overkill when storing data that will probably never be changed again. DDN developed our Web Object Scaler (WOS) solution with this “unchanging” aspect of data in mind.
WOS has none of the constructs of traditional file systems, which allows us to scale beyond petabytes as one single platform without any of the complexities file-based platforms have.
Object storage also addresses challenges associated with new data consumption patterns. File storage was designed to have users access share pools of storage through network drives. But today, we access our data through a variety of applications on desktops, laptops but increasingly on mobile devices like smartphones and tablet computers.
Object storage provides simple and fast interfaces – as opposed to slow gateways – for applications to access the storage directly. This enables new use cases like geographical data distribution, worldwide collaboration, multi-site DR and online, active archives.
Stay tuned for the third part of this blog series, in which I will dig deeper into new methods of data generation and data consumption and illustrate how the new old hype, The Internet of Things (IoT), will further impact storage requirements. Thanks to a new wave of innovation, including connected devices, this paradigm – which in reality was first discussed over a decade ago – has become more actual than ever.