Big Challenges with Unstructured Information

I had the honor of organizing a panel at the Createasphere Digital Asset Management Event in Beverly Hills late last month. Apart from the niceness of the Beverly Hilton (Trader Vic’s, the bar by the pool, really gives you that 60ies Hollywood feeling) the real cool thing about the panel was that Robin Harris, a.k.a. StorageMojo, accepted our invitation to moderate the panel.

Robin kicked off with his great vision of how the Universe hates your data. I love how that line captures that indeed anything and everything out there will continuously be trying to destroy your data: “entropy refers to the inherent tendency for any organized system to disorder”. Our data is – or should be – highly organized, so we have to put energy into it to do that (read more about StorageMojo’s “the Universe Hates your Data” thesis here and read more interesting StorageMojo stuff here).

I saw this as a first call for object storage: it takes a lot of energy to keep your data organized in a file system (since it forces you to organize your data by folder, sub-folder, etc). Applications that talk directly to the storage through an object interface can do that for you. It may seem counter-intuitive that one large pool of unorganized files is a much easier way to keep your data organized than a well-organized file system. It just requires a lot of maintenance for a file system to stay organized (and scale). Object storage is just a big-bucket that scales a lot more easily and it’s a lot more straightforward and less time-consuming to add the logic of keeping the data organized to the application.

Robin terminated his first session with some Big Data types, including Big Science, Big Entertainment, Big Streams and Big Business. Big Data is all about variety, volume and velocity of data. After this, I gave a quick overview on the analogy between Big Data for Analytics and Big Unstructured Data as discussed here.

The title of the panel was “Big Challenges with Unstructured Information”; the objective was to demonstrate how data created in many organizations has moved past terabytes and into the petabyte scale, and how to determine the best way to store, manage and analyze all this information turning it into logic and profit.

For this purpose I had selected a panel that represented the full Big Data stack: a user, an application provider, a data transfer provider, a storage provider.

David Sugg of Warner Bros. did a brief talk about the end-user’s view on the growing data sets. He dug into the massive Volumes of data he has to deal with at Warner Bros. and the challenges he has to make the data available with the required Velocity.

Fernando Mesa of MarkLogic added Variety and Complexity to this, with which we had covered the 3 main characteristics of Big Data only halfway the session.

Big Data is:

* Volume of internal and external data

* Velocity of data coming in and the speed of analysis required

* Variety of formats and sources of information, external and internal

* Complexity of meaning, context and structure.

Next up was Doug Davis of BitSpeed to inform us on the latest and greatest of fast data movement. They have some pretty interesting technology and explained how they try to solve the fact that Bandwidth is growing according to Moore’s law, but digital data is growing much faster.

No surprise, my session tackled the benefits of Object Storage to store these large volumes of data file systems. File systems will not scale sufficiently and actually become obsolete as applications will take over the role of the file system. We need scalable, efficient and durable object storage that provide a REST API through which the applications can interface directly with the storage. More about this here.

After the lightning pitches, it was time for the panel, which Robin Harris opened by inviting David to challenge the vendors in the panel. I was happy to be vendor of choice. Here is a short abstract of the conversation, the tone was all friendly and the topic was similar to Storagebod’s post, which you can read here.

(the conversations have been shortened a lot, I’m just trying to provide the essence)

Isn’t Object Storage just a science project? I don’t think the file system can ever go away.

Well, there is a small bookstore called Amazon that has been pretty successful with their object storage based online storage service.

StorageMojo added to this that AWS are  storing about 600 billion objects today (this site says it is already 762 billion files)

Yes, I have used it as well, but that service is so slow. There is no way that their latency will ever be acceptable in our industry. We need speeds of up to 750 MB/s

Well, that’s exactly why your industry needs to invest in building on premise infrastructures. At that point I invited Paul Speciale, VP Products at Amplidata, to shed a light on throughput performance of AmpliStor, our object storage system:

AmpliStor is based on a scale-out architecture, which also enables the ability scale-out performance with the addition of multiple Controllers, all sharing the storage pool. Each Controller can individually deliver 10GbE full bandwidth of aggregate throughput, when matched up with at least 16 AmpliStor Storage Nodes (containing a total of 160 SATA spindles). So a system with 10 controllers would therefore scale to provide 100 GbE bandwidth.

That sounds interesting, but what about the applications? There are no applications that support Object Storage

Not yet and for that purpose, object storage in the M&E industry will for as a transitionary step require a file system layer between the application and the storage, but eventually, as is the case with all the online cloud applications, the file system will become obsolete.



~ by tomleyden on March 5, 2012.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: