Datameer Puts an Approachable Face to an Intimidating Hadoop

Hadoop was the official “buzz” around O’Reilly Strata Conference 2012 in Santa Clara. We can all agree that Hadoop is a key component to a Big Data strategy, but without the “ease of use” factor, this critical component can become a disabler instead of an enabler. Similar to a data warehouse environment, if a business user does not have the business intelligence tools to quickly and easily explore data, the data warehouse is ineffective. In these instances, organizations end up with silos of user-friendly MS Access databases and Excel spreadsheets.

Enter Datameer…the first and only packaged business intelligence solution for Apache Hadoop. The Datameer booth at Strata caught my eye and I was very impressed with their value proposition because they focus on the end user, the person who will make or break the Big Data project. Datameer solutions address a broad spectrum of needs, from users in IT to business users who are looking at ways to quickly integrate and analyze data from unstructured data such as log file data, social media activity, etc. I sat down with Stefan Groschupf, CEO Datameer at Strata to learn more about how Datameer will help organizations transform their business with Big Data.

Who is Datameer and how is your company unique in the industry?

Datameer is the first business intelligence platform on top of Hadoop. So we have a data integration suite we can connect to any kind of data source. We have a spreadsheet user interface that then compiles into MapReduce jobs that run on top of Hadoop, and we have a visualization module that allows you, drag and drop, to create beautiful visualizations.

So how do you see big data transforming business?

I always joke that big data is a big buzzword to make big money. I was one of the early guys at Nutch that spun off Hadoop and what is interesting to me is that with Hadoop, the limitations of storage and compute disappear. That allows us to do completely different things with data than what we could do before.

Traditionally, we have a three-step process – ETL, Data Warehousing, and BI. With ETL, we try to pre-optimize data into certain structures that are really well-suited for structured data warehouse. And then we have a BI sitting on top of that. The problem is that it takes us six months to basically deploy this kind of infrastructure. And if a business user has a question, it takes really a long time, and it required a lot of expertise to do the analytics. So now, with Hadoop, we don’t need to pre-optimize data anymore. We can basically always go back to raw data.

So instead of having a schema on write, and we try to guess what the questions are that the business user might ask, we can do a schema on read, so be very dynamic, it’s always a view of the data. So pull in raw data, because there are no limitations of storage and compute with Hadoop anymore, and then you have a business user in case of our product that just uses a spreadsheet to analyze data on.

And that’s the biggest change in BI in the last 30 years, because now we really have business agility.  It’s not just that we have end user tools, but also the underlying infrastructure supports to ask questions – any kind of questions, and always go back to the raw data, to any kind of data to integrate your transaction system with unstructured and semi-structured data.

Business agility is key. If a business user wants more data, the IT guy has to get budget and time. This ends up being a six-month cycle and the business problem is perhaps not relevant anymore?

Right. So a common problem today is that companies have a very hard time to get acquire and retain customers. The way to make more money is to increase customer life cycle value. And so you really need to understand interactions you have with your customers.

The problem today is that with traditional BI systems, you have a very long deployment cycle. It takes you a lot of time to answer certain questions, because you have to massage data in certain structures, you need SQL. And usually, people that have questions are not the folks that know how to use the tools. The business user does not know how to use SQL.

So Hadoop changes this. It brings in real business agility by having unlimited storage and compute, and really just solve your problem with a lot of cheap hardware, so to say it. And you win that agility by really asking any kind of question, being able always go back to the raw data, and don’t have to spend month, and months, and months in pre-optimizing data for RDBMS systems.

Thank you for your time.  We look forward to hearing more from Datameer.

About the Author: Mona Patel