Hadoop Summit 2016 – Dublin, Ireland

The conference season is upon us and this time we will be traveling to beautiful Dublin, Ireland for Hadoop Summit 2016 on April 13th and 14th, 2016. Of course we are thrilled to be a Diamond Sponsor and we hope that you will stop by Booth #301 to talk to us about your Big Data and Analytics journey and how our Big Data Portfolio can help you along the way. We have great information to share with you about a Data Lake from EMC and why we feel that it is essential to the foundation for your analytics ecosystem. Our capabilities don’t end at the Data Lake though, we continue the Big Data Journey by utilizing Data Lake Extensions, Big Data Systems, and Big Data Solutions. Stop by to find out more about the capabilities in our portfolio as well as the Global Services we can provide to assist you with your Big Data use cases. Hadoop summit booth

We wanted to provide you with an overview of the sessions we will be speaking at over the two days

General Session: April 13th @ 0945 – 0955

Speaker: Carey James, Director Business Development, EMC Big Data Solutions

Abstract: Gaining Richer Insights and Business Outcomes with the EMC Big Data Portfolio

A recent study by IDC suggests that world data growth is expanding at a 5x rate, and will lead to as much as 40,000 exabytes by 2020. With data generation at such enormous rates, it is becoming essential that our interaction and understanding of these sets must bring out the value in a way that reveals rich insights and truly enhances business outcomes.  However, faced with obstacles from ingesting and indexing the right data from multiple sources to not being able to retain data long enough, and waiting on IT to spin up sufficient resources, businesses can struggle in their Big Data initiatives early on. EMC knows these challenges first-hand as we have been through them ourselves. From this experience, we have learned two things: the power of data is a game changer and the power of infrastructure is essential to discovering actionable insights. In this session, we will review the EMC Big Data Portfolio and related services that will help get your own initiatives off the ground– no matter where you are on your Big Data journey.

Breakout Session: April 13th @ 1220 – 1300 in Liffey Hall 1

Title: Tame that Beast: How to bring Operations, Governance, and Reliability to Hadoop

Speaker: Dr. Stefan Radtke, CTO EMEA, Emerging Technologies Division

Abstract: Many companies have created extremely powerful Hadoop use cases with highly valuable outcomes. The diverse adoption and application of Hadoop is producing an extremely robust ecosystem. However, teams often create silos around their Hadoop, forgetting some of the hard-learned lessons IT has gained over the years. One often overlooked feature is governance.

Does your company have good KPIs and measurements around what gets loaded into Hadoop? Do you have a good taxonomy and metadata tool? As your business grows, are you able to support 99.99% operations that your Hadoop instance can support? If your primary data center goes down, can you replicate models and data into another facility? As the prevalence of Hadoop usage grows, these questions are becoming increasingly common—and urgent.

Breakout Session: April 14th @ 1500 – 1540 in Wicklow Hall 2B

Title: Hadoop Everywhere: Geo-Distributed Storage for Big Data

Speakers: Nikhil Joshi, Consultant Product Manager and Vishrut Shah, Director of Engineering

Abstract: Traditionally, HDFS provides robust protection against disk failures, node failures and rack failures. The mechanisms to protect data against entire datacenter failures and outages leave much to be desired. Neither the storage substrate (HDFS), nor the applications on top (MapReduce, Hive, HBase…etc) are capable of running across geographies/data-centers. With Hadoop’s increased enterprise adoption, there is greater need to protect business critical datasets in Hadoop clusters. This is motivated in large part by compliance, regulation, data protection and business continuity planning. ‘distcp’, which has been the foundation for most Hadoop vendor backup and recovery solutions, just doesn’t cut it when strong consistency is required or when there are more than 2 sites. Cloud-native applications (especially in IoT scenarios) generate humongous amounts of data all across the globe. There is a need for global storage infrastructure to reason over this corpus of data. It’s time for Hadoop storage to break out of its single datacenter confines. In this talk, we will discuss the challenges, approaches and architectures to take Hadoop storage global! *Topics Covered*: Hadoop Compatible Filesystems (HCFS), Geo-distribution of data, Disaster Recovery, Storage Overhead, Strong Consistency, Multi-protocol Data Access, Shared Storage Architectures

Follow us on @EMCbigdata and get social with us on twitter by using #EMC #HS16Dublin

About the Author: Erin K. Banks

Erin K. Banks is a Product Marketing Director in the Telco Systems Business at Dell Technologies. Previously she was the Director of Product Marketing for the Unstructured Data Solutions group as well as the Messaging Director for Security Transformation at Dell Technologies.  She has been in the IT industry for almost 20 years, previously working at Dell EMC as Portfolio Marketing Director for Data Analytics. She has also worked at Juniper Networks in Technical Marketing for the Security Business Unit and VMware and EMC as an SE in the Federal Division, focused on Virtualization and Security. She holds both CISSP and CISA accreditations. Erin has a BS in Electrical Engineering and is an author, blogger, and on the board for Our SAM Foundation.