3 Types of Data Swamping and How to Avoid It

The influx of IoT technology provides tons of benefits when implemented the right way, but the reality is it can be tough for companies to make the best decisions regarding the data volume and complexity. The vast number of sensors, embedded systems and connected devices making their way on to the edge of the network continues to grow. While everyone wants to tap the rich potential of IoT data, we’ve seen that the data flowing through so-called network tributaries can quickly overload traditional data lakes and analysis tools, leading to data swamping.

swamp

Data swamping is an expensive state resulting from the inefficient handling of information within an organization. In fact, it is estimated that today nearly 99 percent of all collected data is never utilized. The results of data swamping are slow decisions and a costly network and storage burden. Data swamping can present itself in several interrelated ways:

1. Long looping: The goal of IoT is to make valuable, data-based decisions in a timely fashion.  Despite the fast-moving requirements of modern organizations, some still backhaul all collected data directly to their on premise data center (or cloud) for analytics, only to push decisions all the way back to the collection points at the edge.  This is highly inefficient when shorter-term decisions are possible and perhaps even required. Unfortunately, the concept of an immediate cloud-based decision is contradictory and doesn’t work in use cases where periods of lost connectively will disrupt critical operations. Robust IoT Gateways, like the Dell Edge Gateway 5000 Series, can run in-memory analytics on streams of incoming data to help companies make rapid decisions near the very edge of the network, while routing meaningful data to the right final repository for further action or longer term storage.

2. Data hoarding:  IoT data is often perishable and of little long-term value (think of binary automated building data that the lights are on or off).  Aside from the immediate need to execute on rules against a given context, a significant amount of IoT data is of low-level importance and can be discarded on the spot. Many organizations are not only sending this data long distances for decision making, they are storing it without any clear vision for its future use.  This transfer and hoard-like storage behavior results in complexity and data silos, which contributes to the data swamp.  Another benefit of running local stream analytics at the edge is the ability to apply metadata at the point of ingestion that can help ensure proper routing to a final repository and facilitate future access.

3. Traffic density: Sending data long distances through networks can not only present missed decision-making opportunities, but also results in expensive provisioning, transfer and storage costs. This behavior can be likened to constantly overnighting packages of junk to yourself only to stack them up in an expensive storage unit with no intended use.  The expenses add up especially when dealing with wide-area networks, like cellular, and high-bandwidth data such as video.  Rather than sending live streams over the network it is more efficient to analyze data locally and only send small bits of information representing meaningful events – for example, the detection of motion in a secure area or notification of an impending failure in remote equipment. The ideal solution is to make decisions as close as possible to where action is taking place so network cost is only incurred for centralizing data that will be truly useful in the future.

As the processing power of edge gateways has increased, these devices have become more than mere entry points for sensor data.  Gateways now possess the ability to make rapid decisions at the edge of the network and immediately upstream of sensors. By acting like a data “spam filter,” gateways can help organizations rapidly act on perishable insights, dispatch useless data on the spot and route meaningful data to central repositories for further analysis. One analogy I like to use for the value of in-the-moment edge analytics is the notion of picking up after yourself on a daily basis. By putting your keys, mail, groceries, etc. away right when you get home, your house stays clean, but if you just dump everything on the floor upon coming in the house, you quickly end up with an insurmountable mess!

Dell’s new edge IoT gateways, combined with distributed analytics capabilities powered by Dell Statistica, and assets from our ISV partners are bringing analytics closer to data sources. Compared to many competitor products that require a failure-prone fan to operate at full processing capacity in harsh industrial conditions, Dell’s gateways are designed to perform at their maximum potential at specified temperature extremes with zero airflow. This means organizations can deploy Dell gateways virtually anywhere to capitalize on the benefits of cloud computing combined with powerful edge analytics.  This can prevent the dreaded data swamp and provide faster and more secure business insights while saving on the costly transfer of data to and from the cloud.

Swamp image via Creative Commons by shankar s.

About the Author: Jason Shepherd

Jason leads a team responsible for technology strategy, standardization, business model innovation and strategic ecosystem development within the Dell Technologies IoT and Edge Computing Solutions Division. His proven track record as a thought leader in the market is evidenced through his leadership building up the award-winning Dell IoT partner program and establishing the vendor-neutral, open source EdgeX Foundry project to facilitate greater interoperability at the IoT edge. Jason was recognized as one of the Top 100 Industrial IoT influencers of 2018 and currently sits on the board of LF Edge - an umbrella project of complimentary open source efforts facilitating open edge computing. He has spent his career at both Dell and tech startups in roles spanning CTO, engineering and marketing. He holds 14 granted and 13 pending US patents.