Computer Vision and Machine Intelligence: Advancing the Human/Machine Partnership

As we head into VMworld, hot on the heels of a successful Dell Technologies World, it seems like an appropriate time to provide an IoT update.

Computer Vision IoT

As you know, last October, we unveiled our vision and strategy, a new Dell Technologies IoT Solutions Division as well as IoT-specific incubation projects including Project World Wide Herd for federated analytics. This project has now entered a formal technology preview for VMworld as it approaches production launch under a new name to be unveiled soon.

A lot has happened since that announcement so settle yourself comfortably for a long but hopefully interesting read!

Engineered solutions, bundles and evolving IoT offers

I’m happy to report that our new IoT Solutions Division is fully operational. The division’s core charter is to engineer IoT and edge computing solutions that combine the power of the Dell Technologies hardware and software infrastructure portfolio with partner assets for specific use cases. Our ultimate goal is to make it easier for our customers to realize business value at scale.

In parallel, we’re also working to enable the channel by collaborating with partners to develop easy-to-consume and deploy solution bundles. Examples launching soon include solutions for cold chain retail, Data Center Infrastructure Management (DCIM) and remote monitoring of Oil and Gas field assets.

In terms of existing purpose-built IoT offerings from throughout the portfolio, we continue to find new applications for our Dell Edge Gateways, with customers valuing their rugged utility combined with the global scale and support of a Tier 1 manufacturer. A few months back, VMware launched version 1.1 of Pulse IoT Center, which addresses the often overlooked function of being able to remotely manage heterogeneous things and gateways at scale.

Surveillance opening a view into much broader potential

All good stuff, and to coincide with the delivery of our first engineered solution for video surveillance, I want to put this announcement within the context of the broader Dell Technologies IoT/Edge vision and associated roadmap.

Surveillance is a use case within our broader computer vision strategy – the first milestone for our vision-vision, if you will. And so, this first engineered solution – while important for video surveillance – is also significant in the greater scheme of things.

The bottom line is that we see computer vision as a foundational enabler for many IoT use cases – after all cameras are one of the best sensors around. Further, applying analytics to these data feeds enables customers to more cost-effectively monitor events in the physical world and automate decision-making.

Our aim is to enable computer vision in a variety of use cases in addition to classic surveillance so that customers can “see more than meets the human eye” (enter Transformers theme music) and receive alerts and summaries based on important context.

But wait there’s more!

One foundation, different workload themes

As part of our roadmap of engineered solutions, we’re investing in a single Dell Technologies infrastructure foundation comprised of hardware and software, spanning the edge to the cloud and addressing the needs of OT and IT on Day 0, 1 and 2 (before, during and after deployment). This work will be done against two key workload themes – computer vision and machine intelligence.

Each component in this loosely-coupled infrastructure offer is engineered to work together to optimally support these two themes, depending on which elements are included and dialed in alongside value-add from our partners. Even though it’s ultimately one foundation, we distinguish between the two themes as each track has slightly unique functional properties that require different combinations of tools from the overall Dell Technologies portfolio and partner ecosystem.

Computer vision workloads

Computer vision workloads are enabled by cameras and imaging sensors (including thermal and infrared). They generally require different types of analytics tools than structured machine data, have inherently heavy “northbound” content flow, and as a result drive high compute and storage needs by default.

No surprises here – I challenge you to find someone who thinks it’s a great idea to blindly stream 4k video over the internet compared to snapshots of critical events – at least, someone who doesn’t represent a telco, ISP, or public cloud provider who wants to move and/or store your data for a price, only for you to pay again to get it back!

Traditionally with vision-based solutions, “southbound” data flow has generally been limited to driving displays and speakers plus PTZ (Pan-Tilt-Zoom) control for motorized cameras, but more below on how this is changing with advanced robotics and autonomous vehicles.

Machine intelligence workloads

Machine intelligence, on the other hand, involves structured telemetry data from control systems and sensors, including those that can provide data that cameras cannot.

Of course, image-based sensors can provide extremely rich information about the physical world – not just video in the purest sense but also serve to detect attributes like acceleration and motion, position of a robotic arm, wobble of a motor shaft, temperature and visible gas emissions. However, last time I checked cameras can’t measure parameters like voltage, current, pressure or oil particulates inside of a sealed engine block.

Simple telemetry-based sensors are also important when low power operation (and especially long battery life) is required. Finally, in some cases, it’s not feasible or desirable to install a camera to detect certain conditions due to privacy concerns. For example, how would you feel if a camera was used as a motion sensor in your bathroom to turn on the lights?

Another more universal consideration for machine intelligence is the notion of control, which drives unique requirements to address latency and predictability – ranging from “hard” real time (e.g. super low-latency and deterministic, as in deployment of your car air bag) to “soft” real time (e.g. reactions in seconds to minutes, if not more, and tolerant of slight variances in response time). Granted, vision-based systems that perform in hard real-time are increasingly a necessity in robotics and autonomous vehicles.

Pi and the Sky

I like to talk about how we’re in the “AOL stage of IoT”, which puts developers in what I call “Pi and the Sky” mode when it comes to machine-based data – simply connecting their sensors or PLCs to some sort of Raspberry Pi-class device, and in turn to the public cloud, simply to get started because it’s cheap and easy.

While it’s widely accepted that a shift to edge computing is a necessity to support the sheer amount of data coming online, developers just getting started with capturing basic telemetry data from sensors don’t realize they have an edge data problem… yet. Accordingly, these deployments are often first addressed at the thin gateway edge and the cloud, unlike computer vision where data saturation happens pretty much immediately.

In short, the time scales for the adoption of edge computing at scale are different between computer vision and machine intelligence even though the same foundational computing infrastructure elements can apply for either theme.

EdgeX Foundry – enabling an open ecosystem for machine intelligence

Looking outside of the company, we see our investments in the EdgeX Foundry project as key to enabling the machine intelligence theme in our solution roadmap in a very open and scalable way.

There’s a lot of detail about the project online so I’ll keep it short here, but in case you haven’t heard about EdgeX, it’s an industry-wide vendor-neutral open source project, hosted by the Linux Foundation and focused on facilitating open interoperability between commercial value-add solutions at the IoT edge. The idea is to maximize customer choice, making it easier for them to achieve faster ROI with their IoT deployments.

Even though EdgeX is a hardware and OS-agnostic framework versus an OS itself, one way I describe it is that it’s slated to do for IoT what Android did for mobile – create an open marketplace of interoperable devices and applications.

I’m proud to have been part of the team at Dell that first seeded the project with code in April 2017 that was developed over the course of two years with lots of feedback from partners and customers. The project has since taken on a life of its own in a growing community effort.

You can read about what the community has since accomplished and learn more about key project tenets and priorities for this year in my post here. Eric Brown of Linux.com also did a great write-up on the recent “California” code release as well as what’s in store for the project’s “Delhi” release in October.

Freedom of choice, but we’ll provide you with some great, interoperable choices

Continuing the topic of customer choice, each business unit across Dell Technologies has its own offers that are either IoT-specific or very relevant to IoT, and more generally-speaking edge computing. Our strategy is to render each of these offers as independently-valuable but also better together when used in integrated solutions combined with choice of third-party value-add. Bottom line, we’re all about providing customers with choice and flexibility, not just now but into the future.

In order to enable this, we’re not only contributing to the EdgeX Foundry project as a community member but also leveraging the EdgeX framework internally to help federate our own solutions portfolio. Picture a set of building blocks with super-flexible, open glue, where customers can use all the enabling components together or separately in a mix and match approach.

Getting past the “AOL stage” to IoT scale and advanced class

The loosely-coupled nature of the EdgeX framework enables customers to have consistent tools for data ingestion, security and management, regardless of what devices they use combined with their choice of on-prem and cloud applications. We believe the notion of decoupling – 1) the “infrastructure plane” from the “insights plane”, 2) the edge from the cloud and 3) domain knowledge from technology – in an open, interoperable fashion is the only way IoT can scale longer term in an inherently heterogeneous, multi-edge and multi-cloud world.

In fact, we anticipated the shift to edge computing when we started accelerating our IoT investment back in 2015, building on top of 20 years of experience serving embedded OEM customers. This is why we led with our purpose-built Dell Edge Gateways in addition to spinning up the internal Dell Project Fuse effort, which turned into the open source EdgeX Foundry project of today.

EdgeX goes commercial

Earlier this summer, Dell Technologies Capital also invested in IOTech – a vendor-neutral “Red Hat” of EdgeX that is commercializing the code as its core business model. IOTech’s first offering “Edge Xpert” will enable customers that want to benefit from the open EdgeX ecosystem to invest in their choice of plug-in value-add rather than having to expend resources to support the open source EdgeX baseline.

Further, IOTech is building a second commercial variant of EdgeX that will serve hard real-time use cases while using the same APIs at the perimeter so customers can re-use device and application services. This has the potential to change the game in the automation world.

Important to note is that while EdgeX is a key step towards enabling machine intelligence in a very open and scalable way, the framework is not suitable for video analytics today. That said, the community is in the process of adding support for binary data at the request of Hitachi Vantara at the most recent public face-to-face Technical Steering Committee (TSC) meeting. This will make it capable of passing through certain types of image data in a variety of use cases.

Committed to open standards for distributed computing

Overall, enabling distributed computing based on open standards is fundamental to our IoT strategy. We have been active participants in the Industrial Internet Consortium (IIC) and Open Fog Consortium over the past number of years and have recently joined the Automotive Edge Computing Consortium (AECC), which will address considerations for both computer vision and machine intelligence in the automotive space.

Advancing the human/machine partnership

Now that I’ve presented all the elements, let’s take it a step further – especially powerful is leveraging computer vision combined with machine intelligence to provide people with even richer automated insights into the physical world!

For example, think about pointing a camera at a manufacturing conveyor belt to inspect the quality of parts flying by while simultaneously ingesting data from the PLCs and sensors on the machinery producing those parts. By analyzing the mashup of this data, both the production supervisor and quality engineer would know what was happening with the machines that produced the part when the cameras detected the flaw. [Side note: “machine vision” is also a term used in this space, which is basically computer vision applied in manufacturing use cases such as quality control, process and robot control, etc.]

This paradigm applies to nearly all other verticals and use cases in a similar fashion. In another example, a building facilities manager may merge event data from surveillance with machine data from devices like badge readers, motion sensors, beacons and thermostats.

 AI and context-based reasoning

Of course, applying analytics to drive outcomes is a key part of computer vision (and machine intelligence for that matter). Minor rant first – I always get a kick out of those that talk Artificial Intelligence (AI) when they’re really just working with a basic IFTTT (If This Then That) rules engine.

The promise of AI is, of course, true context-based reasoning. In other words, it’s about making sound judgement calls in a real-world situation with many inputs and many different potential outcomes. Ethics and morality come into play too but this post is shaping up to be long already so I’ll save those topics for another time.

It turns out it’s rather simple to tell a bad part from a good part on a manufacturing line with the right camera resolution and analytics tools – because it’s highly predictable (using the old “one of these is not like the others” trick).

It’s also pretty easy to retroactively search recorded video with tools that classify readily distinguishable objects, based on prescribed context. Like “’show me everyone with red shirts in that area between the times of 1:00 and 1:15 last Wednesday” or “show me all the pink cars that turned left at that intersection today” (hopefully not that many!), and so forth.

Further, it’s fairly rudimentary these days to train a model to tell the difference between a car and a bicycle. The ability to recognize specific faces and even gauge relative demographics (e.g. age, gender, race) with no prior knowledge of the individual is also getting quite (eerily) good.

Animals tend to be a little trickier, and while I wouldn’t be too impressed if your algorithm could pick out a giraffe in a field of hyenas, you’re getting warmer if it can tell a cat from a dog and you’re really starting to impress me if it can distinguish minor differences between animals in any given breed.

It’s all possible with the right image resolution, compute power and model training. It’s just that data scientists have been a little more focused on identifying humans than animals in the wild kingdom (but more on that below).

Making a judgement call

Diving deeper into context-based reasoning – how do I know that person who may look a little shifty is really up to no good? For example, if an algorithm detects a person leaving behind a bag in a crowded public space, did they purposely drop off something that can do some harm or did they just accidentally lose their gym clothes? It turns out making these calls accurately isn’t so easy. Where are the pre-cogs from the movie Minority Report when you need them?

To avoid a false positive while making a proactive judgement call about a theft, I would need to know things in the moment like that the suspected offender has a criminal record with the authorities, or at least have a record of past questionable behaviors in my store (via my private database) – and be able to analyze a collection of this historical behavior together with real-time context to definitively predict the theft before it happens.

It’s tricky and we’ll likely have as close to as many false positives as with humans, but nevertheless, AI will increasingly help us automate these judgement calls. Maybe not like a pre-cog hanging out with Tom Cruise but at the very least, catching crooks red-handed on the way out the door.

The true power of IoT is triggering actions in real time across a variety of interrelated use cases and contexts. It’s about a system of systems and networked intelligence. And part of this networked intelligence is the notion of combining sensor-based data with business data, whether it be your ERP, CRM, social networking tool de jour, or otherwise. Within the Dell Technologies family, Boomi can help with this data fusion!

Retail customer tracking… wait make that trending

Looking at automating judgement calls through sensor-driven analytics in a different context, how do I know that person who just walked into my retail store is a big spender, meaning I should summarily roll out the red carpet? [Aside: I found it funny at the National Retail Federation (NRF) event a few years ago when apparently the industry decided to no longer call following customers’ patterns in stores “tracking”, rather rebranding it as customer “trending”, after all the latter does sound a little less creepy.]

In case you didn’t know – a sophisticated retailer knows for a fact that you stood in front of that end cap for 46 seconds debating but not purchasing those cookies on special. All because, you actually went to the freezer aisle to grab a pint of Ben and Jerry’s Chunky Monkey ice cream instead… at full price… for the third time this week!

Kind of creepy, right?

Privacy goes out the window with sufficient value

However, I find that as much as people talk about privacy concerns, it all goes out the window if sufficient value is received. If I told you ten years ago that you’d leave location-based services (LBS) on your phone all the time, you would have thought I was crazy. But, guess what, the majority of people with smart phones today do just that. Where else do you think all those red and yellow traffic lines come from? (Waze of course is the ultimate opt-in way of capturing this data). And then, there’s the always-on Alexa’s of the world.

It’s also about context – you know when you shop for something online and then your life becomes all about that product for like three months after? Literally, every single nook and cranny of your browser is an incessant carousel of that product. Did you forget something? No, I didn’t forget it – I chose not to buy it!

However, here’s where context matters. As much as I find that web phenomenon supremely annoying, let’s be honest, if I were to walk into a brick and mortar retail store (which means by the way that I just signaled strong intent to buy something), I sure would be happy to receive a personalized coupon on the spot. Even though it would still be a little creepy that they knew it was me that walked in, and that the coupon was for Ben and Jerry’s ice cream.

Using our AI powers for good

Sure, we’ll have some false positives along the way and privacy will unfortunately be violated (both unintentionally and intentionally) from time to time. Of course, GDPR is working to address this, more on that another time. However, ultimately technology is about driving human progress and as long as we use our powers for good not evil, I think we’ll be just fine.

Speaking of the wild kingdom and doing good, as part of their “Data for Good” program, one of our analytics partners SAS® is collaborating with the non‐profit WildTrack – an organization that uses non‐invasive techniques to monitor and protect endangered species.

With the help of SAS® technology – built on top of Dell PowerEdge server infrastructure – WildTrack is using extensive data on endangered species like cheetahs and leopards to improve conservation efforts by collecting footprint images and analyzing them with a customized statistical model in SAS to gain insights into density, distribution, and habits of these species – their software can identify species, sex, age-class and individual animals from their footprints. WildTrack researchers are also using Dell Latitude Rugged Laptops at their field sites in Namibia for managing their data.

Given enough data, with SAS deep learning, AI models can be trained to perform human-like tasks such as identifying footprint images and recognizing patterns in a similar way to indigenous trackers in Africa – but with the added ability to apply these concepts at a much larger scale and more rapid pace. WildTrack has developed algorithms to monitor a range of species from Amur tiger and Giant panda in China to Black rhino in Namibia, to Mountain lion in the Americas, and in partnership with SAS is actively expanding their program to roll-out more innovative techniques in the future.

This is very cool stuff, and of course, even more interesting is when you combine all of the above with emerging technologies like blockchain and AR/VR, but more on that in future blogs!

From DVR to automated real-time insights

Now to be fair, computer vision has been around a long time – in the same way that many use cases that are now called IoT have. However, as with IoT, we’re at an inflection point where available technology, not to mention an ever-increasing demand for real-time data, is making these trends accessible to the masses and soon to be pervasive.

In the case of computer vision, this is especially due to the advent of better tools including drag and drop interfaces to train models (almost “Pinterest like”) and co-processing via GPUs and FPGAs to greatly accelerate analytics workloads. While we’re getting closer to the art of the possible, we have to stay vigilant for issues with privacy and false positives that I spoke about earlier.

And yet, today, it’s true that something on the order of 90% of surveillance deployments are really fancy DVR at best, with best-in-class analytics being retroactive context-based search of archives after an event such as a crime.

Surely you’ve seen a TV or movie scene with some security guard eating out of a bucket of fried chicken while staring at a wall of fuzzy 8” black and white CCTV screens. This has been the norm (perhaps minus the KFC) for many years. However, it’s getting cost prohibitive to put people in front of screens to try to capture critical events in real time, plus in order to drive new business outcomes (such as offering a customer a deal when they walk into your store) you need to act in the moment. Here’s where computer vision kicks in.

A clear path to innovation

We’re seeing more and more sensors coming online in general and the introduction of 4k video is enabling new analytics-driven outcomes because of the more granular detail that can be captured from the physical world compared to traditionally lower-resolution CCTV cameras. Like detecting a license plate number or specific face from a long distance, or a slight bulge in someone’s jacket that wasn’t there when they walked into the store, or that it was actually Cherry Garcia and not Chunky Monkey ice cream that I grabbed.

We’re also seeing increasingly creative usage of drones equipped with sophisticated sensing capabilities, not only outdoors for inspection of infrastructure like oil pipelines and bridges but also indoors to take inventory in a warehouse by flying around after-hours scanning barcodes on packages.

Shallow learning

And with all of this vision-based data, we’re seeing deep learning techniques move closer and closer to the device edge in addition to more and more silicon purpose-built for AI. Another recent Dell Technologies Capital investment was in Graphcore – a startup developing AI-optimized silicon that can be used for both learning and inference.

Whether Graphcore’s processor sits at the edge (for example, in an autonomous vehicle), the cloud or somewhere in between simply depends on use case. We’re seeing similar investment and M&A activity across the market, including Intel’s acquisition of Movidius.

The deepest of deep learning will always happen in the cloud where we effectively have infinitely scalable compute but we’ll continue to see more and more models being pushed to and even trained at the extreme edge. I jokingly call this “shallow learning”.

Invest now so you don’t pay for it later

All of the use cases I’ve talked about and more are driving new architectures including an increasing need for edge compute. Net-net, even in the case when a customer may think they’ll be fine with pumping data to the cloud for a simple IoT use case or leveraging fancy DVR for surveillance with brute-force reactive search, we advise them to invest in more compute headroom at the edge now.

My message is – don’t get caught later having to rip and replace infrastructure in order to drop in new AI workloads for real-time analytics. You need to equip yourself now to stay competitive as the world transforms around you.

One infrastructure for distributed computing

In a perfect world, a user who is leveraging drones to capture high-resolution aerial imagery would love to be able to fully process that data in the drone itself, but unfortunately the laws of physics can be pesky at times.

However, at the very least they’ll want to increasingly pre-process that data in the drone, then send smaller data to a server cluster in a nearby field base station (perhaps in a truck or Modular Data Center) for further processing, and finally backhauling only the most meaningful information to the cloud for additional processing and archiving. This is compared to sending huge image files directly over a cellular, or worse satellite connection to some distant cloud.

And in any event, with GPU- or FPGA-based acceleration compared to processing with a traditional host CPU, an image that once took weeks to process can now take days, if not hours.

In all cases, our goal at Dell Technologies is to equip customers with the right scalable, secure, manageable and open infrastructure today so they can grow into new use cases over time, simply by adding more devices and pushing new applications via containers or VMs to any compute node spanning the edge to the cloud.

We build the guts so our partners can bring the glory

Important to stress again is that at Dell Technologies, we’re all about the underlying infrastructure. The last thing we want to do is compete with partners and customers that were deploying IoT use cases before it was called IoT.

As such, we’re curating an ecosystem of Technology and Services partners that provide domain- and use-case specific offerings on top of our engineered infrastructure foundation to deliver complete outcome-oriented solutions for customers. Our foundation will be equally applicable to scale out any use case when paired with the right partner software, hardware and services, and EdgeX provides an open way to facilitate increasing plug-and-play interoperability across the board over time.

Simplifying surveillance as the first stop on the vision train

My colleagues in the Dell Technologies IoT Solutions Division have done a fantastic job pulling together our first targeted solution for surveillance through close collaboration with the Dell Technologies portfolio Business Units, our analytics and camera partners and the open source EdgeX community. This will continuously evolve into a foundation supporting myriad use case-specific solutions spanning both the computer vision and machine intelligence themes.

As part of this effort, we have validated various sizings of the combined solution in our labs and made sure that it’s not only simple to deploy but also highly reliable. While there are great benefits of virtualizing compute, networking and storage for scalability and performance, we also want to make sure that we offer rock-solid uptime so customers don’t experience service dropouts in critical moments.

History repeats itself

I’ll close with a story of how I believe computer vision is taking a similar path to the POTS to VOIP (Plain Ol’ Telephone System to Voice over IP) transition in the enterprise.

Looking back, despite OT (Operations Technology, e.g. facilities within a building) traditionally owning the phone system, the business ultimately directed IT to take on the support of emerging VOIP technology due to the savings involved and flexibility gained.

Similarly with video, OT has historically owned CCTV systems but the technologies involved with where we’re headed (compute, storage, networking, AI etc.) are driving a need for IT skills to deploy and manage computer vision solutions in scale.

As I like to say, IoT starts in OT but scales in IT.

On that note, I talk quite a bit about the importance of OT and IT convergence, including jokingly highlighting how this trend makes for the preeminent IoT conference Venn diagram (for more on this check out Part Two of my five-part blog series on digital transformation and the need for a cloud-native edge).

At Dell Technologies, we’re here to work with our partners and both OT and IT organizations alike to make this transition as smooth as possible so customers can benefit from entirely new business outcomes in spades.

Thoughts? I’d love to hear your comments and questions. Join the IoT conversation on Twitter: @defshepherd

Learn more about Dell Technologies at www.delltechnologies.com. Learn more about Dell EMC OEM at www.dellemc.com/oem. Keep in touch about ongoing developments in the Internet of Things. Join the Dell EMC OEM IoT Showpage here.

About the Author: Jason Shepherd

Jason leads a team responsible for technology strategy, standardization, business model innovation and strategic ecosystem development within the Dell Technologies IoT and Edge Computing Solutions Division. His proven track record as a thought leader in the market is evidenced through his leadership building up the award-winning Dell IoT partner program and establishing the vendor-neutral, open source EdgeX Foundry project to facilitate greater interoperability at the IoT edge. Jason was recognized as one of the Top 100 Industrial IoT influencers of 2018 and currently sits on the board of LF Edge - an umbrella project of complimentary open source efforts facilitating open edge computing. He has spent his career at both Dell and tech startups in roles spanning CTO, engineering and marketing. He holds 14 granted and 13 pending US patents.