Meet Groq — and Its Amazing Tensor Processing Unit

A young California company has launched a simplified processing architecture with software-orchestrated control to bring predictable performance to compute-intensive workloads.

One of the great things about working for a world-class company like Dell EMC is the chance to get a close-up view of startup companies that are bringing exciting new technologies to the market. This is the case with a company called Groq, a Silicon Valley startup that offers a groundbreaking streaming Tensor Processing Unit (TPU) architecture for compute-intensive and inferencing workloads.

Known as a “secretive semiconductor startup,” Groq is focused on helping organizations overcome some of the fundamental problems associated with compute-intensive applications, including machine learning and artificial intelligence. To gain the maximum value from data that grows larger every sub-millisecond, Groq believes we need a smarter approach to the underlying compute processing architecture.

As Groq points out in a white paper on its innovative tensor streaming architecture, machine learning computations like inferencing put unprecedented demands on processors, as well as the software developers who need to make it all work. To perform more operations per second, chips have become larger and more complex, with multiple cores, multiple threads, on-chip networks and complicated control circuitry, Groq notes. To squeeze higher levels of performance out of silicon, chip designers have integrated more and more components and building blocks on the chips, driving up the complexity.

To get outside of the box and gain the benefits of AI, Groq believes organizations need a simpler and more scalable processing architecture that can sustainably accelerate the performance of compute-intensive workloads. And that all boils down to a less complex chip design. To that end, Groq is introducing a new processing architecture designed for the unique performance requirements of machine learning applications and other compute-intensive workloads.

Inspired by a software-first mindset, Groq’s overall product architecture provides an innovative and unique approach to accelerated computation. With the company’s integrated circuit architecture, which is optimized to run TensorFlow, the compiler choreographs the operation of the hardware. All execution planning happens in software, freeing up valuable silicon space for additional processing capabilities.

Groq’s breakthrough chip design reduces the complexity of hardware-focused development, so developers can concentrate on the algorithms that turn massive amounts of data into business value — instead of spending their time adapting their solutions to the complexities of the hardware. The simpler hardware also saves developer resources by eliminating the need for profiling, while making it easier to deploy AI solutions at scale.

The bottom line? Groq says that the tight control provided by its simplified chip architecture leads to the deployment of better and faster machine learning models using industry-standard frameworks, along with fast and predictable performance for data-intensive workloads.

This is all exciting stuff. And its great to know that Dell EMC is working with this visionary startup that promises to advance the frontiers of artificial intelligence.

To learn more

About the Author: Janet Morss

Janet Morss previously worked at Dell Technologies, specializing in  machine learning (ML) and high performance computing (HPC) product marketing.