Tiny But Mighty: Ceva Reveals New NPUs for Tiny Machine Learning Devices

At Sensors Converge 2024 in Santa Clara, California, Ceva launched the Ceva-NeuPro-Nano household of self-sufficient neural processing models (NPUs). The NeuPro-Nano is an NPU IP set designed for third-party processor producers to combine it into their system-on-chips (SoCs) or microcontrollers.

All About Circuits’ Dale Wilson meets with Ceva at Sensors Converge 2024.

Architected to function with out a separate coprocessor, the NPUs deliver machine studying (ML) to functions that beforehand had been too resource-constrained to assist AI performance. The IP’s self-sufficiency opens up new utility potentialities for ML by decreasing design prices, energy necessities, and bodily footprint.

Ceva-NeuPro-Nano Supports TinyML
The Ceva-NeuPro-Nano structure is designed for AI good sensors and edge AI gadgets working within the area referred to as tiny machine studying (TinyML).

Edge AI functions that profit from TinyML. Image used courtesy of Ceva

TinyML brings machine studying and focused synthetic intelligence to IoT gadgets which are restricted in dimension, processing energy, and assets. While broad-based and generative AI requires huge knowledge units overlaying all kinds of interconnected info parameters, TinyML functions use narrow-focus, small knowledge units. For instance, a TinyML thermostat would seemingly want an information set that coated little greater than environmental parameters (like temperature or humidity) and the consumer’s conduct preferences.
Built for these small knowledge units, Ceva’s NPUs will be programmed with key neural community performance, corresponding to characteristic extraction, management code, and DSP code. It helps probably the most superior machine studying knowledge sorts and operators, together with native transformer computation, sparsity acceleration, and quick quantization.

Lightweight Deployment Requirements
As an IP core, the Ceva-NeuPro-Nano household will be deployed as an inside a part of an current SoC resolution and even as a stand-alone processing unit. It is purpose-built to assist edge AI and TinyML resolution suppliers get their merchandise to market quicker at a decrease per-unit price.

Combining neural processing with a scalar processing unit on a single core for lowered total useful resource necessities. Image courtesy of Ceva

The Ceva-NeuPro-Nano household helps as much as 64 int8 multiply-accumulate (MACs) per cycle and 4-bit by means of 32-bit integer math. The household affords sparsity acceleration, the flexibility to scale back computation affect when a matrix accommodates numerous zero or insignificant values. It may speed up non-linear activation sorts.
The edge NPU will be deployed in single-core kind with lower than 10 MB of Flash and fewer than 500 KB of code plus dynamic knowledge reminiscence. The mannequin weights (parameters) will be within the vary of 10 KB to 10 MB. The minimal computational assets are 10 GOPs, and the system—outfitted with automated on-the-fly power tuning—is optimized to run at 10 mW or much less.
Ceva helps the NeuPro-Nano with a whole AI SDK for use in its NeuPro Studio. It additionally helps open AI frameworks to scale back improvement time.

The New NPU Uses Ceva-NetSqueeze AI Compression
TinyML functions are sometimes battery-powered and should even be hosted inside a sensor. Edge AI builders typically attempt to repurpose authorized DSPs for TinyML, however doing so doesn’t ship ample neural processing per watt. One of the foremost bottlenecks is decompressing after which recompressing or decompressing and feeding these decompressed weights into an NPU engine. This compression/decompression consumes a disproportionate quantity of the system’s assets.
To tackle this bottleneck, Ceva developed NetSqueeze AI compression know-how—employed in Ceva-NeuPro-Nano—to straight course of compressed mannequin weights with out first decompressing the fashions. Skipping the decompression step reduces processing energy and time necessities. It additionally cuts reminiscence footprint by as much as 80% in comparison with techniques that require intermediate decompression.
“With NetSqueeze, you do not have to decompress, and you do not have to replenish the reminiscence buffers with the mannequin weights,” mentioned Chad Lucien, Ceva’s VP and normal supervisor of the sensors and audio enterprise unit. “We’re in a position to really course of the compressed mannequin weights straight to hurry issues up and cut back the general reminiscence footprint, which delivers an enormous optimistic affect to price.”

https://www.allaboutcircuits.com/news/tiny-but-mighty-ceva-reveals-new-npus-tinyml-devices/