Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / Engineering Tiny Machine Learning for the Edge

Engineering Tiny Machine Learning for the Edge

February 6, 2020 by cbn Leave a Comment

As developers face the challenge of making complex AI and machine learning applications work on edge-computing devices, options to support Tiny ML are emerging.

Edge is all about intelligence, but those smarts must be squeezed into ever tinier form factors.

Developers of artificial intelligence (AI) applications must make sure that each new machine learning (ML) model they build is optimized for fast inferencing on one or more target platforms. Increasingly, these target environments are edge devices such as smartphones, smart cameras, drones, and embedded appliances, many of which have severely constrained processing, memory, storage, and other local hardware resources.

The hardware constraints of smaller devices are problematic for the deep neural networks at the heart of more sophisticated AI apps. Many neural-net models can be quite large and complex. As a result, the processing, memory, and storage requirements for executing those models locally on edge devices may prove excessive for some mass-market applications that require low-cost commoditized chipsets. In addition, the limited, intermittent wireless bandwidth available to some deployed AI-enabled endpoints may cause long download latencies associated with retrieving the latest model updates necessary to keep their pattern-recognition performance sharp. 

Edge AI is a ‘model once, run optimized anywhere’ paradigm

Developers of AI applications for edge deployment are doing their work in a growing range of frameworks and deploying their models to myriad hardware, software, and cloud environments. This complicates the task of making sure that each new AI model is optimized for fast inferencing on its target platform, a burden that has traditionally required manual tuning. Few AI developers are specialists in the hardware platforms into which their ML models will be deployed.

Increasingly, these developers rely on their tooling to automate the tuning and pruning of their models’ neural network architectures, hyperparameters, and other features to fit the hardware constraints of target platforms without unduly compromising the predictive accuracy for which an ML was built.

Image: Shutterstock

Image: Shutterstock

Over the past several years, open-source AI-model compilers have come to market to ensure that the toolchain automatically optimizes AI models for fast efficient edge execution without compromising model accuracy. These model-once, run-optimized-anywhere compilers now include AWS NNVM Compiler, Intel Ngraph, Google XLA, and NVIDIA TensorRT 3. In addition, AWS provides SageMaker Neo, and Google offers TensorRT with TensorFlow for inferencing optimization for various edge target platforms.

Tweaking tinier math into AI edge processors

Some have started to call this the “TinyML” revolution. This refers to a wave of new approaches that enable on-device AI workloads to be executed by compact runtimes and libraries installed on ultra-low-power, resource-constrained edge devices.

One key hurdle to overcome is the fact that many chip-level AI operations — such as calculations for training and inferencing — have to be performed serially, which is very time consuming, rather than in parallel. In addition, these are computationally expensive processes that drain device batteries rapidly. The usual workaround — uploading data to be processed by AI running in a cloud data center — introduces its own latencies and may, as a result, be a non-starter for performance-sensitive AI apps, such as interactive gaming, at the edge.

One recent event in the advance of TinyML was Apple’s acquisition of Xnor.ai, a Seattle startup specializing in low-power, edge-based AI tools. Xnor.ai launched in 2017 with $2.6 million in seed funding, with a follow-up $12 million Series A financing round a year later. Spun off from the Allen Institute for Artificial Intelligence, the three-year-old startup’s technology embeds AI on the edge, enabling facial recognition, natural language processing, augmented reality, and other ML-driven capabilities to be executed on low-power devices rather than relying on the cloud.

Xnor.ai’s technology makes AI more efficient by allowing data-driven machine learning, deep learning, and other AI models to run directly on resource-constrained edge devices — including smartphones, Internet of Things endpoints, and embedded microcontrollers — without relying on data centers or network connectivity. Its solution replaces AI models’ complex mathematical operations with simpler, rougher, less precise binary equivalents.

Xnor.ai’s approach can boost the speed and efficiency at which AI models can be run by several orders of magnitude. Their technology enables fast AI models to run on edge devices for hours. It greatly reduces the CPU computational workloads typically associated with such edge-based AI functions as object recognition, photo tagging, and speech recognition and synthesis. It leverages only a single CPU core without appreciably draining device batteries. It achieves a trade-off between the efficiency and accuracy of the AI models and assures that real-time device-level calculations stay within acceptable confidence levels.

Building tinier neural-net architectures into machine learning models

Another key milestone in development of TinyML was Amazon Web Services’ recent release of the open-source AutoGluon toolkit. This is an ML pipeline automation tool that includes a feature known as “neural architecture search.”

What this feature does is find the most compact, efficient structure of a neural net for a specific AI inferencing task. It helps ML developers optimize the structure, weights, and hyperparameters of an ML model’s algorithmic “neurons.” It allows AI developers of all skill levels to automatically optimize the accuracy, speed, and efficiency of new or existing models for inferencing in edge devices and other deployment targets.

Available from this project website or GitHub, AutoGluon can automatically generate a high-performance ML model from as few as three lines of Python code. It taps into available compute resources and uses reinforcement learning algorithms to search for the best-fit, most compact, and top-performing neural-network architecture for its target environment. It can also interface with existing AI DevOps pipelines via APIs to automatically tweak an existing ML model and thereby improve its performance of inferencing tasks.

There are also commercial implementations of neural architecture search tools on the market. A solution from Montreal-based AI startup Deeplite can automatically optimize a neural network for high-performance inferencing on a range of edge-device hardware platforms. It does this without requiring manual inputs or guidance from scarce, expensive data scientists.

Compressing AI neural nets and data to fit edge resources

Compression of AI algorithms and data will prove pivotal to mass adoption. As discussed here, a Stanford AMPLab research project is exploring approaches for compressing neural networks so they can use less powerful processors, less memory, less storage, and less bandwidth at the device level, while minimizing trade-offs to their pattern-discovery accuracy. The approach involves pruning the “unimportant” neural connections, reweighting the connections, and applying a more efficient encoding of the model. 

A related project called Succinct is striving to produce more efficient compression of locally acquired data for caching on resource-constrained mobile and IoT endpoints. The project allows deep neural nets and other AI models to operate against sensor data stored in flat files and immediately execute search queries, compute counts, and other operations on compressed, cached local data.

Data-compression schemes such as this will enable endpoint-embedded neural networks to continue to ingest sufficient amounts of sensor data to detect subtle patterns. These techniques will also help endpoints to rapidly consume sufficient cached training data for continual fine-tuning of the accuracy of their core pattern-discovery functions. And superior data compression will reduce solid-state data-caching resource requirements at the endpoints.

Benchmarking AI performance on tinier edge processing nodes

The proof of any TinyML initiative is in the pudding of performance. As the edge AI market matures, industry-standard TinyML benchmarks will rise in importance to substantiate vendor claims to being fastest, most resource efficient, and lowest cost.

In the past year, the MLPerf benchmarks took on greater competitive significance, as everybody from Nvidia to Google boasted of their superior performance on these. As the decade wears on, MLPerf benchmark results will figure into solution providers’ TinyML positioning strategies wherever edge AI capabilities are essential.

Another industry framework comes from the Embedded Microprocessor Benchmark Consortium. Their MLMark suite is for benchmarking ML that runs in optimized chipsets running in power-constrained edge devices. The suite encompasses real-world ML workloads from virtual assistants, smartphones, IoT devices, smart speakers, IoT gateways and other embedded/edge systems to identify the performance potential and power efficiency of processor cores used for accelerating ML inferencing jobs. It measures inferencing performance, neural-net spin-up time and power efficiency of low-, moderate- and high-complexity inferencing tasks. It is agnostic to ML front-end frameworks, back-end runtime environments and hardware-accelerator targets.

The edge AI industry confronts daunting challenges in producing a one-size-fits-all benchmark for TinyML performance.

For starters, any general-purpose benchmarks would have to address the full range of heterogeneous multidevice system architectures (such as drones, autonomous vehicles, and smart buildings) and commercial systems-on-a-chip platforms (such as smartphones and computer-vision systems) into which AI apps will be deployed in edge scenarios.

Also, benchmarking suites may not be able to keep pace with the growing assortment of AI apps being deployed to every type of mobile, IoT or embedded device. In addition, innovative edge-based AI inferencing algorithms, such as real-time browser-based human-pose estimation, will continue to emerge and evolve rapidly, not crystallizing into standard approaches long enough to warrant creating standard benchmarks.

Last but not least, the range of alternative training and inferencing workflows (on the edge, at the gateway, in the data center, etc.) would make it unlikely that any one benchmarking suite can do them all justice.

So, it’s clear that the ongoing creation of consensus practices, standards, and tools for TinyML is no puny undertaking.

James Kobielus is Futurum Research’s research director and lead analyst for artificial intelligence, cloud computing, and DevOps. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

More Insights

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: Uncategorized

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • How Azure Cobalt 100 VMs are powering real-world solutions, delivering performance and efficiency results
  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,322)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in