MCU AI

Embedded AI on
Microcontrollers

Running machine learning inference on STM32, ESP32, and ARM Cortex-M microcontrollers — the same hardware that manages sensors, actuators, and communication. No external AI accelerator required.

The Engineering Problem

MCU-class inference is not a matter of shrinking a cloud model and flashing it to hardware. The constraints are severe and non-negotiable: a microcontroller with 256KB of RAM cannot run a model that requires 4MB of working memory, regardless of how the problem is framed.

Effective embedded AI engineering starts from the hardware constraints and works backward to the model architecture. Target accuracy is balanced against memory footprint, inference latency, and power consumption from the beginning of the model development process — not at the end.

WIRL Engineering designs embedded AI systems where the MCU selection, firmware architecture, model design, and quantization strategy are co-optimized. Each component is chosen in the context of the others.

Engineering Challenges

Flash and RAM measured in kilobytes — model size must match the hardware reality

No floating-point unit on lower-tier MCUs — INT8 quantization is not optional

Single-core sequential execution — inference must coexist with firmware task scheduling

No operating system — model runtime must work in bare-metal or RTOS environments

Power consumption — inference duty cycles must fit within battery budgets

Temperature and environmental variation — model accuracy must hold across deployment conditions

Scope of Work

Model architecture selection for MCU resource constraints
TensorFlow Lite for Microcontrollers deployment
INT8 and INT4 post-training quantization
CMSIS-NN kernel optimization for Cortex-M targets
Edge Impulse project development and export
Bare-metal and RTOS inference integration
Flash footprint and RAM profiling
Inference latency benchmarking on target hardware
Production firmware integration and testing

Supported Hardware Targets

ESP32-S3

Dual-core Xtensa LX7, 8MB PSRAM available, hardware matrix acceleration

STM32H7

Cortex-M7 @ 480MHz, DTCM RAM, hardware FPU, ART Accelerator

Nordic nRF9160

Cortex-M33, integrated LTE-M/NB-IoT modem, low-power design

STM32U5 / Cortex-M33

Ethos-U55 NPU support, TrustZone, ultra-low-power run modes

RP2040

Dual Cortex-M0+, PIO for sensor interfacing, cost-effective deployment

Technology Stack

TensorFlow Lite for MCUsEdge Impulse SDKCMSIS-NNONNX RuntimeC / C++ARM CompilerFreeRTOSZephyrPython (training)TVM (optional)

Related Practice Areas

AI Integration for
Your Hardware Platform

Discuss your MCU target and application requirements with our engineering team.

Discuss Your System All Edge AI

Embedded AI onMicrocontrollers

The Engineering Problem

AI Integration forYour Hardware Platform

Embedded AI on
Microcontrollers

AI Integration for
Your Hardware Platform