The recent ubiquitous adoption of remote conferencing has been accompanied
by omnipresent frustration with distorted or otherwise unclear voice
communication. Audio enhancement can compensate for low-quality input
signals from, for example, small true wireless earbuds, by applying noise
suppression techniques. Such processing relies on voice activity detection
(VAD) with low latency and the added capability of discriminating the
wearerβs voice from others - a task of significant computational complexity.
The tight energy budget of devices as small as modern earphones, however,
requires any system attempting to tackle this problem to do so with minimal
power and processing overhead, while not relying on speaker-specific voice
samples and training due to usability concerns.
This paper presents the design and implementation of a custom research
platform for low-power wireless earbuds based on novel, commercial,
MEMS bone-conduction microphones. Such microphones can record the wearerβs
speech with much greater isolation, enabling personalized voice activity
detection and further audio enhancement applications. Furthermore, the
paper accurately evaluates a proposed low-power personalized speech
detection algorithm based on bone conduction data and a recurrent
neural network running on the implemented research platform. This algorithm
is compared to an approach based on traditional microphone input. The
performance of the bone conduction system, achieving detection of speech
within 12.8ms at an accuracy of 95% is evaluated. Different SoC choices are
contrasted, with the final implementation based on the cutting-edge Ambiq
Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per
inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and
without duty cycling.