ST-5: Show and Tell Demo 5
Wed, 6 May, 16:30 - 18:30 (UTC +2)
Location: Exhibition Hall
ST-5.1: Sub-Nyquist DoA Estimation of an Ultrasound Source in a Sector of Interest
Our prototype demonstrates direction-of-arrival (DoA) estimation of an ultrasound source from a limited number of microphone measurements. The demo leverages prior knowledge that the sound source lies within a known sector of interest. In practice, this sector is typically identified from an initial scan with wide beams (as done in this demo) or from known source motion statistics. The scientific challenge here is to estimate the DoA of any source, within the sector of interest, for a specified number of active microphones. We propose an integer programming-based approach that optimizes the set of active microphones so that the aliasing artifacts within the sector of interest are minimized. While the optimized configuration results in higher aliasing artifacts outside the sector, the out-of-sector artifacts are irrelevant as the source is known to lie within the identified sector.
Our live demonstration comprises a 40 kHz ultrasound testbed with a single speaker (transmitter) and an array of eight microphones (receive channels). To make the demo interactive, we will show the ultrasound signal power using FFTWave Android application. Furthermore, the receive array of microphones is mechanically steered (instead of electronic beam steering) so that visitors can observe how the sector of interest is identified. The received signals at the microphone array are processed on a Teensy 4.1 to estimate inter-microphone phase differences. These phase differences are then used for matched-filter-based DoA estimation. In our experiment, we consider only four active microphones of the available eight to demonstrate sub-Nyquist sampling. Using these four measurements, we show the matched-filtered output, corresponding to our optimized configuration, for DoA estimation on our laptop. We will demonstrate, in the form of a live plot, the aliasing artifacts within and outside the sector of interest. Visitors will also have the opportunity to enter their own choice of active microphones to observe the aliases. Furthermore, they can also visualize how the resolution of the DoA estimate varies with the number of microphones using our demo. A video of our setup is available here: https://www.youtube.com/watch?v=-2LHcuZD1S0.
ST-5.2: Photo-Driven Multimodal Conversational AI for Reminiscence-Based Cognitive Training and Longitudinal Cognitive Assessment
We present an interactive multimodal cognitive training and assessment system that leverages photo-driven conversational AI to support cognitive enhancement and early detection of cognitive decline. The proposed demo focuses on reminiscence-based cognitive training using a large-scale historical photo and video database that reflects past lifestyles, everyday objects, cultural contexts, and autobiographical memories. By selecting photos from the database, users engage in natural spoken dialogues generated by a multimodal generative AI model that adaptively tailors questions according to each user’s cognitive level, response context, and prior interaction history.
The framework builds upon reminiscence therapy, a clinically validated cognitive intervention in which narrating personal past experiences stimulates memory, emotion, and communication. During reminiscence-based interaction, recalling and describing past events primarily activates episodic long-term memory, which stores autobiographical experiences associated with temporal and contextual cues. As users reconstruct these memories through dialogue, semantic long-term memory is simultaneously engaged to support conceptual understanding and language processing, while working memory is recruited to maintain conversational flow, comprehend questions, and organize responses. Through this process, multiple cognitive systems—including attention, language generation, executive function, and emotional regulation—are integratively stimulated.
The system integrates visual understanding, automatic speech recognition, natural language generation, and cognitive signal analysis to automatically generate clinically informed questions that stimulate memory recall, attention, language, and executive functions. Unlike conventional repetitive question-and-answer cognitive training, the proposed approach emphasizes personalized, context-aware, and emotionally engaging conversations grounded in autobiographical memory cues and multisensory associations.
To support continuous monitoring, the system quantitatively analyzes linguistic, acoustic, and response-pattern signals extracted from spoken interactions and compares them with historical cognitive assessment records. This enables intuitive visualization of individual cognitive trajectories over time, supports early identification of cognitive decline trends, and provides timely and interpretable feedback for preventive intervention and personalized training adjustment.
This work extends our ICASSP 2025 Show&Tell system, which focused on spoken language–based screening of mild cognitive impairment, toward an active and interactive cognitive training paradigm. The unified integration of reminiscence-based cognitive theory, adaptive multimodal conversational AI, and longitudinal cognitive signal analysis demonstrates how multimodal signal processing can be operationalized in practical, user-centered cognitive healthcare systems beyond laboratory settings.
ST-5.3: Plug-and-Play Latent Diffusion for Ultrasound Inverse Imaging – Show and Tell Demonstration
Pathological tissues exhibit acoustic property variations (e.g., speed of sound and attenuation) relative to healthy tissue. Recovering spatial maps of these properties provides valuable diagnostic information, but ultrasound imaging in the presence of strong acoustic contrasts (e.g., bone vs. soft tissue) becomes a challenging ultrasound inverse problem due to severe reflections and multiple scattering. This demo showcases a physics-guided plug-and-play latent diffusion approach that reconstructs speed of sound maps directly from measured channel data, enabling stable imaging under strong scattering conditions.
This demo demonstrates how modern diffusion-based generative priors can be combined with physics-based models to tackle challenging nonlinear inverse problems, with relevance to medical ultrasound imaging. Our key innovation is physics-guided latent diffusion inference for ultrasound inverse imaging: iterative latent refinement integrates (i) the learned generative prior and (ii) the measurement-consistency constraint from the forward acoustic model, enabling stable reconstructions under severe multiple scattering.
The demonstration platform is built on a Verasonics research ultrasound system with 16 individually addressable custom immersion transducers (300 kHz center frequency) arranged in a circular configuration with a 10 cm aperture diameter. The Verasonics platform performs synchronized transmission/reception and stores raw channel data for reconstruction. To provide intuitive insight into the measurement process, the platform additionally display example received waveforms in real time. We scan tissue-mimicking phantoms containing bone-like structures and edema-like inclusions and reconstruct speed-of-sound maps from the time-domain channel measurements. Attendees can interact with a custom GUI to (1) visualize raw channel data, calibration, and preprocessing, and (2) run reconstructions and compare with baseline methods.
Reference:
[1] R. Guo, Y. Zhang, Y. Kvich, T. Huang, M. Li, and Y. C. Eldar, “Plug-and-Play Latent Diffusion for Electromagnetic Inverse Scattering with Application to Brain Imaging,” arXiv preprint arXiv:2509.04860, 2025.
ST-5.4: Full Wave Inversion for Pulse-Echo Ultrasound Linear Arrays
Acoustic tissue parameters such as the speed of sound (SoS) and attenuation can serve as localized biomarkers for pathological conditions of tissues such as fibrosis, tumors and inflammation. Changes in these parameters relative to their counterpart in heathy tissue can provide diagnostic information beyond structural imaging. However, conventional clinical pulse-echo systems use B-mode imaging, which captures qualitative anatomical structure but cannot quantitatively map the underlying acoustic properties throughout the medium. In the proposed demonstration, we show for the first time how local acoustic properties can be inferred from the raw data of a commercial pulse-echo ultrasound system by carrying out a physical full wave inversion (FWI).
The demonstration is based on applying an efficient FWI algorithm framework that enables SoS reconstruction from the raw channel data sensed by a linear array ultrasound probe. Specifically, it will be carried out using a portable commercial research ultrasound system ArtUs, manufactured by Telemed UAB, enabling the attendees to experience scanning phantoms to get the look and feel of standard beamformed images. On a separate route, the raw data will be collected, and a custom GUI will be used to visualize & demonstrate the algorithmic route leading to the reconstruction of the SoS maps. The reconstruction itself is an extremely ill-posed inverse problem, which we solve using regularization, ADMM optimization, and efficient computational frameworks for both inversion and calibration.
The demonstration serves as an initial guide and proof of concept for FWI of pulse-echo Ultrasound data; we plan to generalize it to clinical scenarios by incorporating model-based AI into the FWI framework. We will allow attendees to scan some phantoms mimicking clinical targets using B-mode imaging, thus demonstrating some of the challenges in bringing inverse-imaging to the clinic.
ST-5.5: From Wearables to Generative Insight: A Multimodal Framework for AI-Augmented Cardiology Assistance with Single-lead Electrocardiogram
We present an interactive demonstration of an edge-native Cardiac Clinical Guidance System (CCGS) that translates raw ECG signals into real-time, personalized clinical insights, exemplifying a “Signal-to-Semantics” paradigm. Using a Polar H10 wearable, the system performs real-time signal processing for artefact suppression and feature extraction, enabling accurate detection of cardiac Arrhythmias and other cardiac anomalies. These signals are fused with multi-modal patient data, including symptoms (for e.g., fatigue, fever, chest pain, etc…), blood reports (Blood Glucose, Complete Blood Count, Cholesterol, etc…), blood pressure measurements, imaging (for e.g., Echocardiogram), other reports like tread-mill test, and self and family history on cardiac diseases to generate a comprehensive cardiac risk profile, incorporating standard clinical scores (ASCVD, QRisk3, Framingham, CHA₂DS₂ VASc). BioMistral-7B, a domain-specific medical small language model (MSLM), then produces tailored guidance, including lifestyle advice, medication suggestions, and diagnostic recommendations. The system runs on an Android interface of a smartphone with GPU-accelerated inference for responsive performance. Designed as a clinical companion rather than a replacement for medical professionals, the CGS supports personalized care and informed decision-making.
Novelty and Impact:
This demonstration features a novel integration of biosignal processing with generative AI, showing how signal analysis can serve as the foundation for semantic-level clinical reasoning. It highlights how signal processing and established risk scoring enable a small language model to produce actionable health insights. This approach integrates signal processing and Gen AI for intelligent clinical guidance and demonstrates the critical role of signal processing in the real-world healthcare AI applications.
Impact to Signal Processing Community:
By bridging low-level physiological data with high-level semantic inference, our work positions signal processing as a key enabler of context-aware AI systems. It showcases new opportunities for signal processing professionals to contribute to intelligent, conversational AI-driven smart healthcare solutions that yield real-time, clinically meaningful guidance and support.
Interactive Demo:
Attendees will engage directly with the system by wearing a Polar H10 chest strap to capture their ECG signals. The system will display real-time cardiac analytics and generate anonymized clinical reports, accessible either on participants’ Android smartphones via a companion application or on a dedicated demo tablet provided at the venue.
ST-5.6: Size Doesn’t Matter: Interactive Acoustic Imaging System Design Using a 1024-Channel Ultrasound Array
As a research lab specialized in in-air ultrasound, we present an interactive demonstration of our latest sensor technology, showcasing our expertise in this field. We propose a live demonstration of our HiRIS sensor, a massive 1024-channel in-air ultrasonic array [10.1109/ACCESS.2024.3385232]. While its dense aperture offers artifact-free imaging with high dynamic range, this exhibit highlights its broader contribution as a reconfigurable validation platform for array signal processing. By treating the 1024 elements as a dense candidate grid, the system can function as a virtual, programmable aperture. This allows for the experimental validation of arbitrary sparse array geometries and beamforming pipelines using real-world acoustic data. Typically, validating these geometries relies heavily on simulation, which often fails to capture the real-world performance.
The setup of our demonstrator consists of the HiRIS sensor facing a set of closely positioned, adjustable acoustic sources. Visitors will interact via a computer interface (e.g. tablet) to define the virtual array geometry by selecting from standard topologies (e.g. regular grid, spiral, poisson disk sampling, random, concentric) or drawing custom patterns. In addition, users can choose among different beamforming and imaging algorithms, such as Delay-and-Sum, Delay-Multiply-and-Sum, MUSIC, and MVDR, enabling a direct exploration of how array geometry and processing method jointly influence performance. The system will compute on the spot the resulting beam patterns using our fast signal processing pipeline. This allows a direct comparison of spatial resolution, peak-to-sidelobe ratio, and grating-lobe suppression against the ground truth of the full 1024-element array, which will also be visualized for the visitor.
To encourage visitor engagement, we introduce an optimization challenge: participants must design a geometry that maximizes source separation while minimizing the active channel count, optionally leveraging advanced beamforming methods to compensate for sparsity. This interactive benchmark effectively illustrates the fundamental trade-offs between array density, algorithmic complexity, and imaging quality. A live leaderboard will track the most efficient designs, encouraging participants to find the best solutions in terms of both hardware sparsity and imaging fidelity.