DEMO-2B: Show and Tell Demos II-B
Thu, 18 Apr, 13:10 - 15:10 (UTC +9)
Location: Hall D2: Podium Pitch Room B
DEMO-2B.1: Model-Based Deep Learning Real Time Quantitative Ultrasound for Thyroid Nodules Assessment
In this demo, we present Model-Based Quantitative Ultrasound (MB-QUS), a Model-based neural network solution to reconstruct real-time quantitative physical properties (QPPs) mappings from ultrasound (US) signals [1]. The signals are extracted from an US scan of a thyroid phantom to visualize tissue QPPs.
Thyroid nodules - an abnormal growth of thyroid cells that forms lumps within the thyroid gland - are common incidental findings. While most of these nodules are benign, a small proportion contains thyroid cancer.
Traditional B-mode ultrasound imaging is a key tool for thyroid nodule evaluation, as it is non-invasive and radiation-free. However, it is difficult to distinguish between malignant and benign nodules in US, resulting in many unnecessary biopsies. Quantitative US (QUS) imaging can be used to overcome this, by mapping various QPPs of the scanned tissues, which are an indicator of tissue microarchitecture.
Full Waveform Inversion (FWI) is an approach for QUS imaging. FWI is an optimization method that iteratively minimizes the difference between real time measured channel data and predicated data from wave simulator, to estimate the QPPs of the medium. However, FWI is time-consuming and often does not converge, yielding unsatisfactory results. In this demo, we introduce MB-QUS, a model-based deep learning approach to QUS. Utilizing a shallow U-net architecture, this method learns the gradients of the loss within the FWI algorithm, subsequently calculating one gradient decent step followed by a ReLU activation to reconstruct the QPPs in real time and in complex scenarios.
Our demo platform consists of a thyroid phantom containing thyroid nodules, and a Verasonics US system for scanning the phantom and extracting US signal samples. The purpose of this demo is to demonstrate in a clinical setting the advantage of using model-based deep learning for medical applications and in particular for solving effectively physics-based wave equations.
DEMO-2B.2: Non-Contact Spirometry Test Using mmWave FMCW Radar
Spirometry is the most commonly used pulmonary function test. It measures the volume and flow of inhaled and exhaled air, enabling physicians to diagnose a range of pulmonary pathologies such as asthma, COPD, pulmonary fibrosis, or can be done to check lung function before surgery. Since it requires a mouthpiece, nose clip, operator expertise, training, patient cooperation and motor coordination, all of which limit its use, especially with children. Over the past decade, substantial effort has been invested to develop advanced and alternative ways of measuring lung function. However, all of these methods are bulky and obtrusive and require uncomfortable contact with the patient’s body. By exploiting polynomial modeling and sparse recovery techniques, we show that we can use radar as a remote, non-contact tool for assessing pulmonary function, aiming to circumvent the technical challenges related to spirometry tests.
As an innovative tool in this field, there is a lack of available data for the development of a proper algorithm due to the demand for human trials involving the use of spirometers and radar together. To this end, we have developed a dedicated phantom that simulates spirometry maneuvers for enhancing algorithm development. Another advantage is that the phantom enables enlargements of data sets for training the algorithm.
Our demonstration platform consists of a vibration generator for generating mechanical thoracic displacements based on realistic spirometry maneuvers, a flat circular metal plate (24 cm diameter), TI IWR1443 mmWave radar sensor, and a dedicated experimental setup. By using a dedicated GUI, both on-site and online attendees can evident the extraction of spirometry parameters without contact based on mechanical displacements solely.
DEMO-2B.3: TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts
We present the TherapyView, a demonstration system to help therapists visualize the dynamic contents of past and current treatment sessions, enabled by the state-of-the-art neural topic modeling techniques to analyze the topical tendencies of various psychiatric conditions and deep learning-based image generation engine to provide a visual summary in real time. The system incorporates temporal modeling to provide a time-series representation of topic similarities at a turn-level resolution and AI-generated artworks given the dialogue segments to provide a concise representation of the contents covered in the session, offering interpretable insights for therapists to optimize their strategies and enhance the effectiveness of psychotherapy. Evaluated both in online and offline settings on large scale dataset of psychotherapy, this interactive data visualization system provides a first effective proof of concept of AI-augmented therapy tools with real-time processing of the speech and linguistic features of doctor-patient interactions, as well as with in-depth understanding of the patient's mental state, enabling more effective treatment in critical clinical settings.
DEMO-2B.4: Harnessing Adaptive TEM for ECG: A Sub-Nyquist Approach to Heart Rate Monitoring
Time Encoding Machines (TEMs) offer an innovative approach to signal measurement. The conventional TEM, known as an integrate-and-fire time encoding machine, processes an input signal by adding a pre-determined bias that maintains the signal's positivity. It then captures the moment when the cumulative value of the amplified signal, starting from a prior point, meets a certain threshold. This system necessitates pre-selecting parameters tailored to the specific input signal to ensure accurate recovery. Previous research using this method has successfully reconstructed a broad range of signals, including BL and FRI, while highlighting its power efficiency and independence from a global clock.
However, the model requires the bias to exceed the signal's maximum absolute value to ensure positivity, which must be adjusted based on the signal in question. Moreover, in scenarios where the signal amplitude exceeds the minimum threshold, a high bias can degrade performance due to the reduced time intervals between recorded instances. To overcome these limitations, we introduce an adaptive TEM approach. This method dynamically adjusts the bias level during measurement, eliminating the need for signal-specific bias settings and reducing the bias for areas with higher signal values to enhance overall recovery quality. In our demonstration, we will present a specialized board developed in our lab, designed to exhibit the capabilities of the adaptive TEM. This system demonstrates the real-time sampling and reconstruction of signals, achieved with enhanced efficiency and no need for manual setup. The key to its improved performance lies in the application of a variable bias during the measurement process, a feature integral to the adaptive TEM's design.
DEMO-2B.5: Integrated Access and Backhauling in 5G-NTN using OpenAirInterface5G
Recently, the integrated access and backhaul (IAB) has been standardized by 3GPP as a part of 5G-NR networks. The key concept of IAB is to multiplex the 5G access link with the backhaul in the time/frequency/space. However, the current standardization defines IAB only for Terrestrial Networks(TN), while non-terrestrial networks (NTN) are yet to be considered for such standardization efforts. While there exists a plethora of theoretical work on 5G-NTN IAB, the practical aspect is often neglected. It is desirable to have tools using which researchers can develop and test the efficiency of 5G-NTN IAB before performing live satellite tests. To this end, a proof-of-concept demonstrator has been developed using OpenAirInterface5G which is an open-source Software Defined Radio implementation of 3GPP Release-17 compliant 5G-NTN. It enables direct access to 5G services via GEO and LEO satellite links. This demo aims to showcase the functionality of multiplexing radio access and backhauling on the same frequency band via a GEO satellite. One on-ground NTN-gNB associated with a ground-based 5G Core Network (CN) will provide access services to a remote NTN-UE via a GEO satellite (emulated using a satellite channel emulator) at frequency F. The remote NTN-UE will further provide an IP tunnel connection to another collocated ground-based gNB termed TN-gNB. This IP tunnel, in effect, connects TN-gNB to the same 5G-CN, thus acting as a backhaul for TN-gNB. Finally, another ground-based UE termed TN-UE connects to TN-gNB and receives data connection via the same remote 5G-CN. In summary, two remote UEs are served by the same 5G-CN demonstrating the multiplexing of radio-access and backhaul. Note that NTN-gNB/UE uses OpenAirInterface 5G-NTN while TN-gNB/UE uses terrestrial 5G. At ICASSP-2023, we have successfully demonstrated the radio-access functionality of OpenAirInterface5G-NTN and the current work is the next step showcasing the recent development in this field of research.
DEMO-2B.6: Biometric Authentication using Surface Electromyographic signals
As technology advances, privacy concerns, including security breaches in biometric authentication systems, have increased. Current systems, relying on fingerprint, iris, or face scans, are susceptible to spoofing attacks using fake biometric data. This includes 3D models of fingerprints and forged images or videos for facial recognition. Cost concerns for more secure biometrics like iris scans also persist.
Surface Electromyography or sEMG biometric authentication is a cost-efficient and non-invasive technique and is considered much more robust and difficult to forge compared to other biometrics. This is because, unlike static and unique biometrics like fingerprint and facial recognition, sEMG is dynamic and real-time. Hence replicating a specific muscle activation pattern is very challenging. Another unique feature of sEMG is that it can also provide liveness detection, ie. involuntary or forced muscle movements will not produce the same sEMG signals required to authenticate the user and most likely would produce patterns corresponding to noise. Hence forceful or involuntary authentication attempts are also not permitted.
To demonstrate the idea of this research we have prepared a minimal expense prototype. The prototype contains modules for signal acquisition, signal processing, and authentication of users using a KNN (k-nearest neighbors algorithm) integrated ML model. The prototype features a UI that allows user to register their details following which they are prompted to register their sEMG for a specific gesture five times using the signal acquisition hardware module. The system proceeds to perform advanced signal processing techniques. Unique features corresponding to the acquired signals are extracted and combined to form a feature vector for each user. These feature vectors are trained by the ML model that learns the feature vector and its associated username. The trained model can then be used to identify the user using a new set of EMG signals.
DEMO-2B.7: Wideband transparency with hearing protection and spatial cue preservation
Elevear presents a real-time demonstrator which showcases wideband transparency in True Wireless Stereo (TWS) headphones and is designed to provide hearing protection while preserving spatial cues.
Existing headphones often fail to mitigate the discomfort caused by loud sounds. Our solution efficiently manages sudden, as well as sustained loud noises (> 80dB SPL) while preserving spatial integrity. We also tackle the occlusion effect, common in hearables, which leads to muffled sound perception, akin to speaking underwater. This effect, which varies among individuals, occurs when the ear canal is obstructed and results in an amplification of internal bone-conducted sounds. Our technology creates a compensation signal to cancel the occlusion effect and improve the naturalness of sound. The algorithms run on a specialized digital signal processor linked to commercially available headphones, enabling direct access to both the speakers and the microphones in the headphones.
The novelty of our research lies in the elevated level of naturalness and enriched user experience, which are highlighted by natural directional hearing, a balanced open-ear sensation at comfortable loudness levels, and reliable protection against loud environmental noises.
We provide the community with an interactive demo, allowing users to toggle between various modes in active noise cancellation headphones. Attendees can explore our adaptive hearing protection technology, occlusion cancellation, and other proposed enhancements. We aim to facilitate an open discussion about current limitations affecting the ideal user experience.
On-site, we will bring our real-time demo kit. Attendees will use headphones to listen to ambient sounds while being exposed to different loud noises in their surroundings. They can then evaluate the efficiency of Elevear’s adaptive hearing protection and occlusion effect cancellation.
DEMO-2B.8: Context-aware User Preference Learning Demonstrator for Multi-modal Hearing Aids
Since the rise of Deep Learning (DL), speech enhancement (SE) models excel in diverse noise conditions. Yet, they may introduce sonic artifacts, sound unnatural, and limit hearing ambient sounds. Hearing Aid (HA) users desire customized SE systems aligning with personal preferences. Our demo presents a context-aware preference learning SE (PLSE) model for future multi-modal HAs. Using fuzzy inference and deep neural networks, it leverages user preferences to enhance sound quality. The system estimates Signal-to-noise ratio (SNR) and predicts the acoustic scene, integrating these inferences to determine target SNR for an Audio-Visual SE (AVSE) system. The personalized model rivals state-of-the-art (SOTA) by reducing noise in challenging conditions, scaling SE output for individualized HA experiences. Subjective results show significant improvement over non-individualized SE models.
The demo involves distinct training and testing phases. In training, users interact with a Raspberry Pi, monitor, and Nvidia GPU-based setup. The testing phase employs a fuzzy inference model for dynamic transitions between camera and microphone inputs, aligning with SE preference profiles. Profiles come from pre-trained or custom models. Our interactive demonstrator adjusts AVSE model SNR based on joint SSNR and acoustic scene prediction. Users observe preference generalization across diverse scenes using the GRID-CHIME3 dataset, visualizing the current scene post-session through a low-dimensional feature representation
DEMO-2B.9: Towards Low-Energy Low-Latency Multimodal Open Master Hearing Aid
The demo will showcase an innovative two-point neuron-inspired audio visual (AV) open Master Hearing Aid (openMHA) framework for on chip energy-efficient speech enhancement (SE). The developed system is compared against state-of-the-art cepstrum-based audio-only (A-only) SE and conventional point-neuron-inspired deep neural net (DNN) driven multimodal (MM) SE models. Pilot experiments [1] demonstrate that the proposed system outperforms audio-only SE in terms of speech quality and intelligibility and performs comparably to a conventional point-neuron inspired DNN based SE model with a significantly reduced energy consumption at any time both during training and inferencing. In addition, the demo will showcase comparative evaluation with a number of state-of-the-art baseline AV SE models, both with and without openMHA integration, in terms of their streaming SE performance (including audible speech quality and intelligibility gains), hearing-aid latency, and energy consumption trade-offs. In particular, three widely-used baseline AV SE models ([2],[3],[4]) will be comparatively evaluated and demonstrated for evaluation and feedback by hearing and speech researchers, innovators and industry experts at ICASSP.
References
[1] A. Adeel, A. Adetomi, K. Ahmed, A. Hussain, T. Arslan and W. A. Phillips, "Unlocking the Potential of Two-Point Cells for Energy-Efficient and Resilient Training of Deep Nets," in IEEE TETCI, vol. 7, no. 3, pp. 818-828, 2023.
[2] Gogate, M., Dashtipour, K., Adeel, A. and Hussain, A., 2020. CochleaNet: A robust language-independent audio-visual model for real-time speech enhancement. Information Fusion, 63, pp.273-285.
[3] Hussain, T., Gogate, M., Dashtipour, K. and Hussain, A., 2021. Towards intelligibility-oriented audio-visual speech enhancement. arXiv preprint arXiv:2111.09642.
[4] Shi, B., Mohamed, A. and Hsu, W.N., 2022. Learning lip-based audio-visual speaker embeddings with av-hubert. arXiv preprint arXiv:2205.07180.