Tutorials
Presented by: Fei Chen, Yu Tsao
Part I: Fundamental of objective speech intelligibility and quality assessment
- Background speech assessment and objective speech assessment metrics
- Methodology of existing objective speech assessment metrics
- Approaches to improve the power of objective speech assessment metrics
- Progress to develop nonintrusive speech assessment metrics
- Design of speech assessment metrics for special listening conditions (noisesuppression processing with nonlinear distortions) and specific listeners (hearing-impaired listeners)
Part II: Deep learning-based assessment metrics and their applications
- Traditional assessment metrics and their limitations
- The advantages of deep learningbased assessment metrics
- Deep learningbased assessment metrics with advanced input, model architectures, and training criteria
- Wellknown datasets and deep learning-based assessment metrics
- Applying deeplearning-based assessment metrics speech signal processing systems
Presented by: Laura Balzano, Qing Qu, Peng Wang, Zhihui Zhu
The Neural Collapse phenomenon has garnered significant attention in both practical and theoretical fields of deep learning, as evident from the extensive research on the topic. The presenters' own works have made key contributions to this body of research. Below is a summary of the tutorial outline. The first half focuses on the structures of representations appearing in the last layer, and we generalize the study into intermediate layers in the second half of this tutorial.
1. Prevalence of Neural Collapse & Global Optimality
The tutorial starts with the introduction of the Neural Collapse phenomenon in the last layer and its universality in deep network training, and lays out the mathematical foundations of understanding its cause based upon simplified unconstrained feature model (UFM) . We then generalize and explain this phenomenon and its implications under data imbalanceness.
2. Optimization Theory of Neural Collapse
We provide a rigorous explanation of the emergence of Neural Collapse from an optimization perspective and demonstrate its impacts on algorithmic choices, drawing on recent works. Specifically, we conduct a global landscape analysis under the UFM to show that benign landscapes are prevalent across various loss functions and problem formulations. Furthermore, we demonstrate the practical algorithmic implications of Neural Collapse on training deep neural networks.
3. Progressive Data Compression & Separation Across Intermediate Layers
We open the black-box of deep representation learning by introducing a law that governs how real-world deep neural networks separate data according to their class membership from the bottom layers to the top layers. We show that each layer roughly improves a certain measure of data separation by an equal multiplicative factor. We demonstrate its universality by showing its prevalence across different network architectures, dataset, and training losses.
4. Theory & Applications of Progressive Data Separation
Finally, we delve into theoretical understandings of the structures in the intermediate layer via studying the learning dynamics of gradient descent. In particular, we reveal that there are certain parsimonious structures in gradient dynamics so that a certain measure of data separation exhibits layer-wise linear decay from shallow to deep layers. Finally, we demonstrate its practical implications of understanding the phenomenon in transfer learning and the study of foundation models, leading to efficient fine-tuning methods with reduced overfitting.
Presented by: Jun Qi, Ying-Jer Kao, Samuel Yen-Chi Chen, Mohammadreza Noormandipour
- Introduction to Quantum Computing and Tensor Networks
- Principles of Quantum Computing
- Fundamentals of Tensor Networks
- The intersection of Quantum Computing and Tensor Networks
- Foundations of Quantum Machine Learning
- Parametrized Quantum Circuits
- Data Encoding
- Quantum Neural Networks and Quantum Kernels
- Quantum Neural Networks
- Variational Quantum Circuits
- Quantum Convolutional Neural Network
- Quantum Long Short-Term Memory
- Quantum Reinforcement Learning
- Hybrid Quantum-Classical Machine Learning Architecture
- Training Quantum Machine Learning with Gradient Descent
- Variational Quantum Circuits
- Tensor Networks for Quantum Machine Learning
- Fundamentals of Tensor Networks
- Tensor Networks as Quantum Machine Learning Models
- Learnability of Quantum Tensor Networks
- Hybrid Quantum Tensor Networks and Quantum Neural Networks
- Conclusion and Open Questions
- Prospects of Quantum Tensor Networks
- Conclusion Remarks
Presented by: Dirk Slock, Christo K. Thomas
Part I Approximate Bayesian Techniques
- Variational Bayes
- Variational Free Energy
- Variational Bayes (VB)
- Mean field and EM algorithms
- Expectation Propagation
- Factor Graph models
- Bethe Free Energy
- Belief Propagation
- Expectation Propagation
- Convergent Alternating Constrained Minimization
- ADMM
- Algorithm unfolding
- Relation to Deep NNs
- LMMSE case: multistage Wiener Filter
- Compressed Sensing
- LASSO, OMP etc.
- Sparse Bayesian Learning (SBL)
Part II Generalized Linear models
- xAMP (AMP, GAMP, VAMP, GUAMP,…)
- convergent GAMP
- Large System Analysis (iid, Haar)
- Bayes optimality
- Stein’s Unbiased Risk Estimation
- SBL example: EM, VB, SURE
Part III Bilinear models
- Cell-Free Massive MIMO setting
- MAP and MMSE estimates
- CRB variations
- EP variations: Factor Level, Variable Level
Part IV Adaptive Kalman filtering
- Dynamic SBL setting
- Bayesian CRB
- EM, VB,…., applied to Kalman filtering
Presented by: Kush R. Varshney
- Overview of traditional (non-LLM) trustworthy machine learning based on the book “Trustworthy Machine Learning” by the presenter
- Definitions of trustworthiness and safety in terms of aleatoric and epistemic uncertainty
- AI fairness
- Human-centered explainability
- Adversarial robustness
- Control-theoretic view of transparency and governance
- What are the new risks
- Information-related risks
- Hallucination, lack of factuality, lack of faithfulness
- Lack of source attribution
- Leakage of private information
- Copyright infringement and plagiarism
- Interaction-related risks
- Hateful, abusive, and profane language
- Bullying and gaslighting
- Inciting violence
- Prompt injection attacks
- Information-related risks
- Brief discussion of moral philosophy
- How to change the behavior of LLMs
- Data curation and filtering
- Supervised fine tuning
- Parameter efficient fine tuning, including low-rank adaptation
- Reinforcement learning with human feedback
- Model reprogramming and editing
- Prompt engineering and prompt tuning
- How to mitigate risks in LLMs and make them safer
- Methods for training data source attribution based on influence functions
- Methods for in-context source attribution based on post hoc explainability methods
- Equi-tuning, fair infinitesimal jackknife, and fairness reprogramming
- Aligning LLMs to unique user-specified values and constraints stemming in use case constraints, social norms, laws, industry standards, etc. via policy elicitation, parameter-efficient fine-tuning, and red team audits
- Orchestrating multiple possibly conflicting values and constraints
Presented by: Yao Xie, Xiuyuan Cheng
Introduction
- Generative model in estimation and inference problems
- The problem of generation and conditional generation
- Overview of neural network methods
Mathematical background
- Partial differential equation (PDE)
- Stochastic differential equation (SDE)
- Langevin dynamics
- Continuity equation
- Samplers
Diffusion model and ODE flow
- SDE and ODE approaches for normalizing flow
- Score matching
- Fokker Planck Equation and Transport Equation
- Deterministic and random backward process sampler
- Score-based and flow-based forward process
- Wasserstein gradient flow by flow network
Neural ODE and continuous normalizing flow (CNF)
- From ResNet to CNF
- Computation of the exact likelihood
- The computational challenges in high-dimension
Learning of interpolating distributions
- The problem of distribution interpolation from data
- Learning of dynamic Optimal Transport flow
- Density ratio estimation based on flow network
Evaluation of generative models
- Differential comparison of distributions in high-dimensions
- Two-sample test
- Goodness-of-fit test
- Theoretical guarantee for kernel and neural network tests
Applications
- Image generation: MNIST, CiFar
- Generative model for sequence data and adversarial samplers
- Uncertainty quantification for graph prediction using invertible graph neural networks (GNN)
Open problems and discussion
Presented by: Tianyi Chen, Xiaodong Cui, Lisha Chen
Part I - Introduction and Background
- New challenges of learning under multiple objectives
- Two optimization toolboxes to address those challenges
- History of bilevel and multi-objective optimization
Part II - Bilevel Optimization for Learning with Ordered Objectives
- Solution concepts and metrics of optimality
- Implicit gradient-based methods for bilevel optimization
- Value function-based methods for bilevel optimization
Part III - Multi-objective Optimization for Learning with Competing Objectives
- Solution concepts and metrics of optimality
- Dynamic weighting-based methods for multi-objective optimization
- Generalization bounds on multi-objective optimization algorithms
Part IV - Applications to Automatic Speech Recognition
- Automatic Speech Recognition Opportunities and Challenges
- Recursive pre-training and fine-tuning with limited labels
- Multilingual training for low-resource speech recognition
Part V - Open Research Directions
Presented by: Keshab K. Parhi
Engineering practical and reliable quantum computers and communication systems requires: (a) protection of quantum states from decoherence, and (b) overcoming the reliability issues due to faulty gates. The half-day tutorial will provide a detailed overview of the new developments related to quantum ECCs and fault tolerant computing. Specific topics include: (a) Introduction to quantum gates and circuits, (b) Shor’s 9-qubit ECC and stabilizer formalism for quantum ECCs (c) Systematic method for construction of quantum ECC circuits, (d) Optimization of quantum ECC circuits in terms of number of multiple-qubit gates, and (e) Nearest neighbor compliant (NNC) quantum ECC circuits. Descriptions of the topics are listed below.
- Introduction to quantum gates and circuits.
- Shor’s 9-qubit code and stabilizer formalism – Bit flip codes, phase flip codes, Shor’s 9-qubit code, Stabilizer formalism.
- Systematic method for construction of quantum ECC circuits – Encoder circuit, Syndrome measurement circuit, 5-qubit code encoder and decoder circuit, Steane code encoder and decoder circuit.
- Optimization of quantum ECC circuits in terms of number of multiple-qubit gates – Circuit equivalence rules, Optimization of circuits using circuit equivalence rules, Optimization using group theoretic matrix equivalence.
- Nearest-neighbor compliant quantum circuits – Various IBM architectures, nearest neighbor compliance, swap gates, minimization of swap gates for NNC circuits.
Presented by: Sam Buchanan, Yi Ma, Druv Pai, Yaodong Yu
During the past decade, machine learning and high-dimensional data analysis have experienced explosive growth, due in major part to the extensive successes of deep neural networks. Despite their numerous achievements in disparate fields such as computer vision and natural language processing, which has led to their involvement in safety-critical data processing tasks (such as autonomous driving and security applications), such deep networks have remained mostly mysterious to their end users and even their designers. For this reason, the machine learning community continually places higher emphasis on explainable and interpretable models, those whose outputs and mechanisms are understandable by their designers and even end users. The research community has recently responded to this task with vigor, having developed various methods to add interpretability to deep learning. One such approach is to design deep networks which are fully white-box ab initio, namely designed through mechanisms which make each operator in the deep network have a clear purpose and function towards learning and/or transforming the data distribution. This tutorial will discuss classical and recent advances in constructing white-box deep networks from this perspective. We now present the Tutorial Outline:
- [Yi Ma] Introduction to high-dimensional data analysis (45 min): In the first part of the tutorial, we will discuss the overall objective of high-dimensional data analysis, that is, learning and transforming the data distribution towards template distributions with relevant semantic content for downstream tasks (such as linear discriminative representations (LDR), expressive mixtures of semantically-meaningful incoherent subspaces). We will discuss classical methods such as sparse coding through dictionary learning as particular instantiations of this learning paradigm when the underlying signal model is linear or sparsely generated. This part of the presentation involves an interactive Colab on sparse coding.
- [Sam Buchanan] Layer-wise construction of deep neural networks (45 min): In the second part of the tutorial, we will introduce unrolled optimization as a design principle for interpretable deep networks. As a simple special case, we will examine several unrolled optimization algorithms for sparse coding (especially LISTA and “sparseland”), and show that they exhibit striking similarities to current deep network architectures. These unrolled networks are white-box and interpretable ab initio. This part of the presentation involves an interactive Colab on simple unrolled networks.
- [Druv Pai] White-box representation learning via unrolled gradient descent (45 min): In the third part of the tutorial, we will focus on the special yet highly useful case of learning the data distribution and transforming it to an LDR. We will discuss the information theoretic and statistical principles behind such a representation, and design a loss function, called the coding rate reduction, which is optimized at such a representation. By unrolling the gradient ascent on the coding rate reduction, we will construct a deep network architecture, called the ReduNet, where each operator in the network has a mathematically precise (hence white-box and interpretable) function in the transformation of the data distribution towards an LDR. Also, the ReduNet may be constructed layer-wise in a forward-propagation manner, that is, without any back-propagation required. This part of the presentation involves an interactive Colab on the coding rate reduction.
- [Yaodong Yu] White-box transformers (45 min): In the fourth part of the tutorial, we will show that by melding the perspectives of sparse coding and rate reduction together, we can obtain sparse linear discriminative representations, encouraged by an objective which we call sparse rate reduction. By unrolling the optimization of the sparse rate reduction, and parameterizing the feature distribution at each layer, we will construct a deep network architecture, called CRATE, where each operator is again fully mathematically interpretable, we can understand each layer as realizing a step of an optimization algorithm, and the whole network is a white box. The design of CRATE is very different from ReduNet, despite optimizing a similar objective, demonstrating the flexibility and pragmatism of the unrolled optimization paradigm. Moreover, the CRATE architecture is extremely similar to the transformer, and many of the layer-wise interpretations of CRATE can be used to interpret the transformer, showing the benefits in interpretability from such-derived networks may carry over to understanding current deep architectures which are used in practice. We will highlight in particular the powerful and interpretable representation learning capability of these models for visual data by showing how segmentation maps for visual data emerge in their learned representations with no explicit additional regularization or complex training recipes.
Presented by: Baihan Lin
In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics and computer vision, and last but not least, the speech and language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving deep network training with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this one-session tutorial, we will overview the recent advancements of reinforcement learning and bandits and discuss how they can be employed to solve various speech and natural language processing problems with models that are interpretable and scalable, especially in emerging topics such as large language models.
First, we briefly introduce the basic concept of reinforcement learning and bandits, as well as the major variant problem settings in this machine learning domain. Second, we translate various speech and language tasks into the reinforcement learning problems and show the key challenges. Third, we introduce some reinforcement learning and bandit techniques and their varieties for speech and language tasks and their machine learning formulations. Fourth, we present several state-of-the-art applications of reinforcement learning in different fields of speech and language. Lastly, we will discuss some open problems in reinforcement learning and bandits to show how to further develop more advanced algorithms for speech and language research in the future.
As the second iteration of this tutorial, the topic will emphasize additional coverage in new developments in large language models and deep reinforcement learning. The audience can refer to two resources after the tutorial: (1) a review paper by the author on arXiv covering many topics in this tutorial, and (2) an upcoming Springer book by the author on the same topic to be released this December, which includes more case studies, hands-on examples and additional coverage on recent advancements in large language models.
The outline of the tutorial, along with the topics and subtopics covered, is as follows:
1. Introduction (5 min)
- The significance of RL and Bandits in speech and language processing (real-world use cases, potential advantages)
- Challenges and opportunities in integrating these techniques
2. A Concise Tutorial of Reinforcement Learning and Bandits (85 min)
- Preliminaries
- Multi-Armed Bandits (MAB)
- Contextual Bandits
- Reinforcement Learning (RL)
- Inverse Reinforcement Learning (IRL)
- Imitation Learning and Behavioral Cloning
BREAK (20 min)
3. Reinforcement Learning Formulation for Speech and Language Applications (45 min)
- Automatic Speech Recognition (ASR)
- Speaker Recognition and Diarization
- Spoken Language Understanding (SLU)
- Natural Language Understanding (NLU)
- Sequence Generation and Text-to-Speech (TTS) Synthesis
- Natural Language Generation (NLG)
- Large Language Models (LLM)
- Conversational Recommendation Systems (CRS)
4. Emerging Reinforcement Learning Strategies (15 min)
- Deep Reinforcement Learning and Bandits
- Batched and Offline Reinforcement Learning
- Optimization with Customized Rewards
- Graph Structures and Attention Mechanisms
- Transfer Learning in Reinforcement Learning
5. Conclusions, Open Questions and Challenges (10 minutes)
- Multi-agent Settings in Speech and Language
- Multi-objective Training and Human Priors
- Societal Implications Using RL for Responsible AI
- Summary and Additional Resources
Overall takeaways for our attendees:
- Comprehensive understanding of RL and Bandits and their applications in speech and language processing.
- Practical knowledge and hands-on experience with implementing these techniques
- Insight into emerging strategies and challenges in the field
- Open discussions and problem-solving sessions for practical application
Presented by: Petros Maragos
Tropical geometry is a relatively recent field in mathematics and computer science combining elements of algebraic geometry and polyhedral geometry. The scalar arithmetic of its analytic part pre-existed (since the 1980s) in the form of max-plus and min-plus semiring arithmetic used in finite automata, nonlinear image processing, convex analysis, nonlinear control, and idempotent mathematics.
Tropical geometry recently emerged successfully in the analysis and extension of several classes of problems and systems in both classical machine learning and deep learning. Such areas include (1) Deep Neural Networks (DNNs) with piecewise-linear (PWL) activation functions, (2) Morphological Neural Networks, (3) Neural Network Minimization, (4) Optimization (e.g. dynamic programming) and Probabilistic Dynamical Systems, and (5) Nonlinear regression with PWL functions. Areas (1), (2) and (3) have many novel elements and have recently been applied to image classification problems. Area (4) offers new perspectives on several areas of optimization. Area (5) is also novel and has many applications.
The proposed tutorial will cover the following topics:
Elements from Tropical Geometry and Max-Plus Algebra (Brief). We will first summarize introductory ideas and objects of tropical geometry, including tropical curves and surfaces and Newton polytopes. We will also provide a brief introduction to the max-plus algebra that underlies tropical geometry. This will involve scalar and vector/signal operations defined over a class of nonlinear spaces and optimal solutions of systems of max-plus equations. Tropical polynomials will be defined and related to classical polynomials through Maslov dequantization. Then, the above introductory concepts and tools will be applied to analyzing and/or providing solutions for problems in the following broad areas of machine learning.
Neural Networks with Piecewise-linear (PWL) Activations. Tropical geometry recently emerged in the study of deep neural networks (DNNs) and variations of the perceptron operating in the max-plus semiring. Standard activation functions employed in DNNs, including the ReLU activation and its “leaky” variants, induce neural network layers which are PWL convex functions of their inputs and create a partition of space well-described by concepts from tropical geometry. We will illustrate a purely geometric approach for studying the representation power of DNNs -- measured via the concept of a network's “linear regions” -- under the lens of tropical geometry.
Morphological Neural Networks. Recently there has been a resurgence of networks whose layers operate with max-plus arithmetic (inspired by the fundamental operators of morphological image processing). Such networks enjoy several promising aspects including faster training and capability of being pruned to a large degree without severe degradation of their performance. We will present several aspects from this emerging class of neural networks from some modern perspectives by using ideas from tropical geometry and mathematical morphology. Subtopics include methods for their training and pruning resulting in sparse representations.
Neural Network Minimization. The field of tropical algebra is closely linked with the domain of neural networks with PWL activations, since their output can be described via tropical polynomials in the max-plus semiring. In this tutorial, we will briefly present methods based on approximation of the NN tropical polynomials and their Newton Polytopes via either (i) a form of approximate division of such polynomials, or (ii) the Hausdorff distance of tropical zonotopes, in order to minimize networks trained for multiclass classification problems. We will also present experimental evaluations on known datasets, which demonstrate a significant reduction in network size, while retaining adequate performance.
Approximation Using Tropical Mappings. Tropical Mappings, defined as vectors of tropical polynomials, can be used to express several interesting approximation problems in ML. We will focus on three closely related optimization problems: (a) the tropical inversion problem, where we know the tropical mapping and the output, and search for the input, (b) the tropical regression problem, where we know the input-output pairs and search for the tropical mapping;, and (c) the tropical compression problem, where we know the output, and search for an input and a tropical mapping that represent the data in reduced dimensions. There are several potential applications including data compression, data visualization, recommendation systems, and reinforcement learning. We will present a unified theoretical framework, where tropical matrix factorization has a central role, a complexity analysis, and solution algorithms for this class of problems. Problem (b) will be further detailed under PWL regression (see next).
Piecewise-linear (PWL) Regression. Fitting PWL functions to data is a fundamental regression problem in multidimensional signal modeling and machine learning, since approximations with PWL functions have proven analytically and computationally very useful in many fields of science and engineering. We focus on functions that admit a convex repr¬¬esentation as the maximum of affine functions (e.g. lines, planes), represented with max-plus tropical polynomials. This allows us to use concepts and tools from tropical geometry and max-plus algebra to optimally approximate the shape of curves and surfaces by fitting tropical polynomials to data, possibly in the presence of noise; this yields polygonal or polyhedral shape approximations. For this convex PWL regression problem we present optimal solutions w.r.t. $\ell_p$ error norms and efficient algorithms.
Presented by: Danilo Mandic, Harry Davies
The Hearables paradigm, that is, in-ear sensing of neural function and vital signs, is an emerging solution for 24/7 discrete health monitoring. The tutorial starts by introducing our own Hearables device, which is based on an earplug with the embedded electrodes, optical, acoustic, mechanical and temperature sensors. We show how such a miniaturised embedded system can be can used to reliably measure the Electroencephalogram (EEG), Electrocardiogram (ECG), Photoplethysmography (PPG), respiration, temperature, blood oxygen levels, and behavioural cues. Unlike standard wearables, such an inconspicuous Hearables earpiece benefits from the relatively stable position of the ear canal with respect to vital organs to operate robustly during daily activities. However, this comes at a cost of weaker signal levels and exposure to noise. This opens novel avenues of research in Machine Intelligence for eHealth, with numerous challenges and opportunities for algorithmic solutions. We describe how our hearables sensor can be used, inter alia, for the following applications:
- Automatic sleep scoring based on in-ear EEG, as sleep disorders are a major phenomenon which undercuts general health problems, from endocrinology through to depression and dementia.
- Screening for chronic obstructive pulmonary disease based on in-ear PPG, in the battle against the third leading cause of death worldwide, with an emphasis on developing countries that often lack access to hospital-based examinations.
- Continuous 24/7 ECG from a headphone with the ear-ECG, as cardiac diseases are the most common cause of death, but often remain undetected as until the emergence of Hearables it was only possible to record ECG in a clinic and not in the community.
For the Hearables to provide a paradigm shift in eHealth, they require domain-aware Machine Intelligence, to detect, estimate, and classify the notoriously weak physiological signals from the ear-canal. To this end, the second part of our tutorial is focused on interpretable AI. This is achieved through a first principles matched-filtering explanation of convolutional neural networks (CNNs), introduced by us. We next revisit the operation of CNNs and show that their key component – the convolutional layer – effectively performs matched filtering of its inputs with a set of templates (filters, kernels) of interest. This serves as a vehicle to establish a compact matched filtering perspective of the whole convolution-activation-pooling chain, which allows for a theoretically well founded and physically meaningful insight into the overall operation of CNNs. This is shown to help mitigate their interpretability and explainability issues, together with providing intuition for further developments and novel physically meaningful ways of their initialisation. Interpretable networks are pivotal in the integration of AI into medicine, by dispelling the black box nature of deep learning and allowing clinicians to make informed decisions based off network outputs. We demonstrate this in the context of Hearables by expanding on the following key findings:
- We argue from first principles that convolutional neural networks operate as matched filters.
- Through this lens, we further examine network weights, activation functions and pooling operations.
- We detail the construction of a fully interpretable convolutional neural network designed for R-peak detection, demonstrating its operation as a matched filter and analysing the convergence of its filter weights to an ECG pattern.
Owing to their unique Collocated Sensing nature, Hearables record a rich admixture of information from several physiological variables, motion and muscle artefacts and noise. For example, even a standard Electroencephalogram (EEG) measurement contains a weak ECG and muscle artefacts, which are typically treated as bad data and are subsequently discarded. In the quest to exploit all the available information (no data is bad data), the final section of the tutorial focuses on a novel class of encoder-decoder networks which, taking the advantage from the collocation of information, maximise data utility. We introduce the novel concept of a Correncoder and demonstrate its ability to learn a shared latent space between the model input and output, making it a deep-NN generalisation of partial least squares (PLS). The key topics of the final section of this tutorial are as follows:
- A thorough explanation of Partial Least Squares (Projection on Latent Spaces) regression, and the lens of interpreting deep learning models as an extension of PLS.
- An introduction to the Correncoder and Deep Correncoder, a powerful yet efficient deep learning framework to extract correlated information between input and references.
- Real-world examples of the Correncoder to Hearables data are presented, ranging from transforming Photoplethysmography (PPG) into respiratory signals, through to making sense from artefacts and decoding implanted brain electrical signals into movement.
In summary, this tutorial details how the marriage of the emerging but crucially sensing modality of Hearables and customised interpretable deep learning models can maximise the utility of wearables data for healthcare applications, with a focus on the long-term monitoring of chronic diseases. Wearable in-ear sensing for automatic screening and monitoring of disease has the potential for immense global societal impact, and for personalised healthcare out-of-clinic and in the community – the main aims of the future eHealth.
The presenters are a perfect match for the topic of this tutorial, Prof Mandic’s team are pioneers of Hearables and the two presenters have been working together over the last several years on the links between Signal Processing, Embedded systems and Connected Health; the presenters also hold three international patents in this area.
Tutorial Outline
The tutorial with involve both the components of the Hearables paradigm and the Interpretable AI solutions for 24/7 wearable sensing in the real-world. The duration will be over 3 hours, with the following topics covered:
- The Hearables paradigm. Here, we will cover the Biophysics supporting in-ear sensing the neural function and vital signs, together with the corresponding COMSOL Multiphysics simulations and the real-world recordings of the Electroencephalogram (EEG), Electrocardiogram (ECG), Photoplethysmogram (PPG), respiration, blood oxygen level (SpO2), temperature, movement and sound – all from an earplug with embedded sensors. (40 minutes)
- Automatic Sleep Staging and Cognitive Load Estimation from Hearables. Here we demonstrate two real-world applications of Hearables, with in-ear polysomnography enabling unobtrusive in-home sleep monitoring, and robust tracking of cognitive workload during memory tasks and gaming and their links with dementia. (30 minutes)
- Interpretable Convolutional Neural Networks (CNN). This section explains CNNs through the lens of the matched filter (MF), a seven-decade old core concept in signal processing theory. This section of the tutorial finishes with the example of a deep Matched Filter that is designed for robust R-peak detection in noisy Ear-ECG. (40 minutes)
- Physiologically informed data augmentation. Here we build upon our pioneering work on screening for chronic obstructive pulmonary disease (COPD) with in-ear PPG, by detailing an apparatus designed to simulate COPD in healthy individuals. We demonstrate the advantages of using domain knowledge within such an apparatus when producing surrogate data in deep-learning models. (20 minutes)
- An introduction to the Correncoder. Here we introduce a new rethinking of the classic encoder-decoder structure, with the aim of extracting correlated information between two signals. At each stage, we mirror this model with the method of Projection on Latent Spaces (PLS) showing that this deep learning framework can be interpreted as a deep generalisable PLS. We show multiple real-world applications of such a framework in the context of wearable E-health. (40 minutes)
- No data is bad data. In this final section of the tutorial, we reject the null hypothesis that data containing artefacts should be discarded, with examples from ear-EEG signal processing. We demonstrate that in many cases rich information can be determined from artefacts, and that with the Corr-encoder framework we can achieve artefact removal in real time. (20 minutes)
Presented by: Christos Thrampoulidis, Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi
Part I: Motivation and Overview
I.1 The Transformer Revolution:
Our tutorial begins by providing an in-depth account of the Transformer architecture and its extensive array of applications. We place special emphasis on examples most relevant to the signal-processing audience, including speech analysis, time-series forecasting, image processing, and most recently, wireless communication systems. Additionally, we introduce and review essential concepts associated with Transformers' training, such as pre-training, fine-tuning, and prompt-tuning, while also discussing the Transformers' emerging abilities, such as in-context learning and reasoning.
I.2 A Signal-Processing-Friendly Introduction to the Attention Mechanism:
We then dive into a comprehensive explanation of the Transformer block's structure. Our primary focus is on the Attention mechanism, which serves as the fundamental distinguishing feature from conventional architectures like fully connected, convolutional, and residual neural networks. To facilitate the signal-processing community's understanding, we introduce a simplified attention model that establishes an intimate connection with problems related to sparse signal recovery and matrix factorization. Using this model as a basis, we introduce critical questions regarding its capabilities in memorizing lengthy sequences, modeling long-range dependencies, and training effectively.
Part II: Efficient Inference and Adaptation: Quadratic attention bottleneck and Parameter-efficient tuning (PET)
II.1 Kernel viewpoint, low-rank/sparse approximation, Flash-attn (system level, implementation):
Transformers struggle with long sequences due to quadratic self-attention complexity. We review recently-proposed efficient implementations aimed to tackle this challenge, while often achieving superior or comparable performance to vanilla Transformers. First, we delve into approaches that approximate quadratic-time attention using data-adaptive, sparse, or low-rank approximation schemes. Secondly, we overview the importance of system-level improvements, such as FlashAttention, where more efficient I/O awareness can greatly accelerate inference. Finally, we highlight alternatives which replace self-attention with more efficient problem-aware blocks to retain performance.
II.2 PET: Prompt-tuning, LoRa adapter (Low-rank projection):
In traditional Transformer pipelines, models undergo general pre-training followed by task-specific fine-tuning, resulting in multiple copies for each task, increasing computational and memory demands. Recent research focuses on parameter-efficient fine-tuning (PET), updating a small set of task-specific parameters, reducing memory usage, and enabling mixed-batch inference. We highlight attention mechanisms' key role in PET, discuss prompt-tuning, and explore LoRA, a PET method linked to low-rank factorization, widely studied in signal processing.
II.3 Communication and Robustness gains in Federated Learning:
We discuss the use of large pretrained transformers in mobile ML settings with emphasis on federated learning. Our discussion emphasizes the ability of transformers to adapt in a communication efficient fashion via PET methods: (1) Use of large models shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scaling allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. (2) PET methods, by design, enable >100× less communication in bits while potentially boosting robustness to client heterogeneity and small sample size.
BREAK I
Part III: Approximation, Optimization, and Generalization Fundamentals
III.1 Approximation and Memorization Abilities:
We discuss Transformers as sequence-to-sequence models with a fixed number of parameters, independent of sequence length. Despite parameter sharing, Transformers exhibit universal approximation capabilities for sequence-to-sequence tasks. We delve into key results regarding Transformer models' approximation abilities, examining the impact of depth versus width. We also address their memorization capacity, emphasizing the trade-off between model size and the number of memorized sequence-to-sequence patterns. Additionally, we discuss the link between Transformers and associative memories, a topic of interest within the signal processing community.
III.2 Optimization dynamics: Transformer as Support Vector Machines:
In this section, we present a fascinating emerging theory that elucidates how the attention layer learns, during training, to distinguish 'good' sequence elements (those most relevant to the prediction task) while suppressing 'bad' ones. This separation is formally framed as a convex optimization program, similar to classical support-vector machines (SVMs), but with a distinct operational interpretation that relates to the problems of low-rank and sparse signal recovery. This unique formulation allows us to engage the audience with a background in signal processing, as it highlights an implicit preference within the Transformer to promote sparsity in the selection of sequence elements—a characteristic reminiscent of traditional sparsity-selection mechanisms such as the LASSO.
III.3 Generalization dynamics:
Our discussion encompasses generalization aspects related to both the foundational pretraining phase and subsequent task performance improvements achieved through prompt tuning. To enhance our exploration, we will introduce statistical data models that extend traditional Gaussian mixture models, specifically tailored to match the operational characteristics of the Transformer. Our discussion includes an overview and a comprehensive list of references to a set of tools drawn from high-dimensional statistics and recently developed learning theories concerning the neural tangent kernel (NTK) and the deep neural network's feature learning abilities.
BREAK II
Part IV: Emerging abilities, in-context learning, reasoning
IV.1 Scaling laws and emerging abilities:
We begin the last part of the tutorial by exploring the intriguing world of scaling laws and their direct implications on the emerging abilities of Transformers. Specifically, we will delve into how these scaling laws quantitatively impact the performance, generalization, and computational characteristics of Transformers as they increase in size and complexity. Additionally, we draw connections between the scaling laws and phase transitions, a concept familiar to the signal processing audience, elucidating via examples in the literature how Transformers' behavior undergoes critical shifts as they traverse different scales.
IV.2 In-context learning (ICL): Transformers as optimization algorithms
We delve into the remarkable capability of ICL, which empowers Transformers to engage in reasoning, adaptation, and problem-solving across a wide array of machine learning tasks through the use of straightforward language prompts, closely resembling human interactions. To illustrate this intriguing phenomenon, we will provide concrete examples spanning both language-based tasks and mathematically structured, analytically tractable tasks. Furthermore, we present findings that shed light on an intriguing perspective of in-context learning: the Transformer's capacity to autonomously learn and implement gradient descent steps at each layer of its architectural hierarchy. In doing so, we establish connections to deep-unfolding techniques, which have garnered popularity in applications such as wireless communications and solving inverse problems.
IV.3 Primer on Reasoning:
The compositional nature of human language allows us to express fine-grained tasks/concepts. Recent innovations such as prompt-tuning, instruction-tuning, and various prompting algorithms are enabling the same for language models and catalyzing their ability to accomplish complex multi-step tasks such as mathematical reasoning or code generation. Here, we first introduce important prompting strategies that catalyze reasoning such as chain-of-thought, tree-of-thought, and self-evaluation. We then demonstrate how these methods boost reasoning performance as well as the model’s ability to evaluate its own output, contributing to trustworthiness. Finally, by building on the ICL discussion, we introduce mathematical formalisms that shed light on how reasoning can be framed as “acquiring useful problem solving skills” and “composing these skills to solve new problems”.
Conclusions, outlook, and open problems
We conclude the tutorial by going over a list of important and exciting open problems related to the fundamental understanding of Transformer models, while emphasizing how this research creates opportunities for enhancing architecture and improving algorithms & techniques. This will bring the audience to the very forefront of fast-paced research in this area.
Presented by: Shiwei Liu, Olga Saukh, Zhangyang (Atlas) Wang, Arijit Ukil, and Angshul Majumdar
This tutorial will provide a comprehensive overview of recent breakthroughs of sparsity in the emerging area of large language models (LLMs), showcasing progress and posing challenges, and endeavor to provide insights to improve the affordability and knowledge of LLMs through sparsity. The outline of this tutorial is fourfold: (1) a thorough overview/categorization of sparse neural networks; (2) the latest progress of LLMs compression via sparsity; (3) the caveat of sparsity in LLMs; and finally (4) the benefits of sparsity beyond model efficiency.
The detailed outline is given below:
Tutorial Introduction. Presenter: Zhangyang (Atlas) Wang.
Part 1: Overview of sparse neural networks. Presenter: Shiwei Liu.
We will first provide a brief overview and categorization of existing works on sparse neural networks. As one of the most classical concepts in machine learning, the pristine goal of sparsity in neural networks is to reduce inference costs. However, the research focus on sparsity has undertaken a significant shift from post-training sparsity to prior-training sparsity over the past few years, due to the latter's promise of end-to-end resource saving from training to inference. Researchers have tackled many interlinked concepts such as pruning [13], Lottery Ticket Hypothesis [14], Sparse Training [15,16], Pruning at Initialization [17], and Mixture of Experts [18]. However, the shift of interest only occurred in the last few years, and the relationships among different sparse algorithms in terms of their scopes, assumptions, and approaches are highly intricate and sometimes ambiguous. Providing a comprehensive and precise categorization of these approaches is timely for this newly shaped research community.
Part 2: Scaling up sparsity to LLMs: latest progress. Presenter: Shiwei Liu.
In the context of gigantic LLMs, sparsity is becoming even more appealing to accelerate both training and inference. We will showcase existing attempts that address sparse LLMs, encompassing weight sparsity, activation sparsity, and memory sparsity. For example, SparseGPT [8] and Essential Sparsity [9] shed light on prominent weight sparsity in LLMs, while the unveiling of ''Lazy Neuron" [13] and ''Heavy Hitter Oracle" [10] exemplifies activation sparsity and token sparsity. Specifically, the introduction of Essential Sparsity discovers a consistent pattern across various settings, that is, 30%-50% of weights from LLMs can be removed by the naive one-shot magnitude pruning for free without any significant drop in performance. Ultimately, those observations suggest that sparsity is also an emerging property in the context of LLMs, with great potential to improve the affordability of LLMs.
Coffee Break.
Part 3: The caveat of sparsity in LLMs: What tasks are we talking about? Presenter: Zhangyang (Atlas) Wang.
While sparsity has demonstrated its success in LLMs, the commonly used evaluation in the literature of sparse LLMs are often restricted to simple datasets such as GLUE, Squad, WikiText-2, and PTB; and/or simple one-turn question/instructions. Such (over-) simplified evaluations may potentially camouflage some unexpected predicaments of sparse LLMs. To depict the full picture of sparse LLMs, we highlight two recent works, SMC-Bench [11] and ''Junk DNA Hypothesis", that unveil the failures of (magnitude-based) pruned LLMs on harder language tasks, indicating a strong correlation between the model's ''prunability" and its target downstream task's difficulty.
Part 4: Sparsity beyond efficiency. Presenter: Olga Saukh.
In addition to efficiency, sparsity has been found to boost many other performance aspects such as robustness, uncertainty quantification, data efficiency, multitasking and task transferability, and interoperability [19]. We will mainly focus on the recent progress in understanding the relation between sparsity and robustness. The research literature spans multiple subfields, including empirical and theoretical analysis of adversarial robustness [20], regularization against overfitting, and noisy label resilience for sparse neural networks. By outlining these different aspects, we aim to offer a deep dive into how network sparsity affects the multi-faceted utility of neural networks in different scenarios.
Part 5: Demonstration and Hands-on Experience. Presenter: Shiwei Liu.
The Expo consists of three main components: Firstly, an implementation tutorial will be presented via a typical laptop offering step-by-step guidance in building and training sparse neural networks from scratch. Secondly, a demo will be given to showcase how to prune LLaMA-7B on a single A6000 GPU. Thirdly, we will create and maintain user-friendly open-source implementation for sparse LLMs, ensuring participants have ongoing resources at their disposal. To encourage ongoing engagement and learning, we will make all content and materials readily accessible through the tutorial websites.
Presented by: Moe Z. Win, Andrea Conti
The availability of real-time high-accuracy location awareness is essential for current and future wireless applications, particularly those involving Internet-of-Things and beyond 5G ecosystem. Reliable localization and navigation of people, objects, and vehicles – Localization-of-Things (LoT) – is a critical component for a diverse set of applications including connected communities, smart environments, vehicle autonomy, asset tracking, medical services, military systems, and crowd sensing. The coming years will see the emergence of network localization and navigation in challenging environments with sub-meter accuracy and minimal infrastructure requirements.
We will discuss the limitations of traditional positioning, and move on to the key enablers for high-accuracy location awareness. Topics covered will include: fundamental bounds, cooperative algorithms for 5G and B5G standardized scenarios, and network experimentation. Fundamental bounds serve as performance benchmarks, and as a tool for network design. Cooperative algorithms are a way to achieve dramatic performance improvements compared to traditional non-cooperative positioning. To harness these benefits, system designers must consider realistic operational settings; thus, we present the performance of B5G localization in 3GPP-compliant settings. We will also present LoT enablers, including reconfigurable intelligent surfaces, which promise to provide a dramatic gain in terms of localization accuracy and system robustness in next generation networks.
Presented by: Huck Yang, Pin-Yu Chen, Hung-yi Lee, Kai-Wei Chang, Cheng-Han Chiang
- Introduction and Motivation for Studying Parameter-Efficient learning
To be presented by Dr. Huck Yang- Background: Large-scale Pre-trained and Foundation Models
- Definition and Theory of parameter-efficient learning
- Basics of Pre-trained Model Representation Errors Analysis
- Editing Models with Task Arithmetic
- Advanced Settings of Task Vectors
- Multimodal Weights Merging
- BERT + Hubert for ASR
- Vit + AST for Acoustic Modeling
- In-Context Learning
- Frozen Model Adaptation through long context windows
- Multimodal Weights Merging
- New Approaches on Neural Model Reprogramming
To be presented by Dr. Pin-Yu Chen, IBM Research AI - Reprogramming for Medical Images and DNA with 1B+ LLM (ICML 23)
- Prompting Large Language Models
To be presented by Cheng-Han Chiang and Prof. Hung-yi Lee- Connection between prompting and parameter-efficient learning
- Prompting large language models for reasoning
- ReAct, Plan-and-Solve, Tree-of-Thought prompting
- Faithfulness and robustness of LLM reasonings
- Using LLMs for tool using
- Automatic evaluation using large language models by prompting
- LLM evaluation and G-Eval
- Parameter-Efficient Learning for Speech Processing
To be presented by Kai-Wei Chang and Prof. Hung-yi Lee- Adapting text Large Language Models for Speech Processing
- Adapting text LLM (e.g. LLaMA) for spoken language modeling
- Prompting and Instruction Tuning on Speech Pre-trained Models
- Semantic and acoustic tokens for speech language models
- Prompting and instruction tuning for various speech processing tasks
- Conclusion and Open Questions
To be presented by Prof. Hung-yi Lee- Lessons learned: a signal processor wandering in the land of large-scale models
- Available resources and code for research in parameter-efficient learning
Presented by: Nir Shlezinger, Sangwoo Park, Tomer Raviv, and Osvaldo Simeone
Wireless communication technologies are subject to escalating demands for connectivity, latency, and throughput. To facilitate meeting these performance requirements, emerging technologies such as mmWave and THz communication, holographic MIMO, spectrum sharing, and RISs are currently being investigated. While these technologies may support desired performance levels, they also introduce substantial design and operating complexity. For instance, holographic MIMO hardware is likely to introduce non-linearities on transmission and reception; the presence of RISs complicates channel estimation; and classical communication models may no longer apply in novel settings such as the mmWave and THz spectrum, due to violations of far-field assumptions and lossy propagation. These considerations notably affect transceiver design.
Traditional transceiver processing design is model-based, relying on simplified channel models, which may no longer be adequate to meet the requirements of next-generation wireless systems. The rise of deep learning as an enabler technology for AI has revolutionized various disciplines, including computer vision and natural language processing (NLP). The ability of deep neural networks (DNNs) to learn mappings from data has spurred growing interest in their usage for transceiver design. DNN-aided transceivers have the ability to succeed where classical algorithms may fail. They can learn a detection function in scenarios having no well-established physics-based mathematical model, a situation known as model-deficit; or when the model is too complex to give rise to tractable and efficient model-based algorithms, a situation known as algorithm-deficit.
Despite their promise, several core challenges arise from the fundamental differences between wireless communications and traditional AI domains such as computer vision and NLP. The first challenge is attributed to the nature of the devices employed in communication systems. Wireless communication transceivers are highly constrained in terms of their compute and power resources, while deep learning inherently relies on the availability of powerful devices, e.g., high-performance computing servers. A second challenge stems from the nature of the wireless communication domain. Communication channels are dynamic, implying that the task, dictated by the data distribution, changes over time. This makes the standard pipeline of data collection, annotation, and training highly challenging. Specifically, DNNs rely on (typically labeled) data sets to learn from the underlying unknown, but stationary, data distributions. This is not the case for wireless transceivers , whose processing task depends on the time-varying channel, restricting the size of the training data set representing the task. These challenges imply that successfully applying AI for transceivers design requires deviating from conventional deep learning approaches. To this end, there is a need to develop communication-oriented AI techniques that are not only of high performance for a given channel, but also light-weight, interpretable, flexible, and adaptive.
In the proposed tutorial we shall present in a pedagogic fashion the leading approaches fordesigning of practical and effective deep transceivers that address the specific limitations imposed by the use of dataand resource-constrained wireless devices and by the dynamic nature of the communication channel. We advocate that AI-based wireless transceiver design requires revisiting the three main pillars of AI, namely, (i) the architecture of AI models;(ii) the data used to train AI models; and (iii) the training algorithm that optimizes the AI model for generalization, i.e., to maximize performance outside the training set (either on the same distribution or for a completely new one). For each of these AI pillars, we survey candidate approaches from the recent literature. We first discuss how to design light-weight trainable architectures via model-based deep learning. This methodology hinges on the principled incorporation of model-based processing, obtained from domain knowledge on optimized communication algorithms, within AI architectures. Next, we investigate how labeled data can be obtained without impairing spectral efficiency, i.e., without increasing the pilot overhead. We show how transreceivers can generate labeled data by self-supervision, aided by existing communication algorithms; and how they may further enrich data sets via data augmentation techniques tailored for such data. We then cover training algorithms designed to meet requirements in terms of efficiency, reliability, and robust adaptation of wireless communication systems, avoiding overfitting from limited training data while limiting training time. These methods include communication-specific meta-learning as well as generalized Bayesian learning and modular learning.
Tutorial outline:
- Introduction and motivation
- Dramatic success of deep learning
- Gains of deep learning for wireless communications and sensing
- Overcoming model deficiency
- Overcoming algorithm deficiency
- Applications
- The fundamental differences between wireless technologies and conventional AI domains and its associated challenges
- Nature of the devices
- Nature of the domain
- The need for AI that is light-weight, flexible, adaptive, and interpretable
- Tutorial goal + outline
- Deep Learning Aided Systems in Dynamic Environments
- System model and main running example of deep learning aided receivers
- Overview of existing approaches for handling dynamic tasks
- Joint learning
- Estimated environment parameters as input
- Online learning
- Pros and cons of each approach when and why should AI-aided systems be trained on device?
- Paradigm shift in AI needed to enable such operations:
- Go beyond design of parametric models
- Holistic treatment of machine learning algorithms –
- Architecture
- Data
- Training
- Architecture:
- The family of mappings one can learn
- From black box highly-parameterized architectures to light-weight interpretable machine learning systems via domain knowledge
- Model-based deep learning methodologies
- Deep unfolding and its forms:
- Learned hyperparameters
- Learned objective
- DNN conversion
- Deep unfolding and its forms:
- DNN-aided inference
- Issues for future research
- Data:
- Data for learning the task under the current environment
- From few pilots to large labeled data sets
- Self-supervision:
- Codeword level
- Decision-level
- Active learning
- Data augmentation
- Complete data enrichment pipeline
- Issues for future research
- Training:
- Tuning parametric architecture from data
- Train rapidly with limited data, possibly exploiting model-based architectures
- Deciding when to train using concept drift
- Meta-learning:
- Gradient-based meta-learning
- Hypernetwork-based meta-learning
- Bayesian learning:
- End-to-end Bayesian learning
- Model-based aware Bayesian learning
- Continual Bayesian learning
- Modular learning for model-based deep architectures
- Issues for future research
- Summary:
- Additional aspects of federated learning not discussed in this tutorial
- Hardware-aware and power-aware AI
- Collaborative flexible AI for mobile wireless devices
- Conclusions
- Additional aspects of federated learning not discussed in this tutorial
Presented by: Sijia Liu, Zhangyang Wang, Tianlong Chen, Pin-Yu Chen, Mingyi Hong, Wotao Yin
Part 1: Introduction of ZO-ML
- Preliminary Concepts and Mathematical Foundations
- Basic mathematical tools and formulations
- Why ZO over FO: Limitations of Traditional Gradient-Based Optimization
- Emerging challenges and drawbacks of relying solely on FO gradientbased methods
- Survey of Practical Applications and Use Cases
- Overview of applications that benefit from ZOML
Part 2: Foundations of ZO-ML
- Algorithmic Landscape of ZO-ML
- A rundown of primary algorithms and methods in ZOML
- Convergence and Query Complexity
- Understanding the provable properties of ZOML
- Scaling ZO-ML: Practical Techniques and Implementations
- Tips and tricks for ZOML algorithms at scale
- Extending ZO-ML across Learning Paradigms
- How does ZOML adapt to various ML paradigms?
Break
Part 3: Applications of ZO-ML
- Prompt Learning in FMs
- Fine-tuning and Personalization in FMs via ZO-ML
- ZO-ML in the Context of AI Robustness, Efficiency, and Automation
Part 4: Demo Expo
- Introducing the ZO-ML Toolbox
- A guided tour of our specialized toolbox for ZOML
- Benchmarking with ZO algorithms
- An introduction to ZO performance metrics and benchmark applications
- Practical Demos: Utilizing ZOT for Parameter-Efficient Fine-Tuning (PEFT), and Adversarial Defense
- Live demonstrations showcasing the utility of ZOML
Part 5: Conclusion and Q&A
- Wrap-Up: Key Takeaways from the Tutorial
- Future Horizons: SP and ML Opportunities and Challenges
- Resources for Deeper Exploration
- A curated list of essential ZOML resources
Presented by: Ehsan Variani, Georg Heigold, Ke Wu, Michael Riley
The first part of this talk focuses on the mathematical modeling of the existing neural ASR criteria. We introduce a modular framework that can explain all the existing criteria such as: Cross Entropy (CE), Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Hybrid Autoregressive Transducer (HAT) and Listen, Attend and Spell (LAS). We also introduce the LAttice-based Speech Transducer library (LAST) which provides efficient implementation of these criteria and allows the user mix and match different components to create new training criterion. A simple colab is presented to engage the audience by using LAST and implementing a simple ASR model on a digit recognition task.
The second half of the talk focuses on some practical problems in ASR modeling and some principled solutions. The problems are:
- Language model integration: mainly focuses on principled ways of adding language models within noisy channel formulation of ASR. We introduce ways to estimate internal language models for different ASR models and approaches to integrate external language models during first-pass decoding or second-pass rescoring.
- Streaming ASR: We explain the main theoretical reason for why streaming ASR models perform much worse than their non-streaming counterparts and present two solutions. The main focus will be on the problem of label bias, and how local normalization assumption in the existing ASR training criteria has signified it. Finally we also present a way to measure modeling latency and how to optimize models with this respect.
- Time alignment: how to improve time alignment of ASR models is the main question of this section is trying to answer. Furthermore, how the solution can lead to simpler ASR decoding methods.
- Speech representation: what features can be extracted from an ASR systems for down-stream tasks which preserve the following properties: A) Back-propagation: the down-stream model can fine tune the up-stream ASR model if the pairwise data exist, B) Robust: changing the up-stream ASR system does not require retraining of the down-stream model. We will present several speech representations with such properties.
- Semi-supervised training: how to extend the supervised training criteria to take advantage of unlabeled speech and text data. We show detailed formulation of the semi-supervised criteria and present several experimental results.
For all the problems above, the audiences will have a chance to use the LAST library and the colab to evaluate the effectiveness of the solutions themselves during the tutorial.
Presented by: Xing Liu, Tarig Ballal, Jose A. Lopez-Salcedo, Gonzalo Seco-Granados, Tareq Al-Naffouri
- Background Material
- Introduction to PNT
- Introduction to satellite constellations (LEO, MEO, and GEO)
- Legacy PNT using GNSS.
- GNSS PNT shortcomings.
- LEO Constellation Basics
-
A closer look into LEO constellations and their main characteristics, orbits, geometry, velocity, coverage, etc. We will focus more on the signaling aspects such as modulation schemes, coding techniques, channel characteristics, receiver design, etc. We will contrast LEO attributes with those of GNSS, highlighting potential strengths and weaknesses.
-
- PNT using LEO constellations
In this section, we will cover the main techniques for PNT based on LEO satellite signals. We will distinguish between two main groups of methods:
- PNT based on SoP from LEO satellite designed for other (non-PNT) purposes.
- PNT based on dedicated LEO satellite signals.
- PNT based on 5G non-tersterial networks (NTNs).
- The signal models.
- The main signal parameters (known as observations) that are useful for navigation.
- The methods that can be applied to acquire the signal parameters.
- A general model for each observation type.
We will discuss the pros and cons of each of the two categories. For each category, we will discuss the following topics:
We will conclude this section by presenting
The latter observation models will be used in the following section to develop specific techniques and algorithms for LEO-based PNT.
- PNT using LEO-based PNT Techniques
Here we will provide detailed descriptions of algorithms that can be used, or that have been proposed, for LEO PNT. We will establish a connection with GNSS-based techniques. We will cover the following topics:
- Doppler-based techniques.
- Pseudorange-based techniques.
- Carrier-phase-based techniques.
- A variety of PNT filtering techniques.
- Simulations and demonstrations
In this section of the tutorial, we will present results from extensive simulations to highlight various aspects of LEO PNT. We will make our simulation codes freely accessible in the public domain.
- Opportunities and Challenges
The final part of the tutorial will highlight the most prominent research directions and challenges that might be of interest to the community.
- Summary
Tutorial Summary highlighting the takeaway messages.
- References
We will provide an extensive list of references.
Presented by: Byung-Jun Yoon, Youngjoon Hong
Generative AI models have emerged as a groundbreaking paradigm that can generate, modify, and interpret complex data patterns, ranging from images and sounds to structured datasets. In the realm of signal processing, these models have the potential to revolutionize how we understand, process, and leverage signals. Their capabilities span from the generation of synthetic datasets to the enhancement and restoration of signals, often achieving results that traditional methods can't match. Thus, understanding and harnessing the power of generative AI is not just an academic endeavor; it's becoming an imperative for professionals and researchers who aim to stay at the forefront of the signal processing domain.
The last few years have witnessed an explosive growth in the development and adoption of generative AI models. With the introduction of architectures like GANs, VAEs, and newer transformer-based models, the AI research community is regularly setting new performance benchmarks. The signal processing community also begins to exploit these advancements. The year 2024 presents a crucial juncture where the convergence of AI and signal processing is no longer a future possibility but an ongoing reality. Thus, a tutorial on this topic is not just timely but urgently needed.
While there have been numerous tutorials and courses on generative AI in the context of computer vision or natural language processing, its application in the pure signal and data processing domain is less explored. This tutorial is unique in its comprehensive approach, combining theory, practical methods, and a range of applications specifically tailored for the signal processing community. Attendees will not only learn about the core concepts but will also gain theory and application of generative AI techniques.
Generative AI provides a fresh lens through which to approach longstanding challenges in signal processing. This tutorial will introduce:
- New Ideas: Concepts like latent space exploration, variational inference, and diffusion models which can provide new insights into signal representation and transformation.
- New Topics: Areas where generative AI has found success, such as data augmentation, signal enhancement, and anomaly detection in signals.
- New Tools: Practical demonstrations and hands-on sessions using state-of-the-art software libraries and tools tailored for generative AI in signal processing.
In conclusion, by bridging the gap between the advancements in generative AI and the vast potential applications in signal processing, this tutorial promises to equip attendees with knowledge and tools that can redefine the boundaries of what's possible in the field.