List of Accepted Papers

Following is the list of accepted ICASSP 2024 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at info@2024.ieeeicassp.org.

Paper Number Paper Title
1315”IT IS OKAY TO BE UNCOMMON”: QUANTIZING SOUND EVENT DETECTION NETWORKS ON HARDWARE ACCELERATORS WITH UNCOMMON SUB-BYTE SUPPORT
33771-D SPATIAL ATTENTION IN BINARIZED CONVOLUTIONAL NEURAL NETWORKS
43432D Human Pose Estimation Calibration and Keypoint Visibility Classification
95623D AUTOMATED QUANTITATIVE CALCULATIONS BASED ON CT IMAGES OF THE HIP JOINT
119253D CBCT CHALLENGE 2024: IMPROVED CONE BEAM CT RECONSTRUCTION USING SWINIR-BASED SINOGRAM AND IMAGE ENHANCEMENT
99873D Hand Joint and Grasping Estimation for Teleoperation System
12483-D Near-field Localization by Jointly Exploiting Spatial and Temporal Information Based on a Nonuniform Cross Array
57653D PARALLELISM FOR TRANSFORMERS VIA INTEGER PROGRAMMING
115653D PERCEPTUAL SOUNDFIELD RECONSTRUCTION VIA VIRTUAL MICROPHONE SYNTHESIS
100673D POINT CLOUD SEMANTIC SEGMENTATION BASED ON DIFFUSION MODEL
65883D POSE ESTIMATION FROM MONOCULAR VIDEO WITH CAMERA-BONE ANGLE REGULARIZATION ON THE IMAGE FEATURE
28583DSAM: SEGMENT ANYTHING IN NERF
43533M-TRANSFORMER: A MULTI-STAGE MULTI-STREAM MULTIMODAL TRANSFORMER FOR EMBODIED TURN-TAKING PREDICTION
27383S-TSE: EFFICIENT THREE-STAGE TARGET SPEAKER EXTRACTION FOR REAL-TIME AND LOW-RESOURCE APPLICATIONS
92816DOF SELD: SOUND EVENT LOCALIZATION AND DETECTION USING MICROPHONES AND MOTION TRACKING SENSORS ON SELF-MOTIONING HUMAN
10265A 3D VIRTUAL TRY-ON METHOD WITH GLOBAL-LOCAL ALIGNMENT AND DIFFUSION MODEL
7550A BAYESIAN APPROACH TO HIGH-ORDER LINK PREDICTION
4797A BINARY BP DECODING USING POSTERIOR ADJUSTMENT FOR QUANTUM LDPC CODES
3674A BI-PYRAMID MULTIMODAL FUSION METHOD FOR THE DIAGNOSIS OF BIPOLAR DISORDERS
8415A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames
11526A BP Method for Track-Before-Detect
2465A CCM-BASED JOINT DOA-FREQUENCY ESTIMATION AND SIGNAL RECOVERY WITH EFFICIENT SUB-NYQUIST SAMPLING
4570A Chat About Boring Problems: Studying GPT-based text normalization
4413A Closer Look at Wav2Vec2 Embeddings for On-device Single-channel Speech Enhancement
7266A codec-based approach for video life-cycle characterization in social networks
8328A COMPARATIVE ANALYSIS OF POETRY READING AUDIO: SINGING, NARRATING, OR SOMEWHERE IN BETWEEN?
6150A COMPARATIVE STUDY ON ANNOTATION QUALITY OF CROWDSOURCING AND LLM VIA LABEL AGGREGATION
7551A COMPARISON OF PARAMETER-EFFICIENT ASR DOMAIN ADAPTATION METHODS FOR UNIVERSAL SPEECH AND LANGUAGE MODELS
10309A complete method for the 3D reconstruction of axonal pathways from 2 orthogonal 3D OCT images of the lamina cribrosa
1527A COMPREHENSIVE ANALYSIS OF BIASES AND CUES IN NLU DATASETS AND MODELS WITH ICQ
4480A COMPREHENSIVE FRAMEWORK FOR OCCLUDED HUMAN POSE ESTIMATION
4430A COMPUTATIONALLY EFFICIENT SEMI-BLIND SOURCE SEPARATION APPROACH FOR NONLINEAR ECHO CANCELLATION BASED ON AN ELEMENT-WISE ITERATIVE SOURCE STEERING
8850A CONCEPT FOR A SLAM BACK END HARDWARE ACCELERATOR
2992A CONTRARIO PARADIGM FOR YOLO-BASED INFRARED SMALL TARGET DETECTION
4762A CONVERGENT PRIMAL-DUAL DEEP PLUG-AND-PLAY ALGORITHM FOR CONSTRAINED IMAGE RESTORATION
9599A Counterfactual Inspired Framework for Quantifying Edge Effects on GNNs Fairness
4905A CROSS SEARCH METHOD FOR DATA AUGMENTATION IN NEURAL MACHINE TRANSLATION
1858A crowdsourcing approach to video quality assessment
11463A CTC ALIGNMENT-BASED NON-AUTOREGRESSIVE TRANSFORMER FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
4646A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder
6415A DENSENET-BASED METHOD FOR DECODING AUDITORY SPATIAL ATTENTION WITH EEG
1378A DENSITY-GUIDED TEMPORAL ATTENTION TRANSFORMER FOR INDISCERNIBLE OBJECT COUNTING IN UNDERWATER VIDEOS
7461A DETAILED AUDIO-TEXT DATA SIMULATION PIPELINE USING SINGLE-EVENT SOUNDS
2700A DISTRIBUTED JOINT INTEGRATED PROBABILISTIC DATA ASSOCIATION (JIPDA) FILTER WITH SOFT OBJECT ASSOCIATION
8136A DUAL-PATH FRAMEWORK WITH FREQUENCY-AND-TIME EXCITED NETWORK FOR ANOMALOUS SOUND DETECTION
3263A FACIAL EXPRESSION TRANSFER METHOD BASED ON 3DMM AND DIFFUSION MODELS
4860A FAST BLIND DEBLURRING ALGORITHM USING LOCAL GRADIENT PRODUCT PRIOR
7284A FAST, PERFORMANT, SECURE DISTRIBUTED TRAINING FRAMEWORK FOR LLM
7167A FEDERATED GRAPH TO EMBEDDING APPROACH FOR KNOWLEDGE GRAPH COMPLETION
6498A Fine-Grained Attribute Pre-labeling Method based on Label Dependency and Feature Similarity Dynamics
3175A FINE-GRAINED TRI-MODAL INTERACTION MODEL FOR MULTIMODAL SENTIMENT ANALYSIS
5517A FLEXIBLE ONLINE FRAMEWORK FOR PROJECTION-BASED STFT PHASE RETRIEVAL
11461A FORMAT COMPLIANT ENCRYPTION METHOD FOR 3D OBJECTS ALLOWING HIERARCHICAL DECRYPTION
7959A FOUNDATION MODEL FOR MUSIC INFORMATICS
5969A FRAMEWORK FOR PORTRAIT STYLIZATION WITH SKIN-TONE AWARENESS AND NUDITY IDENTIFICATION
11929A FULLBAND NEURAL NETWORK FOR AUDIO PACKET LOSS CONCEALMENT
6563A fully differentiable model for unsupervised singing voice separation
5990A GENERAL FRAMEWORK FOR ROTATION INVARIANT POINT CLOUD ANALYSIS
7292A Generative Adversarial Framework for Dialogue Generation with Neural Architecture Search
7381A GIBBS SAMPLER FOR BAYESIAN NONPARAMETRIC STATE-SPACE MODELS
9005A GRAPH NEURAL NETWORK BASED APPROACH FOR FAULT DELINEATION IN SEISMIC DATA USING GRAPH TOTAL VARIATION AND MULTIGRAPH
1578A GRAPH NEURAL NETWORK BASED FUSION OF MRI-DERIVED BRAIN NETWORK AND CLINICAL DATA FOR GLIOBLASTOMA SURVIVAL PREDICTION
8207A GRAPH-PREDICTION-BASED APPROACH FOR DEBIASING UNDERREPORTED DATA
3273A GREEN LEARNING APPROACH TO SPOOFED SPEECH DETECTION
3728A GUIDED UPSAMPLING NETWORK FOR SHORT WAVE INFRARED IMAGES USING GRAPH REGULARIZATION
1469A Hierarchical multi-proxy Loss with Dynamic Main-proxy for Deep Metric Learning
3912A HYBRID CNN-TRANSFORMER FOR FOCAL LIVER LESION CLASSIFICATION
8486A HYBRID DEEP-ONLINE LEARNING BASED METHOD FOR ACTIVE NOISE CONTROL IN WAVE DOMAIN
8251A Hybrid Slow-time Coding Framework for Automotive MIMO Radar
2302A JOINT DATA COMPRESSION AND TIME-DELAY ESTIMATION METHOD FOR DISTRIBUTED SYSTEMS VIA EXTREMUM ENCODING
4796A JOINT LOOK ON LUNAR SATELLITE AND COOPERATIVE SURFACE PNT
1640A KEYLESS EXTRACTION FRAMEWORK TARGETING AT DEEP LEARNING BASED IMAGE-WITHIN-IMAGE MODELS
3715A Learning Resource Recommendation Algorithm Based on Online Learning Behavior
6868A LEARNING-BASED MULTI-NODE FUSION POSITIONING METHOD USING WEARABLE INERTIAL SENSORS
3180A LEARNING-BASED SYSTEM FOR AUTOMATIC INTENTIONAL NON-ADHERENCE DETECTION FROM DOSING VIDEOS
3481A Lightweight Change Detection Method Based on Feature Interaction and Transformer for High Resolution Remote Sensing Images
10170A LIGHTWEIGHT HYBRID MULTI-CHANNEL SPEECH EXTRACTION SYSTEM WITH DIRECTIONAL VOICE ACTIVITY DETECTION
2931A LIGHT-WEIGHT STATE DETECTION MODEL FOR KALMAN-FILTER-BASED ACOUSTIC FEEDBACK CANCELLATION WITH RAPID RECOVERY FROM ABRUPT PATH CHANGES
8389A LOW-LATENCY FFT-IFFT CASCADE ARCHITECTURE
8111A Machine-Learning Model for Detecting Depression, Anxiety, and Stress from Speech
6005A META-PRECONDITIONING APPROACH FOR DEEP Q-LEARNING
3105A METHOD FOR BILEVEL OPTIMIZATION WITH CONVEX LOWER-LEVEL PROBLEM
3053A METHOD FOR X-RAY IMAGE LANDMARKS LOCALIZATION USING CYCLIC COORDINATE-GUIDED STRATEGY
7492A MODIFIED CRAMÉR-RAO BOUND FOR DISCRETE-TIME MARKOVIAN DYNAMIC SYSTEMS
11903A MULTI-FILTER AND MULTI-SCALE U-NET FOR CONE-BEAM COMPUTED TOMOGRAPHY WITH HARDWARE CONSTRAINTS
3062A MULTIMODAL APPROACH TO DEVICE-DIRECTED SPEECH DETECTION WITH LARGE LANGUAGE MODELS
5867A MULTI-SCALE BIMODAL FUSION NETWORK FOR ROBUST AND ACCURATE ONLINE HANDWRITING RECOGNITION
4185A MULTISCALE OBJECTIVE FUNCTION FOR CAMERA COLOR CORRECTION
4798A NEAR-FIELD SOURCE LOCALIZATION METHOD FOR UNIFORM/SPARSE CENTRALLY SYMMETRIC RECTANGULAR ARRAYS
7410A NEURAL SYNTAX PARSER FOR CORONARY ARTERY ANATOMICAL LABELING IN CORONARY CT ANGIOGRAPHY
7347A Neurophysiological-Auditory "Listen Receipt" for Communication Enhancement
1565A New Fourth-Order Sparse Array Generator Based on Sum-Difference Co-array Analysis
4557A New Perspective on Understanding Resolution Limit via An Asymptotic Study of Christoffel-Darboux Kernel based Spectrum Estimator
9353A New Pre-training Paradigm for Offline Multi-agent Reinforcement Learning with Suboptimal Data
5016A new similarity-based relational knowledge distillation method
5623A NOVEL 3-D FOCUSING SCHEME FOR DISTRIBUTED SAR TOMOGRAPHY
11910A Novel Approach to WaveNet Architecture for RF Signal Separation with Learnable Dilation and Data Augmentation
7147A NOVEL ARCHITECTURE OF DEEP FEATURE-BASED GAUSSIAN PROCESSES WITH AN ENSEMBLE OF KERNELS
7183A NOVEL CASCADE INSTRUCTION TUNING METHOD FOR BIOMEDICAL NER
3358A Novel Contrastive Diffusion Graph Convolutional Network for Few-Shot Skeleton-Based Action Recognition
10198A NOVEL CROSS-SENSOR SELF-SUPERVISED LEARNING METHOD FOR ROTATING MACHINERY FAULT DIAGNOSIS
1907A NOVEL DEMODULATION AND SELECTION PILOT POWER TRADE-OFF FOR CODEBOOK-BASED IRS WITH IMPERFECT CHANNEL ESTIMATES
8073A NOVEL DISCRETE FRACTIONAL COMPLEX HADAMARD TRANSFORM FOR MEDICAL IMAGE ENCRYPTION
7081A NOVEL ITERATIVE THRESHOLDING ALGORITHM FOR ARCTANGENT REGULARIZATION PROBLEM
5357A NOVEL LOCAL-GLOBAL FEATURE FUSION FRAMEWORK FOR BODY-WEIGHT EXERCISE RECOGNITION WITH PRESSURE MAPPING SENSORS
6674A NOVEL MEDICAL IMAGE FUSION FRAMEWORK INTEGRATING MULTI-SCALE ENCODER-DECODER WITH DISCRETE WAVELET DECOMPOSITION
9506A Novel Multi-atlas Fusion Model Based On Contrastive Learning For Functional Connectivity Graph Diagnosis
8931A NOVEL MULTIMODAL SENTIMENT ANALYSIS MODEL BASED ON GATED FUSION AND MULTI-TASK LEARNING
3150A NOVEL RESIDUAL-GUIDED LEARNING METHOD FOR IMAGE STEGANOGRAPHY
5134A One-Class Approach to Detect Super-Resolution Satellite Imagery with Spectral Features
8638A parameterized generative adversarial network using cyclic projection for explainable medical image classifications
6132A PLS-INTEGRATED LASSO METHOD WITH APPLICATION IN INDEX TRACKING
3216A PRACTICAL ONLINE MULTICHANNEL DEREVERBERATION APPROACH WITH DATA-REUSE TECHNIQUE
7041A PRIOR DRIVEN SEMI-SUPERVISED VITGAN FOR IMAGE RECOLORIZATION
2226A PROBABILITY GRADIENT BASED APPROACH FOR SAMPLING BOUNDARIES OF IN-DOMAIN DATA
4564A Prompt-based Method With Multi-View Optimization for Open Relation Extraction
10384A Property-Guided Diffusion Model for Generating Molecular Graphs
5455A RAY-TRACING BASED FINGERPRINTING METHOD FOR PASSIVE LOCALIZATION IN URBAN NLOS ENVIRONMENT
8948A REAL-TIME ACTIVE SPEAKER DETECTION SYSTEM INTEGRATING AN AUDIO-VISUAL SIGNAL WITH A SPATIAL QUERYING MECHANISM
9012A REAL-TIME LYRICS ALIGNMENT SYSTEM USING CHROMA AND PHONETIC FEATURES FOR CLASSICAL VOCAL PERFORMANCE
7076A REAL-TIME VIDEO QUALITY METRIC FOR HTTP ADAPTIVE STREAMING
3574A RECONSTRUCTION-BASED FEATURE ADAPTATION FOR ANOMALY DETECTION WITH SELF-SUPERVISED MULTI-SCALE AGGREGATION
2450A Reduced-Reference Quality Assessment Metric for Textured Mesh Digital Humans
1463A Relation-Aware Heterogeneous Graph Transformer on Dynamic Fusion for Multimodal Classification Tasks
8720A RIEMANNIAN-BASED JOINT DESIGN FRAMEWORK OF MIMO RADAR TRANSMIT WAVEFORM AND RECEIVE FILTER VIA INFORMATION THEORY
4345A ROBUST AND SCALABLE METHOD WITH AN ANALYTIC SOLUTION FOR MULTI-SUBJECT FMRI DATA ANALYSIS
5643A ROBUST AUDIO DEEPFAKE DETECTION SYSTEM VIA MULTI-VIEW FEATURE
11503A ROBUST FRAMEWORK TO DESIGN OPTIMAL SENSOR LOCATIONS FOR TOA OR RSS SOURCE LOCALIZATION TECHNIQUES
4076A Robust GLRT Detector against Missing Data in Cooperative Sensing
8800A ROBUST PITCH-FUSION MODEL FOR SPEECH EMOTION RECOGNITION IN TONAL LANGUAGES
4433A ROBUST QUANTILE HUBER LOSS WITH INTERPRETABLE PARAMETER ADJUSTMENT IN DISTRIBUTIONAL REINFORCEMENT LEARNING
2018A SALIENCY ENHANCED FEATURE FUSION BASED MULTISCALE RGB-D SALIENT OBJECT DETECTION NETWORK
7436A SCALABLE SPARSE TRANSFORMER MODEL FOR SINGING MELODY EXTRACTION
11921A SELF-SUPERVISED LEARNING APPROACH FOR DETECTING NON-PSYCHOTIC RELAPSES USING WEARABLE-BASED DIGITAL PHENOTYPING
3439A SELF-SUPERVISED PRESSURE MAP HUMAN KEYPOINT DETECTION APPROCH: OPTIMIZING GENERALIZATION AND COMPUTATIONAL EFFICIENCY ACROSS DATASETS
9120A SEPARATION PRIORITY PIPELINE FOR SINGLE-CHANNEL SPEECH SEPARATION IN NOISY ENVIRONMENTS
2383A SEQUENTIAL AVERAGING PLUG-AND-PLAY METHOD FOR IMAGE RESTORATION VIA FIXED-POINT PROJECTION
10168A SIMPLE AND EFFECTIVE METHOD FOR ANOMALY DETECTION ON ATTRIBUTED GRAPHS VIA FEATURE CONSISTENCY
2801A Smoothed Bregman Proximal Gradient Algorithm for Decentralized Nonconvex Optimization
1461A SOFT CONTRASTIVE LEARNING-BASED PROMPT MODEL FOR FEW-SHOT SENTIMENT ANALYSIS
8613A SOUND APPROACH: USING LARGE LANGUAGE MODELS TO GENERATE AUDIO DESCRIPTIONS FOR EGOCENTRIC TEXT-AUDIO RETRIEVAL
8645A SPATIAL LONG-TERM ITERATIVE MASK ESTIMATION APPROACH FOR MULTI-CHANNEL SPEAKER DIARIZATION AND SPEECH RECOGNITION
2161A SPEAKER RECOGNITION METHOD BASED ON STABLE LEARNING
10394A SPECTRAL ANALYSIS OF GRAPH NEURAL NETWORKS ON DENSE AND SPARSE GRAPHS
2824A STATISTICAL CHARACTERIZATION OF COMMUNICATION PERFORMANCE IN RIS-AIDED NETWORKS
7297A STEERED RESPONSE POWER APPROACH WITH BILINEAR PREDICTION-BASED TRADE-OFF PREWHITENING FOR SPEAKER LOCALIZATION
1690A STOCHASTIC GRADIENT APPROACH FOR COMMUNICATION EFFICIENT CONFEDERATED LEARNING
2770A Stochastic Proximal WMMSE for Ergodic Sum Rate Maximization
10275A STUDY OF MISPRONUNCIATION DETECTION AND DIAGNOSIS BASED ON META-LEARNING
2806A STUDY OF MULTICHANNEL SPATIOTEMPORAL FEATURES AND KNOWLEDGE DISTILLATION ON ROBUST TARGET SPEAKER EXTRACTION
3187A STUDY ON COMBINING NON-PARALLEL AND PARALLEL METHODOLOGIES FOR MANDARIN-ENGLISH CROSS-LINGUAL VOICE CONVERSION
4187A STUDY ON GRAPH EMBEDDING FOR SPEAKER RECOGNITION
2325A STUDY ON THE ADVERSE IMPACT OF SYNTHETIC SPEECH ON SPEECH RECOGNITION
9834A Supervised Information Enhanced Multi-granularity Contrastive Learning Framework for EEG based Emotion Recognition
6974A TARGETED ADVERSARIAL ATTACK METHOD FOR MULTI-CLASSIFICATION MALICIOUS TRAFFIC DETECTION
11909A TIME-FREQUENCY BAND-SPLIT NEURAL NETWORK FOR REAL-TIME FULL-BAND PACKET LOSS CONCEALMENT
4295A TRANSFORMER APPROACH FOR POLYPHONIC AUDIO-TO-SCORE TRANSCRIPTION
1566A TRI-DYNAMIC PREPROCESSING FRAMEWORK FOR UGC VIDEO COMPRESSION
5109A TWO-STAGE DEHAZING FRAMEWORK BASED ON INVERTED IMAGE CURVE-ENHANCEMENT
9417A TWO-STAGE FRAMEWORK IN CROSS-SPECTRUM DOMAIN FOR REAL-TIME SPEECH ENHANCEMENT
11917A U-NET ARCHITECTURE FOR TIME-FREQUENCY INTERFERENCE SIGNAL SEPARATION OF RF WAVEFORMS
10178A UNIFIED DNN-BASED SYSTEM FOR INDUSTRIAL PIPELINE SEGMENTATION
1226A UNIFIED FRAMEWORK FOR MULTI-INTENT SPOKEN LANGUAGE UNDERSTANDING WITH PROMPTING
2091A UNIFIED FRONT-END FRAMEWORK FOR ENGLISH TEXT-TO-SPEECH SYNTHESIS
7036A UNIFIED LOSS FUNCTION TO TACKLE INTER-CLASS AND INTRA-CLASS DATA IMBALANCE IN SOUND EVENT DETECTION
1047A VARIABLE SMOOTHING FOR NONCONVEXLY CONSTRAINED NONSMOOTH OPTIMIZATION WITH APPLICATION TO SPARSE SPECTRAL CLUSTERING
7832A WASSERSTEIN GRAPH DISTANCE BASED ON DISTRIBUTIONS OF PROBABILISTIC NODE EMBEDDINGS
8952A weighted-variance variational autoencoder model for speech enhancement
8961AAT: ADAPTING AUDIO TRANSFORMER FOR VARIOUS ACOUSTICS RECOGNITION TASKS
11506Absolute Security in Terahertz Wireless Links
3666ACCELERATED RECOVERY OF SPECTRALLY SPARSE SIGNALS VIA MODIFIED PROXIMAL GRADIENT IN HANKEL SPACE
9936ACCELERATING GRADIENT DESCENT FOR OVER-PARAMETERIZED ASYMMETRIC LOW-RANK MATRIX SENSING VIA PRECONDITIONING
2107ACCENT-SPECIFIC VECTOR QUANTIZATION FOR JOINT UNSUPERVISED AND SUPERVISED TRAINING IN ACCENT ROBUST SPEECH RECOGNITION
9373Accurate and Robust Scene Text Recognition via Adversarial Training
6057ACCURATE GIGAPIXEL CROWD COUNTING BY ITERATIVE ZOOMING AND REFINEMENT
8578ACCURATE INTERPOLATION OF SCATTERED DATA VIA LEARNING RELATION GRAPH
7271ACOUSTIC BPE FOR SPEECH GENERATION WITH DISCRETE TOKENS
9014ACTIVATION COMPRESSION OF GRAPH NEURAL NETWORKS USING BLOCK-WISE QUANTIZATION WITH IMPROVED VARIANCE MINIMIZATION
2333ACTIVE EXPLAINABLE RECOMMENDATION WITH LIMITED LABELING BUDGETS
5967ACTIVE LEARNING FOR SOUND EVENT CLASSIFICATION USING BAYESIAN NEURAL NETWORKS WITH GAUSSIAN VARIATIONAL POSTERIOR
2933ACTIVE LEARNING WITH CORE-SET SAMPLING AND SCALE-SENSITIVE LOSS FOR 3D OBJECT DETECTION
7978ACTIVE NOISE CONTROL OVER 3D SPACE WITH A DYNAMIC NOISE SOURCE
4052ACTIVE NOISE CONTROL OVER A LARGE REGION WITH MULTIPLE SPHERICAL MICROPHONE ARRAYS IN WAVE DOMAIN
9895ACTIVITY RECOGNITION METHOD BASED ON KERNEL SUPERVISED LAPLACIAN EIGENMAPS
6794ADAFL: ADAPTIVE CLIENT SELECTION AND DYNAMIC CONTRIBUTION EVALUATION FOR EFFICIENT FEDERATED LEARNING
9832ADAMER-CTC: CONNECTIONIST TEMPORAL CLASSIFICATION WITH ADAPTIVE MAXIMUM ENTROPY REGULARIZATION FOR AUTOMATIC SPEECH RECOGNITION
1820AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis
5503ADAPTER-BASED INCREMENTAL LEARNING FOR FACE FORGERY DETECTION
8853ADAPTING FRECHET AUDIO DISTANCE FOR GENERATIVE MUSIC EVALUATION
4814ADAPTING LARGE LANGUAGE MODEL WITH SPEECH FOR FULLY FORMATTED END-TO-END SPEECH RECOGNITION
6768ADAPTING PITCH-BASED SELF SUPERVISED LEARNING MODELS FOR TEMPO ESTIMATION
4508Adaptive Chroma Block Vector Derivation From Luma for Screen Content Coding
1939Adaptive Confidence Multi-View Hashing for Multimedia Retrieval
5188ADAPTIVE DATA AUGMENTATION FOR ASPECT SENTIMENT QUAD PREDICTION
3306ADAPTIVE FOURIER DECOMPOSITION BASED SIGNAL EXTRACTION ON WEAK ELECTROMAGNETIC FIELD
10109Adaptive Gaussian Regularization Constrained Sparse Subspace Clustering for Image Segmentation
1537ADAPTIVE GRID 2-D DIRECTION OF ARRIVAL ESTIMATION METHOD USING AN INTEGRATED DICTIONARY
4698ADAPTIVE HEAD POSE ESTIMATION WITH REAL-TIME STRUCTURED LIGHT
4588ADAPTIVE IMAGE-ENHANCED KNOWLEDGE GRAPH COMPLETION
7628ADAPTIVE JOINT CHANNEL ESTIMATION/DATA DETECTION IN FLEXIBLE MULTICARRIER MIMO SYSTEMS - A TENSOR-BASED APPROACH
3959ADAPTIVE KALMANNET: DATA-DRIVEN KALMAN FILTER WITH FAST ADAPTATION
2075ADAPTIVE MULTI-ARMED BANDIT LEARNING FOR TASK OFFLOADING IN MOBILE EDGE COMPUTING
8653Adaptive Multi-Exposure Fusion for Enhanced Neural Radiance Fields
7636ADAPTIVE MULTIVIEW COMMUNITY-PRESERVED GRAPH CONVOLUTIONAL NETWORK FOR MULTIATLAS-BASED FUNCTIONAL CONNECTIVITY ANALYSIS
3672Adaptive Multi-View Joint Contrastive Learning on Graphs
10002ADAPTIVE ORDER AGGREGATOR AND EXTRACTOR GRAPH NEURAL NETWORK
4135Adaptive parameter sharing for multi-agent reinforcement learning
8588ADAPTIVE PEDESTRIAN TRAJECTORY PREDICTION VIA TARGET-DIRECTED ANGLE AUGMENTATION
1621ADAPTIVE PROMPT CONSTRUCTION METHOD FOR RELATION EXTRACTION
7010ADAPTIVE QUANTIZATION WITH MIXED-PRECISION BASED ON LOW-COST PROXY
7844Adaptive Reweighted Sparse Belief Propagation Decoding for Polar Codes
8655ADAPTIVE SECONDARY TRANSFORM SETS FOR VIDEO CODING BEYOND AV1
5443Adaptive Sensor Selection With Deterministic Priors for DoA Tracking
8681ADAPTIVE SPATIAL-TEMPORAL HYPERGRAPH FUSION LEARNING FOR NEXT POI RECOMMENDATION
5017ADAPTIVE SPEECH EMOTION REPRESENTATION LEARNING BASED ON DYNAMIC GRAPH
8672Adaptive Super Resolution For One-Shot Talking-Head Generation
3635ADAPTIVE VIDEO WATERMARKING WITH PERCEPTUAL GUARANTEE AND EFFICIENCY OPTIMIZATION
7216ADAPTIVE-AVG-POOLING BASED ATTENTION VISION TRANSFORMER FOR FACE ANTI-SPOOFING
8313ADDRESSING CONFOUNDS IN FUNCTIONAL CONNECTIVITY ANALYSES OF CALCIUM IMAGING
7480Addressing Data Scarcity In Voice Disorder Detection with Self-Supervised Models
5478ADHD DIAGNOSIS AND BIOMARKER DETECTION BASED ON MULTIMODAL GRAPH CONVOLUTIONAL NEURAL NETWORK
2191ADIFT: ZERO-SHOT GENERATIVE MODEL ADAPTION VIA ADAPTIVE DOMAIN-INVARIANT FEATURE TRANSFER
4307Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks
11911ADVANCING THE FRONTIERS OF DEEP LEARNING FOR LOW-DOSE 3D CONE-BEAM COMPUTED TOMOGRAPHY (CT) RECONSTRUCTION
11534Adversarial Continual Learning to Transfer Self-Supervised Speech Representations for Voice Pathology Detection
8316Adversarial Domain Adaptation for Classification with Nested Dichotomies
9652ADVERSARIAL JAMMING FOR AUTOENCODER DISTRIBUTION MATCHING
4633ADVERSARIAL LEARNING ON COMPRESSED POSTERIOR SPACE FOR NON-ITERATIVE SCORE-BASED END-TO-END TEXT-TO-SPEECH
9094Adversarial Robustness of Convolutional Models Learned in the Frequency Domain
6368ADVERSARIAL SPEECH FOR VOICE PRIVACY PROTECTION FROM PERSONALIZED SPEECH GENERATION
4062ADVSHADOW: EVADING DEEPFAKE DETECTION VIA ADVERSARIAL SHADOW ATTACK
3141ADVSV: AN OVER-THE-AIR ADVERSARIAL ATTACK DATASET FOR SPEAKER VERIFICATION
8559AdvTTS: Adversarial Text-to-Speech Synthesis Attack on Speaker Identification Systems
8708AEAM3D: ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION
8391AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition
8790Aerial-IRS-Assisted Load Balancing in Downlink Networks
7713AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
2588AHRNet: Attention and Heatmap-based Regressor for Hand Pose Estimation and Mesh Recovery
7621AINUR: HARMONIZING SPEED AND QUALITY IN DEEP MUSIC GENERATION THROUGH LYRICS-AUDIO EMBEDDINGS
4678ALIGN, ADAPT AND INJECT: AUDIO-GUIDED IMAGE GENERATION, EDITING AND STYLIZATION
3362All Neural Kronecker Product Beamforming for Speech Extraction with Large-scale Microphone Arrays
5296ALLEVIATING HALLUCINATIONS VIA SUPPORTIVE WINDOW INDEXING IN ABSTRACTIVE SUMMARIZATION
3709AlphaRotate: A Rotation Detection Benchmark using TensorFlow
11535ALTERNATING LEAST-SQUARES-BASED MICROPHONE ARRAY PARAMETER ESTIMATION FOR A SINGLE-SOURCE REVERBERANT AND NOISY ACOUSTIC SCENARIO
4142AMBISONICS NETWORKS - THE EFFECT OF RADIAL FUNCTIONS REGULARIZATION
6831AN ACCURATE AND EFFICIENT NEURAL NETWORK FOR OCTA VESSEL SEGMENTATION AND A NEW DATASET
3235AN ACTIVE NOISE CONTROL SYSTEM BASED ON SOUNDFIELD INTERPOLATION USING A PHYSICS-INFORMED NEURAL NETWORK
3826AN ADAPTER-BASED UNIFIED MODEL FOR MULTIPLE SPOKEN LANGUAGE PROCESSING TASKS
7430AN ADAPTIVE ALGORITHM FOR TRACKING THIRD-ORDER COUPLED CANONICAL POLYADIC DECOMPOSITION
8738AN ANCHOR LEARNING APPROACH FOR CITATION FIELD LEARNING
10024An Asymptotically Achievable Rate Bound for Establishing High-Fidelity Entanglements in Quantum Networks
9668AN ATTENTION-ENHANCED RETENTIVE BROAD LEARNING SYSTEM FOR SUBJECT-GENERIC EMOTION RECOGNITION FROM EEG SIGNALS
11874AN AUDIO-QUALITY-BASED MULTI-STRATEGY APPROACH FOR TARGET SPEAKER EXTRACTION IN THE MISP 2023 CHALLENGE
8453AN AUDIO-TEXTUAL DIFFUSION MODEL FOR CONVERTING SPEECH SIGNALS INTO ULTRASOUND TONGUE IMAGING DATA
5497AN EFFECTIVE MIXTURE-OF-EXPERTS APPROACH FOR CODE-SWITCHING SPEECH RECOGNITION LEVERAGING ENCODER DISENTANGLEMENT
8216AN EFFICIENT ALGORITHM FOR CLUSTERED MULTI-TASK COMPRESSIVE SENSING
5849AN EFFICIENT ALGORITHM FOR MULTIUSER SUM-RATE MAXIMIZATION OF LARGE-SCALE Active RIS-AIDED MIMO SYSTEM
8339AN EFFICIENT ALTERNATING RIEMANNIAN/PROJECTED GRADIENT DESCENT ASCENT ALGORITHM FOR FAIR PRINCIPAL COMPONENT ANALYSIS
3158An Efficient and Interpretable Speech Enhancement Network via Deep Dictionary Learning
8763AN EFFICIENT HIERARCHICAL BLOCK COORDINATE DESCENT METHOD FOR TIME-VARYING GRAPHICAL LASSO
6921An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection
9411An Efficient Transformer for Demosaicing via Compressed Multi-branch Attention Mechanism
1383AN EMPIRICAL INVESTIGATION OF DOMAIN ADAPTATION ABILITY FOR CHINESE SPELLING CHECK MODELS
7060AN EMPIRICAL STUDY ON THE IMPACT OF POSITIONAL ENCODING IN TRANSFORMER-BASED MONAURAL SPEECH ENHANCEMENT
1955AN END-TO-END EEG CHANNEL SELECTION METHOD WITH RESIDUAL GUMBEL SOFTMAX FOR BRAIN-ASSISTED SPEECH ENHANCEMENT
3887AN ERROR SELF-CORRECTED DOA ESTIMATION MODEL FOR SPARSE ARRAY BASED ON ANM
7610An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
6584AN EXPERIMENTAL COMPARISON OF NOISE-ROBUST TEXT-TO-SPEECH SYNTHESIS SYSTEMS BASED ON SELF-SUPERVISED REPRESENTATION
3484AN EXPLAINABLE PROXY MODEL FOR MULTILABEL AUDIO SEGMENTATION
7170AN EXPLICIT MULTI-MODAL FUSION METHOD FOR SIGN LANGUAGE TRANSLATION
4004AN INITIAL INVESTIGATION OF NEURAL REPLAY SIMULATOR FOR OVER-THE-AIR ADVERSARIAL PERTURBATIONS TO AUTOMATIC SPEAKER VERIFICATION
7379AN INTERPRETABLE AND GENERALIZABLE SPEECH DETECTOR BASED ON A CNN-LSTM FRAMEWORK
7025AN INVESTIGATION OF DISTRIBUTION ALIGNMENT IN MULTI-GENRE SPEAKER RECOGNITION
3126AN MVDR-EMBEDDED U-NET BEAMFORMER FOR EFFECTIVE AND ROBUST MULTICHANNEL SPEECH ENHANCEMENT
6904AN OPTIMIZED INTERLEAVED OFDM CHIRP ORTHOGONAL WAVEFORM DESIGN FOR DECHIRPED MINIATURE MMW MIMO RADAR
9579AN UNSUPERVISED SEGMENTATION OF VOCAL BREATH SOUNDS
10434Analysis and Utilization of Hidden Information in Model Inversion Attacks
1473Analysis of An Elliptic Localization Algorithm Using Fixed Point Iteration
7938ANALYSIS OF HIGH-ORDER BRAIN NETWORKS RESOLVED IN TIME AND FREQUENCY USING CP DECOMPOSITION
7397Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust?
3991ANALYSIS OF THE SINR IN LEO-PNT SYSTEMS WITH 5G PRS MULTIPLEXING: INTEGRATION OF PRS AND NTN
10440Analytical performance assessment of 2-D Tensor ESPRIT in terms of physical parameters
10419ANALYZING ADVERSARIAL VULNERABILITIES OF GRAPH LOTTERY TICKETS
6902ANCHOR-GUIDED GAN WITH CONTRASTIVE LOSS FOR LOW-RESOURCE OUT-OF-DOMAIN DETECTION
7368ANIM-400K: A LARGE-SCALE DATASET FOR AUTOMATED END TO END DUBBING OF VIDEO
5084ANM-BASED SOURCE LOCALIZATION UNDER MIXED FIELD
9884ANOMALOUS SOUND DETECTION BY FEATURE-LEVEL ANOMALY SIMULATION
2734ANOMALY DETECTION FROM A FREQUENCY PERSPECTIVE: M-BAND WAVELET PACKET ANOMALY DETECTION NETWORK
1886ANOMALY-AWARE SEMANTIC SELF-ALIGNMENT FRAMEWORK FOR VIDEO-BASED PERSON RE-IDENTIFICATION
9025ANONYMIZING SPEAKER VOICES: EASY TO IMITATE, DIFFICULT TO RECOGNIZE?
3373ANTI-DECEPTION JAMMING POWER OPTIMIZATION STRATEGY FOR MULTI-TARGET TRACKING TASKS IN MULTI-RADAR SYSTEMS
7517APOLLO'S UNHEARD VOICES: GRAPH ATTENTION NETWORKS FOR SPEAKER DIARIZATION AND CLUSTERING FOR FEARLESS STEPS APOLLO COLLECTION
9190APPLICATION OF SNNs MODEL BASED ON MULTI-DIMENSIONAL ATTENTION IN DRONE RADIO FREQUENCY SIGNAL CLASSIFICATION
7243APPLYING HYBRID QUANTUM LSTM FOR INDOOR LOCALIZATION BASED ON RSSI
4434AQF: Assessing the Quality of Hyperspectral Reconstruction with a Learnable Metric
3229ARBITRARY STYLE TRANSFER BASED ON CONTENT INTEGRITY AND STYLE CONSISTENCY ENHANCEMENT
2796Arbitrary Style Transfer with Prototype-based Channel Alignment
4006ARCHITECTURE-AGNOSTIC ITERATIVE BLACK-BOX CERTIFIED DEFENSE AGAINST ADVERSARIAL PATCHES
9279ARE DEEP NEURAL NETWORKS ROBUST TO NAMED ENTITIES? AN ADVERSARIAL ATTACK AND DEFENSE PERSPECTIVE
7809ARE SNNS TRULY ENERGY-EFFICIENT? - A HARDWARE PERSPECTIVE
2735ARE SOFT PROMPTS GOOD ZERO-SHOT LEARNERS FOR SPEECH RECOGNITION?
3477ARFA: AN ASYMMETRIC RECEPTIVE FIELD AUTOENCODER MODEL FOR SPATIOTEMPORAL PREDICTION
3705ARRAY GEOMETRY OPTIMIZATION FOR REGION-OF-INTEREST NEAR-FIELD BEAMFORMING
3839ASFORMER: LEARNING FROM ADJACENT SCALE
2595ASPED: AN AUDIO DATASET FOR DETECTING PEDESTRIANS
6798AS-PVAD: A FRAME-WISE PERSONALIZED VOICE ACTIVITY DETECTION NETWORK WITH ATTENTIVE SCORE LOSS
4129ASSESSING GNSS CARRIER-TO-NOISE-DENSITY RATIO ESTIMATION IN THE PRESENCE OF MEACONER INTERFERENCE
6255ASSESSING VIBROACOUSTIC SOUND MASSAGE THROUGH THE BIOSIGNAL OF HUMAN SPEECH: EVIDENCE OF IMPROVED WELLBEING
4914ASYMMETRIC CLEAN SEGMENTS-GUIDED SELF-SUPERVISED LEARNING FOR ROBUST SPEAKER VERIFICATION
7905Asymptotic Behavior of Super-resolution Sparse Bayesian Learning
9820ASYMPTOTICALLY TIGHT MISSPECIFIED BAYESIAN CRAMÉR-RAO BOUND
9715ASYNCHRONOUS DIFFUSION LEARNING WITH AGENT SUBSAMPLING AND LOCAL UPDATES
4073ATTA-NET: ATTENTION AGGREGATION NETWORK FOR AUDIO-VISUAL EMOTION RECOGNITION
1968ATTENTION DECOUPLING FOR QUERY-BASED OBJECT DETECTION
8911ATTENTION IS ALL YOU NEED FOR BLIND ROOM VOLUME ESTIMATION
5174ATTENTION-BASED SPATIAL-FREQUENCY INFORMATION NETWORK FOR UNDERWATER SINGLE IMAGE SUPER-RESOLUTION
5409ATTENTION-DRIVEN MULTICHANNEL SPEECH ENHANCEMENT IN MOVING SOUND SOURCE SCENARIOS
2297ATTENTION-GUIDED ADAPTATION FOR CODE-SWITCHING SPEECH RECOGNITION
3689ATTENTIONLUT: ATTENTION FUSION-BASED CANONICAL POLYADIC LUT FOR REAL-TIME IMAGE ENHANCEMENT
7925AttHear: Explaining Audio Transformers Using Attention-Aware NMF
9405ATTRIBUTE-AWARE AMPLIFICATION OF FACIAL FEATURE SEQUENCES FOR FACIAL EMOTION RECOGNITION
1767Attribute-aware Head Swapping Guided by 3D Modeling
3185Attribution-based Scanline Perturbation Attack on 3D Detectors of LiDAR Point Clouds
5085ATTR-INT: A SIMPLE AND EFFECTIVE ENTITY ALIGNMENT FRAMEWORK FOR HETEROGENEOUS KNOWLEDGE GRAPHS
9804AUDIO DEEPFAKE DETECTION WITH SELF-SUPERVISED WAVLM AND MULTI-FUSION ATTENTIVE CLASSIFIER
9821AUDIO DIFFERENCE LEARNING FOR AUDIO CAPTIONING
4683AUDIO MATCH CUTTING: FINDING AND CREATING MATCHING AUDIO TRANSITIONS IN MOVIES AND VIDEOS
9719Audio prompt tuning for universal sound separation
7378AUDIO TRANSFORMER FOR SYNTHETIC SPEECH DETECTION VIA FORMANT MAGNITUDE AND PHASE ANALYSIS
1114AUDIO-AIDED LEARNING FRAMEWORK FOR IMAGE CLASSIFICATION WITH LIMITED TRAINING IMAGES
3101AUDIO-FREE PROMPT TUNING FOR LANGUAGE-AUDIO MODELS
7774Audio-Journey: Open Domain Latent Diffusion Based Text-to-Audio Generation
7441AudioSR: Versatile Audio Super-resolution at Scale
3782AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH
4865AUDIO-VISUAL CHILD-ADULT SPEAKER CLASSIFICATION IN DYADIC INTERACTIONS
7799AUDIOVISUAL SPEAKER SEPARATION WITH FULL- AND SUB-BAND MODELING IN THE TIME-FREQUENCY DOMAIN
7075AUDIO-VISUAL SPEECH RECOGNITION IN-THE-WILD: MULTI-ANGLE VEHICLE CABIN CORPUS AND ATTENTION-BASED METHOD
6243AUDITORY CORTEX-INSPIRED SPECTRAL ATTENTION MODULATION FOR BINAURAL SOUND LOCALIZATION IN HRTF MISMATCH
8024Augment on Manifold: Mixup Regularization with UMAP
8356AUGMENTING CONFORMERS WITH STRUCTURED STATE-SPACE SEQUENCE MODELS FOR ONLINE SPEECH RECOGNITION
7650AUGMENTING TRANSFORMER AUTOENCODERS WITH PHENOTYPE CLASSIFICATION FOR ROBUST DETECTION OF PSYCHOTIC RELAPSES
7949AUGSUMM: TOWARDS GENERALIZABLE SPEECH SUMMARIZATION USING SYNTHETIC LABELS FROM LARGE LANGUAGE MODELS
6396AUTOCALI: ENHANCING AOA-BASED INDOOR LOCALIZATION THROUGH AUTOMATIC PHASE CALIBRATION
1102AUTOFGNN: A FRAMEWORK FOR EXTRACTING ALL FREQUENCY INFORMATION FROM LARGE-SCALE GRAPHS
9686Automated Labeling of Automotive Radar Azimuth Multipath
6254AUTOMATIC CHANNEL SELECTION AND SPATIAL FEATURE INTEGRATION FOR MULTI-CHANNEL SPEECH RECOGNITION ACROSS VARIOUS ARRAY TOPOLOGIES
9249AUTOMATIC DESIGN OF ADAPTER ARCHITECTURES FOR ENHANCED PARAMETER-EFFICIENT FINE-TUNING
3517AUTOMATIC DETECTION OF SLEEPINESS-RELATED SYNDROMES AND SYMPTOMS USING VOICE AND SPEECH BIOMARKERS
2278Automatic Recognition of Gesture Identity and Onset of Cued-Speech
8519AUTOMATIC SPEECH RECOGNITION TUNED FOR CHILD SPEECH IN THE CLASSROOM
1434AUTOMATIC TEMPORAL ALIGNMENT FOR PITCH ESTIMATION EVALUATION
8270AUTOMOTIVE RADAR INTERFERENCE CHARACTERIZATION: FMCW OR PMCW?
8271AUTOMOTIVE RADAR INTERFERENCE MITIGATION VIA SINR MAXIMIZATION
5615AUTOMOTIVE RADAR POINT CLOUD PARAMETRIC DENSITY ESTIMATION USING CAMERA IMAGES
3290AUTONOMOUS GENERATIVE FEATURE REPLAY FOR NON-EXEMPLAR CLASS-INCREMENTAL LEARNING
7570AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
4358AUTOREGRESSIVE 3D SHAPE COMPLETION VIA SPHERE-GUIDED DISENTANGLED REPRESENTATION
7667AUTOSEN: IMPROVING AUTOMATIC WIFI HUMAN SENSING THROUGH CROSS-MODAL AUTOENCODER
8564AutoSGM: A Unified Lowpass Regularization Framework for Accelerated Learning
4665AutoST: Training-free Neural Architecture Search for Spiking Transformers
4347AV2WAV: DIFFUSION-BASED RE-SYNTHESIS FROM CONTINUOUS SELF-SUPERVISED FEATURES FOR AUDIO-VISUAL SPEECH ENHANCEMENT
7520AV-SUPERB: A MULTI-TASK EVALUATION BENCHMARK FOR AUDIO-VISUAL REPRESENTATION MODELS
5450AXIS ORDER INVARIANCE LEARNED FROM POINT CLOUDS
3670BAE-Net: A Low complexity and high fidelity bandwidth-adaptive neural network for speech super-resolution
6499BALANCED AND DISCRIMINATIVE CONTRASTIVE LEARNING FOR CLASS-IMBALANCED MEDICAL IMAGES
9104Balanced Learning for Multi-Domain Long-tailed Speaker Recognition
8159Balancing Easy and Hard Distortions: A Multi-Rate Knowledge Distillation Strategy for Blind Image Quality Assessment
9743Balancing Representation Abstractions and Local Details Preservation for 3D Point Cloud Quality Assessment
7478Balancing Speaker-Rater Fairness for Gender-Neutral Speech Emotion Recognition
8020BALLISTOCARDIOGRAM-BASED HEART RATE VARIABILITY ESTIMATION FOR STRESS MONITORING USING CONSUMER EARBUDS
2230BANDWIDTH-EFFICIENT INFERENCE FOR NERUAL IMAGE COMPRESSION
7668BASS ACCOMPANIMENT GENERATION VIA LATENT DIFFUSION
7876BATCH SUBSTITUTION CALIBRATION OF A MEMS MICROPHONE ARRAY : IMPACT OF SENSOR PERFORMANCE DISPERSION ON DIRECTIVITY ESTIMATION
1703Bayesian Activity Detection for Massive Connectivity in Cell-Free IoT Networks
8617BAYESIAN LEARNING-BASED KALMAN SMOOTHING FOR LINEAR DYNAMICAL SYSTEMS WITH UNKNOWN SPARSE INPUTS
4049BAYESIAN OPTIMIZATION WITH GAUSSIAN PROCESSES FOR ROBUST LOCALIZATION
11478Bayesian Tensor Tucker Completion With a Flexible Core
7710BAYESIAN TOPOLOGY INFERENCE ON PARTIALLY KNOWN NETWORKS FROM INPUT-OUTPUT PAIRS
4291BAYESIAN-BOOSTED METALOC: EFFICIENT TRAINING AND GUARANTEED GENERALIZATION FOR INDOOR LOCALIZATION
5689BCC: BIDIRECTIONAL CONSISTENCY CONSTRAINT METHOD FOR HIERARCHICAL TEXT CLASSIFICATION
6501Beamforming Design and Performance Evaluation for RIS-aided Localization using LEO Satellite Signals
3326Beamforming Through Online Convex Combination of Differential Beamformers
11480BeamSync: Over-The-Air Synchronization for Distributed Massive MIMO Systems
2485BEAST: ONLINE JOINT BEAT AND DOWNBEAT TRACKING BASED ON STREAMING TRANSFORMER
5332BENCHMARKING ADVERSARIAL ROBUSTNESS OF IMAGE SHADOW REMOVAL WITH SHADOW-ADAPTIVE ATTACKS
9173BETA QUANTILE REGRESSION FOR ROBUST ESTIMATION OF UNCERTAINTY IN THE PRESENCE OF OUTLIERS
6244BEVLOC: END-TO-END 6-DOF LOCALIZATION VIA CROSS-MODALITY CORRELATION UNDER BIRD’S EYE VIEW
4166BEVOXSEG: BEV-VOXEL REPRESENTATION FOR FAST AND ACCURATE CAMERA-BASED 3D SEGMENTATION
2937BEYOND EMPIRICAL WINDOWING: AN ATTENTION-BASED APPROACH FOR TRUST PREDICTION IN AUTONOMOUS VEHICLES
7426BEYOND SIMPLE TEXT STYLE TRANSFER: UNVEILING COMPOUND TEXT STYLE TRANSFER WITH PROMPT-BASED PRE-TRAINED LANGUAGE MODELS
4769Beyond the Limit of Weight-Sharing: Pioneering Space-Evolving NAS with Large Language Models
3954BEYOND THE SNOWFALL: ENHANCING SNOWY DAY OBJECT DETECTION THROUGH PROGRESSIVE RESTORATION AND MULTI-FEATURE FUSION.
6126BFRFormer: Transformer-based generator for Real-World Blind Face Restoration
2607BI-DIRECTIONAL MOTION ATTENTION WITH CONTRASTIVE LEARNING FOR FEW-SHOT ACTION RECOGNITION
1892BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
4511BINARY SIGNAL ALIGNMENT: OPTIMAL SOLUTION IS POLYNOMIAL-TIME AND LINEAR-TIME SOLUTION IS QUASI-OPTIMAL
7856Binaural Angular Separation Network
2969BINAURAL RENDERING OF HETEROGENEOUS SOUND SOURCES WITH EXTENT
3955BINAURAL ROOM TRANSFER FUNCTION INTERPOLATION VIA SYSTEM INVERSION
9962BINAURAL SOUND SOURCE LOCALIZATION USING A HYBRID TIME AND FREQUENCY DOMAIN MODEL
4176BINAURAL SPEECH ENHANCEMENT USING DEEP COMPLEX CONVOLUTIONAL TRANSFORMER NETWORKS
3697BINAURALMUSIC: A DIVERSE DATASET FOR IMPROVING CROSS-MODAL BINAURAL AUDIO GENERATION
4910Biomimetic Mappings for Active Sonar Object Recognition in Clutter
8605BLENDA: DOMAIN ADAPTIVE OBJECT DETECTION THROUGH DIFFUSION-BASED BLENDING
3775BLIND BEAMFORMING FOR INTELLIGENT REFLECTING SURFACE: A REINFORCEMENT LEARNING APPROACH
2831BLIND DECONVOLUTION OF SPARSE GRAPH SIGNALS IN THE PRESENCE OF PERTURBATIONS
5690Blind Estimation of Audio Effects using an Auto-Encoder Approach and Differentiable Digital Signal Processing
1234BLIND INPAINTING WITH OBJECT-AWARE DISCRIMINATION FOR ARTIFICIAL MARKER REMOVAL
7299BLIND SEPARATION OF NOISY MIXTURES OVER GALOIS FIELDS
5721BLOCK ADAPTIVE SUBSPACE PURSUIT METHOD FOR WALL CLUTTER MITIGATION
11941BMMSNet: Bidirectional Mapping and Multilevel Similarity Comparison for EEG-Speech Match-Mismatch Problem
7226BNMTRANS: A BRAIN NETWORK SEQUENCE-DRIVEN MANIFOLD-BASED TRANSFORMER FOR COGNITIVE IMPAIRMENT DETECTION USING EEG
6381BOOSTING ADVERSARIAL ROBUSTNESS DISTILLATION VIA HYBRID DECOMPOSED KNOWLEDGE
7509Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
7702BOOSTING IMAGE QUALITY ASSESSMENT PERFORMANCE: UNSUPERVISED SCORE FUSION BY DEEP MAXIMUM A POSTERIORI ESTIMATION
8754BOOSTING LLMS WITH ONTOLOGY-AWARE PROMPT FOR NER DATA AUGMENTATION
9505Boosting of Implicit Neural Representation-based Image Denoiser
1365BOOSTING PRUNED NETWORKS WITH LINEAR OVER-PARAMETERIZATION
8857BOOSTING SPEECH ENHANCEMENT WITH CLEAN SELF-SUPERVISED FEATURES VIA CONDITIONAL VARIATIONAL AUTOENCODERS
2909BOOSTING UNKNOWN-NUMBER SPEAKER SEPARATION WITH TRANSFORMER DECODER-BASED ATTRACTOR
5216BOOSTING ZERO-SHOT HUMAN-OBJECT INTERACTION DETECTION WITH VISION-LANGUAGE TRANSFER
8827BOOSTING ZERO-SHOT NODE CLASSIFICATION VIA DEPENDENCY CAPTURE AND DISCRIMINATIVE FEATURE LEARNING
6945BOOTSTRAP PREDICTIVE CODING: INVESTIGATING A NON-CONTRASTIVE SELF-SUPERVISED LEARNING APPROACH
4453BOUNDARY-DRIVEN ACTIVE LEARNING FOR ANOMALY DETECTION IN TIME SERIES DATA STREAMS
8910BOUNDING BOX-GUIDED PSEUDO POINT CLOUDS EARLY-FUSION AND DENSITY OPTIMIZE FOR 3D OBJECT DETECTION
2228BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection
3201BRAIN STRUCTURE-FUNCTION INTERACTION NETWORK FOR FLUID COGNITION PREDICTION
1220BrainFC-CGAN: A Conditional Generative Adversarial Network for Brain Functional Connectivity Augmentation and Aging Synthesis
4712BRANCHFORMER-BASED TDNN FOR AUTOMATIC SPEAKER VERIFICATION
6511BRAVEN: IMPROVING SELF-SUPERVISED PRE-TRAINING FOR VISUAL AND AUDITORY SPEECH RECOGNITION
1007Breaking Speaker Recognition with PaddingBack
1265Breaking the Barrier: Selective Uncertainty-based Active Learning for Medical Image Segmentation
4587BREAST ULTRASOUND COMPUTER-AIDED DIAGNOSIS USING STRUCTURE-AWARE TRIPLET PATH NETWORKS
4881Bregman Graph Neural Network
8268BRIDGING THE DOMAIN GAP ARISING FROM TEXT DESCRIPTION DIFFERENCES FOR STABLE TEXT-TO-IMAGE GENERATION
8516BRIDGING THE GAP: A SELF-LEARNING MODEL USING IMPLICIT KNOWLEDGE FOR CHINESE SPELLING CORRECTION
9594Bridging the Gap: Sketch to Color Diffusion Model with Semantic Prompt Learning
8238BRIDGING THE GAPS OF BOTH MODALITY AND LANGUAGE: SYNCHRONOUS BILINGUAL CTC FOR SPEECH TRANSLATION AND SPEECH RECOGNITION
2409BRINGING THE DISCUSSION OF MINIMA SHARPNESS TO THE AUDIO DOMAIN: A FILTER-NORMALISED EVALUATION FOR ACOUSTIC SCENE CLASSIFICATION
8947Broadband Personal Sound Zone Control in the Presence of Nonlinearities
11872BS-PLCNET: BAND-SPLIT PACKET LOSS CONCEALMENT NETWORK WITH MULTI-TASK LEARNING FRAMEWORK AND MULTI-DISCRIMINATORS
7904Buffered Gaussian Modeling for Vectorized HD Map Construction
2179BUILD A 50+ HOURS CHINESE MANDARIN CORPUS FOR CHILDREN’S SPEECH RECOGNITION
7377Building Lane-Level Maps from Aerial Images
11916BUMBLEBEE YOUR WAY TO RECOVERY: TRANSFORMING THE APPROACH TO DETECTION OF MENTAL HEALTH RELAPSES
2982BWSNET: AUTOMATIC PERCEPTUAL ASSESSMENT OF AUDIO SIGNALS
7549BYTEHUM: FAST AND ACCURATE QUERY-BY-HUMMING IN THE WILD
3043CAGEN: CONTROLLABLE ANOMALY GENERATOR USING DIFFUSION MODEL
6682CAG-FPN: CHANNEL SELF-ATTENTION GUIDED FEATURE PYRAMID NETWORK FOR OBJECT DETECTION
1826CALSeg: Improving Calibration of Medical Image Segmentation Via Variational Label Smoothing
1515CAMERA CALIBRATION USING A SINGLE VIEW OF A SYMMETRIC OBJECT
5217CAMERA-RADAR ASSOCIATION FOR DATA ANNOTATION
2399CAN CHATGPT SERVE AS A MULTI-CRITERIA DECISION MAKER? A NOVEL APPROACH TO SUPPLIER EVALUATION
2517CAN LARGE-SCALE VOCODED SPOOFED DATA IMPROVE SPEECH SPOOFING COUNTERMEASURE WITH A SELF-SUPERVISED FRONT END?
7798CAN LLM FIND THE GREEN CIRCLE? INVESTIGATION AND HUMAN-GUIDED TOOL MANIPULATION FOR COMPOSITIONAL GENERALIZATION
3975CAN SYNTHETIC DATA BOOST THE TRAINING OF DEEP ACOUSTIC VEHICLE COUNTING NETWORKS?
2475CAN WE TRUST EXPLAINABLE AI METHODS ON ASR? AN EVALUATION ON PHONEME RECOGNITION
8435Can Whisper perform speech-based in-context learning?
4852CAPTION UNIFICATION FOR MULTI-VIEW LIFELOGGING IMAGES BASED ON IN-CONTEXT LEARNING WITH HETEROGENEOUS SEMANTIC CONTENTS
7660CAPTURING DETAIL VARIATIONS FOR LIGHTWEIGHT NEURAL RADIANCE FIELDS
6880CARDINALITY-CONSTRAINED BINARY QUADRATIC OPTIMIZATION VIA EXTREME POINT PURSUIT, WITH APPLICATION TO THE DENSEST K-SUBGRAPH PROBLEM
7105CARTOONDIFF: TRAINING-FREE CARTOON IMAGE GENERATION WITH DIFFUSION TRANSFORMER MODELS
9726CAUSALITY-INSPIRED SINGLE-SOURCE DOMAIN GENERALIZATION FOR FACE ANTI-SPOOFING
3380CAUSALLY UNCOVERING BIAS IN VIDEO MICRO-EXPRESSION RECOGNITION
3383CAUSALME: BALANCING BI-MODALITIES IN VISUAL QUESTION ANSWERING
4174CAUSAL-STORY: LOCAL CAUSAL ATTENTION UTILIZING PARAMETER-EFFICIENT TUNING FOR VISUAL STORY SYNTHESIS
6080CC-DA: CROSS-DOMAIN CONSISTENCY DATA AUGMENTATION FOR 3D TUMOR SEGMENTATION
4167C-CLAPA: IMPROVING TEXT-AUDIO CROSS DOMAIN RETRIEVAL WITH CAPTIONING AND AUGMENTATIONS
6238CDA-MBPO:CORRECTED DATA AGGREGATION FOR MODEL-BASED POLICY OPTIMIZATION
2693CDCNet: A FAST and LIGHTWEIGHT DEHAZING NETWORK WITH COLOR DISTORTION CORRECTION
3394CDUMA: An Adaptive Approach for Mitigating Confounder for MCQA
1532CED: Consistent ensemble distillation for audio tagging
7050CEDNET: A CONTINUOUS EMOTION DETECTION NETWORK FOR NATURALISTIC STIMULI USING MEG SIGNALS
4685CEMOAE: A DYNAMIC AUTOENCODER WITH MASKED CHANNEL MODELING FOR ROBUST EEG-BASED EMOTION RECOGNITION
10015CENET: CONTENT-AWARE ENHANCED NETWORK FOR PRACTICAL SCENE PARSING
4669CENTER OF PRESSURE ESTIMATION BY ANALYZING WALKING VIDEOS
8921CGN: A SIMPLE YET EFFECTIVE MULTI-CHANNEL GATED NETWORK FOR LONG-TERM TIME SERIES FORECASTING
1573CHANGENET: MULTI-TEMPORAL ASYMMETRIC CHANGE DETECTION DATASET
4339CHANNEL ESTIMATION AND PREDICTION IN WIRELESS COMMUNICATIONS ASSISTED BY SEMI-PASSIVE RIS
5652CHANNEL ESTIMATION IN UNDERDETERMINED SYSTEMS UTILIZING VARIATIONAL AUTOENCODERS
1446CHANNEL-SPATIAL TRANSFORMER FOR EFFICIENT IMAGE SUPER-RESOLUTION
8171Character Attribute Extraction from Movie Scripts using LLMs
2199CHAT: Cascade Hole-Aware Transformers with Geometric Spatial Consistency for Accurate Monocular Endoscopic Depth Estimation
1647CHILD FER: DOMAIN-AGNOSTIC FACIAL EXPRESSION RECOGNITION IN CHILDREN USING A SECONDARY IMAGE DIFFUSION MODEL
5999CHUNKED ATTENTION-BASED ENCODER-DECODER MODEL FOR STREAMING SPEECH RECOGNITION
3388CIF-RNNT: Streaming ASR via Acoustic Word Embeddings with Continuous Integrate-and-Fire and RNN-Transducers
3303CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
5214CKT-RCM: CLIP-BASED KNOWLEDGE TRANSFER AND RELATIONAL CONTEXT MINING FOR UNBIASED PANOPTIC SCENE GRAPH GENERATION
8317CLAF: CONTRASTIVE LEARNING WITH AUGMENTED FEATURES FOR IMBALANCED SEMI-SUPERVISED LEARNING
7363CLAP4EMO: CHATGPT-ASSISTED SPEECH EMOTION RETRIEVAL WITH NATURAL LANGUAGE SUPERVISION
9224CLASS: CONTINUAL LEARNING APPROACH FOR SPEECH SUPER-RESOLUTION
7413CLASSIFICATION-ORIENTED SEMANTIC WIRELESS COMMUNICATIONS
6170CLASS-INCREMENTAL LEARNING FOR MULTI-LABEL AUDIO CLASSIFICATION
7260CLASS-WISE BUFFER MANAGEMENT FOR INCREMENTAL OBJECT DETECTION: AN EFFECTIVE BUFFER TRAINING STRATEGY
10279CLIENT-FREE FEDERATED UNLEARNING VIA TRAINING RECONSTRUCTION WITH ANCHOR SUBSPACE CALIBRATION
7315CLINICAL SCORES PREDICTION AND MEDICATION ADJUSTMENT FOR COURSE OF PARKINSON'S DISEASE
2620CLIP-BASED SYNERGISTIC KNOWLEDGE TRANSFER FOR TEXT-BASED PERSON RETRIEVAL
5672CLIP-FONT: SEMENTIC SELF-SUPERVISED FEW-SHOT FONT GENERATION WITH CLIP
5963CLIP-MSA: INCORPORATING INTER-MODAL DYNAMICS AND COMMON KNOWLEDGE TO MULTIMODAL SENTIMENT ANALYSIS WITH CLIP
1386CLIPRerank: An Extremely Simple Method for Improving Ad-hoc Video Search
11558Closed-Loop Training for Projected GAN
6566CLOSE-RANGE DIRECTION OF ARRIVAL ESTIMATION IN THE PRESENCE OF CLOCK JITTER
6201CLPSD: DETECTING ETHEREUM PHISHING SCAMS BASED ON CURRICULUM LEARNING
1443CLT: COOPERATIVE LOTTERY TICKET HYPOTHESIS IN LIVE STREAMING SALES PREDICTION
10446CLUSTER-GUIDED UNSUPERVISED DOMAIN ADAPTATION FOR DEEP SPEAKER EMBEDDING
9858CM-PIE: CROSS-MODAL PERCEPTION FOR INTERACTIVE-ENHANCED AUDIO-VISUAL VIDEO PARSING
3816CNFA: Conditional Normalizing Flow for Query-Limited Attack
9306CODING FOR THE UNSOURCED B-CHANNEL WITH ERASURES: ENHANCING THE LINKED LOOP CODE
7009COGNITIVE VIRTUAL SENSING TECHNIQUE FOR FEEDFORWARD ACTIVE NOISE CONTROL
5509COLLABORATIVE WATERMARKING FOR ADVERSARIAL SPEECH SYNTHESIS
4330COLLD: CONTRASTIVE LAYER-TO-LAYER DISTILLATION FOR COMPRESSING MULTILINGUAL PRE-TRAINED SPEECH ENCODERS
5219COLOR AGNOSTIC CROSS-SPECTRAL DISPARITY ESTIMATION
1599ColorFlow: A Conditional Normalizing Flow for Image Colorization
6836Combining Conformer and Dual-Path-Transformer Networks for Single Channel Noisy Reverberant Speech Separation
6903COMMIN: SEMANTIC IMAGE COMMUNICATIONS AS AN INVERSE PROBLEM WITH INN-GUIDED DIFFUSION MODELS
11484COMMON-SLOPE MODELING OF LATE REVERBERATION
9520Communication Efficient Private Federated Learning Using Dithering
8205COMMUNICATION-EFFICIENT DECENTRALIZED DYNAMIC KERNEL LEARNING
3451COMMUNICATION-EFFICIENT FEDERATED LEARNING THROUGH ADAPTIVE WEIGHT CLUSTERING AND SERVER-SIDE DISTILLATION
7479Communication-Efficient Federated Optimization over Semi-Decentralized Networks
3060COMMUNICATION-EFFICIENT LAPLACE MECHANISM FOR DIFFERENTIAL PRIVACY VIA RANDOM QUANTIZATION
1416Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
7564COMMUNICATION-ORIENTED AUTOMATIC ASSESSMENT SYSTEM FOR ACCENTED SPOKEN CHINESE IN READ-ALOUD TASKS
10226COMPACT AND DE-BIASED NEGATIVE INSTANCE EMBEDDING FOR MULTI-INSTANCE LEARNING ON WHOLE-SLIDE IMAGE CLASSIFICATION
3044COMPARABLE DEMONSTRATIONS ARE IMPORTANT IN IN-CONTEXT LEARNING: A NOVEL PERSPECTIVE ON DEMONSTRATION SELECTION
8923COMPARATIVE STUDY OF TOKENIZATION ALGORITHMS FOR END-TO-END OPEN VOCABULARY KEYWORD DETECTION
8418COMPARING AND COMBINING AUDIO PROCESSING AND DEEP LEARNING FEATURES FOR CLASSIFICATION OF HEARTBEAT SOUNDS
7423Comparing data-driven and handcrafted features for dimensional emotion recognition
7402COMPARISON OF CONDITIONS FOR OMNIDIRECTIONAL VIDEO WITH SPATIAL AUDIO IN TERMS OF SUBJECTIVE QUALITY AND IMPACTS ON OBJECTIVE METRICS RESOLVING POWER
4335COMPARISON OF FREQUENCY-FUSION MECHANISMS FOR BINAURAL DIRECTION-OF-ARRIVAL ESTIMATION FOR MULTIPLE SPEAKERS
1392Complementary Fusion Network based on Frequency Hybrid Attention for Pansharpening
6905Complex Bounded Component Analysis: Identifiability and Algorithm
3795COMPLEXITY REDUCTION OF TEMPLATE MATCHING-BASED REFERENCE PICTURE PADDING IN VIDEO CODING
8497Complexity Scaling for Speech Denoising
3521COMPOSITE FEDERATED LEARNING WITH HETEROGENEOUS DATA
11544COMPRESSION OF HIGHER-ORDER AMBISONIC SIGNALS USING DIRECTIONAL AUDIO CODING
9531COMPUTATIONAL COMPLEXITY OF ASYNCHRONOUS POLICY ITERATION FOR TWO-PLAYER ZERO-SUM MARKOV GAMES
3382COMPUTING AN ENTIRE SOLUTION PATH OF A NONCONVEXLY REGULARIZED CONVEX SPARSE MODEL
9035CONCEALING MEDICAL CONDITION BY NODE TOGGLING IN ASR FOR DEMENTIA PATIENTS
6848CONCENTRATED REASONING AND UNIFIED RECONSTRUCTION FOR MULTI-MODAL MEDIA MANIPULATION
4017CONCSS: CONTRASTIVE-BASED CONTEXT COMPREHENSION FOR DIALOGUE-APPROPRIATE PROSODY IN CONVERSATIONAL SPEECH SYNTHESIS
3933CONFIDENCE-AWARE SPATIAL-TEMPORAL ATTENTION GRAPH CONVOLUTIONAL NETWORK FOR SKELETON-BASED EXPERT-NOVICE LEVEL CLASSIFICATION
7850CONFORMALIZED MULTIMODAL UNCERTAINTY REGRESSION AND REASONING
1995Conformer is all you need for visual speech recognition
3624CONGESTION-AWARE DISTRIBUTED TASK OFFLOADING IN WIRELESS MULTI-HOP NETWORKS USING GRAPH NEURAL NETWORKS
3545Conjugate Gradient Based Adaptive Algorithm for Nonlinear AEC
9578CONNECTING SPEECH ENCODER AND LARGE LANGUAGE MODEL FOR ASR
6467CONSIDERING TEMPORAL CONNECTION BETWEEN TURNS FOR CONVERSATIONAL SPEECH SYNTHESIS
6859CONSISTENT AND RELEVANT: RETHINK THE QUERY EMBEDDING IN GENERAL SOUND SEPARATION
7366ConsPrompt: Exploiting Contrastive Samples for Few-shot Prompt Learning
6009CONTACTLESS RADAR HEART RATE VARIABILITY MONITORING VIA DEEP SPATIO-TEMPORAL MODELING
7078CONTENT-BASED OBJECTIVE EVALUATION OF ARTIFICIALLY GENERATED SIGN LANGUAGE VIDEOS
7148CONTEXT-AWARE AND CONTRASTIVENESS-DRIVEN FEATURE LEARNING FOR CROSS-DOMAIN FEW-SHOT HYPERSPECTRAL IMAGE CLASSIFICATION
10169CONTEXT-AWARE DUAL ATTENTION NETWORK FOR MULTIMODAL SARCASM DETECTION
7268CONTEXT-AWARE PREFERENCE LEARNING SYSTEM BASED ON DIRICHLET PROCESS GAUSSIAN MIXTURE MODEL
9727CONTEXT-AWARE TRANSFORMER FOR SINGLE IMAGE RAIN STREAKS REMOVAL
8862CONTEXT-GUIDED AND SYNTACTIC AUGMENTED DUAL GRAPH CONVOLUTIONAL NETWORK FOR ASPECT-BASED SENTIMENT ANALYSIS
9617CONTEXTUAL BIASING METHODS FOR IMPROVING RARE WORD DETECTION IN AUTOMATIC SPEECH RECOGNITION
2008Contextual Biasing of Named-Entities with Large Language Models
8630CONTEXTUAL HUMAN OBJECT INTERACTION UNDERSTANDING FROM PRE-TRAINED LARGE LANGUAGE MODEL
4546CONTEXTUALIZED AUTOMATIC SPEECH RECOGNITION WITH ATTENTION-BASED BIAS PHRASE BOOSTED BEAM SEARCH
9098CONTINUAL LEARNING WITH CLASS-LEVEL MINIMALLY INTERFERED UPDATE
3161Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels via Self-Not-True Distillation
7027CONTRASTIVE DEEP NONNEGATIVE MATRIX FACTORIZATION FOR COMMUNITY DETECTION
1403CONTRASTIVE LEARNING FOR REGRESSION ON HYPERSPECTRAL DATA
7146CONTRASTIVE LEARNING WITH AUDIO DISCRIMINATION FOR CUSTOMIZABLE KEYWORD SPOTTING IN CONTINUOUS SPEECH
1295CONTRASTIVE LEARNING WITH BIDIRECTIONAL TRANSFORMERS FOR KNOWLEDGE TRACING
6941CONTRASTIVE LEARNING WITH HIGH-QUALITY AND LOW-QUALITY AUGMENTED DATA FOR QUERY-FOCUSED SUMMARIZATION
7145CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION
4544CONTRASTIVE SPEAKER EMBEDDING WITH SEQUENTIAL DISENTANGLEMENT
9258CONTRMIX: PROGRESSIVE MIXED CONTRASTIVE LEARNING FOR SEMI-SUPERVISED MEDICAL IMAGE SEGMENTATION
8858CONTROLCAP: CONTROLLABLE CAPTIONING VIA NO-FUSS LEXICON
7653CONTROLLABLE PROSODY GENERATION WITH PARTIAL INPUTS
3149CONTROLLABLE SEMANTIC LINGUISTIC STEGANOGRAPHY VIA SUMMARIZATION GENERATION
4411CONTROLLABLE SPEAKING STYLES USING A LARGE LANGUAGE MODEL
11938CONVCONCATNET: A DEEP CONVOLUTIONAL NEURAL NETWORK TO RECONSTRUCT MEL SPECTROGRAM FROM THE EEG
7594CONVERGENT PLUG-AND-PLAY USING CONTRACTIVE DENOISERS
3710CONVERSATION CLIQUE-BASED MODEL FOR EMOTION RECOGNITION IN CONVERSATION
8375CONVERSATIONAL CO-SPEECH GESTURE GENERATION VIA MODELING DIALOG INTENTION, EMOTION AND CONTEXT WITH DIFFUSION MODELS
8956CONVNEXT-TTS AND CONVNEXT-VC: CONVNEXT-BASED FAST END-TO-END SEQUENCE-TO-SEQUENCE TEXT-TO-SPEECH AND VOICE CONVERSION
11456CONVOLUTIONAL FILTERS AND NEURAL NETWORKS WITH NONCOMMUTATIVE ALGEBRAS
1999Co-occurrence Graph-Enhanced Hierarchical Prediction of ICD Codes
6616COOKING-CLIP: CONTEXT-AWARE LANGUAGE-IMAGE PRETRAINING FOR ZERO-SHOT RECIPE GENERATION
1910Cooperative Sensing via Matrix Factorization of the Partially Received Sample Covariance Matrix
9630COORDINATE-BASED NEURAL NETWORK FOR FOURIER PHASE RETRIEVAL
2400COPHTC: CONTRASTIVE LEARNING WITH PROMPT TUNING FOR HIERARCHICAL TEXT CLASSIFICATION
6982COQ:AN EMPIRICAL FRAMEWORK FOR MULTI-HOP QUESTION ANSWERING EMPOWERED BY LARGE LANGUAGE MODELS
8053CORAAL QA: A Dataset and Framework for Open Domain Spontaneous Speech Question Answering from Long Audio Files
1857CORE BODY TEMPERATURE AND ITS ROLE IN DETECTING ACUTE STRESS: A FEASIBILITY STUDY
2156CORN: CO-TRAINED FULL- AND NO-REFERENCE SPEECH QUALITY ASSESSMENT
4516CORNER DETECTION BASED ON A ROTATION-INVARIANT AND NOISE-INSENSITIVE CURVATURE MEASUREMENT
8621Corpus Synthesis for Zero-shot ASR Domain Adaptation using Large Language Models
1518CORRECTING FAULTY ROAD MAPS BY IMAGE INPAINTING
4412CORRECTION FOCUSED LANGUAGE MODEL TRAINING FOR SPEECH RECOGNITION
2005CORRELATION-BASED MACHINE LEARNING TECHNIQUES FOR CHANNEL ESTIMATION WITH FLUID ANTENNAS
8905CO-SALIENT OBJECT DETECTION VIA DISCRIMINATIVE PROTOTYPES CONTRAST
1329CoSLR: Contrastive Chinese Sign Language Recognition with Prior Knowledge and Multi-tasks Joint Learning
10005COST AWARE UNTARGETED POISONING ATTACK AGAINST GRAPH NEURAL NETWORKS
7977Counting Network for Learning from Majority Label
7890COUPLED BLOCK-TERM TENSOR DECOMPOSITION FOR NEAR-FIELD LOCALIZATION IN MULTI-STATIC MIMO RADAR SYSTEMS
9708Coupling Self-Supervised and Supervised Contrastive Learning for Multiple Classification of Cervical Cytological Whole Slide Images
11492Covariance Matrix Recovery From One-Bit Data With Non-Zero Quantization Thresholds: Algorithm and Performance Analysis
9099COVERAGE ANALYSIS FOR MMWAVE UAV NETWORKS WITH STATIC AND DYNAMIC BLOCKAGES
11553COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features
4729CPAUG: REFINING COPY-PASTE AUGMENTATION FOR SPEECH ANTI-SPOOFING
10199CPMSVD: Cross-Project Multiclass Software Vulnerability Detection via Fused Deep Feature and Domain Adaptation
9376CRAMER-RAO BOUND FOR ADMITTANCE MATRIX ESTIMATION UNDER LAPLACIAN CONSTRAINTS
1124CRC-AIDED LEARNED ENSEMBLES OF BELIEF-PROPAGATION POLAR DECODERS
6849CREATING PERSONALIZED SYNTHETIC VOICES FROM ARTICULATION IMPAIRED SPEECH USING AUGMENTED RECONSTRUCTION LOSS
1582Credible Teacher for Semi-Supervised Object Detection in Open Scene
8753CRESTYLER: TEXT-GUIDED SINGLE IMAGE STYLE TRANSFER METHOD BASED ON CNN AND RESTORMER
3000CroCFuN: Cross-modal Conditional Fusion Network for Pansharpening
1879CROSS BRANCH FEATURE FUSION DECODER FOR CONSISTENCY REGULARIZATION-BASED SEMI-SUPERVISED CHANGE DETECTION
8411Cross Modal Training For ASR Error Correction With Contrastive Learning
9143Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
3621CROSS-AGE CONTRASTIVE LEARNING FOR AGE-INVARIANT FACE RECOGNITION
3869CROSS-ATTENTION WATERMARKING OF LARGE LANGUAGE MODELS
11858CROSS-ATTENTION-GUIDED WAVENET FOR MEL SPECTROGRAM RECONSTRUCTION IN THE ICASSP 2024 AUDITORY EEG CHALLENGE
1121Cross-Camera Human Motion Transfer by Time Series Analysis
1284CROSS-DOMAIN CROSS-TASK TRANSFER MOBILE TOUCH-STROKE AUTHENTICATION
7143CROSS-IMAGE DISTILLATION FOR SEMI-SUPERVISED SEMANTIC SEGMENTATION
7208CROSS-LINGUAL LEARNING IN MULTILINGUAL SCENE TEXT RECOGNITION
11873Cross-lingual Text-to-Speech via Hierarchical Style Transfer
3459CROSS-MODAL ALIGNMENT FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING BASED ON MOMENTUM CONTRASTIVE LEARNING
9690CROSS-MODAL MULTISCALE DIFFERENCE-AWARE NETWORK FOR JOINT MOMENT RETRIEVAL AND HIGHLIGHT DETECTION
7698CROSS-MODAL MULTI-TASKING FOR SPEECH-TO-TEXT TRANSLATION VIA HARD PARAMETER SHARING
2881CROSS-MODAL PARALLEL TRAINING FOR IMPROVING END-TO-END ACCENTED SPEECH RECOGNITION
3815Cross-Modal Synthesis of Structural MRI and Functional Connectivity Networks via Conditional ViT-GANs
9369CROSS-MODALITY AND WITHIN-MODALITY REGULARIZATION FOR AUDIO-VISUAL DEEPFAKE DETECTION
7789Cross-speaker encoding network for multi-talker speech recognition
3731CROSS-SUBJECT EEG EMOTION RECOGNITION BASED ON INTERCONNECTED DYNAMIC DOMAIN ADAPTATION
3703CROSS-TARGET STANCE DETECTION BY EXPLOITING TARGET ANALYTICAL PERSPECTIVES
5935CROSS-TRIGGERING ISSUE IN AUDIO EVENT DETECTION AND MITIGATION
8460CROSSWORD: A SEMANTIC APPROACH TO TEXT COMPRESSION VIA MASKING
4207CROWD MODELING AND CONTROL VIA COOPERATIVE ADAPTIVE FILTERING
8502Crowdsourced and Automatic Speech Prominence Estimation
9588Crowdsourced multilingual speech intelligibility testing
7740CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
7845CRYPTO-MINE: Cryptanalysis via Mutual Information Neural Estimation
6196CSCNET: CLASS-SPECIFIED CASCADED NETWORK FOR COMPOSITIONAL ZERO-SHOT LEARNING
4392CSI-Free Over-the-Air Decentralized Learning over Frequency Selective Channels
10370CSNET: CONTRASTIVE SIAMESE NETWORK FOR ROBUST SLU
6838CST-FORMER: TRANSFORMER WITH CHANNEL-SPECTRO-TEMPORAL ATTENTION FOR SOUND EVENT LOCALIZATION AND DETECTION
3276CT AND MRI FUSION WITH ANISOTROPIC GUIDED FILTERING
3165CUBIC KNOWLEDGE DISTILLATION FOR SPEECH EMOTION RECOGNITION
8457CUFFLESS BLOOD PRESSURE ESTIMATION USING MAGNETIC FLUX IN A RING FORM FACTOR
3164Curricular Contrastive Regularization for Speech Enhancement with Self-supervised Representations
5958Customising General Large Language Models for Specialised Emotion Recognition Tasks
3196Customized Treatment Per Pixel for Blind Image Super-Resolution
4148CutDEM: Depth-Aware Enhanced Multi-View Image Mixing for Light Field Super-Resolution
8898CUTransNet: Transformers to Make Strong Encoders for Multi-Task Vision Perception of Autonomous Driving
9809CYCLIC MISSPECIFIED CRAMER-RAO BOUND FOR PERIODIC PARAMETER ESTIMATION
1781D3: DUAL-DOMAIN DEFENSES FOR BYZANTINE-RESILIENT DECENTRALIZED RESOURCE ALLOCATION
9454DACR: DISTRIBUTION-AUGMENTED CONTRASTIVE RECONSTRUCTION FOR TIME-SERIES ANOMALY DETECTION
2898DAMP: DISTRIBUTION-AWARE MAGNITUDE PRUNING FOR BUDGET-SENSITIVE GRAPH CONVOLUTIONAL NETWORKS
1213DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
2815DarkShot: Lighting Dark Images with Low-Compute and High-Quality
8730DATA AUGMENTATION VIA SUBGROUP MIXUP FOR IMPROVING FAIRNESS
4931DATA DRIVEN GRAPHEME-TO-PHONEME REPRESENTATIONS FOR A LEXICON-FREE TEXT-TO-SPEECH
1947DATA-AIDED CHANNEL ESTIMATION UTILIZING GAUSSIAN MIXTURE MODELS
8101DATA-DRIVEN CONVEX REGULARIZERS FOR INVERSE PROBLEMS
4830DATA-DRIVEN LATTICES FOR VECTOR QUANTIZATION
2132DATA-FREE WATERMARK FOR DEEP NEURAL NETWORKS BY TRUNCATED ADVERSARIAL DISTILLATION
6934DATA-SCARCE CONDITION MODELING REQUIRES MODEL-BASED PRIOR REGULARIZATION
2639Dataset Distillation with Channel Efficient Process
3469DBS: Differentiable Budget-aware Searching for channel pruning
4726DCL-NET: DUAL CONTRASTIVE LEARNING NETWORK FOR SEMI-SUPERVISED MULTI-ORGAN SEGMENTATION
2962DCS: DEBIASED CONTRASTIVE LEARNING WITH WEAK SUPERVISION FOR TIME SERIES CLASSIFICATION
6030DCTTS: DISCRETE DIFFUSION MODEL WITH CONTRASTIVE LEARNING FOR TEXT-TO-SPEECH GENERATION
4823DDD: A PERCEPTUALLY SUPERIOR LOW-RESPONSE-TIME DNN-BASED DECLIPPER
7562DDI-COCO: A DATASET FOR UNDERSTANDING THE EFFECT OF COLOR CONTRAST IN MACHINE-ASSISTED SKIN DISEASE DETECTION
2356DDN-Net: Deep Residual Shrinkage Denoising Networks with Channel-wise Adaptively Soft Thresholds for Automated Major Depressive Disorder Identification
8068DE NOVO MOLECULE GENERATION WITH GRAPH LATENT DIFFUSION MODEL
4883DEBIASING RECOMMENDERS THROUGH PERSONALIZED POPULARITY-AWARE MARGINS
6644Debris sensing based on LEO constellation: an intersatellite channel parameter estimation approach
2044DECENTRALIZED GENERALIZED APPROXIMATE MESSAGE-PASSING FOR TREE-STRUCTURED NETWORKS
3025DECENTRALIZED LOW RANK MATRIX RECOVERY FROM COLUMN-WISE PROJECTIONS BY ALTERNATING GD AND MINIMIZATION
3325DECENTRALIZING COHERENT JOINT TRANSMISSION PRECODING VIA DETERMINISTIC EQUIVALENTS
2422DECOUPLED SELF-ADAPTIVE DISTRIBUTION REGULARIZATION FOR FEW-SHOT IMAGE CLASSIFICATION
8326DECOUPLED SPATIAL AND TEMPORAL PROCESSING FOR RESOURCE EFFICIENT MULTICHANNEL SPEECH ENHANCEMENT
9380Decoupling and Refilling: A Simple Data Augmentation Method for Aspect Term Extraction
3668Deep convolution network based super resolution DOA estimation with Toeplitz and sparse prior
2776DEEP FUSION OF SHIFTED MLP AND CNN FOR MEDICAL IMAGE SEGMENTATION
9301DEEP INCM RECONSTRUCTION FOR ADAPTIVE BEAMFORMING
4341DEEP LEARNING AMR MODEL INFERENCE ACCELERATION WITH CFU FOR EDGE SYSTEMS
8430Deep learning based single-shot profilometry by three-channel binary-defocused projection
7371DEEP LEARNING INVERSION OF OCEAN WAVE SPECTRUM FROM SAR SATELLITE OBSERVATIONS
4036DEEP MANIFOLD TRANSFORMATION FOR PROTEIN REPRESENTATION LEARNING
10200DEEP NEIGHBOR LAYER AGGREGATION FOR LIGHTWEIGHT SELF-SUPERVISED MONOCULAR DEPTH ESTIMATION
2127DEEP NEURAL NETWORK MODELS TRAINED WITH A FIXED RANDOM CLASSIFIER TRANSFER BETTER ACROSS DOMAINS
6650Deep Optimization of relay networks - Using Relays as Neurons
11457DEEP ORDINAL REGRESSION FRAMEWORK FOR NO-REFERENCE IMAGE QUALITY ASSESSMENT
4337Deep Plug-and-Play Algorithm for Unsaturated Imaging
9252Deep regression for biological age estimation in multiple organs: Investigations on 40,000 subjects of the UK Biobank
4924Deep Reinforcement Learning for Energy Minimization in Multi-RIS-Aided Cell-Free MEC Networks
3801DEEP RESIDUAL W-UNIT LEARNING WITH SEMANTIC EMBEDDING FOR AUTOMATIC PULMONARY CT ARTERY-VEIN SEPARATION
7333DEEP UNFOLDED ANNEALED STEIN PARTICLE FILTER FOR VEHICLE TRACKING
8633DEEP UNROLLING NETWORK FOR SAR IMAGE DESPECKLING
9607DEEP VARIATIONAL PRIVACY FUNNEL: GENERAL MODELING WITH APPLICATIONS IN FACE RECOGNITION
1717Deep Versatile Hyperspectral Reconstruction Model from a Snapshot Measurement with Arbitrary Masks
11479DEEPCOMBOSAD: SPECTRO-TEMPORAL CORRELATION BASED SPEECH ACTIVITY DETECTION FOR NATURALISTIC AUDIO STREAMS
7870DEEPGRE: GLOBAL ROBUSTNESS EVALUATION OF DEEP NEURAL NETWORKS
2940DEEPOREDNET: CONTRASTIVE LEARNING-BASED ATTENTION-WEIGHTED DUAL CHANNEL RESIDUAL NETWORK FOR OCULAR REDNESS ASSESSMENT
2610DEFENDING AGAINST CLEAN-IMAGE BACKDOOR ATTACK IN MULTI-LABEL CLASSIFICATION
5855DEFOCUSSR: An EFFICIENT FRAMEWORK FOR DEFOCUS IMAGE SUPER-RESOLUTION GUIDED BY DEPTH INFORMATION
3489DEFORMATION AND PENETRATION HYBRID DETECTION-NET FOR PARCELS INSPECTION IN INDUSTRIAL SUPPLY CHAIN
1759DEFORMMLP: DYNAMIC LARGE-SCALE RECEPTIVE FIELD MLP NETWORKS FOR HUMAN MOTION PREDICTION
8687DEGAN: DISCRIMINATION ENHANCED GAN FOR PERCEPTUAL-ORIENTED SUPER-RESOLUTION
4648DELAY EMBEDDING FOR MATRIX GRAPHICAL MODEL LEARNING FROM DEPENDENT DATA
11546Delayless Generative Fixed-filter Active Noise Control based on Deep Learning and Bayesian Filter
9387DELINEATION OF PROSTATE CANCER VIA ENHANCED AI-BASED ALGORITHM IN ULTRASOUND IMAGES
2252DELVING DEEPER INTO VULNERABLE SAMPLES IN ADVERSARIAL TRAINING
8973DEMENTIA ASSESSMENT USING MANDARIN SPEECH WITH AN ATTENTION-BASED SPEECH RECOGNITION ENCODER
11919DEMUCS for Data-Driven RF Signal Denoising
9026Denoising Diffusion Probabilistic Models for Action-Conditioned 3D Motion Generation
4691Depth-guided dominant plane perception for unsupervised homography estimation
8779DESIGN OF SPATIAL-SLOW-TIME CONSTANT-MODULUS WAVEFORM TRANSMISSION AND RECEIVE ADAPTIVE FILTER FOR DUAL-FUNCTION RADAR COMMUNICATIONS WITH RECONFIGURABLE INTELLIGENT SURFACE
7808DETECTING CHECK-WORTHY CLAIMS IN POLITICAL DEBATES, SPEECHES, AND INTERVIEWS USING AUDIO DATA
7687DETECTING CONTINUOUS GRAVITATIONAL WAVES USING GENERATED TRAINING DATA
11892DETECTING GAMMA-BAND RESPONSES TO THE SPEECH ENVELOPE FOR THE ICASSP 2024 AUDITORY EEG DECODING SIGNAL PROCESSING GRAND CHALLENGE
9050DETECTION AND ATTRIBUTION OF MODELS TRAINED ON GENERATED DATA
6079DETECTION IN COMPLEX SCENES USING RGB AND DEPTH MULTIMODAL FEATURE FUSION
9002DETECTION OF EPILEPTIC SEIZURES IN LONG EEG RECORDINGS USING AN ANOMALY DETECTOR WITH ARTIFACT REJECTION
2879DETECTOR DESIGN FOR DISTRIBUTED MULTICHANNEL RADAR SENSORS IN COLORED INTERFERENCE ENVIRONMENTS
5842DETERMINED BSS BY COMBINATION OF IVA AND DNN VIA PROXIMAL AVERAGE
4567DETS: End-to-End Single-Stage Text-to-Speech via Hierarchical Diffusion Gan Models
3376DF-VTON: Dense Flow Guided Virtual Try-On Network
6426DGLP: INCORPORATING ORIENTATION INFORMATION FOR ENHANCED LINK PREDICTION IN DIRECTED GRAPHS
4940DG-RAINDIFF: DEPTH-GUIDED DYNAMIC MESSAGE PASSING DIFFUSION MODEL FOR MIXTURE OF RAIN REMOVAL
5223DIACORRECT: ERROR CORRECTION BACK-END FOR SPEAKER DIARIZATION
3819DIAGNOSIS OF AUTISM SPECTRUM DISORDER BASED ON CONTRASTIVE FUNCTIONAL CONNECTIVITY GRAPH LEARNING NETWOR
1610DIAGONALIZE INTEGRAL GRAPH BY DCT
8896DIALCLIP: EMPOWERING CLIP AS MULTI-MODAL DIALOG RETRIEVER
7936DIALOG MODELING IN AUDIOBOOK SYNTHESIS
4419DIARIST: STREAMING SPEECH TRANSLATION WITH SPEAKER DIARIZATION
8291DIB-X: FORMULATING EXPLAINABILITY PRINCIPLES FOR A SELF-EXPLAINABLE MODEL THROUGH INFORMATION THEORETIC LEARNING
4078DICETRACK: LIGHTWEIGHT DICE CLASSIFICATION ON RESOURCE-CONSTRAINED PLATFORMS WITH OPTIMIZED DEEP LEARNING MODELS
5738DIFFDUB: PERSON-GENERIC VISUAL DUBBING USING INPAINTING RENDERER WITH DIFFUSION AUTO-ENCODER
9504DIFFERENTIABLE QUANTUM ARCHITECTURE SEARCH FOR JOB SHOP SCHEDULING PROBLEM
3427Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval
11561Differentiable Uncalibrated Imaging
4591DIFFERENTIAL BEAMFORMING WITH NULL CONSTRAINTS FOR SPHERICAL MICROPHONE ARRAYS
8916DIFFERENTIALLY PRIVATE FEDERATED FRANK-WOLFE
4622DIFFEVENT: EVENT RESIDUAL DIFFUSION FOR IMAGE DEBLURRING
4946DIFF-HOD: DIFFUSION MODEL FOR OBJECT DETECTION IN HAZY WEATHER CONDITIONS
8353DiffRadar:High-quality mmWave Radar Perception with Diffusion Probabilistic Model
8995DIFFRENT: A DIFFUSION MODEL FOR RECORDING ENVIRONMENT TRANSFER OF SPEECH
4374DIFFSC: SEMANTIC COMMUNICATION FRAMEWORK WITH ENHANCED DENOISING THROUGH DIFFUSION PROBABILISTIC MODELS
8715DIFFSTOCK: PROBABILISTIC RELATIONAL STOCK MARKET PREDICTIONS USING DIFFUSION MODELS
2673DIFF-SV: A UNIFIED HIERARCHICAL FRAMEWORK FOR NOISE-ROBUST SPEAKER VERIFICATION USING SCORE-BASED DIFFUSION PROBABILISTIC MODELS
5717DIFFUSION MODELS FOR AUDIO SEMANTIC COMMUNICATION
2729DIFFUSION OPTIMISTIC LEARNING FOR MIN-MAX OPTIMIZATION
1622DIFFUSION-BASED ADVERSARIAL PURIFICATION FOR ROBUST DEEP MRI RECONSTRUCTION
1572DIFFUSION-BASED POSE REFINEMENT AND MULTI-HYPOTHESIS GENERATION FOR 3D HUMAN POSE ESTIMATION
3014DIFFUSION-BASED SPEECH ENHANCEMENT IN MATCHED AND MISMATCHED CONDITIONS USING A HEUN-BASED SAMPLER
9097Diffusion-based Speech Enhancement with a Weighted Generative-Supervised Learning Loss
3244DIFFUSION-BASED SPEECH ENHANCEMENT WITH JOINT GENERATIVE AND PREDICTIVE DECODERS
1581DiffusionInst: Diffusion Model for Instance Segmentation
8427DIGITAL PATHOLOGY IMAGE DEBLURRING VIA LOCAL FOCUS QUALITY ASSESSMENT
8164DIGITAL TASK-ORIENTED COMMUNICATION WITH HARDWARE-LIMITED TASK-BASED QUANTIZATION
3399DI-MVS: LEARNING EFFICIENT MULTI-VIEW STEREO WITH DEPTH-AWARE ITERATIONS
2843DIRECT POSITION DETERMINATION BY COVARIANCE-FITTING ON THE RIEMANNIAN MANIFOLD OF HERMITIAN POSITIVE DEFINITE MATRICES
7841DIRECTED SCATTERING FOR KNOWLEDGE GRAPH-BASED CELLULAR SIGNALING ANALYSIS
3302DIRECTIONAL GAIN BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MVDR BEAMFORMING
8627DISCOVERING MALICIOUS SIGNATURES IN SOFTWARE FROM STRUCTURAL INTERACTIONS
8077DISCRETE AUDIO REPRESENTATION AS AN ALTERNATIVE TO MEL-SPECTROGRAMS FOR SPEAKER AND SPEECH RECOGNITION
4867DISCRIMINANT PIXEL-DIFFERENCE VECTOR HASHING OF SPATIAL-TEMPORAL LOCAL BINARY PATTERNS FOR DYNAMIC TEXTURE RECOGNITION
10445DISCRIMINATIVE FREQUENCY INFORMATION LEARNING FOR END-TO-END SPEECH ANTI-SPOOFING
7530DISCRIMINATIVE SEMI-SUPERVISED FEATURE SELECTION VIA A CLASS-CREDIBLE PSEUDO-LABEL LEARNING FRAMEWORK
7483DISCRIMINATIVE TRAINING OF VBX DIARIZATION
4985DISENTANGLE ESTIMATION OF CAUSAL EFFECTS FROM CROSS-SILO DATA
5893DISENTANGLED GRAPH REPRESENTATION WITH CONTRASTIVE LEARNING FOR RUMOR DETECTION
7159DISENTANGLEMENT NETWORK: DISENTANGLE THE EMOTIONAL FEATURES FROM ACOUSTIC FEATURES FOR SPEECH EMOTION RECOGNITION
9762DISENTANGLING THE SPECTRAL PROPERTIES OF THE HODGE LAPLACIAN: NOT ALL SMALL EIGENVALUES ARE EQUAL
3862DISTILL VISION TRANSFORMERS TO CNNS VIA TEACHER COLLABORATION
1046DISTILLING DISTRIBUTIONAL UNCERTAINTY FROM A GAUSSIAN PROCESS
7193DISTILLING HUBERT WITH LSTMS VIA DECOUPLED KNOWLEDGE DISTILLATION
1589Distributed Decision-Making for Community Structured Networks
11469Distributed Self-Localization for Acoustic Transceiver Networks
11471DISTRIBUTED SENSOR SELECTION FOR SPEECH ENHANCEMENT WITH ACOUSTIC SENSOR NETWORKS
8272DISTRIBUTED STOCHASTIC CONTEXTUAL BANDITS FOR PROTEIN DRUG INTERACTION
6594DISTRIBUTED VECTOR APPROXIMATE MESSAGE PASSING
6922Distribution-aware Contrastive Learning for Robust Medical Image Segmentation
4445DITW: a high-performance Deep-Independent Template-based Watermarking
8873Diversifying Cross-Domain Few-shot Learning via Multimodal Image Editing
8744Diversity based core-set selection for text-to-speech with linguistic and acoustic features
9751Diversity-aware Buffer for Coping with Temporally Correlated Data Streams in Online Test-time Adaptation
1696DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation
1140DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS
4086DMKD: IMPROVING FEATURE-BASED KNOWLEDGE DISTILLATION FOR OBJECT DETECTION VIA DUAL MASKING AUGMENTATION
1677DMT: Comprehensive Distillation with Multiple Self-supervised Teachers
9226DO LEARNED SPEECH SYMBOLS FOLLOW ZIPF’S LAW?
8951DO SELF-SUPERVISED SPEECH AND LANGUAGE MODELS EXTRACT SIMILAR REPRESENTATIONS AS HUMAN BRAIN?
2469DOA ESTIMATION FOR SWITCH-ELEMENT ARRAYS BASED ON SPARSE REPRESENTATION
8944Does Audio Deepfake Detection Rely on Artifacts?
9584DOES VIDEO SUMMARIZATION REQUIRE VIDEOS? QUANTIFYING THE EFFECTIVENESS OF LANGUAGE IN VIDEO SUMMARIZATION
9534DOMAIN ADAPTIVE GRAPH CLASSIFICATION
7947DOMAIN GENERALIZATION WITH FOURIER TRANSFORM AND SOFT THRESHOLDING
2180Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration
7639DOMAIN-ADAPTIVE SEMANTIC SEGMENTATION EMERGES FROM VISION-LANGUAGE SUPERVISED DOMAIN-DEBIASED SELF-TRAINING
3013DOMAINDIFF: BOOST OUT-OF-DISTRIBUTION GENERALIZATION WITH SYNTHETIC DATA
9181DOMAIN-SLOT AWARE CONTRASTIVE LEARNING FOR IMPROVED DIALOGUE STATE TRACKING
3361DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION
9885DONE: DYNAMIC NEURAL REPRESENTATION VIA HYPERPLANE NEURAL ODE
10249DOUBLE REVERSE REGULARIZATION NETWORK BASED ON SELF-KNOWLEDGE DISTILLATION FOR SAR OBJECT CLASSIFICATION
10106DP-MAE: A DUAL-PATH MASKED AUTOENCODER BASED SELF-SUPERVISED LEARNING METHOD FOR ANOMALOUS SOUND DETECTION
7834DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
8434DRIVER SCANPATH PREDICTION BASED ON INVERSE REINFORCEMENT LEARNING
3403Drop Sparse Convolution for 3D Object Detection
8927DROPFL: CLIENT DROPOUT ATTACKS AGAINST FEDERATED LEARNING UNDER COMMUNICATION CONSTRAINTS
1405DROPOUT MULTI-HEAD ATTENTION FOR SINGLE IMAGE SUPER-RESOLUTION
6781DRSM: EFFICIENT NEURAL 4D DECOMPOSITION FOR DYNAMIC RECONSTRUCTION IN STATIONARY MONOCULAR CAMERAS
6411DSIS: a novel (k,n) threshold deniable secret image sharing scheme with lossless recovery
7877DT-NERF: DECOMPOSED TRIPLANE-HASH NEURAL RADIANCE FIELDS FOR HIGH-FIDELITY TALKING PORTRAIT SYNTHESIS
4733DUAL CONTRASTIVE LEARNING GUIDED PATHOLOGICAL IMAGE RE-STAINING
1715Dual Directional Complementary Gradient Fusion and Deep Refinement for Hyperspectral Image Super Resolution
8556DUAL LEVEL INTENT-SLOT INTERACTION FOR IMPROVED MULTI-INTENT SPOKEN LANGUAGE UNDERSTANDING
4212DUAL PARAMETER-EFFICIENT FINE-TUNING FOR SPEAKER REPRESENTATION VIA SPEAKER PROMPT TUNING AND ADAPTERS
7228Dual Rank-1 Tensor Attention Module for Convolutional Neural Networks
7572DUAL-CHANNEL UNLIMITED SAMPLING FOR BANDPASS SIGNALS
4810DUAL-COLOR GRANULARITY ALIGNMENT FOR TEXT-BASED PERSON SEARCH
11867DUAL-DOMAIN NEURAL NETWORKS FOR CLINICAL AND LOW-DOSE CBCT RECONSTRUCTION
6917DualGCN-MIL: Whole Slide Image Classification Based on Double Relationship Graph Learning
6027DUAL-MIX FOR CROSS-MODAL RETRIEVAL WITH NOISY LABELS
8112DUAL-PATH MINIMUM-PHASE AND ALL-PASS DECOMPOSITION NETWORK FOR SINGLE CHANNEL SPEECH DEREVERBERATION
10264DUAL-STREAM CONTRASTIVE PREDICTIVE NETWORK WITH JOINT HANDCRAFTED FEATURE VIEW FOR SAR SHIP CLASSIFICATION
4981DUALVC 2: DYNAMIC MASKED CONVOLUTION FOR UNIFIED STREAMING AND NON-STREAMING VOICE CONVERSION
1354DUNET: A ROBUST END-TO-END DEEP NEURAL NETWORK FRAMEWORK FOR IMBALANCED CLASSIFICATION
5628DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
1866DURRNet: Deep Unfolded Single Image Reflection Removal Network with Joint Prior
9886DUST: DUAL-GRAINED SYNTAX-AWARE TRANSFORMER NETWORK FOR CHINESE NAMED ENTITY RECOGNITION
8312Dynamic ASR pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR Model
5556DYNAMIC BANDWIDTH VARIATIONAL MODE DECOMPOSITION
5597Dynamic Clustering and Cluster Contrastive Learning for Unsupervised Person Re-ID with Feature Distribution Alignment
5805DYNAMIC DATA SAMPLER FOR CROSS-LANGUAGE TRANSFER LEARNING IN LARGE LANGUAGE MODELS
1897DYNAMIC FREQUENCY DOMAIN GRAPH CONVOLUTIONAL NETWORK FOR TRAFFIC FORECASTING
1528DYNAMIC LABEL SMOOTHING STRATEGY FOR BIOSIGNAL CLASSIFICATION
2030DYNAMIC MODEL STRUCTURE ADJUSTMENT TO REALIZE QUANTUM CONTINUAL LEARNING BASED ON QUANTUM DATA
5536DYNAMIC MULTI-SCALE CONTEXT AGGREGATION FOR CONVERSATIONAL ASPECT-BASED SENTIMENT QUADRUPLE ANALYSIS
2792Dynamic Mutual-Activated Transformer for Human Motion Prediction
3575DYNAMIC PRIVACY ALLOCATION FOR LOCALLY DIFFERENTIALLY PRIVATE FEDERATED LEARNING WITH COMPOSITE OBJECTIVES
7837Dynamic random feature Gaussian Processes for Bayesian optimization of time-varying functions
3843DYNAMIC REPLAY TRAINING FOR CLASS-INCREMENTAL LEARNING
7883Dynamic Speech Emotion Recognition using a Conditional Neural Process
1259DYNAMIC VIDEO FRAME INTERPOLATION WITH INTEGRATED DIFFICULTY PRE-ASSESSMENT
8147DYNAMIC-SUPERB: TOWARDS A DYNAMIC, COLLABORATIVE, AND COMPREHENSIVE INSTRUCTION-TUNING BENCHMARK FOR SPEECH
2348EARLY DIAGNOSING PARKINSON'S DISEASE VIA A DEEP LEARNING MODEL BASED ON AUGMENTED FACIAL EXPRESSION DATA
8902ECHOCARDIOGRAPHY VIDEO SYNTHESIS FROM END DIASTOLIC SEMANTIC MAP VIA DIFFUSION MODEL
4936ECIL-MU: EMBEDDING BASED CLASS INCREMENTAL LEARNING AND MACHINE UNLEARNING
3653ECM-OPCC: EFFICIENT CONTEXT MODEL FOR OCTREE-BASED POINT CLOUD COMPRESSION
3069EC-NAS: ENERGY CONSUMPTION AWARE TABULAR BENCHMARKS FOR NEURAL ARCHITECTURE SEARCH
8877ECPNET: AN ENHANCED CURVE PERCEPTION NETWORK FOR LANE DETECTION
1864Edge Attention Learning for Efficient Camouflaged Object Detection
4849EDGE DEPLOYABLE DISTRIBUTED EVOLUTIONARY OPTIMIZATION BASED CALIBRATION METHOD FOR NEURAL QUANTIZATION
9959EDM: Synthetic data from exemplar diffusion model improves non-communicable diseases detection
8157ED-TTS: MULTI-SCALE EMOTION MODELING USING CROSS-DOMAIN EMOTION DIARIZATION FOR EMOTIONAL SPEECH SYNTHESIS
5644EEG EMOTION RECOGNITION BASED ON DYNAMICAL GRAPH ATTENTION NETWORK
3659EEG-BASED FAST AUDITORY ATTENTION DETECTION IN REAL-LIFE SCENARIOS USING TIME-FREQUENCY ATTENTION MECHANISM
8887EFFECT OF BEAMPATTERN ON MATRIX COMPLETION WITH SPARSE ARRAYS
7386EFFECT OF TARGET SIGNALS AND DELAYS ON SPATIALLY SELECTIVE ACTIVE NOISE CONTROL FOR OPEN-FITTING HEARABLES
3327Effective Connectivity-based Multi-View Feature Learning Method for Dementia Diagnosis with fNIRS Signal
9338EFFECTIVE IMAGE TAMPERING LOCALIZATION VIA ENHANCED TRANSFORMER AND CO-ATTENTION FUSION
9722EFFECTIVE INTERNAL LANGUAGE MODEL TRAINING AND FUSION FOR FACTORIZED TRANSDUCER MODEL
10153EFFICIENT 3D POSITION ESTIMATION IN BADMINTON SCENE
4394Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
8127EFFICIENT ADAPTER TUNING OF PRE-TRAINED SPEECH MODELS FOR AUTOMATIC SPEAKER VERIFICATION
3941Efficient Architecture Search for Real-time Instance Segmentation
9927EFFICIENT BLACK-BOX SPEAKER VERIFICATION MODEL ADAPTATION WITH REPROGRAMMING AND BACKEND LEARNING
11547EFFICIENT CODED MULTI-PARTY COMPUTATION AT EDGE NETWORKS
9734EFFICIENT CONTENT RECONSTRUCTION FOR HIGH DYNAMIC RANGE IMAGING
4888EFFICIENT FEDERATED LEARNING WITH SMOOTH AGGREGATION FOR NON-IID DATA FROM MULTIPLE EDGES
5937EFFICIENT FUNCTIONAL LINK ADAPTIVE FILTERS BASED ON NEAREST KRONECKER PRODUCT DECOMPOSITION
1344EFFICIENT FUSION OF DEPTH INFORMATION FOR DEFOCUS DEBLURRING
6814EFFICIENT HIERARCHICAL STRIPE ATTENTION FOR LIGHTWEIGHT IMAGE SUPER-RESOLUTION
9176EFFICIENT HIGH-PERFORMANCE BARK-SCALE NEURAL NETWORK FOR RESIDUAL ECHO AND NOISE SUPPRESSION
6098EFFICIENT JOINT RECTIFICATION OF PHOTOMETRIC AND GEOMETRIC DISTORTIONS IN DOCUMENT IMAGES
8436Efficient Learned Image Compression with Selective Kernel Residual Module and Channel-wise Causal Context Model
4304EFFICIENT LEARNING ON SUCCESSIVE TEST TIME AUGMENTATION
5366Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding
8398Efficient Personal Voice Activity Detection With Wake Word Reference Speech
9893EFFICIENT POINT CLOUD ATTRIBUTE COMPRESSION FRAMEWORK USING ATTRIBUTE-GUIDED GRAPH FOURIER TRANSFORM
10159EFFICIENT POINT CLOUD ATTRIBUTE COMPRESSION USING RICH PARALLELIZABLE CONTEXT MODEL
4236Efficient Polyp Segmentation Via Integrity Learning
1448Efficient PoseNet with Coarse to Fine Transformer
6898EFFICIENT QUANTUM RECURRENT REINFORCEMENT LEARNING VIA QUANTUM RESERVOIR COMPUTING
3344EFFICIENT SCENE TEXT IMAGE SUPER-RESOLUTION WITH SEMANTIC GUIDANCE
9000EFFICIENT VIDEO AND AUDIO PROCESSING WITH LOIHI 2
6373EiffHDR : AN EFFICIENT NETWORK FOR MULTI-EXPOSURE HIGH DYNAMIC RANGE IMAGING
6106EIGENDECOMPOSITION-BASED SPATIAL-TEMPORAL ATTENTION FOR BRAIN COGNITIVE STATES IDENTIFICATION
5359EK-NET:REAL-TIME SCENE TEXT DETECTION WITH EXPAND KERNEL DISTANCE
3974ELECTROENCEPHALOGRAM HELPS FEW-SHOT LEARNING
8423ELECTROENCEPHALOGRAM SENSOR DATA COMPRESSION USING AN ASYMMETRICAL SPARSE AUTOENCODER WITH A DISCRETE COSINE TRANSFORM LAYER
4670ELECTROLARYNGEAL SPEECH INTELLIGIBILITY ENHANCEMENT THROUGH ROBUST LINGUISTIC ENCODERS
4190Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision
4038ELEVATING VISUAL PROMPTING IN TRANSFER LEARNING VIA PRUNED MODEL ENSEMBLES: NO RETRAIN, NO PAIN
2581ELLIPSE DETECTION BASED ON CONTRAST-GUIDED ARC ENHANCEMENT
7399ELLIPSE DETECTION BASED ON STRUCTURE-PRESERVING ANISOTROPIC EDGE EXTRACTION
7198EMALG: AN ENHANCED MANDARIN LOMBARD GRID CORPUS WITH MEANINGFUL SENTENCES
1241Embedded Feature Similarity Optimization with Specific Parameter Initialization for 2D/3D Medical Image Registration
8603EMBEDDED GRAPH REPRESENTATION FOR INTER-FRAME CODING OF DYNAMIC MESHES
7109EMOCONV-DIFF: DIFFUSION-BASED SPEECH EMOTION CONVERSION FOR NON-PARALLEL AND IN-THE-WILD DATA
4477EMOHRNET: HIGH-RESOLUTION NEURAL NETWORK BASED SPEECH EMOTION RECOGNITION
4195EMORED: A DATASET FOR RELATION EXTRACTION IN TEXTS WITH EMOTICONS
8212EMOTALKER: EMOTIONALLY EDITABLE TALKING FACE GENERATION VIA DIFFUSION MODEL
1880EMOTION NEURAL TRANSDUCER FOR FINE-GRAINED SPEECH EMOTION RECOGNITION
5697EMOTION-ALIGNED CONTRASTIVE LEARNING BETWEEN IMAGES AND MUSIC
7429EMOTION-AWARE CONTRASTIVE ADAPTATION NETWORK FOR SOURCE-FREE CROSS-CORPUS SPEECH EMOTION RECOGNITION
7301EMOTVR: A HYBRID MODEL TO ESTIMATE CONTINUOUS-TIME AND CONTINUOUS-LEVEL EMOTION FROM ELECTROENCEPHALOGRAPHY
2624Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
4151EMPLOYING REAL TRAINING DATA FOR DEEP NOISE SUPPRESSION
1706EMPOWERING VISION-LANGUAGE MODELS FOR REASONING ABILITY THROUGH LARGE LANGUAGE MODELS
7944ENABLING DEVICE CONTROL PLANNING CAPABILITIES OF SMALL LANGUAGE MODEL
8288ENABLING ORIENTATION-FREE MMWAVE-BASED VITAL SIGN SENSING WITH MULTI-DOMAIN SIGNAL ANALYSIS
8643ENABLING SECURE WIRELESS COMMUNICATIONS VIA MOVABLE ANTENNAS
7066ENCLAP: COMBINING NEURAL AUDIO CODEC AND AUDIO-TEXT JOINT EMBEDDING FOR AUTOMATED AUDIO CAPTIONING
3713Encoder-minimal and Decoder-minimal Framework for Remote Sensing Image Dehazing
4378Encoding Seasonal Climate Predictions with Modular Neural Network
4338ENCODING TIME AND ENERGY MODEL FOR SVT-AV1 BASED ON VIDEO COMPLEXITY
7447END-TO-END LEARNING OF GAUSSIAN MIXTURE PROPOSALS USING DIFFERENTIABLE PARTICLE FILTERS AND NEURAL NETWORKS
11541End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
7924END-TO-END PERSONALIZED CUFF-LESS BLOOD PRESSURE MONITORING USING ECG AND PPG SIGNALS
7223End-to-end real time tracking of children's reading with pointer network
4203end-to-end spatially-constrained multi-perspective fine-grained image captioning
8872END-TO-END SPEECH RECOGNITION CONTEXTUALIZATION WITH LARGE LANGUAGE MODELS
5872END-TO-END SPEECH TRANSLATION WITH MUTUAL KNOWLEDGE DISTILLATION
10323ENERGY EFFICIENT WAKE-UP SOLUTION FOR LARGE-SCALE INTERNET OF UNDERWATER THINGS NETWORKS
7008ENERGY-AWARE RESOLUTION SELECTION FOR PER-TITLE ENCODING
9680ENERGY-BASED MODELS FOR SPEECH SYNTHESIS
4030ENERGY-EFFICIENT DECENTRALIZED LEARNING VIA GRAPH SPARSIFICATION
5253ENERGY-SAVING CELL-FREE MASSIVE MIMO PRECODERS WITH A PER-AP WIDEBAND KRONECKER CHANNEL MODEL
8176ENGINEERING THE NEURAL COLLAPSE GEOMETRY OF SUPERVISED-CONTRASTIVE LOSS
6541ENHANCED AXLE-BASED VEHICLE CLASSIFICATION USING ANGLE-BASED MICRO-DOPPLER SIGNATURE
4757Enhanced Channel Estimation in mm-Wave MIMO Systems Leveraging Integrated Communication and Sensing
5998ENHANCED COLOR PALETTE MODELING FOR LOSSLESS SCREEN CONTENT COMPRESSION
5000ENHANCED DEEP REINFORCEMENT LEARNING FOR PARCEL SINGULATION IN NON-STATIONARY ENVIRONMENTS
8218ENHANCED KPI ANOMALY DETECTION: AN UNSUPERVISED HYBRID MODEL WITH DYNAMIC THRESHOLD
3135Enhanced low-rank and sparse Tucker decomposition for image completion
2486ENHANCED SCREEN SHOOTING RESILIENT DOCUMENT WATERMARKING
3878ENHANCED TRANSFER LEARNING WITH EFFICIENT MODELING AND ADAPTIVE FUSION OF KNOWLEDGE VIA PROMPT TUNING
1818Enhanced Unsupervised Domain Adaptation with Dual-attention between Classification and Domain Alignment
4554Enhancing Adversarial Robustness of DNNs via Weight Decorrelation in Training
4537ENHANCING ADVERSARIAL TRAINING WITH PRIOR KNOWLEDGE DISTILLATION FOR ROBUST IMAGE COMPRESSION
2715ENHANCING ADVERSARIAL TRANSFERABILITY IN OBJECT DETECTION WITH BIDIRECTIONAL FEATURE DISTORTION
9877ENHANCING AOA ESTIMATION VIA PHASE MODELING OF BLUETOOTH 5 CTE SIGNALS
10039ENHANCING ARGUMENTATIVE RELATION CLASSIFICATION BY MULTI-GRANULARITY RETRIEVAL AND HETEROGENEOUS GRAPH REASONING
5806ENHANCING AUDIO GENERATION DIVERSITY WITH VISUAL INFORMATION
3288ENHANCING AUDIO-VISUAL QUESTION ANSWERING WITH MISSING MODALITY VIA TRANS-MODAL ASSOCIATIVE LEARNING
4506Enhancing Code-switching Speech Recognition with Interactive Language Biases
4736ENHANCING CONVERSATION SMOOTHNESS IN LANGUAGE LEARNING CHATBOTS: AN EVALUATION OF GPT4 FOR ASR ERROR CORRECTION
6886ENHANCING CROSS-DOMAIN DETECTION: ADAPTIVE CLASS-AWARE CONTRASTIVE TRANSFORMER
9647ENHANCING DOCUMENT-LEVEL EVENT EXTRACTION VIA STRUCTURE-AWARE HETEROGENEOUS GRAPH WITH MULTI-GRANULARITY SUBSENTENCES
7752ENHANCING END-TO-END CONVERSATIONAL SPEECH TRANSLATION THROUGH TARGET LANGUAGE CONTEXT UTILIZATION
4540Enhancing Event Sequence Modeling with Contrastive Relational Inference
6729ENHANCING EXPRESSIVENESS IN DANCE GENERATION VIA INTEGRATING FREQUENCY AND MUSIC STYLE INFORMATION
8551ENHANCING GAN PERFORMANCE THROUGH NEURAL ARCHITECTURE SEARCH AND TENSOR DECOMPOSITION
3644ENHANCING GENDER PRIVACY WITH PHOTO-REALISTIC FUSION OF DISENTANGLED SPATIAL SEGMENTS
8914ENHANCING GENERALIZATION IN MEDICAL VISUAL QUESTION ANSWERING TASKS VIA GRADIENT-GUIDED MODEL PERTURBATION
2078ENHANCING GENERALIZATION OF INVISIBLE FACIAL PRIVACY CLOAK VIA GRADIENT ACCUMULATION
3294ENHANCING GENERATIVE ASPECT-BASED SENTIMENT ANALYSIS WITH RELATION-LEVEL SUPERVISION AND PROMPT
9585ENHANCING HEALTHCARE WITH EOG: A NOVEL APPROACH TO SLEEP STAGE CLASSIFICATION
9929ENHANCING HYPERSPECTRAL ANOMALY DETECTION BY DIFFERENCE-OF-CONVEX SPARSE ANOMALY MODELING
7903ENHANCING IMAGE-TEXT MATCHING WITH ADAPTIVE FEATURE AGGREGATION
6144ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING
4628ENHANCING MULTILINGUAL SPEECH RECOGNITION THROUGH LANGUAGE PROMPT TUNING AND FRAME-LEVEL LANGUAGE ADAPTER
8247ENHANCING MULTILINGUAL TTS WITH VOICE CONVERSION BASED DATA AUGMENTATION AND POSTERIOR EMBEDDING
1996Enhancing Multi-task Models for Recommendation with Tensor Trace Norm
4817ENHANCING NOISY LABEL LEARNING VIA UNSUPERVISED CONTRASTIVE LOSS WITH LABEL CORRECTION BASED ON PRIOR KNOWLEDGE
2038ENHANCING NOTE-LEVEL SINGING TRANSCRIPTION MODEL WITH UNLABELED AND WEAKLY LABELED DATA
1139ENHANCING PERFORMANCE OF COARSENED GRAPHS WITH GRADIENT-MATCHING
8610ENHANCING PRE-TRAINED ASR SYSTEM FINE-TUNING FOR DYSARTHRIC SPEECH RECOGNITION USING ADVERSARIAL DATA AUGMENTATION
8915Enhancing Quantised End-to-End ASR Models via Personalisation
9036ENHANCING REALISM IN 3D FACIAL ANIMATION USING CONFORMER-BASED GENERATION AND AUTOMATED POST-PROCESSING
4970ENHANCING REINFORCEMENT LEARNING VIA CAUSALLY CORRECT INPUT IDENTIFICATION AND TARGETED INTERVENTION
4005ENHANCING SEMANTIC COMMUNICATION WITH DEEP GENERATIVE MODELS: AN OVERVIEW
2412ENHANCING SHORT- AND LONG-TERM SEA SURFACE TEMPERATURE FORECASTING WITH A STATIC AND DYNAMIC LEARNABLE PERSONALIZED GRAPH CONVOLUTION NETWORK
8670ENHANCING SPATIAL AUDIO GENERATION WITH SOURCE SEPARATION AND CHANNEL PANNING LOSS
4416ENHANCING SPEAKER DIARIZATION WITH LARGE LANGUAGE MODELS: A CONTEXTUAL BEAM SEARCH APPROACH
10118ENHANCING STEGANOGRAPHY OF GENERATIVE IMAGE BASED ON IMAGE RETOUCHING
2122ENHANCING TARGETED TRANSFERABILITY VIA FEATURE SPACE FINE-TUNING
2528ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
5910Enhancing Two-stage Finetuning for Speech Emotion Recognition Using Adapters
4848ENHANCING VIOLIN FINGERING GENERATION THROUGH AUDIO-SYMBOLIC FUSION
5124ENRICHING MUSIC DESCRIPTIONS WITH A FINETUNED-LLM AND METADATA FOR TEXT-TO-MUSIC RETRIEVAL
2298Entwined Inversion: Tune-Free Inversion for Real Image Faithful Reconstruction and Editing
2601Environmental sound synthesis from vocal imitations and sound event labels
8608EOFD-NET: EDGE OPTIMIZATION AND FEATURE DENOISING FOR WEAKLY SUPERVISED DEEP NUCLEI SEGMENTATION WITH PIONT ANNOTATIONS
6074EPA: NEURAL COLLAPSE INSPIRED ROBUST OUT-OF-DISTRIBUTION DETECTOR
3264ESA: EXPERT-AND-SAMPLES-AWARE INCREMENTAL LEARNING UNDER LONGTAIL DISTRIBUTION
5097ESIHGNN: EVENT-STATE INTERACTIONS INFUSED HETEROGENEOUS GRAPH NEURAL NETWORK FOR CONVERSATIONAL EMOTION RECOGNITION
5922ESTGN: ENHANCED SELF-MINED TEXT GUIDED SUPER-RESOLUTION NETWORK FOR SUPERIOR IMAGE SUPER RESOLUTION
8363ESTIMATING DIRECTED SPECTRAL INFORMATION FLOW BETWEEN MULTI-RESOLUTION TIME SERIES
1824ESTIMATING EXERCISE-INDUCED FATIGUE FROM THERMAL FACIAL IMAGES
3541ESTIMATING SYMPTOMS AND CLINICAL SIGNS INSTEAD OF DISORDERS: THE PATH TOWARD THE CLINICAL USE OF VOICE AND SPEECH BIOMARKERS IN PSYCHIATRY
6224ESTIMATION OF IMPULSE RESPONSES FOR A MOVING SOURCE USING OPTIMAL TRANSPORT REGULARIZATION
8962ESTIMATION OF SPECTRAL LINES USING EXPECTATION PROPAGATION
8178ESVC: COMBINING ADAPTIVE STYLE FUSION AND MULTI-LEVEL FEATURE DISENTANGLEMENT FOR EXPRESSIVE SINGING VOICE CONVERSION
7616ETP: Learning Transferable ECG Representations via ECG-Text Pre-training
10028Evaluation of an Improved ultrasonic imaging Helmet for observing Articulatory data
9457EVIDENCE-AWARE MULTIMODAL CHINESE SOCIAL MEDIA RUMOR DETECTION
4370EVOLUTION BACKCASTING OF EDGE FLOWS FROM PARTIAL OBSERVATIONS USING SIMPLICIAL VECTOR AUTOREGRESSIVE MODELS
7992Exact classification of NMR spectra from NMR signals
2991Exploiting A Quantum Multiple Kernel Learning Approach for Low-Resource Spoken Command Recognition
8693EXPLOITING AUDIO-VISUAL FEATURES WITH PRETRAINED AV-HUBERT FOR MULTI-MODAL DYSARTHRIC SPEECH RECONSTRUCTION
9972EXPLOITING MODALITY-SPECIFIC FEATURES FOR MULTI-MODAL MANIPULATION DETECTION AND GROUNDING
2459EXPLOITING SPATIAL-TEMPORAL DATA FOR SLEEP STAGE CLASSIFICATION VIA HYPERGRAPH LEARNING
4432EXPLORATION OF VISUAL PROMPT IN GROUNDED PRE-TRAINED OPEN-SET DETECTION
9977EXPLORING ADAPTERS WITH CONFORMERS FOR CHILDREN'S AUTOMATIC SPEECH RECOGNITION
3620EXPLORING CONSISTENT SPATIO-TEMPORAL DISTORTION AND STABLE 3-D DCT COEFFICIENTS FOR ROBUST BLIND VIDEO WATERMARKING
6901EXPLORING LABEL HIERARCHY IN DIALOGUE INTENT CLASSIFICATION
8714Exploring large scale pre-trained models for robust machine anomalous sound detection
1594Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework
7899EXPLORING META INFORMATION FOR AUDIO-BASED ZERO-SHOT BIRD CLASSIFICATION
8239EXPLORING MULTI-MODAL CONTROL IN MUSIC-DRIVEN DANCE GENERATION
7362Exploring Object-centered External Knowledge for Fine-grained Video Paragraph Captioning
9665EXPLORING PHONETIC CONTEXT-AWARE LIP-SYNC FOR TALKING FACE GENERATION
8528EXPLORING SELF-EXPLAINABLE STREET-LEVEL IP GEOLOCATION WITH GRAPH INFORMATION BOTTLENECK
8198EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION
8057Exploring Soft Prompt Initialization Strategy for Few-shot Continual Text Classification
6220EXPLORING SPATIO-TEMPORAL DISCRIMINATIVE CUES FOR GROUP ACTIVITY RECOGNITION VIA CONTRASTIVE LEARNING
6823EXPLORING SPEECH RECOGNITION, TRANSLATION, AND UNDERSTANDING WITH DISCRETE SPEECH UNITS: A COMPARATIVE STUDY
4124EXPLORING TARGETED UNIVERSAL ADVERSARIAL ATTACK FOR DEEP HASHING
7330EXPLORING THE UTILITY OF CLIP PRIORS FOR VISUAL RELATIONSHIP PREDICTION
2794Exponentially Consistent Nonparametric Clustering of Data Streams with Composite Distributions
8804EXPRESSION DOMAIN TRANSLATION NETWORK FOR CROSS-DOMAIN HEAD REENACTMENT
9678EXPRESSIVE ACOUSTIC GUITAR SOUND SYNTHESIS WITH AN INSTRUMENT-SPECIFIC INPUT REPRESENTATION AND DIFFUSION OUTPAINTING
10439Extended Depth-of-Field Lensless Imaging Using an Optimized Radial Mask
5889EXTENDING IMPLICIT NEURAL REPRESENTATIONS FOR TEXT-TO-IMAGE GENERATION
5531EXTENDING LARGE LANGUAGE MODELS FOR SPEECH AND AUDIO CAPTIONING
3495EXTENDING MULTILINGUAL ASR TO NEW LANGUAGES USING SUPPLEMENTARY ENCODER AND DECODER COMPONENTS
6956EXTENDING MULTILINGUAL SPEECH SYNTHESIS TO 100+ LANGUAGES WITHOUT TRANSCRIBED DATA
9141EXTENDING WHISPER WITH PROMPT TUNING TO TARGET-SPEAKER ASR
7391EXTENSION OF CLIFFORD DATA REGRESSION METHODS FOR QUANTUM ERROR MITIGATION
3640External Division of Two Proximity Operators: An Application to Signal Recovery with Structured Sparsity
7401EXTREME ENCODER OUTPUT FRAME RATE REDUCTION: IMPROVING COMPUTATIONAL LATENCIES OF LARGE END-TO-END MODELS
7584EXTREMELY LIGHT-WEIGHT LEARNING BASED LDR TO PQ HDR CONVERSION USING BERNSTEIN CURVES
6983Extrinsic versus APP information feedback in turbo VEP MU-MIMO receivers: optimization via deep unfolding.
7123EYE MOTION MATTERS FOR 3D FACE RECONSTRUCTION
1097F1-EV SCORE: MEASURING THE LIKELIHOOD OF ESTIMATING A GOOD DECISION THRESHOLD FOR SEMI-SUPERVISED ANOMALY DETECTION
5129F2GNN: AN ADAPTIVE FILTER WITH FEATURE SEGMENTATION FOR GRAPH-BASED FRAUD DETECTION
10073FACE RECOGNITION USING LENSLESS CAMERA
9906FACE RECONSTRUCTION FROM PARTIALLY LEAKED FACIAL EMBEDDINGS
6986Facial Aesthetic Enhancement Network for Asian Faces Based on Differential Facial Aesthetic Activations
4553FACIAL MICRO-MOTION-AWARE MIXUP FOR MICRO-EXPRESSION RECOGNITION
9358FACILITATING MESSAGE PASSING WITH POTENTIAL LINKS FOR KNOWLEDGE GRAPH COMPLETION
8334FACT-AWARE SUMMARIZATION WITH CONTRASTIVE LEARNING FOR FEW-SHOT DIALOGUE STATE TRACKING
5186FAIRNESS-AWARE JOB SCHEDULING FOR MULTI-JOB FEDERATED LEARNING
3138FALL PREDICTION BY A SPATIO-TEMPORAL MULTI-CHANNEL CAUSAL MODEL FROM WEARABLE SENSORS DATA
2032FAMIM: A Novel Frequency-Domain Augmentation Masked Image Model Framework for Domain Generalizable Face Anti-Spoofing
8717FAST ALGORITHM DESIGN FOR THE CONSTANT-ENVELOPE PRECODING IN MASSIVE MIMO COMMUNICATIONS WITH INTERFERENCE EXPLOITATION
1244FAST ALIGNMENT ALGORITHM FOR CRYO-EM PARTICLE IMAGES BASED ON HARMONIC ANALYSIS
10139FAST AND ACCURATE ROOT CAUSE ANALYSIS BASED ON SIGNALLING MESSAGES FOR 5G NETWORKS
6285FAST AND EFFICIENT SEQUENTIAL RADAR PARAMETER ESTIMATION IN MIMO-OTFS SYSTEMS
3076FAST AND PHYSICALLY ENRICHED DEEP NETWORK FOR JOINT LOW-LIGHT ENHANCEMENT AND IMAGE DEBLURRING
7623FAST APPROXIMATION OF THE GENERALIZED SLICED-WASSERSTEIN DISTANCE
9455Fast Cross-modality Knowledge Transfer via a Contextual Autoencoder Transformation
10424Fast Dynamics of Brain-wide Patterns on Neuronal Oscillations
8365FAST GRAPH-BASED DENOISING FOR POINT CLOUD COLOR INFORMATION
10084Fast Intra mode prediction algorithms for SCBs in VVC SCC
4681FAST PERSONALIZED TEXT TO IMAGE SYNTHESIS WITH ATTENTION INJECTION
9063FAST TEST ERROR RATES FOR GRADIENT-BASED ALGORITHMS ON SEPARABLE DATA
2780FASTGAT: SIMPLE AND EFFICIENT GRAPH ATTENTION NEURAL NETWORK WITH GLOBAL-AWARE ADAPTIVE COMPUTATIONAL NODE ATTENTION
7422FASTINJECT: INJECTING UNPAIRED TEXT DATA INTO CTC-BASED ASR TRAINING
2958FASTMANDARIN: EFFICIENT LOCAL MODELING FOR NATURAL MANDARIN SPEECH SYNTHESIS
3070FAVANO: FEDERATED AVERAGING WITH ASYNCHRONOUS NODES
9084FCC-MF: DETECTING VIOLENCE IN AUDIO-VISUAL CONTEXT WITH FRAME-WISE CLUSTER CONTRAST AND MODALITY-STAGE FLOODING
6991FDA-MIMO Radar Using Ambiguity Function for Target Two-Dimensional Localization
5638FDC-NERF: LEARNING POSE-FREE NEURAL RADIANCE FIELDS WITH FLOW-DEPTH CONSISTENCY
2329FDIG: A Fine-grained Data Integration approach for Group Recommendation
3223FDNET: A NOVEL MULTIVARIATE TIME SERIES CLASSIFICATION MODEL THROUGH FUSING FEATURE AND DIFFERENCE
10261FEARLESS STEPS APOLLO: TEAM COMMUNICATIONS BASED DEVELOPMENT FOR SCIENCE, TECHNOLOGY, EDUCATION, AND HISTORICAL PRESERVATION
3378Feature Mixing-based Active Learning for Multi-label Text Classification
2370FEATURE-CONSTRAINED AND ATTENTION-CONDITIONED DISTILLATION LEARNING FOR VISUAL ANOMALY DETECTION
2163Feature-Distribution Perturbation and Calibration for Generalized ReID
4357FEDAQT: ACCURATE QUANTIZED TRAINING WITH FEDERATED LEARNING
2827FEDERATED CINN CLUSTERING FOR ACCURATE CLUSTERED FEDERATED LEARNING
2915Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation
8131Federated Learning of Tensor Generalized Linear Models with Low Separation Rank
1592Federated Learning on Distributed Graphs considering Multiple Heterogeneities
8062FEDERATED LEARNING UNDER RESTRICTED USER AVAILABILITY
9580Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence
3096Federated Learning with Instance-Dependent Noisy Label
3894FEDERATED PAC-BAYESIAN LEARNING ON NON-IID DATA
8565FEDERATED QUANTUM MACHINE LEARNING WITH DIFFERENTIAL PRIVACY
8066FedKA: Federated Knowledge Augmentation for Multi-Center Medical Image Segmentation on Non-IID Data
7814FedLion: Faster Adaptive Federated Optimization with Fewer Communication
3073FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology
9494FED-SDS: ADAPTIVE STRUCTURED DYNAMIC SPARSITY FOR FEDERATED LEARNING UNDER HETEROGENEOUS CLIENTS
2635FedSODA: Federated Cross-assessment and Dynamic Aggregation for Histopathology Segmentation
9939FEWER-TOKEN NEURAL SPEECH CODEC WITH TIME-INVARIANT CODES
9182FEW-SHOT ANOMALOUS SOUND DETECTION BASED ON ANOMALY MAP ESTIMATION USING PSEUDO ABNORMAL DATA
1794FFT-BASED SELECTION AND OPTIMIZATION OF STATISTICS FOR ROBUST RECOGNITION OF SEVERELY CORRUPTED IMAGES
7455FIBA: FEDERATED INVISIBLE BACKDOOR ATTACK
10134Filamentary Convolution for Spoken Language Identification: A Brain-Inspired Approach
6465FILTER-ENHANCED HYPERGRAPH TRANSFORMER FOR MULTI-BEHAVIOR SEQUENTIAL RECOMMENDATION
3274FINCGAN: A GAN FRAMEWORK OF IMBALANCED NODE CLASSIFICATION ON HETEROGENEOUS GRAPH NEURAL NETWORK
6432Finding Representative Sampling Subsets on Graphs via Submodularity
9259FINE-GRAINED DISCREPANCY CONTRASTIVE LEARNING FOR ROBUST FAKE NEWS DETECTION
4855FINE-GRAINED DISENTANGLED REPRESENTATION LEARNING FOR MULTIMODAL EMOTION RECOGNITION
7786FINE-GRAINED ENGINE FAULT SOUND EVENT DETECTION USING MULTIMODAL SIGNALS
4048FINE-GRAINED FEATURES ALIGNMENT AND FUSION FOR TEXT-VIDEO CROSS-MODAL RETRIEVAL
8402Fine-Granularity Face Sketch Synthesis
6068FINE-TUNE THE PRETRAINED ATST MODEL FOR SOUND EVENT DETECTION
7657FINE-TUNING SELF-SUPERVISED MODELS FOR LANGUAGE IDENTIFICATION USING ORTHONORMAL CONSTRAINT
4442FIRNET: FUNDAMENTAL FREQUENCY CONTROLLABLE FAST NEURAL VOCODER WITH TRAINABLE FINITE IMPULSE RESPONSE FILTER
8141FIRST-SHOT UNSUPERVISED ANOMALOUS SOUND DETECTION WITH UNKNOWN ANOMALIES ESTIMATED BY METADATA-ASSISTED AUDIO GENERATION
7900Fixed Inter-Neuron Covariability Induces Adversarial Robustness
9068FLARE-FREE VISION: EMPOWERING UFORMER WITH DEPTH INSIGHTS
3925FLATTENING SINGULAR VALUES OF FACTORIZED CONVOLUTION FOR MEDICAL IMAGES
1347FLEXIBLE KEYWORD SPOTTING BASED ON HOMOGENEOUS AUDIO-TEXT EMBEDDING
1408Flipping Consistent and Counterfactual Attention Network for Facial Expression Recognition
7029FLOW DYNAMICS CORRECTION FOR ACTION RECOGNITION
7157FOCUS FUSION NETWORK FOR VISIBLE AND INFRARED IMAGE FUSION
7588Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
6920FOLLOWING THE EMBEDDING: IDENTIFYING TRANSITION PHENOMENA IN WAV2VEC 2.0 REPRESENTATIONS OF SPEECH AUDIO
3599FORECASTING TORSIONAL RESONANCE IN ELECTRIC VEHICLES BY LEARNING A QUANTILE REGRESSOR
2314Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
8090FOUNDATION MODEL ASSISTED AUTOMATIC SPEECH EMOTION RECOGNITION: TRANSCRIBING, ANNOTATING, AND AUGMENTING
4032FOURIER DOMAIN APPROACH FOR GALAXY SPECTRA DECONTAMINATION AND DECONVOLUTION
8445Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention
10228FPGNET: SINGLE IMAGE DERAINING WITH HIGH-FREQUENCY CHANNEL AND FREQUENCY DOMAIN PRIOR GUIDANCE
4513FPN WITH GMM BASED FEATURE ENHANCEMENT STRATEGY FOR OBJECT DETECTION IN REMOTE SENSING IMAGES
11496Fractional Fourier Transform in Time Series Prediction
7747FRACTURE ASSEMBLY WITH SEGMENTATION AND ITERATIVE REGISTRATION
6826FRAME-LEVEL EMOTIONAL STATE ALIGNMENT METHOD FOR SPEECH EMOTION RECOGNITION
4260Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection
3291FRAME-WISE STREAMING END-TO-END SPEAKER DIARIZATION WITH NON-AUTOREGRESSIVE SELF-ATTENTION-BASED ATTRACTORS
2757FREETALKER: CONTROLLABLE SPEECH AND TEXT-DRIVEN GESTURE GENERATION BASED ON DIFFUSION MODELS FOR ENHANCED SPEAKER NATURALNESS
3045FREEZE THE BACKBONES: A PARAMETER-EFFICIENT CONTRASTIVE APPROACH TO ROBUST MEDICAL VISION-LANGUAGE PRE-TRAINING
4175FREGRAD: LIGHTWEIGHT AND FAST FREQUENCY-AWARE DIFFUSION VOCODER
6963FREMAX: A SIMPLE METHOD TOWARDS TRULY SECURE GENERATIVE LINGUISTIC STEGANOGRAPHY
7494FREQ2TIME: WEAKLY SUPERVISED LEARNING OF CAMERA-BASED RPPG FROM HEART RATE
7202FREQUENCY ANALYSIS AND FILTER DESIGN FOR DIRECTED GRAPHS WITH POLAR DECOMPOSITION
1736FREQUENCY AWARE AND GRAPH FUSION NETWORK FOR POLYP SEGMENTATION
6946FREQUENCY ESTIMATION VIA SUB-NYQUIST UNLIMITED SAMPLING
8934FREQUENCY MASKING FOR UNIVERSAL DEEPFAKE DETECTION
8446FREQUENCY-DOMAIN SIGNAL RECONSTRUCTION FOR DYNAMIC TIME-DOMAIN WEIGHTING HYBRID PRECODING WITH BEAM SQUINT
2470Friends to Help: Saving Federated Learning from Client Dropout
9284FROM COARSE TO FINE: EFFICIENT TRAINING FOR AUDIO SPECTROGRAM TRANSFORMERS
2725FROM CONVOLUTIONAL SPARSE CODING TO *-NMF FACTORIZATION OF TIME-FREQUENCY COEFFICIENTS
7234FROM GAME THEORY TO VISUAL RECOGNITION: ADVANCING DNN ROBUSTNESS
7963FROM RIR TO BRIR: A SPARSE RECOVERY BEAMFORMING APPROACH FOR VIRTUAL BINAURAL SOUND RENDERING
3633FSD: AN INITIAL CHINESE DATASET FOR FAKE SONG DETECTION
3820FSPEN: AN ULTRA-LIGHTWEIGHT NETWORK FOR REAL TIME SPEECH ENAHNCMENT
3741FUNCODEC: A FUNDAMENTAL, REPRODUCIBLE AND INTEGRABLE OPEN-SOURCE TOOLKIT FOR NEURAL SPEECH CODEC
4576Functional Emotion Transformer for EEG-assisted Cross-Modal Emotion Recognition
7833Functional Invariants to Watermark Large Transformers
8377FUNCTIONALLY SIMILAR MULTI-LABEL KNOWLEDGE DISTILLATION
6308FUNDAMENTAL LIMITS OF DIRECTION FINDING IN DISTRIBUTED ARRAYS EXPLOITING AUXILIARY SOURCES
9304FUNDAMENTAL PERFORMANCE BOUNDS FOR CARRIER PHASE POSITIONING IN LEO-PNT SYSTEM
2621FUR-API: DATASET AND BASELINES TOWARD REALISTIC API ANOMALY DETECTION
3867FURTHER RESULTS ON THE DESIGN OF REAL-VALUED WIDEBAND BEAMFORMERS USING ADAPTIVE-ARRAY-THEORY-INSPIRED WEIGHTED LEAST SQUARES
9364FUSDOM: COMBINING IN-DOMAIN AND OUT-OF-DOMAIN KNOWLEDGE FOR CONTINUOUS SELF-SUPERVISED LEARNING
2550Fusing Modality-Specific Representations and Decisions for Multimodal Emotion Recognition
7241FUSING MULTI-LEVEL FEATURES FROM AUDIO AND CONTEXTUAL SENTENCE EMBEDDING FROM TEXT FOR INTERVIEW-BASED DEPRESSION DETECTION
5573FUSING STRUCTURE AND APPEARANCE FEATURES IN FACIAL EXPRESSION RECOGNITION TRANSFORMER
9878FUSION OF AUDIO AND VISUAL EMBEDDINGS FOR SOUND EVENT LOCALIZATION AND DETECTION
7948Fusion of Multi-resolution Seismic Tomography Maps with Physics-informed Probability Graphical Models
2150FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models
4738FW-SHAPLEY: REAL-TIME ESTIMATION OF WEIGHTED SHAPLEY VALUES
1615G2G: GENERALIZED LEARNING BY CROSS-DOMAIN KNOWLEDGE TRANSFER FOR FEDERATED DOMAIN GENERALIZATION
1746G2PU: Grapheme-to-Phoneme Transducer with Speech Units
7756GAMAFLOW: ESTIMATING 3D SCENE FLOW VIA GROUPED ATTENTION AND GLOBAL MOTION AGGREGATION
3536GaP-aug: Gamma Patch-Wise Correction Augmentation Method for Respiratory Sound Classification
3528GASS: GENERALIZING AUDIO SOURCE SEPARATION WITH LARGE-SCALE DATA
8078GBSD: GENERATIVE BOKEH WITH STAGE DIFFUSION
9846GCC-PHAT RE-IMAGINED - A U-NET FILTER FOR AUDIO TDOA PEAK-SELECTION
6454GCIA: A BLACK-BOX GRAPH INJECTION ATTACK METHOD VIA GRAPH CONTRASTIVE LEARNING
1608GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
4147GENEFORMER: LEARNED GENE COMPRESSION USING TRANSFORMER-BASED CONTEXT MODELING
5185General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level
11876GENERAL SPEECH RESTORATION USING TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS
9367GENERALIZABLE TWO-BRANCH FRAMEWORK FOR IMAGE CLASS-INCREMENTAL LEARNING
7863Generalization of self-supervised learning-based representations for cross-domain speech emotion recognition
1564GENERALIZED DETERMINISTIC-RANDOM TRADEOFF OF INTEGRATED SENSING AND COMMUNICATIONS: THE SENSING-OPTIMAL OPERATING POINT
8011GENERALIZED HOLE-FILLING STRATEGY FOR OVERLAPPING HOLE-EXISTING COPRIME ARRAYS FOR DOA ESTIMATION
7848Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
5472GENERALIZED SPECAUGMENT VIA MULTI-RECTANGLE INVERSE MASKING FOR ACOUSTIC SCENE CLASSIFICATION
7168Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization
2759Generating High-quality Adversarial Examples with Universal Perturbation-Based Adaptive Network and Improved Perceptual Loss
8932GENERATING PERSONA-AWARE EMPATHETIC RESPONSES WITH RETRIEVAL-AUGMENTED PROMPT LEARNING
10023GENERATING STEREOPHONIC MUSIC WITH SINGLE-STAGE LANGUAGE MODELS
7649GENERATION OR REPLICATION: AUSCULTATING AUDIO LATENT DIFFUSION MODELS
9530Generation-based Target Speech Extraction with Speech Discretization and Vocoder
2445GENERATIVE AI-AIDED JOINT TRAINING-FREE SECURE SEMANTIC COMMUNICATIONS VIA MULTI-MODAL PROMPTS
5119GENERATIVE CONTEXT-AWARE FINE-TUNING OF SELF-SUPERVISED SPEECH MODELS
8060GENERATIVE DE-QUANTIZATION FOR NEURAL SPEECH CODEC VIA LATENT DIFFUSION
4244Generative Extension Positive Pairs and Improving Sample Selection Based on Contrastive Learning for Unsupervised Person Re-identification
7557GEODESIC INTERPOLATION OF FRAME-WISE SPEAKER EMBEDDINGS FOR THE DIARIZATION OF MEETING SCENARIOS
8937Geometry Compression Artifact Removal for V-PCC over a Wide Bitrate Range
3457GEOMETRY-CORRECTED GEODESIC MOTION MODELING WITH PER-FRAME CAMERA MOTION FOR 360-DEGREE VIDEO COMPRESSION
8582GESTURE GENERATION VIA DIFFUSION MODEL WITH ATTENTION MECHANISM
11475GFANC-Kalman: Generative Fixed-Filter Active Noise Control with CNN-Kalman Filtering
9257GFMAE: Self-Supervised GNN-Free Masked AutoEncoders
5005GI-PIP: DO WE REQUIRE IMPRACTICAL AUXILIARY DATASET FOR GRADIENT INVERSION ATTACKS?
7033GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL
3562GLANCE, FOCUS AND REFINEMENT NETWORK FOR REMOTE SENSING CHANGE DETECTION
6216GLANCING FUTURE FOR SIMULTANEOUS MACHINE TRANSLATION
1497GLAND INSTANCE SEGMENTATION BY FULL RESOLUTION MULTI-SCALE DILATION RESIDUAL NETWORKS
10177GLAND SEGMENTATION VIA DUAL ENCODERS AND BOUNDARY-ENHANCED ATTENTION
3535GLMAE: GRAPH REPRESENTATION LEARNING METHOD COMBINING GENERATIVE LEARNING AND MASKING AUTOENCODER
5172GLMB 3D SPEAKER TRACKING WITH VIDEO-ASSISTED MULTI-CHANNEL AUDIO OPTIMIZATION FUNCTIONS
2165Global Convergence of Alternating Direction Method of Multipliers for Invex Objective Losses
1668GLOBAL OPTIMIZATION OF ACTIVE RIS IN LINEAR TIME
11550Global Optimization of Long-Term Average Proportional Fair Throughput via Convex Reformulation
3239Globally Optimal Beamforming Design for Integrated Sensing and Communication Systems
1938GLOCAL CASCADING NETWORK FOR TOPIC ENHANCED VISUAL STORYTELLING
8043GMM-RESNET2: ENSEMBLE OF GROUP RESNET NETWORKS FOR SYNTHETIC SPEECH DETECTION
7184GMM-RESNEXT: COMBINING GENERATIVE AND DISCRIMINATIVE MODELS FOR SPEAKER VERIFICATION
6173GMTR: Graph Matching Transformers
6861GM-VRC: SEMANTIC TOPOLOGICAL DATA ENSEMBLE APPROACH FOR EEG SIGNAL CLASSIFICATION
7878GPT-4 DRIVEN CINEMATIC MUSIC GENERATION THROUGH TEXT PROCESSING
1691GPTCN: GATED PARALLEL TRANSFORMER CONVOLUTIONAL NETWORKS FOR DOWNSTREAM-TASK USER REPRESENTATION LEARNING ON APP USAGE
4300GR0: Self-supervised Global Representation Learning for Zero-shot Voice Conversion
4538GRADIENT AND BRIGHTNESS GUIDED LOW-LIGHT ENHANCEMENT WITH ATTENTION-BASED SELF-PACED LEARNING
8532Gradient Inversion Attacks on Acoustic Signals: Revealing Security Risks in Audio Recognition Systems
6947Gradient Reactivation Enhanced Causal Attention for Out-Of-Distribution Generalizable Graph Classification
5876GRADIENT WEIGHTING FOR SPEAKER VERIFICATION IN EXTREMELY LOW SIGNAL-TO-NOISE RATIO
3421GRADIENT-AWARE LOGIT ADJUSTMENT LOSS FOR LONG-TAILED CLASSIFIER
6847GRADIENT-BASED DIMENSIONALITY REDUCTION FOR SPEECH EMOTION RECOGNITION USING DEEP NETWORKS
7633GRADUALLY SPATIO-TEMPORAL FEATURE ACTIVATION FOR TARGET TRACKING
3832Granger Connectivity Analysis as a Block-Term Tensor Regression for eSport Players
11514GRAPH ATTENTION FOR AUTOMATED AUDIO CAPTIONING
8029Graph Convolutional Neural Networks in the Companion Model
8289GRAPH IDENTIFICATION AND UPPER CONFIDENCE EVALUATION FOR CAUSAL BANDITS WITH LINEAR MODELS
6174GRAPH LOCAL-SMOOTH DICTIONARY LEARNING
8660Graph Networks Stand Strong: Enhancing Robustness via Stability Constraints
9469GRAPH NEURAL NETWORKS ARE MORE POWERFUL THAN WE THINK
8509Graph Signal Processing: The 2D Companion Model
1086GRAPH-AWARE MULTI-VIEW FUSION FOR RUMOR DETECTION ON SOCIAL MEDIA
8882Graph-based Environment Representation for Vision-and-Language Navigation in Continuous Environments
7849GRAPH-BASED PERMUTATION PATTERNS FOR THE ANALYSIS OF TASK-RELATED FMRI SIGNALS ON DTI NETWORKS IN MILD COGNITIVE IMPAIRMENT
8152Graph-enhanced Hybrid Sampling for Multi-armed Bandit Recommendation
5839Graphical Inference in Non-Markovian Linear-Gaussian State-space Models
11455GRAPHON POOLING FOR REDUCING DIMENSIONALITY OF SIGNALS AND CONVOLUTIONAL OPERATORS ON GRAPHS
3907Gravitated Latent Space Loss Generated by Metric Tensor for High-Dynamic Range Imaging
9389GRIDLESS PARAMETER ESTIMATION IN PARTLY CALIBRATED RECTANGULAR ARRAYS
6486GROUNDED-INSTRUCT-PIX2PIX: IMPROVING INSTRUCTION BASED IMAGE EDITING WITH AUTOMATIC TARGET GROUNDING
11462GROUSE: A TASK AND MODEL AGNOSTIC WAVELET-DRIVEN FRAMEWORK FOR MEDICAL IMAGING
3943G-SharP: Globally Shared Kernel with Pruning for Efficient CNNs
8780GSTNet: Gait Spatio-Temporal Network for Gait Recognition Using Millimeter-Wave Radar
6951GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources
1369GuessKT: Improving Knowledge Tracing via Considering Guess Behaviors
1960GUIDED CIRCULAR DECOMPOSITION AND CROSS-MODAL RECOMBINATION FOR MULTIMODAL SENTIMENT ANALYSIS
4826HADGEO: IMAGE BASED 3-DOF CROSS-VIEW GEO-LOCALIZATION WITH HARD SAMPLE MINING
5575HAFFORMER: A HIERARCHICAL ATTENTION-FREE FRAMEWORK FOR ALZHEIMER’S DISEASE DETECTION FROM SPONTANEOUS SPEECH
4198HAFORMER: HETEROGENEOUS AGGREGATION TRANSFORMER FOR SINGLE IMAGE DERAINING
11545HALF-INVERTED ARRAY DESIGN SCHEME FOR LARGE HOLE-FREE FOURTH-ORDER DIFFERENCE CO-ARRAYS
9523HALTINGVT: ADAPTIVE TOKEN HALTING TRANSFORMER FOR EFFICIENT VIDEO RECOGNITION
7307HARDWARE IMPAIRMENTS-AWARE DESIGN OF NONCOHERENT GRASSMANNIAN CONSTELLATIONS
7967Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P^2M) for Neuromorphic Vision Sensors
7879HARDWARE-LIMITED TIME CONSTANT ESTIMATION USING A WEIGHTED LINEAR REGRESSION
7887Harmonic Retrieval for Non-Circular Coherent Signals via Double Decoupled Atomic Norm Minimization
6916HARNESSING THE POWER OF LARGE VISION LANGUAGE MODELS FOR SYNTHETIC IMAGE DETECTION
7761HAROOD: HUMAN ACTIVITY CLASSIFICATION AND OUT-OF-DISTRIBUTION DETECTION WITH SHORT-RANGE FMCW RADAR
8664HAZY REMOTE SENSING IMAGES SEMANTIC SEGMENTATION FOR WEAKLY ANNOTATION BASED ON SALIENCY-AWARE ALIGNMENT STRATEGY
6116HDPNeRF: Hybrid Depth Priors for Neural Radiance Fields from Sparse Input Views
1738HDRTVFormer: Efficient SDRTV-to-HDRTV via Affine Transformation and Spatial-aware Transformer
10426HEALTHY AGING IS MARKED BY ENTROPY REDUCTION IN CORTICAL SPONTANEOUS ACTIVITY
2516HEARING LOSS DETECTION FROM FACIAL EXPRESSIONS IN ONE-ON-ONE CONVERSATIONS
7908Heart Rate Variability Estimation with Dynamic Fine Filtering and Global-Local Context Outlier Removal
8494HEAR-YOUR-ACTION: HUMAN ACTION RECOGNITION BY ULTRASOUND ACTIVE SENSING
5399HENET: HYPERBOLIC-BASED ENCODER-DECODER NETWORK FOR WORD SPOTTING IN HISTORICAL MONGOLIAN DOCUMENTS
7107HETEROGENEOUS FACE RECOGNITION USING DOMAIN INVARIANT UNITS
4251HEURISTIC-DRIVEN, TYPE-SPECIFIC EMBEDDING IN PARALLEL SPACES FOR ENHANCING KNOWLEDGE GRAPH REASONING
9687Hierarchical Attacks on Large-Scale Graph Neural Networks
4839Hierarchical cross-modality knowledge transfer with sinkhorn attention for CTC-based ASR
3529HIERARCHICAL EMOTION PREDICTION AND CONTROL IN TEXT-TO-SPEECH SYNTHESIS
8256HIERARCHICAL HOME ACTION UNDERSTANDING WITH IMPLICIT AND EXPLICIT PRIOR KNOWLEDGE
9753HIERARCHICAL METADATA INFORMATION CONSTRAINED SELF-SUPERVISED LEARNING FOR ANOMALOUS SOUND DETECTION UNDER DOMAIN SHIFT
2708HIERARCHICAL SPEAKER REPRESENTATION FOR TARGET SPEAKER EXTRACTION
2730HIERARCHICAL VAE BASED SEMANTIC COMMUNICATIONS FOR POMDP TASKS
8277High Accuracy Device Localization in Indoor mmWave Networks Exploiting Channel Sparsity and Virtual Anchor Mapping
7349HIGH RESOLUTION GUITAR TRANSCRIPTION VIA DOMAIN ADAPTATION
3027HIGH RESOLUTION IMAGE QUALITY DATABASE
5114HIGH-ACCURACY ANXIETY DISORDER IDENTIFICATION THROUGH SUBSPACE-ENHANCED HYPERGRAPH NEURAL NETWORK
8126Higher Order Multiple Graph Filtering for Structured Graph Learning
11465High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks
4276High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
1380Highlight removal network based on an improved dichromatic reflection model
7319HIGH-ORDER TENSOR POOLING WITH ATTENTION FOR ACTION RECOGNITION
4701HIGH-RESOLUTION THROUGH-WALL IMAGING USING DATA FUSION AND REASONING
3729HIM: DISCOVERING IMPLICIT RELATIONSHIPS IN HETEROGENEOUS SOCIAL NETWORKS
2379HINT-ENHANCED IN-CONTEXT LEARNING WAKES LARGE LANGUAGE MODELS UP FOR KNOWLEDGE-INTENSIVE TASKS
5535HIQ: ONE-SHOT NETWORK QUANTIZATION FOR HISTOPATHOLOGICAL IMAGE CLASSIFICATION
11512HISTORICAL AUDIO SEARCH AND PRESERVATION: FINDING WALDO WITHIN THE FEARLESS STEPS APOLLO 11 NATURALISTIC AUDIO CORPUS
8868HLS-FGVC: HIERARCHICAL LABEL SEMANTICS ENHANCED FINE-GRAINED VISUAL CLASSIFICATION
3491HM-CONFORMER: A CONFORMER-BASED AUDIO DEEPFAKE DETECTION SYSTEM WITH HIERARCHICAL POOLING AND MULTI-LEVEL CLASSIFICATION TOKEN AGGREGATION METHODS
8072HMM-Based CSI Embedding For Trajectory Recovery from RSS Measurements of Non-Cooperative Devices
1450HMNet: Hierarchical Microscale-aware Network for Infrared Small Target Detection
7827HODGE-AWARE CONTRASTIVE LEARNING
8851HOICS: Zero-Shot HOI Detection via Compatibility Self-Learning
4366HOT-FIXING WAKE WORD RECOGNITION FOR END-TO-END ASR VIA NEURAL MODEL REPROGRAMMING
2684HOURGLASS-AVSR: DOWN-UP SAMPLING-BASED COMPUTATIONAL EFFICIENCY MODEL FOR AUDIO-VISUAL SPEECH RECOGNITION
1379How Can Personalized Context Help? Exploring Joint Retrieval of Passage and Personalized Context
4802HOW DOES END-TO-END SPEECH RECOGNITION TRAINING IMPACT SPEECH ENHANCEMENT ARTIFACTS?
7272HOW SECURE IS THE TIME-MODULATED ARRAY-ENABLED OFDM DIRECTIONAL MODULATION?
10254HOW TO BRIDGE GRAPH AND SEQUENCE PATTERNS IN SESSION-BASED RECOMMENDATION? A SELF-SUPERVISED METHOD
10437HOW TO DISTURB NETWORK RECONNAISSANCE: A MOVING TARGET DEFENSE APPROACH BASED ON DEEP REINFORCEMENT LEARNING
7537HRTF Recommendation Based on the Predicted Binaural Colouration Model
7269HUBERTOPIC: ENHANCING SEMANTIC REPRESENTATION OF HUBERT THROUGH SELF-SUPERVISION UTILIZING TOPIC MODEL
1709Human Guided Cross-Modal Reasoning with Semantic Attention Learning for Visual Question Answering
8143HUMAN MOTION CAPTURE DATA SEGMENTATION BASED ON ST-GCN
4495HUMAN MOTION GENERATION VIA CONDITIONED GMVAE WITH TUNET
7094Human Perception-Guided Meta-training for Few-shot NeRF
2267HUMTRANS: A NOVEL OPEN-SOURCE DATASET FOR HUMMING MELODY TRANSCRIPTION AND BEYOND
3001Hybrid Attention Time-Frequency Analysis Network for Single-Channel Speech Enhancement
1874HYBRID CONVOLUTION-TRANSFORMER FOR LIGHTWEIGHT SINGLE IMAGE SUPER-RESOLUTION
2264HYBRID DOMAIN LEARNING TOWARDS LIGHT FIELD SPATIAL SUPER-RESOLUTION USING HETEROGENEOUS IMAGING
5132Hybrid Module with Multiple Receptive Fields and Self-Attention Layers for Medical Image Segmentation
5107HYPERBOLIC DIFFUSION PROCRUSTES ANALYSIS FOR INTRINSIC REPRESENTATION OF HIERARCHICAL DATA SETS
7816HYPERBOLIC DISTANCE-BASED SPEECH SEPARATION
6786HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks
9342HYPERGRAPH TRANSFORMER FOR SEMI-SUPERVISED CLASSIFICATION
2732HYPERGRAPH-ENHANCED SELF-SUPERVISED ROBUST GRAPH LEARNING FOR SOCIAL RECOMMENDATION
8986Hypergraph-MLP: Learning On Hypergraphs Without Message Passing
11518HYPERPIXELS: FLEXIBLE 4D OVER-SEGMENTATION FOR DENSE AND SPARSE LIGHT FIELDS
6267Hyperspectral Image Reconstruction using Hierarchical Neural Architecture Search from a Snapshot Image
11927HYPERSPECTRAL RECONSTRUCTION OF SKIN THROUGH FUSION OF SCATTERING TRANSFORM FEATURES
11895HYPERSPECTRAL SKIN VISION CHALLENGE: CAN YOUR CAMERA SEE BEYOND YOUR SKIN?
11914HYSAT++: HYBRID SPECTRAL-WISE ATTENTION TRANSFORMER FOR SKIN SPECTRAL RECONSTRUCTION
9475HYSENSE: HYBRID EVENT OCCURRENCE DETECTION METHOD FOR IOT DEVICES
5699HYSTOC: OBTAINING WORD CONFIDENCES FOR FUSION OF END-TO-END ASR SYSTEMS
2442I3FDM: Iris Inpainting via Inverse Fusion of Diffusion Models
11961ICASSP 2024 Auditory EEG Decoding Challenge
11865ICASSP 2024 SPEECH SIGNAL IMPROVEMENT CHALLENGE
11900ICMC-ASR: THE ICASSP 2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
2308IDENTIFIABILITY ANALYSIS OF SENSOR ARRAYS WITH SENSORS OFF HALF-WAVELENGTH GRID
10012IDENTIFIABILITY STUDY OF NEAR-FIELD AUTOMOTIVE SAR
8035Identifying Attack-Specific Signatures in Adversarial Examples
2926IFNET: IMAGING AND FOCUSING NETWORK FOR HANDHELD MMWAVE DEVICES
3369IFNET: INTEGRATING DATA AUGMENTATION AND DECOUPLED ATTENTION FUSION FOR 3D OBJECT DETECTION
4395IHT-Inspired Neural Network for Single-Snapshot DOA Estimation with Sparse Linear Arrays
1827Image Aesthetics Assessment via Learnable Queries
4010Image Attribution by Generating Images
4635IMAGE AUGMENTATION WITH CONTROLLED DIFFUSION FOR WEAKLY-SUPERVISED SEMANTIC SEGMENTATION
5591Image Coding for Analytics via Adversarially Augmented Adaptation
4617IMAGE HARMONIZATION Based on Hierarchical Dynamics
2343Image Mixing and Gradient Smoothing to Enhance the SAR Image Attack Transferability
7189IMAGE RESTORATION WITH GENERALIZED L2 LOSS AND CONVERGENT PLUG-AND-PLAY PRIOR
3919IMAGE RETRIEVAL WITH COMPOSED QUERY BY MULTI-SCALE MULTI-MODAL FUSION
8629IMAGE STEGANOGRAPHY WITH DEEP ORTHOGONAL FUSION OF MULTI-SCALE CHANNEL ATTENTION
3330IMAGE2POINTS: A 3D POINT-BASED CONTEXT CLUSTERS GAN FOR HIGH-QUALITY PET IMAGE RECONSTRUCTION
4838IMAGING AN EVOLVING BLACK HOLE BY LEVERAGING SHARED STRUCTURE
1151IMFIT: Normal Estimation Via Learning Neural Implicit Surface
6602IMITATING THE HUMAN VISUAL SYSTEM FOR SCANPATH PREDICTING
9486IMPACT OF SAMPLING STRATEGIES ON THE MONITORING OF CLIMATE REGIME SHIFTS WITH A LEARNING DATA ASSIMILATION METHOD
1698IMPLICIT ENHANCEMENT OF TARGET SPEAKER IN SPEAKER-ADAPTIVE ASR THROUGH EFFICIENT JOINT OPTIMIZATION
2472IMPLICIT FOREGROUND-GUIDED NETWORK FOR ANOMALY DETECTION AND LOCALIZATION
3752Implicit Neural Multiple Description for DNA-based data storage
1893IMPLICIT NEURAL REPRESENTATION FOR LOW-OVERHEAD GRAPH-BASED HOLOGRAPHIC-TYPE COMMUNICATIONS
6058Implicit-Knowledge-Guided Align before Understanding for KB-VQA
9370Importance of negative sampling in weak label learning
9508IMPORTANCE SAMPLING BASED FEDERATED UNSUPERVISED REPRESENTATION LEARNING
7404IMPOSING EARLY AND ASYMPTOTIC CONSTRAINTS ON LIGME WITH APPLICATION TO NONCONVEX ENHANCEMENT OF FUSED LASSO MODELS
6837IMPROVE DEEP FOREST WITH LEARNABLE LAYERWISE AUGMENTATION POLICY SCHEDULES
9991IMPROVED CHILDREN'S AUTOMATIC SPEECH RECOGNITION COMBINING ADAPTERS AND SYNTHETIC DATA AUGMENTATION
9473IMPROVED IMAGE CAPTIONING VIA KNOWLEDGE GRAPH-AUGMENTED MODELS
6066IMPROVED SCREEN CONTENT CODING IN VVC USING SOFT CONTEXT FORMATION
2569Improving acoustic echo cancellation by exploring speech and echo affinity with multi-head attention
4396IMPROVING ACOUSTIC ECHO CANCELLATION FOR VOICE ASSISTANTS USING NEURAL ECHO SUPPRESSION AND MULTI-MICROPHONE NOISE REDUCTION
8039IMPROVING ASR CONTEXTUAL BIASING WITH GUIDED ATTENTION
3351IMPROVING ATTENTION-BASED END-TO-END SPEECH RECOGNITION BY MONOTONIC ALIGNMENT ATTENTION MATRIX RECONSTRUCTION
1863IMPROVING AUDIO CAPTIONING MODELS WITH FINE-GRAINED AUDIO FEATURES, TEXT EMBEDDING SUPERVISION, AND LLM MIX-UP AUGMENTATION
6695IMPROVING BIOMEDICAL ENTITY LINKING WITH RETRIEVAL-ENHANCED LEARNING
7261IMPROVING CHINESE SPELLING CORRECTION WITH TEXT-PHONETICS DIFFERENTIATION AND ADAPTIVE FUSION
8142IMPROVING CONTINUAL LEARNING OF ACOUSTIC SCENE CLASSIFICATION VIA MUTUAL INFORMATION OPTIMIZATION
1966Improving Cross-domain Few-shot Classification with Multilayer Perceptron
11883IMPROVING DATA-DRIVEN RF SIGNAL SEPARATION WITH SOI-MATCHED AUTOENCODERS
4011IMPROVING DESIGN OF INPUT CONDITION INVARIANT SPEECH ENHANCEMENT
7059IMPROVING DOMAIN GENERALIZATION IN SPEECH EMOTION RECOGNITION WITH WHISPER
4015IMPROVING KINYARWANDA SPEECH RECOGNITION VIA SEMI-SUPERVISED LEARNING
9662IMPROVING LANGUAGE MODEL-BASED ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS WITH MULTI-SCALE ACOUSTIC PROMPTS
2067Improving Learned Video Compression by Exploring Spatial Redundancy
7211IMPROVING LIMITED SUPERVISED FOOT ULCER SEGMENTATION USING CROSS-DOMAIN AUGMENTATION STRATEGIES
7326Improving Long Text Understanding with Knowledge Distilled from Summarization Model
7408IMPROVING MEDICAL DIALOGUE GENERATION WITH ABSTRACT MEANING REPRESENTATIONS
2196Improving Motion Deblur by Multi-Output Learning
7305IMPROVING MULTI-MODAL EMOTION RECOGNITION USING ENTROPY-BASED FUSION AND PRUNING-BASED NETWORK ARCHITECTURE OPTIMIZATION
8889IMPROVING MULTI-SPEAKER ASR WITH OVERLAP-AWARE ENCODING AND MONOTONIC ATTENTION
2760IMPROVING MUSIC SOURCE SEPARATION WITH SIMO STEREO BAND-SPLIT RNN
7652IMPROVING NEURAL DIARIZATION THROUGH SPEAKER ATTRIBUTE ATTRACTORS AND LOCAL DEPENDENCY MODELING
4656IMPROVING OPEN-SET RECOGNITION WITH BAYESIAN METRIC LEARNING
4262Improving Oral Reading Fluency Assessment through Sub-sequence Matching of Acoustic Word Embeddings
8883IMPROVING RADIOLOGY REPORT GENERATION WITH D^2-NET: WHEN DIFFUSION MEETS DISCRIMINATOR
7063Improving Short Utterance Anti-Spoofing with AASIST2
7169Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
9093IMPROVING SPEECH ATTENUATION IN HEADPHONES USING HARMONIC MODEL DECOMPOSITION AND MULTIPLE-FREQUENCY ANC
1865Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
8746IMPROVING SPEECH RECOGNITION FOR AFRICAN AMERICAN ENGLISH WITH AUDIO CLASSIFICATION
7194IMPROVING SPEED/ACCURACY TRADEOFF FOR ONLINE STREAMING ASR VIA REAL-VALUED AND TRAINABLE STRIDES
9208IMPROVING TARGET SOUND EXTRACTION WITH TIMESTAMP KNOWLEDGE DISTILLATION
1359IMPROVING VGG-STYLE CONVNET FOR JPEG STEGANALYSIS
2795IMPROVING VISION-INSPIRED KEYWORD SPOTTING USING DYNAMIC MODULE SKIPPING IN STREAMING CONFORMER ENCODER
2904IMPROVING VISUAL QUALITY AND TRANSFERABILITY OF ADVERSARIAL ATTACKS ON FACE RECOGNITION SIMULTANEOUSLY WITH ADVERSARIAL RESTORATION
9022INAPPROPRIATE PAUSE DETECTION IN DYSARTHRIC SPEECH USING LARGE-SCALE SPEECH RECOGNITION
6555INCOMPLETE MULTI-VIEW CLUSTERING VIA INFERENCE AND EVALUATION
8188INCOMPLETE MULTI-VIEW REPRESENTATION LEARNING THROUGH ANCHOR GRAPH-BASED GCN AND INFORMATION BOTTLENECK
1678Incomplete Observations Bias Suppression for Abductive Natural Language Inference
1612In-Context Learning for Few-Shot Nested Named Entity Recognition
2147IN-CONTEXT PROMPT EDITING FOR CONDITIONAL AUDIO GENERATION
8461INCPROMPT: TASK-AWARE INCREMENTAL PROMPTING FOR REHEARSAL-FREE CLASS-INCREMENTAL LEARNING
6483Incremental Tensor Decomposition for Few Shot Neural Radiance Field
7917Inducing Inductive Bias in Vision Transformer for EEG Classification
5879INFERENCE OF GENETIC EFFECTS VIA APPROXIMATE MESSAGE PASSING
7637INFERENCE OF TIME–VARYING GRAPH TOPOLOGIES VIA GAUSSIAN PROCESSES
6212INFERRING THE GRAPH OF NETWORKED DYNAMICAL SYSTEMS UNDER PARTIAL OBSERVABILITY AND SPATIALLY COLORED NOISE
9401Inferring Time Varying Signals Over Uncertain Graphs
7939INNOVATIVE METHODS FOR NON-DESTRUCTIVE INSPECTION OF HANDWRITTEN DOCUMENTS
2515INPUTMIX: A STRATEGY TO REGULARIZE AND BALANCE MULTI-MODALITY AND MULTI-VIEW MODEL LEARNING
2529Instant Photorealistic Neural Radiance Fields Stylization
8705INTEGRATED LOCALIZATION AND COMMUNICATION IN 3GPP INDUSTRIAL ENVIRONMENTS
2941INTEGRATED SENSING AND COMMUNICATION IN UNLICENSED MMWAVE BANDS: JOINT BEAMFORMING TRAINING AND ENERGY ALLOCATION
7004INTEGRATING LANGUAGE MODELS WITH SYMBOLIC FORMULAS FOR FIRST-ORDER LOGIC REASONING
2352INTEGRATING SENSING, COMMUNICATION, AND COMPUTATION IN THE SKY
5790INTELLIGENT CARDIAC AUSCULTATION FOR MURMUR DETECTION VIA PARALLEL-ATTENTIVE MODELS WITH UNCERTAINTY ESTIMATION
11532INTER-FREQUENCY PHASE DIFFERENCE FOR PHASE RECONSTRUCTION USING DEEP NEURAL NETWORKS AND MAXIMUM LIKELIHOOD
8378INTER-MODALITY AND INTRA-SAMPLE ALIGNMENT FOR MULTI-MODAL EMOTION RECOGNITION
4760INTERNAL LOCATION ASSISTANCE FOR TEMPORAL ACTION PROPOSAL GENERATION
1971INTERPRETABLE FACE AGING: ENHANCING CONDITIONAL ADVERSARIAL AUTOENCODERS WITH LIME EXPLANATIONS
6915INTERPRETABLE MULTIMODAL OUT-OF-CONTEXT DETECTION WITH SOFT LOGIC REGULARIZATION
9518INTERPRETABLE POLICY EXTRACTION WITH NEURO-SYMBOLIC REINFORCEMENT LEARNING
2784INTERPRETING MEMORIZATION IN DEEP LEARNING FROM DATA DISTRIBUTION
11467INTERPRETING THE CONTRIBUTION OF SENSORS IN BLIND SOURCE EXTRACTION BY MEANS OF SHAPLEY VALUES
3020In-the-Wild Physiological-based Stress Detection Using Federated Strategy
1846INTRODUCING MULTILINGUAL PHONETIC INFORMATION TO SPEAKER EMBEDDING FOR SPEAKER VERIFICATION
5589INVARIANT MOTION REPRESENTATION LEARNING FOR 3D TALKING FACE SYNTHESIS
8975INVARIANTOODG: LEARNING INVARIANT FEATURES OF POINT CLOUDS FOR OUT-OF-DISTRIBUTION GENERALIZATION
11460INVERSE IMAGE FREQUENCY FOR LONG-TAILED IMAGE RECOGNITION
2689Inversive-Reasoning Augmentation for Natural Language Inference
6433INVERTEDFONTNET: FONT WATERMARKING BASED ON PERTURBING STYLE MANIFOLD
2600Invertible Mosaic Image Hiding Network for Very Large Capacity Image Steganography
1637INVERTIBLE VOICE CONVERSION WITH PARALLEL DATA
8018Investigating End-to-end ASR Architectures for Long form Audio Transcription
7444INVESTIGATING PERSONALIZATION METHODS IN TEXT TO MUSIC GENERATION
4982Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis
7993INVESTIGATING SELF-SUPERVISED DEEP REPRESENTATIONS FOR EEG-BASED AUDITORY ATTENTION DECODING
5301INVESTIGATING THE CLUSTERS DISCOVERED BY PRE-TRAINED AV-HUBERT
4922IPCL: ITERATIVE PSEUDO-SUPERVISED CONTRASTIVE LEARNING TO IMPROVE SELF-SUPERVISED FEATURE REPRESENTATION
9583IPHONMATCHNET: ZERO-SHOT USER-DEFINED KEYWORD SPOTTING USING IMPLICIT ACOUSTIC ECHO CANCELLATION
2825IRLSG: INVARIANT REPRESENTATION LEARNING FOR SINGLE-DOMAIN GENERALIZATION IN MEDICAL IMAGE SEGMENTATION
8496IRREGULARITY-AWARE BANDLIMITED APPROXIMATION FOR GRAPH SIGNAL INTERPOLATION
1282IRS-Assisted Covert Communication with a BPP Distributed Warden outside a Safety Zone
8468IRS-Assisted Joint Sensing and Communication Design for Autonomous Driving
4479ISAC Beamforming Optimization for Robust Transmission in Dynamic mmWave MIMO Networks
5396ITERATIVE AUTOREGRESSIVE GENERATION FOR ABSTRACTIVE SUMMARIZATION
4418ITERATIVELY PRECONDITIONED GUIDANCE OF DENOISING (DIFFUSION) MODELS FOR IMAGE RESTORATION
1212J-MAE: JIGSAW MEETS MASKED AUTOENCODERS IN X-RAY SECURITY INSPECTION
2626JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval
7225JOINT ADMISSION CONTROL AND BEAMFORMER DESIGN FOR MOBILE USERS: STAY HERE OR MOVE TO A BETTER POSITION?
7312JOINT BEAMFORMING AND COMPRESSION DESIGN FOR PER-ANTENNA POWER CONSTRAINED COOPERATIVE CELLULAR NETWORKS
8507Joint Blind Deconvolution and Demixing of Sparse Signals via Factorization and Nonconvex Optimization
7759JOINT CHANNEL ESTIMATION AND DATA DETECTION IN MASSIVE MIMO SYSTEMS BASED ON DIFFUSION MODELS
2380JOINT CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA USING CROSS-MODAL HIERARCHICAL FREQUENCY FUSION NETWORK
7968Joint Computing and Communication Resource Allocation for TDMA-Based Binary Computation Offloading
8084JOINT DEMOSAICING AND DENOISING WITH DOUBLE DEEP IMAGE PRIORS
11542JOINT DEREVERBERATION AND BEAMFORMING WITH BLIND ESTIMATION OF THE SHAPE PARAMETER OF THE DESIRED SOURCE PRIOR
1784JOINT DOA ESTIMATION AND DISTORTED SENSOR DETECTION UNDER ENTANGLED LOW-RANK AND ROW-SPARSE CONSTRAINTS
5687Joint Embedding Learning and Latent Subspace Probing for Cross-domain Few-shot Keyword Spotting
4687Joint End-to-End Spoken Language Understanding and Automatic Speech Recognition Training based on Unified Speech-to-Text Pre-training
2791JOINT INDSCAL DECOMPOSITION MEETS BLIND SOURCE SEPARATION
4771JOINT INFERENCE OF SPEAKER DIARIZATION AND ASR WITH MULTI-STAGE INFORMATION SHARING
1105JOINT LEARNING OF IDENTITY AND VEIN FEATURES FOR ENHANCED REPRESENTATIONS IN VASCULAR BIOMETRICS
1531JOINT MULTI-BAND DOA ESTIMATION USING LOW-RANK MATRIX RECOVERY
2548JOINT MULTI-FACTS REASONING NETWORK FOR COMPLEX TEMPORAL QUESTION ANSWERING OVER KNOWLEDGE GRAPH
7566JOINT MUSIC AND LANGUAGE ATTENTION MODELS FOR ZERO-SHOT MUSIC TAGGING
7778JOINT NEAR-FIELD TARGET TRACKING AND COMMUNICATIONS WITH FULL DUPLEX HOLOGRAPHIC MIMO
7919Joint Ranging and Phase Offset Estimation of Multiple Aviation Vehicles using Secondary Radar
9797Joint Robust Optimal Transmit and Receive Beamforming Designs for a DFRC System for the MIMO Radar and Secondary Multicast Communication in a Cognitive Radio Network
11522JOINT SEPARATION AND LOCALIZATION OF MOVING SOUND SOURCES BASED ON NEURAL FULL-RANK SPATIAL COVARIANCE ANALYSIS
7028JOINT SIGNAL INTERPOLATION / TIME-VARYING GRAPH ESTIMATION VIA SMOOTHNESS AND LOW-RANK PRIORS
9651Joint Signal Recovery and Graph Learning from Incomplete Time-Series
9027JOINT SPATIO-TEMPORAL FILTERING OF MOTION IMAGERY EEG SIGNALS FOR DATA ALIGNMENT IN TRANSFER LEARNING
7910JOINT TRANSMIT PRECODERS AND PASSIVE REFLECTION BEAMFORMER DESIGN IN IRS-AIDED IOT NETWORKS
4593JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR AUTOMATIC SPEECH RECOGNITION VIA BILEVEL OPTIMIZATION
6840JOINTLY LEARNING SELECTION MATRICES FOR TRANSMITTERS, RECEIVERS AND FOURIER COEFFICIENTS IN MULTICHANNEL IMAGING
1958JOINT-SEMANTICS MULTI-SIMILARITY HASHING FOR CROSS-MODAL RETRIEVAL
3208JPEG ENCRYPTION WITH DC PREDICTION AND RUN-BASED RS PAIRS PERMUTATION
3050JPIS: A JOINT MODEL FOR PROFILE-BASED INTENT DETECTION AND SLOT FILLING WITH SLOT-TO-INTENT ATTENTION
7262KALMAN FILTER FOR TRACKING NETWORK DYNAMIC
8698Kalman Filtering with Unlimited Sensing
7438KC-Prompt: End-to-end Knowledge-Complementary Prompting for Rehearsal-free Continual Learning
5015KD-Former: Transformer Knowledge Distillation for Image Matting
4464KEEP DECODING PARALLEL WITH EFFECTIVE KNOWLEDGE DISTILLATION FROM LANGUAGE MODELS TO END-TO-END SPEECH RECOGNISERS
8472KEEP KNOWLEDGE IN PERCEPTION: ZERO-SHOT IMAGE AESTHETIC ASSESSMENT
7734KENET:KNOWLEDGE-ENHANCED DOC-LABEL ATTENTION NETWORK FOR MULTI-LABEL TEXT CLASSIFICATION
10125KEY POINTS CENTERED SPARSE HASHING FOR CROSS-MODAL RETRIEVAL
8970Killing it with Zero-Shot: Adversarially Robust Novelty Detection
8299K-Means Clustering based on Chebyshev Polynomial Graph Filtering
4747KNN-CTC: ENHANCING ASR VIA RETRIEVAL OF CTC PSEUDO LABELS
3311KNOWLEDGE-AWARE PROMPT LEARNING FRAMEWORK FOR KOREAN-CHINESE MICROBLOG SENTIMENT ANALYSIS
7941KNOWLEDGE-BASED CONVOLUTIONAL NEURAL NETWORK FOR THE SIMULATION AND PREDICTION OF TWO-PHASE DARCY FLOWS
11600Kronecker-Product Beamforming with Sparse Concentric Circular Arrays
11877KS-NET: MULTI-BAND JOINT SPEECH RESTORATION AND ENHANCEMENT NETWORK FOR 2024 ICASSP SSI CHALLENGE
9981L1-aware Multilingual Mispronunciation Detection Framework
6322LABCLIP: LABEL-ENHANCED CLIP FOR IMPROVING ZERO-SHOT TEXT CLASSIFICATION
5192LABEL CORRECTION FOR SKETCH-BASED 3D SHAPE RETRIEVAL
5355LABEL DEPENDENCIES-AWARE SET PREDICTION NETWORKS FOR MULTI-LABEL TEXT CLASSIFICATION
5113LABEL RECTIFIED AND GRAPH ADAPTIVE SEMI-SUPERVISED REGRESSION FOR ELECTRODE SHIFTED GESTURE RECOGNITION
7068LABEL-AWARE AUXILIARY LEARNING FOR DIALOGUE STATE TRACKING
2006LACVIT: A LABEL-AWARE CONTRASTIVE FINE-TUNING FRAMEWORK FOR VISION TRANSFORMERS
9803LANGUAGE GUIDED ADVERSARIAL PURIFICATION
1299Language Model is a Branch Predictor for Simultaneous Machine Translation
4012LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
2339Language-Driven Ordinal Learning for Imbalanced Head Pose Estimation
2258LANGUAGE-FREE COMPOSITIONAL ACTION GENERATION VIA DECOUPLING REFINEMENT
1288Language-guided Few-shot Semantic Segmentation
9515LANGUAGE-ORIENTED COMMUNICATION WITH SEMANTIC CODING AND KNOWLEDGE DISTILLATION FOR TEXT-TO-IMAGE GENERATION
3746LANGWAVE: REALISTIC VOICE GENERATION BASED ON HIGH-ORDER LANGEVIN DYNAMICS
7156LARGE COVARIANCE MATRIX ESTIMATION BASED ON FACTOR MODELS VIA NONCONVEX OPTIMIZATION
4795LARGE LANGUAGE MODEL-BASED EMOTIONAL SPEECH ANNOTATION USING CONTEXT AND ACOUSTIC FEATURE FOR SPEECH EMOTION RECOGNITION
4406LARGE LANGUAGE MODELS AS A PROXY FOR HUMAN EVALUATION IN ASSESSING THE COMPREHENSIBILITY OF DISORDERED SPEECH TRANSCRIPTION
2883LARGE LANGUAGE MODELS AUGMENTED RATING PREDICTION IN RECOMMENDER SYSTEM
1629Large Scale Self-Supervised Pretraining for Active Speaker Detection
4690LARGE-SCALE MULTI-VIEW MULTIPLE CLUSTERING
3317Latent Degradation Representation Constraint for Single Image Deraining
5161LATENT FILLING: LATENT SPACE DATA AUGMENTATION FOR ZERO-SHOT SPEECH SYNTHESIS
3582LCB-NET: LONG-CONTEXT BIASING FOR AUDIO-VISUAL SPEECH RECOGNITION
7585LEAKY WAVEGUIDE ANTENNAS FOR DOWNLINK WIDEBAND THZ COMMUNICATIONS
9222LEARN FROM ZOOM: DECOUPLED SUPERVISED CONTRASTIVE LEARNING FOR WCE IMAGE CLASSIFICATION
1724LEARN TO CLUSTER FACES WITH BETTER SUBGRAPHS
6123LEARN TO TRACK-BEFORE-DETECT VIA NEURAL DYNAMIC PROGRAMMING
4264Learnable Statistical Moments Pooling for Automatic Modulation Classification
1402LEARNED ISTA WITH ERROR-BASED THRESHOLDING FOR ADAPTIVE SPARSE CODING
4102LEARNED LAYERED CODING FOR SUCCESSIVE REFINEMENT IN THE WYNER-ZIV PROBLEM
6383LEARNED VIDEO COMPRESSION WITH SPATIAL-TEMPORAL OPTIMIZATION
7343LEARNING A CONVEX PATCH-BASED SYNTHESIS MODEL VIA DEEP EQUILIBRIUM
3794LEARNING A LOW-RANK FEATURE REPRESENTATION: ACHIEVING BETTER TRADE-OFF BETWEEN STABILITY AND PLASTICITY IN CONTINUAL LEARNING
2166Learning Active Subspaces for Effective and Scalable Uncertainty Quantification in Deep Neural Networks
8114LEARNING AROUSAL-VALENCE REPRESENTATION FROM CATEGORICAL EMOTION LABELS OF SPEECH
2143LEARNING AUDIO CONCEPTS FROM COUNTERFACTUAL NATURAL LANGUAGE
8656LEARNING CONTEXTUALIZED REPRESENTATION ON DISCRETE SPACE VIA HIERARCHICAL PRODUCT QUANTIZATION
9339Learning Density Regulated and Multi-view Consistent Unsigned Distance Fields
6056LEARNING DISCRIMINATIVE STYLE REPRESENTATIONS FOR UNSUPERVISED AND FEW-SHOT ARTISTIC PORTRAIT DRAWING GENERATION
8240LEARNING DISENTANGLED SPEECH REPRESENTATIONS WITH CONTRASTIVE LEARNING AND TIME-INVARIANT RETRIEVAL
4294LEARNING DYNAMICS OF LOW-PRECISION CLIPPED SGD WITH MOMENTUM
3543LEARNING EMOTION-INVARIANT SPEAKER REPRESENTATIONS FOR SPEAKER VERIFICATION
8307Learning Fine-Grained Information Alignment for Calibrated Cross-Modal Retrieval
3840LEARNING FROM EASY TO HARD: MULTI-TASK LEARNING WITH DATA SCHEDULING
4494LEARNING FROM TAXONOMY: MULTI-LABEL FEW-SHOT CLASSIFICATION FOR EVERYDAY SOUND RECOGNITION
2357LEARNING GENERALIZABLE VISUAL REPRESENTATIONS VIA SELF-SUPERVISED INFORMATION BOTTLENECK
9095Learning Graphs and Simplicial Complexes from Data
1508LEARNING HYBRID NEGATIVE PROBABILITY MODEL FOR WEAKLY-SUPERVISED WHOLE SLIDE IMAGE RECOGNITION
1438Learning Inference-Time Drift Sensor-Actuator for Domain Generalization
1507Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer
2756LEARNING MULTIPLEX GRAPH WITH INTER-LAYER COUPLING
1549LEARNING MULTISCALE CONSISTENCY FOR SELF-SUPERVISED ELECTRON MICROSCOPY INSTANCE SEGMENTATION
7459LEARNING ONTOLOGY INFORMED REPRESENTATIONS WITH CONSTRAINTS FOR ACOUSTIC EVENT DETECTION
5635LEARNING REPRESENTATIONS FROM EXPLAINABLE AND CONNECTIONIST APPROACHES FOR VISUAL QUESTION ANSWERING
8759LEARNING SEMANTIC INFORMATION FROM RAW AUDIO SIGNAL USING BOTH CONTEXTUAL AND PHONETIC REPRESENTATIONS
7350LEARNING SIGNALS AND GRAPHS FROM TIME-SERIES GRAPH DATA WITH FEW CAUSES
8359LEARNING SPATIO-TEMPORAL RELATIONS WITH MULTI-SCALE INTEGRATED PERCEPTION FOR VIDEO ANOMALY DETECTION
7716LEARNING SPEAKER-LISTENER MUTUAL HEAD ORIENTATION BY LEVERAGING HRTF AND VOICE DIRECTIVITY ON HEADPHONES
9615LEARNING SPECTRAL CANONICAL F-CORRELATION REPRESENTATION FOR FACE SUPER-RESOLUTION
2104Learning Speech Representation From Contrastive Token-Acoustic Pretraining
11562LEARNING STOCHASTIC GRAPH NEURAL NETWORKS WITH CONSTRAINED VARIANCE
9807Learning the Barankin Lower Bound on DOA estimation error
11450Learning to Bound: A Generative Cramér-Rao Bound
3508Learning with Non-Uniform Label Noise: A Cluster-Dependent Weakly Supervised Approach
5586Least-Effort Adversarial Attack Against Gait-based Identity Recognition System
3173LEFORMER: A HYBRID CNN-TRANSFORMER ARCHITECTURE FOR ACCURATE LAKE EXTRACTION FROM REMOTE SENSING IMAGERY
8338Lesion-aware Open Set Medical Image Recognition with Domain Shift
7412LESS PEAKY AND MORE ACCURATE CTC FORCED ALIGNMENT BY LABEL PRIORS
1025LEVERAGE CAUSAL GRAPHS AND RUMOR-REFUTING TEXTS FOR INTERPRETABLE RUMOR ANALYSIS
9265LEVERAGING BIASES IN LARGE LANGUAGE MODELS: “BIAS-KNN” FOR EFFECTIVE FEW-SHOT LEARNING
9516Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
11902LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE
4555Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
8369LEVERAGING LARGE LANGUAGE MODELS FOR EXPLOITING ASR UNCERTAINTY
8336Leveraging Large Pretrained Models for Line-by-Line Spoken Program Recognition
9964LEVERAGING NOISY LABELS OF NEAREST NEIGHBORS FOR LABEL CORRECTION AND SAMPLE SELECTION
2849Leveraging Redundancy in Feature for Efficient Learned Image Compression
4381LEVERAGING SELF-SUPERVISED SPEECH REPRESENTATIONS FOR DOMAIN ADAPTATION IN SPEECH ENHANCEMENT
3956LEVERAGING SOUND LOCALIZATION TO IMPROVE CONTINUOUS SPEAKER SEPARATION
5102LEVERAGING SPEECH PTM, TEXT LLM, AND EMOTIONAL TTS FOR SPEECH EMOTION RECOGNITION
9730LEVERAGING TENSOR SUBSPACE PRIOR: ENHANCED SUM OF NUCLEAR NORM MINIMIZATION FOR TENSOR COMPLETION
2764LEVERAGING TIMESTAMP INFORMATION FOR SERIALIZED JOINT STREAMING RECOGNITION AND TRANSLATION
8785Leveraging Visual Handicaps for Text-based Reinforcement Learning
4728LIBRIHEAVY: A 50,000 HOURS ASR CORPUS WITH PUNCTUATION CASING AND CONTEXT
3738LIGHTCODEC: A HIGH FIDELITY NEURAL AUDIO CODEC WITH LOW COMPUTATION COMPLEXITY
7039Lighting Image/Video Style Transfer Methods by Iterative Channel Pruning
4613Lightweight high-resolution Subject Matting in the Real World
3272Lightweight Multi-Axial Transformer with Frequency Prompt for Single Channel Speech Enhancement
3916LIKELIHOOD CONSENSUS 2.0: REDUCING INTERAGENT COMMUNICATION IN DISTRIBUTED BAYESIAN TARGET TRACKING
11898LIMMITS’24: MULTI-SPEAKER, MULTI-LINGUAL INDIC TTS WITH VOICE CLONING
10438Linear Complexity Gibbs Sampling for Generalized Labeled Multi-Bernoulli Filtering
11466Linearly-involved Moreau-Enhanced-over-Subspace Model: Debiased Sparse Modeling and Stable Outlier-Robust Regression
5853LIPSCHITZ-CONSTRAINED CONVOLUTIONAL LAYERS USING CONVEX PROJECTION
2817LITEVSR: EFFICIENT VISUAL SPEECH RECOGNITION BY LEARNING FROM SPEECH REPRESENTATIONS OF UNLABELED DATA
4047LIVE ITERATIVE PTYCHOGRAPHY WITH PROJECTION-BASED ALGORITHMS
1712LK-UNET: LARGE KERNEL DESIGN FOR 3D MEDICAL IMAGE SEGMENTATION
9699LLET: LIGHTWEIGHT LEXICON-ENHANCED TRANSFORMER FOR CHINESE NER
2821LOCAL AND GLOBAL FEATURE ADAPTIVE ADJUSTMENT NETWORK FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
6997local and global: text matching via syntax graph calibration
9603Local Contrast Prior-Guided Cross Aggregation Model for Effective Infrared Small Target Detection
1251LOCAL DISTANCE CORRELATION EMBEDDING FOR TIME-SERIES ANALYSIS ON RIEMANNIAN MANIFOLDS
10353LOCAL INFORMATION GUIDED GLOBAL INTEGRATION FOR INFRARED SMALL TARGET DETECTION
8005LOCAL OPTIMIZATION NETWORKS FOR MULTI-VIEW MULTI-PERSON HUMAN POSTURE ESTIMATION
2119Locality-Enhanced Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images
7765LOCALIZATION AND TRACKING OF GOLD NANOPARTICLES USING MMWAVE FMCW RADAR
1994LOCALIZATION IN SENSOR NETWORKS USING DISTRIBUTED LOW-RANK MATRIX COMPLETION
1914LOCALIZING ACOUSTIC ENERGY IN SOUND FIELD SYNTHESIS BY DIRECTIONALLY WEIGHTED EXTERIOR RADIATION SUPPRESSION
3184LOCATION OPTIMIZATION FOR RIS AIDED MMWAVE DOWNLINK NETWORK
6896LOCSELECT: TARGET SPEAKER LOCALIZATION WITH AN AUDITORY SELECTIVE HEARING MECHANISM
9924LoFi User Scheduling for Multiuser MIMO Wireless Systems
6938LOFT: LATENT SPACE OPTIMIZATION AND GENERATOR FINE-TUNING FOR DEFENDING AGAINST DEEPFAKES
6326LONG TERM MEMORY-ENHANCED VIA CAUSAL REASONING FOR TEXT-TO-VIDEO RETRIEVAL
7857LONGITUDINAL MODELING OF DEPRESSION SHIFTS USING SPEECH AND LANGUAGE
3846LONG-TERM ACTION ANTICIPATION BASED ON CONTEXTUAL ALIGNMENT
7951LONG-TERM SOCIAL INTERACTION CONTEXT: THE KEY TO EGOCENTRIC ADDRESSEE DETECTION
2496Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
8901Loop Structure-Aware Learning for Fully Automated Pulmonary Fissure Completeness Assessment
4857LOSS MASKING IS NOT NEEDED IN DECODER-ONLY TRANSFORMER FOR DISCRETE-TOKEN-BASED ASR
2670LOSSY COMPRESSION OF ADJACENCY MATRICES BY GRAPH FILTER BANKS
7175Low Bitrate Loss Resilience Scheme For a Speech Enhancing Neural Codec
11913LOW DOSE CBCT DENOISING USING A 3D U-NET
4145LOW OVERHEAD DMG SENSING FOR VITAL SIGNS DETECTION
2375LOW REDUNDANT ATTENTION NETWORK FOR EFFICIENT IMAGE SUPER-RESOLUTION
9716Low-Complexity GLRT Based Quickest Detection with Unknown Parameters
4390LOW-COMPLEXITY VECTOR SOURCE CODING FOR DISCRETE LONG SEQUENCES WITH UNKNOWN DISTRIBUTIONS
4138LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION
8822LOW-LIGHT RAW IMAGE ENHANCEMENT ON A DATASET SUFFERING LIGHT EFFECTS
11504LOW-PAPR OFDM WAVEFORM DESIGN FOR RADAR AND COMMUNICATION SYSTEMS
7318LOW-RANK COMPLETION BASED NORMAL GUIDED LIDAR POINT CLOUD UP-SAMPLING
2519LOW-RANK CONSTRAINED MULTICHANNEL SIGNAL DENOISING CONSIDERING CHANNEL-DEPENDENT SENSITIVITY INSPIRED BY SELF-SUPERVISED LEARNING FOR OPTICAL FIBER SENSING
2398LVC-LGMC: JOINT LOCAL AND GLOBAL MOTION COMPENSATION FOR LEARNED VIDEO COMPRESSION
10019LV-SEGFORMER: TOWARDS MORE ACCURATE LEAF-VEIN SEGMENTATION WITH TRANSFORMER
8485M$^3$ARL: Moment-Embedded Mean-Field Multi-Agent Reinforcement Learning for Continuous Action Space
1355M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for More Uplift Modeling
7579M2BART: MULTILINGUAL AND MULTIMODAL ENCODER-DECODER PRE-TRAINING FOR ANY-TO-ANY MACHINE TRANSLATION
4499M2SUM: MULTI-GRANULARITY SCALE-ADAPTIVE VIDEO SUMMARIZER TOWARDS INFORMATIVE CONTEXT REPRESENTATION LEARNING
6683M3DSYNTH: A DATASET OF MEDICAL 3D IMAGES WITH AI-GENERATED LOCAL MANIPULATIONS
8731M3SUM: A NOVEL UNSUPERVISED LANGUAGE-GUIDED VIDEO SUMMARIZATION
1843M3TQA: MULTI-VIEW, MULTI-HOP AND MULTI-STAGE REASONING FOR TEMPORAL QUESTION ANSWERING
4194MACCN:MULTI-MODAL ADAPTIVE CO-ATTENTION FUSION CONTRASTIVE LEARNING NETWORKS FOR FAKE NEWS DETECTION
3404MaDE: MULTI-SCALE DECISION ENHANCEMENT FOR MULTI-AGENT REINFORCEMENT LEARNING
3571MADRL-BASED UAVS TRAJECTORY DESIGN WITH ANTI-COLLISION MECHANISM IN VEHICULAR NETWORKS
3155MAINLOBE DECEPTIVE JAMMER SUPPRESSION USING FDA-MIMO RADAR IN THE PRESENCE OF MULTIPATH PROPAGATION
8482MAML-BASED 24-HOUR PERSONALIZED BLOOD PRESSURE ESTIMATION FROM WRIST PHOTOPLETHYSMOGRAPHY SIGNALS IN FREE-LIVING CONTEXT
6000MANTICORE: AN UNSUPERVISED INTRUSION DETECTION SYSTEM BASED ON CONTRASTIVE LEARNING IN 5G NETWORKS
3990MAPACHE: MASKED PARALLEL TRANSFORMER FOR ADVANCED SPEECH EDITING AND SYNTHESIS
3886MAPFLOW: MULTI-AGENT PEDESTRIAN TRAJECTORY PREDICTION USING NORMALIZING FLOW
5076Mask6D: Masked Pose Priors For 6D Object Pose Estimation
4210MaskMark: Robust Neural Watermarking for Real and Synthetic Speech
9313MaskSTR: Guide Scene Text Recognition Models with Masking
8481MAS-NET: MIXED-FEATURE ATTENTION SIAMESE NETWORK FOR CHANGE DETECTION ON REMOTE SENSING IMAGES
6040MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING
2557MATPR-UNET: A MULTI ATTENTION TWO-PATH RESIDUAL UNET FOR FOCAL CORTICAL DYSPLASIA LESIONS SEGMENTATION
4336MATRIX FACTORIZATION IN TROPICAL AND MIXED TROPICAL-LINEAR ALGEBRAS
7388MAX-AST: COMBINING CONVOLUTION, LOCAL AND GLOBAL SELF-ATTENTIONS FOR AUDIO EVENT CLASSIFICATION
2606MAXIMAL CODING RATE REDUCTION FOR GRAPH EMBEDDINGS
11551MAXIMUM LIKELIHOOD-BASED GRIDLESS DOA ESTIMATION USING STRUCTURED COVARIANCE MATRIX RECOVERY AND SBL WITH GRID REFINEMENT
4372MAXIMUM-ENTROPY ADVERSARIAL AUDIO AUGMENTATION FOR KEYWORD SPOTTING
8362MAX-MARGIN TRANSDUCER LOSS: IMPROVING SEQUENCE-DISCRIMINATIVE TRAINING USING A LARGE-MARGIN LEARNING STRATEGY
8257Max-min Beamforming for Multi-User Massive MIMO Systems: An Alternating Projection-Based Approach
7720MCM-CSD: MULTI-GRANULARITY CONTEXT MODELING WITH CONTRASTIVE SPEAKER DETECTION FOR EMOTION RECOGNITION IN REAL-TIME CONVERSATION
5311MDAVIF: A MULTI-DOMAIN ACOUSTICAL-VISUAL INFORMATION FUSION MODEL FOR DEPRESSION RECOGNITION FROM VLOG DATA
5180MDRT: MULTI-DOMAIN SYNTHETIC SPEECH LOCALIZATION
4402MDX-GAN: ENHANCING PERCEPTUAL QUALITY IN MULTI-CLASS SOURCE SEPARATION VIA ADVERSARIAL TRAINING
2867MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization
5054MEDICAL VISION-LANGUAGE REPRESENTATION LEARNING WITH CROSS-MODAL MULTI-TEACHER CONTRASTIVE DISTILLATION
9717MELS-TTS : MULTI-EMOTION MULTI-LINGUAL MULTI-SPEAKER TEXT-TO-SPEECH SYSTEM VIA DISENTANGLED STYLE TOKENS
8383MEMORY EFFICIENT CORNER DETECTION FOR EVENT-DRIVEN DYNAMIC VISION SENSORS
8783MEMORY SELF-CALIBRATED NETWORK FOR VISUAL GROUNDING
2630Memory-augmented Dual-domain Unfolding Network for MRI reconstruction
6496MEMORY-AUGMENTED ONLINE VIDEO ANOMALY DETECTION
9897MEMORY-AUGMENTED SPEECH-TO-TEXT TRANSLATION WITH MULTI-SCALE CONTEXT TRANSLATION STRATEGY
7817MEPE: A Minimalist Ensemble Policy Evaluation Operator for Deep Reinforcement Learning
2538MERG: Multi-dimensional Edge Representation Generation Layer for Graph Neural Networks
3363MERTECH: INSTRUMENT PLAYING TECHNIQUE DETECTION USING SELF-SUPERVISED PRETRAINED MODEL WITH MULTI-TASK FINETUNING
2935MESH-RTUME: UNIVERSAL MANIFOLD EMBEDDING FOR ESTIMATING 3D RIGID TRANSFORMATIONS OF SURFACES
5844META REPRESENTATION LEARNING METHOD FOR ROBUST SPEAKER VERIFICATION IN UNSEEN DOMAINS
1758META STRUCTURE SEARCH FOR LINK WEIGHT PREDICTION IN HETEROGENEOUS GRAPH
4169META-AF ECHO CANCELLATION FOR IMPROVED KEYWORD SPOTTING
2676META-KNOWLEDGE ENHANCED DATA AUGMENTATION FOR FEDERATED PERSON RE-IDENTIFICATION
1852META-LEARNING WITH VERSATILE LOSS GEOMETRIES FOR FAST ADAPTATION USING MIRROR DESCENT
8034METASURFACE-BASED RECEIVERS WITH 1-BIT ADCS FOR MULTI-USER UPLINK COMMUNICATIONS
4899MF-AED-AEC: SPEECH EMOTION RECOGNITION BY LEVERAGING MULTIMODAL FUSION, ASR ERROR DETECTION, AND ASR ERROR CORRECTION
2930MFT-PCQA: Multi-modal Fusion Transformer for No-reference Point Cloud Quality Assessment
2194MGRL: MUTUAL-GUIDANCE REPRESENTATION LEARNING FOR TEXT-TO-IMAGE PERSON RETRIEVAL
4943MHPS: MULTIMODALITY-GUIDED HIERARCHICAL POLICY SEARCH FOR KNOWLEDGE GRAPH REASONING
2856Micro-expression Recognition by Fusing Action Unit Detection and Spatio-temporal Features
9321MICROPHONE CONVERSION: MITIGATING DEVICE VARIABILITY IN SOUND EVENT CLASSIFICATION
7493MICROPHONE SUBSET SELECTION FOR THE WEIGHTED PREDICTION ERROR ALGORITHM USING A GROUP SPARSITY PENALTY
9539MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors
4041MIMO IMAGING METHOD WITH ITERATIVE-BASED SUPER-RESOLUTION FOR AUTOMOTIVE RADAR
2100MINIMALLY-SUPERVISED SPEECH SYNTHESIS WITH CONDITIONAL DIFFUSION MODEL AND LANGUAGE MODEL: A COMPARATIVE STUDY OF SEMANTIC CODING
11497MINIMIZING LOW-RANK MODELS OF HIGH-ORDER TENSORS: HARDNESS, SPAN, TIGHT RELAXATION, AND APPLICATIONS
8977MIR-MLPOP: A MULTILINGUAL POP MUSIC DATASET WITH TIME-ALIGNED LYRICS AND AUDIO
5742MISA: UNVEILING THE VULNERABILITIES IN SPLIT FEDERATED LEARNING
1921MISSPECIFIED TIME-DELAY AND DOPPLER ESTIMATION OVER NON GAUSSIAN SCENARIOS
8441MITIGATE REPLICATION AND COPYING IN DIFFUSION MODELS WITH GENERALIZED CAPTION AND DUAL FUSION ENHANCEMENT
7782MITIGATING DATA INJECTION ATTACKS ON FEDERATED LEARNING
5127MITIGATING INTRA-CLASS VARIANCE IN FEW-SHOT POINT CLOUD CLASSIFICATION
1462MITIGATING OPTIMIZATION CONFLICT IN DOMAIN ADVERSARIAL NEURAL NETWORK VIA UNCERTAINTY-AWARE
3121MIXED GRAPH SIGNAL ANALYSIS OF JOINT IMAGE DENOISING / INTERPOLATION
1104MIXED INFORMED TRANSFORMER FOR FEW-SHOT MEDICAL IMAGE SEGMENTATION
4904MIXED PRECISION NEURAL QUANTIZATION WITH MULTI-OBJECTIVE BAYESIAN OPTIMIZATION FOR ON-DEVICE DEPLOYMENT
8686MIXED-ATTENTION AUTO ENCODER FOR MULTI-CLASS INDUSTRIAL ANOMALY DETECTION
6202MLCA-AVSR: MULTI-LAYER CROSS ATTENTION FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION
9248MLMTD: A MULTI-LAYER MALICIOUS TRAFFIC DETECTION MODEL BASED ON MULTI-BRANCH OCTAVE CONVOLUTION AND ATTENTION MECHANISM
6042MLPs Compass: What is learned when MLPs are combined with PLMs?
7784MMAFLOW: MATCHING-GUIDED MOTION AGGREGATION FOR OPTICAL FLOW ESTIMATION
10205MMBAT: A MULTI-TASK FRAMEWORK FOR MMWAVE-BASED HUMAN BODY RECONSTRUCTION AND TRANSLATION PREDICTION
4197mmCount: Stationary Crowd Counting System Based on Commodity Millimeter-wave Radar
6777MMHSV: A MULTIMODAL HANDWRITTEN SIGNATURE VERIFICATION FUSING DYNAMIC AND STATIC FEATURE
7247MMRBN: RULE-BASED NETWORK FOR MULTIMODAL EMOTION RECOGNITION
6884MMS: MORPHOLOGY-MIXUP STYLIZED DATA GENERATION FOR SINGLE DOMAIN GENERALIZATION IN MEDICAL IMAGE SEGMENTATION
8991MODAL CONSENSUS AND CONTEXTUAL SEPARATION FOR WEAKLY SUPERVISED TEMPORAL ACTION LOCALIZATION
7898MODALITY DROP-OUT FOR MULTIMODAL DEVICE DIRECTED SPEECH DETECTION USING VERBAL AND NON-VERBAL FEATURES
3042MODALITY RE-BALANCE FOR VISUAL QUESTION ANSWERING: A CAUSAL FRAMEWORK
2614Modality-dependent sentiments exploring for multi-modal sentiment classification
4765MODEL-BASED LABEL-TO-IMAGE DIFFUSION FOR SEMI-SUPERVISED CHOROIDAL VESSEL SEGMENTATION
1397MODEL-BASED LEARNING FOR LOCATION-TO-CHANNEL MAPPING
9758MODELING INTRAPERSONAL AND INTERPERSONAL INFLUENCES FOR AUTOMATIC ESTIMATION OF THERAPIST EMPATHY IN COUNSELING CONVERSATION
7030Modeling pseudo-speaker uncertainty in voice anonymization
7044MODELING QUASI-PERIODIC DEPENDENCY VIA SELF-SUPERVISED PRE-TRAINING FOR RESPIRATORY SOUND CLASSIFICATION
2099MODELING ROUTE REPRESENTATION WITH MIXED-SCALE HIERARCHICAL TRANSFORMER
11476MODELING THE IMPACT OF INTER-RATER DISAGREEMENT ON SLEEP STATISTICS USING DEEP GENERATIVE LEARNING
1778MODULO SAMPLING AND RECOVERY IN SHIFT-INVARIANT SPACES
4359MOMA: MIXTURE-OF-MODALITY-ADAPTATIONS FOR TRANSFERRING KNOWLEDGE FROM IMAGE MODELS TOWARDS EFFICIENT AUDIO-VISUAL ACTION RECOGNITION
6785MOMENTUM-IMBUED LANGEVIN DYNAMICS (MILD) FOR FASTER SAMPLING
11908MONAI FOR DEEP-LEARNING BASED CBCT RECONSTRUCTION
8096MONOSTATIC DMG PASSIVE SENSING WITH HYPOTHESIS TESTING
10418Monte Carlo Self-Training For Speech Recognition
5896MOS-FAD: IMPROVING FAKE AUDIO DETECTION VIA AUTOMATIC MEAN OPINION SCORE PREDICTION
8844MOSIC: MULTIMODAL SEMANTIC INTEGRATED COMMUNICATION FOR HEALTH MONITORING IN IOT SCENARIOS
2707MOSSFORMER2: COMBINING TRANSFORMER AND RNN-FREE RECURRENT NETWORK FOR ENHANCED TIME-DOMAIN MONAURAL SPEECH SEPARATION
7580Motif-Matching Based Sub-Braingraph Level Networks for Noisy Resting-State fMRI Analysis
6860MOTION LATENT DIFFUSION FOR STOCHASTIC TRAJECTORY PREDICTION
3524MOTION TRANSFER-DRIVEN INTRA-CLASS DATA AUGMENTATION FOR FINGER VEIN RECOGNITION
8631Motion-Tolerant Radar-based Heart Sound Detection
1694MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
9745MSFD: Multi-Scale Feature Distillation for Semantic Segmentation
7213MSFR: Stance Detection based on Multi-aspect Semantic Feature Representation via Hierarchical Contrastive Learning
3284MSG-BART: Multi-granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-grounded Dialogue Generation
8488MS-SENET: ENHANCING SPEECH EMOTION RECOGNITION THROUGH MULTI-SCALE FEATURE FUSION WITH SQUEEZE-AND-EXCITATION BLOCKS
2644MSSTNET: A MULTI-SCALE SPATIO-TEMPORAL CNN-TRANSFORMER NETWORK FOR DYNAMIC FACIAL EXPRESSION RECOGNITION
11907MST--: A MODIFICATION OF MST++ FOR NARROW DOMAIN HYPERSPECTRAL RECONSTRUCTION
8182MTA: A LIGHTWEIGHT MULTILINGUAL TEXT ALIGNMENT MODEL FOR CROSS-LANGUAGE VISUAL WORD SENSE DISAMBIGUATION
3059MTDIFFUSION: MULTI-TASK DIFFUSION MODEL WITH DUAL-UNET FOR FOLEY SOUND GENERATION
1617MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION
4105MTRGL: Effective Temporal Correlation Discerning through Multi-modal Temporal Relational Graph Learning
2717MULTI-AGENT 3D SEISMIC EXPLORATION USING ADAPT-THEN-COMBINE FULL WAVEFORM INVERSION IN A HARDWARE-IN-THE-LOOP SYSTEM
1351MULTI-AGENT EXPLORATION VIA SELF-LEARNING AND SOCIAL LEARNING
3829MULTI-AGENT SPARSE INTERACTION MODELING IS AN ANOMALY DETECTION PROBLEM
4014Multi-Antenna ISAC Receiver with n-Tuple Blind Deconvolution
6985Multi-Attention Enhanced Discriminator for GAN-Based Anomalous Sound Detection
6167Multi-band speech tensor decomposition for interactive feature extraction in early dysphagia screening
1184MULTI-BEAM MULTIPLEXING DESIGN WITH PHASE-ONLY EXCITATION BASED ON HYBRID BEAMFORMING ARCHITECTURES
7498Multicast Transmission Design with Enhanced DoF for MIMO Coded Caching Systems
2169MULTICAST WITH MULTIPLE WARDENS IN IRS-AIDED COVERT DFRC SYSTEM
2300MULTI-CHANNEL MOSRA: MEAN OPINION SCORE AND ROOM ACOUSTICS ESTIMATION USING SIMULATED DATA AND A TEACHER MODEL
2097MULTI-CMGAN+/+: LEVERAGING MULTI-OBJECTIVE SPEECH QUALITY METRIC PREDICTION FOR SPEECH ENHANCEMENT
3465Multi-dimension Queried and Interacting Network for Stereo Image Deraining
9654MULTI-DIMENSIONAL GEOMETRIC FEATURE-BASED CALIBRATION METHOD FOR LIDAR AND CAMERA FUSION
1551Multidimensional Scaling-Based TDOA Localization in Modified Polar Representation
4265Multi-dimensional Speech Quality Assessment in Crowdsourcing
9997Multi-grained Multimodal Interaction Network for Sentiment Analysis
7624MULTI-INTEREST LEARNING FOR MULTI-MODAL PAPER RECOMMENDATION
9323MULTI-LABEL ABNORMALITY CLASSIFICATION FROM 12-LEAD ECG USING A 2D RESIDUAL U-NET
8846MULTI-LAYER RELATION KNOWLEDGE DISTILLATION FOR FINGERPRINT RESTORATION
6316MULTI-LEVEL AUGMENTATION CONSISTENCY LEARNING AND SAMPLE SELECTION FOR SEMI-SUPERVISED DOMAIN GENERALIZATION
5449MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
4320MULTI-LEVEL GRAPH LEARNING FOR AUDIO EVENT CLASSIFICATION AND HUMAN-PERCEIVED ANNOYANCE RATING PREDICTION
9371MULTI-LEVEL SPATIAL-TEMPORAL FEATURE AGGREGATION AND ALIGNMENT-BASED SELECTIVE RESIDUAL DENSE PROPAGATION MODULE FOR HDR VIDEO RECONSTRUCTION
3579MULTI-LINEAR KERNEL REGRESSION AND IMPUTATION VIA MANIFOLD LEARNING: THE DYNAMIC MRI CASE
7802MULTILINGUAL AND FULLY NON-AUTOREGRESSIVE ASR WITH LARGE LANGUAGE MODEL FUSION: A COMPREHENSIVE STUDY
2129MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER
4061MULTILINGUAL DISTILWHISPER: EFFICIENT DISTILLATION OF MULTI-TASK SPEECH MODELS VIA LANGUAGE-SPECIFIC EXPERTS
3079MULTILINGUAL TRANSLITERATION FOR PAN-INDIC KEYBOARD INPUT
2733MULTI-MICROPHONE NOISE DATA AUGMENTATION FOR DNN-BASED OWN VOICE RECONSTRUCTION FOR HEARABLES IN NOISY ENVIRONMENTS
7398MULTIMODAL BREATHING RATE ESTIMATION USING FACIAL MOTION AND RPPG FROM RGB CAMERA
4226MULTI-MODAL CONTINUAL PRE-TRAINING FOR AUDIO ENCODERS
3195MULTI-MODAL EMOTION RECOGNITION USING MULTIPLE ACOUSTIC FEATURES AND DUAL CROSS-MODAL TRANSFORMER
8684MULTI-MODAL GPT-4 AIDED ACTION PLANNING AND REASONING FOR SELF-DRIVING VEHICLES
1676MULTIMODAL GRAPH-BASED AUDIO-VISUAL EVENT LOCALIZATION
7801MULTIMODAL IMAGING FEATURE EXTRACTION WITH REFERENCE CANONICAL CORRELATION ANALYSIS UNDERLYING INTELLIGENCE
6914MULTIMODAL MODELING FOR SPOKEN LANGUAGE IDENTIFICATION
5761Multimodal Multi-view Spectral-Spatial-Temporal Masked Autoencoder for Self-supervised Emotion Recognition
5106MULTIMODAL SENTIMENT ANALYSIS BASED ON 3D STEREOSCOPIC ATTENTION
10047MULTIMODAL SURVIVAL ENSEMBLE NETWORK: INTEGRATING GENOMIC AND HISTOPATHOLOGICAL INSIGHTS FOR ENHANCED CANCER PROGNOSIS
10075Multimodal Transformer Distillation for Audio-Visual Synchronization
4436MULTIMODAL TRANSFORMER WITH A LOW-COMPUTATIONAL-COST GUARANTEE
5979MULTI-MODALITY ACTION RECOGNITION BASED ON DUAL FEATURE SHIFT IN VEHICLE CABIN MONITORING
1442MULTI-MODALITY CONDITIONAL DIFFUSION MODEL FOR TIME SERIES FORECASTING OF LIVE SALES VOLUME
4592MULTI-MODALITY SPEECH RECOGNITION DRIVEN BY BACKGROUND VISUAL SCENES
8229MULTI-MODEL WIRELESS FEDERATED LEARNING WITH DOWNLINK BEAMFORMING
5830MULTI-OBJECT EDITING IN PERSONALIZED TEXT-TO-IMAGE DIFFUSION MODEL VIA SEGMENTATION GUIDANCE
8848MULTI-OBJECT TRACKING FOR UNMANNED AERIAL VEHICLES BASED ON MULTI-FRAME FEATURE FUSION
8385MULTI-OBJECTIVE PROGRESSIVE CLUSTERING FOR SEMI-SUPERVISED DOMAIN ADAPTATION IN SPEAKER VERIFICATION
4706MULTI-PERSON RESPIRATION RATE ESTIMATION WITH SINGLE PAIR OF TRANSMIT AND RECEIVE ANTENNA
4423MULTIPLE OBJECT TRACKING BASED ON OCCLUSION-AWARE EMBEDDING CONSISTENCY LEARNING
1583MULTIPLE PLAYER TRACKING WITH 3D PROJECTION AND SPATIO-TEMPORAL INFORMATION IN MULTI-VIEW SPORTS VIDEOS
2072MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS
9956MULTI-RATE VARIABLE-LENGTH CSI COMPRESSION FOR FDD MASSIVE MIMO
6268MULTI-RELATIONAL GRAPH DIFFUSION NEURAL NETWORK WITH PARALLEL RETENTION FOR STOCK TRENDS CLASSIFICATION
5023Multiscale Attention Distillation for Object Detection
2703MULTISCALE AUGMENTED NORMALIZING FLOWS FOR IMAGE COMPRESSION
11449MULTISCALE COARSE-TO-FINE GUIDED SCREENSHOT DEMOIRÉING
3228MULTI-SCALE FUSION OF GATED NEIGHBORHOOD ATTENTION TRANSFORMERS FOR SINGLE IMAGE DERAINING
7003MULTISCALE MATCHING DRIVEN BY CROSS-MODAL SIMILARITY CONSISTENCY FOR AUDIO-TEXT RETRIEVAL
9260MULTI-SCALE PERMUTATION ENTROPY FOR AUDIO DEEPFAKE DETECTION
2332MULTISCALE SCORING MODEL FOR ENHANCED URBAN PERCEPTION EVALUATION
11495Multi-Scale Spectral Loss Revisited
3580MULTI-SCALE SUB-BAND CONSTANT-Q TRANSFORM DISCRIMINATOR FOR HIGH-FIDELITY VOCODER
6829MULTI-SENSOR MULTI-SCAN RADAR SENSING OF MULTIPLE EXTENDED TARGETS
2513Multi-Signal Fusion of Social Diffusion Graph with Bi-directional Semantic Consistency
2141Multi-source DOA estimation with statistical coverage guarantees
2945Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning
7893MULTI-SOURCE DOMAIN ADAPTATION WITH TRANSFORMER-BASED FEATURE GENERATION FOR SUBJECT-INDEPENDENT EEG-BASED EMOTION RECOGNITION
2367Multi-Source Domain Generalization for ECG-based Cognitive Load Estimation: Adversarial Invariant and Plausible Uncertainty Learning
3960Multi-Source Dynamic Interactive Network Collaborative Reasoning Image Captioning
2176Multi-Source Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition
3694MULTI-SPEAKER LOCALIZATION IN THE CIRCULAR HARMONIC DOMAIN ON SMALL APERTURE MICROPHONE ARRAYS USING DEEP CONVOLUTIONAL NETWORKS
10443Multispectral Filter Array Design by Optimal Sphere Packing
7927MULTISPECTRAL RF IMAGING USING MULTIPLE NARROW-BAND FMCW SIGNALS
8668MULTI-STAGE CONTRASTIVE REGRESSION FOR ACTION QUALITY ASSESSMENT
8721MULTI-STAGE LEARNING FOR RADAR PULSE ACTIVITY SEGMENTATION
9707MULTI-STAGE PROGRESSIVE REFINEMENT AND ROI CONTEXT ENHANCEMENT NETWORK FOR SMALL LOGO DETECTION
11878MULTI-STAGE TRAINING FOR CROSS-DOMAIN FULL-BAND AUDIO PACKET LOSS CONCEALMENT
1486Multistatic passive detection of cyclostationary signals
11502Multi-stream Acoustic Modelling using Raw Real and Imaginary Parts of the Fourier Transform
10314MULTITARGET TRACKING IN THE PRESENCE OF VELOCITY AMBIGUITY FOR AUTOMOTIVE RADAR
10157MULTI-TASK CASCADED ATTENTION NETWORK FOR BRAIN TUMOR SEGMENTATION AND CLASSIFICATION
4375MULTITASK CLASSIFICATION OF ANTIMICROBIAL PEPTIDES FOR SIMULTANEOUS ASSESSMENT OF ANTIMICROBIAL PROPERTY AND STRUCTURAL FOLD
4328MULTI-TASK LEARNING FOR FRONT-END TEXT PROCESSING IN TTS
5284MULTI-TASK PSEUDO-LABEL LEARNING FOR NON-INTRUSIVE SPEECH QUALITY ASSESSMENT MODEL
3693MULTI-TASK SELF-SUPERVISED LEARNING FOR MEDICAL IMAGE SEGMENTATION
9419MULTITASK SPEECH RECOGNITION AND SPEAKER CHANGE DETECTION FOR UNKNOWN NUMBER OF SPEAKERS
2696MULTI-TEACHER DISTILLATION FOR INCREMENTAL OBJECT DETECTION
8667MULTIVARIATE DENSITY ESTIMATION USING LOW-RANK FEJER-RIESZ FACTORIZATION
2195Multivariate Fourier Distribution Perturbation: Domain Shifts with Uncertainty in Frequency Domain
3241MULTIVARIATE TIME SERIES FORECASTING WITH CAUSAL-TEMPORAL ATTENTION NETWORK
2786MULTI-VIEW INTERACTIVE COMPROMISE LEARNING FOR GROUP RECOMMENDATION
6478MULTI-VIEW MIDIVAE: FUSING TRACK- AND BAR-VIEW REPRESENTATIONS FOR LONG MULTI-TRACK SYMBOLIC MUSIC GENERATION
1840MULTI-VIEW SPEAKER EMBEDDING LEARNING FOR ENHANCED STABILITY AND DISCRIMINABILITY
4979MULTI-VIEW SPECTROGRAM TRANSFORMER FOR RESPIRATORY SOUND CLASSIFICATION
5130Multi-view Subspace Clustering with Consensus Graph Contrastive Learning
6471MULTIWAY-ADAPTER: ADAPTING MULTIMODAL LARGE LANGUAGE MODELS FOR SCALABLE IMAGE-TEXT RETRIEVAL
6789MULTI-WEATHER DEGRADATION-AWARE TRANSFORMER FOR IMAGE RESTORATION
3437MUSIC AUTO-TAGGING WITH ROBUST MUSIC REPRESENTATION LEARNED VIA DOMAIN ADVERSARIAL TRAINING
11942MUSIC ENHANCEMENT WITH DEEP FILTERS: A TECHNICAL REPORT FOR THE ICASSP 2024 CADENZA CHALLENGE
4106MUSIC SOURCE SEPARATION BASED ON A LIGHTWEIGHT DEEP LEARNING FRAMEWORK (DTTNET: DUAL-PATH TFC-TDF UNET)
3058MUSIC SOURCE SEPARATION WITH BAND-SPLIT ROPE TRANSFORMER
1482MUSIC UNDERSTANDING LLAMA: ADVANCING TEXT-TO-MUSIC GENERATION WITH QUESTION ANSWERING AND CAPTIONING
7866MUSICLDM: ENHANCING NOVELTY IN TEXT-TO-MUSIC GENERATION USING BEAT-SYNCHRONOUS MIXUP STRATEGIES
3920MUSIC-TO-DANCE POSES: LEARNING TO RETRIEVE DANCE POSES FROM MUSIC
1926MuSR: Multi-Scale 3D Scenes Reconstruction based on Monocular Video
7235MUTUAL INFORMATION ASSISTED GRAPH CONVOLUTION NETWORK FOR COLD-START RECOMMENDATION
8722Mutual information based Noise Scale optimization for Gradient Leakage Resistant Federated Learning
1101MUTUAL INFORMATION-BASED FAIR ACTIVE LEARNING
1431MUTUALITY ATTRIBUTE MAKES BETTER VIDEO ANOMALY DETECTION
1544MUTUALREG: MUTUAL LEARNING FOR UNSUPERVISED MEDICAL IMAGE REGISTRATION
3293MVITP: MULTI-VIEW IMAGE-TEXT PERCEPTION FOR FEW-SHOT REMOTE SENSING IMAGE CLASSIFICATION
7289NAC: MITIGATING NOISY CORRESPONDENCE IN CROSS-MODAL MATCHING VIA NEIGHBOR AUXILIARY CORRECTOR
2022NATURAL LANGUAGE SUPERVISION FOR GENERAL-PURPOSE AUDIO REPRESENTATIONS
7699NEAR-FIELD LOCALIZATION WITH 1-BIT QUANTIZED HYBRID A/D RECEPTION
9771NEAR-FIELD MIMO CHANNEL RECONSTRUCTION VIA LIMITED GEOMETRY FEEDBACK
7512NEAR-FIELD NEURAL RENDERING GUIDED BY SINGLE-SHOT PHOTOMETRIC STEREO
9253NEBNET:EXPLOITING NODE-EDGE BI-LEVEL NETWORK FOR GENE EXPRESSION PREDICTION
10272Neighborhood-Enhanced Multimodal Collaborative Filtering for Item Cold Start Recommendation
4699NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis
1660NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
3977NERI: IMPLICIT NEURAL REPRESENTATION OF LIDAR POINT CLOUD USING RANGE IMAGE SEQUENCE
4277NEURAL AMBISONICS ENCODING FOR COMPACT IRREGULAR MICROPHONE ARRAYS
9378NEURAL CONCATENATIVE SINGING VOICE CONVERSION: RETHINKING CONCATENATION-BASED APPROACH FOR ONE-SHOT SINGING VOICE CONVERSION
1683Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification
9228NEURAL NETWORK-BASED SYMBOLIC REGRESSION FOR EMPIRICAL MODELING OF THE BEHAVIOR OF A PLANETARY GEARBOX
4784NEURAL NETWORK-BASED VIRTUAL MICROPHONE ESTIMATION WITH VIRTUAL MICROPHONE AND BEAMFORMER-LEVEL MULTI-TASK LOSS
9761Neural Ordinary differential equations with Trainable solvers
7056NEURAL SPEAKER DIARIZATION USING MEMORY-AWARE MULTI-SPEAKER EMBEDDING WITH SEQUENCE-TO-SEQUENCE ARCHITECTURE
7812Neural Stochastic Differential Equations with Change Points: A Generative Adversarial Approach
8741NEURAL2SPEECH: A TRANSFER LEARNING FRAMEWORK FOR NEURAL-DRIVEN SPEECH RECONSTRUCTION
6667NEUROHEED+: IMPROVING NEURO-STEERED SPEAKER EXTRACTION WITH JOINT AUDITORY ATTENTION DETECTION
7762NEUROMORPHIC SENSING MEETS UNLIMITED SAMPLING
8791NEW INTENT DISCOVERY WITH MULTI-VIEW CLUSTERING
9186NEWTONALIZED ORTHOGONAL MATCHING PURSUIT FOR MIXED FAR-FIELD AND NEAR-FIELD SOURCE LOCALIZATION
5282NEXT-TDNN: MODERNIZING MULTI-SCALE TEMPORAL CONVOLUTION BACKBONE FOR SPEAKER VERIFICATION
7136NIIRF: NEURAL IIR FILTER FIELD FOR HRTF UPSAMPLING AND PERSONALIZATION
7828NLSIT: A NON-LOCAL STEREO INTERACTION TRANSFORMER FOR STEREO IMAGE SUPER-RESOLUTION
7573NOISE MASKING ATTACKS AND DEFENSES FOR PRETRAINED SPEECH MODELS
4046NOISE2ONE: ONE-SHOT IMAGE DENOISING WITH LOCAL IMPLICIT LEARNING
9122Noise-Aware Speech Separation with Contrastive Learning
4541NOISE-BERT: A UNIFIED PERTURBATION-ROBUST FRAMEWORK WITH NOISE ALIGNMENT PRE-TRAINING FOR NOISY SLOT FILLING TASK
3763NOISE-DISENTANGLED GRAPH CONTRASTIVE LEARNING VIA LOW-RANK AND SPARSE SUBSPACE DECOMPOSITION
9500NOISE-RESISTANT GRAPH NEURAL NETWORK FOR NODE CLASSIFICATION
7440NOISE-ROBUST DSP-ASSISTED NEURAL PITCH ESTIMATION WITH VERY LOW COMPLEXITY
6788NOISE-ROBUST ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS CONDITIONED ON SELF-SUPERVISED SPEECH-REPRESENTATION MODEL WITH ADAPTERS
8038NOISY IMAGE RESTORATION BASED ON CONDITIONAL ACCELERATION SCORE APPROXIMATION
3333NOISY-ARCMIX: ADDITIVE NOISY ANGULAR MARGIN LOSS COMBINED WITH MIXUP FOR ANOMALOUS SOUND DETECTION
3029NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
7117NOMAD: UNSUPERVISED LEARNING OF PERCEPTUAL EMBEDDINGS FOR SPEECH ENHANCEMENT AND NON-MATCHING REFERENCE AUDIO QUALITY ASSESSMENT
8623Non Commutative Convolutional Signal Models in Neural Networks: Stability to Small Deformations
3435NONASYMPTOTIC PERFORMANCE LIMITS OF LOW-LATENCY SECURE INTEGRATED SENSING AND COMMUNICATION SYSTEMS
1817NON-INTRUSIVE SPEECH INTELLIGIBILITY PREDICTION FOR HEARING-IMPAIRED USERS USING INTERMEDIATE ASR FEATURES AND HUMAN MEMORY MODELS
5583NON-INTRUSIVE SPEECH QUALITY ASSESSMENT WITH MULTI-TASK LEARNING BASED ON TENSOR NETWORK
6552NON-ITERATIVE PYRAMID NETWORK FOR UNSUPERVISED DEFORMABLE MEDICAL IMAGE REGISTRATION
11548Nonlinear Graph Wavelets via Medianfication
4074NONLINEARITY DETECTION AND COMPENSATION FOR EEG-BASED SPEECH TRACKING
10189NON-STATIONARY BANDITS WITH PERIODIC BEHAVIOR: HARNESSING RAMANUJAN PERIODICITY TRANSFORMS TO CONQUER TIME-VARYING CHALLENGES
1042Non-uniform Frequency Spacing for Regularization-free Gridless DOA
2572NORMALIZATION IS ALL YOU NEED: ROBUST FULL-RANGE CONTACTLESS SPO2 ESTIMATION ACROSS USERS
2080NPRF: NEURAL PAINTED RADIOSITY FIELDS FOR NEURAL IMPLICIT RENDERING AND SURFACE RECONSTRUCTION
5770NTT SPEAKER DIARIZATION SYSTEM FOR CHIME-7: MULTI-DOMAIN, MULTI-MICROPHONE END-TO-END AND VECTOR CLUSTERING DIARIZATION
4964Nuclear-norm Maximization for Low-Rank Updates
6660NUV-DOA: NUV PRIOR-BASED BAYESIAN SPARSE RECONSTRUCTION WITH SPATIAL FILTERING FOR SUPER-RESOLUTION DOA ESTIMATION
5088NWS: NATURAL TEXTUAL BACKDOOR ATTACKS VIA WORD SUBSTITUTION
4451OADAS: OPTIMIZING GLOBAL PERTURBATION ATTACKS WITH DUAL-PATH ATTRIBUTION SYNERGY
8795Object Correlation Matrix For Two-Stage Object Detection Network
9700OBJECT DETECTION ORIENTED PRIVACY-PRESERVING FRAME-LEVEL VIDEO ANOMALY DETECTION
7611Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion
1791OBJECT-CONDITIONED BAG OF INSTANCES FOR FEW-SHOT PERSONALIZED INSTANCE RECOGNITION
5288ODAQ: OPEN DATASET OF AUDIO QUALITY
8870OFDM WAVEFORM DESIGN WITH GOOD CORRELATION LEVEL AND PEAK-TO-MEAN ENVELOPE POWER RATIO FOR THE JOINT MIMO RADAR AND COMMUNICATIONS
3341OFFLINE REINFORCEMENT LEARNING BASED ON NEXT STATE SUPERVISION
1967OFFLINE REINFORCEMENT LEARNING WITH GENERATIVE ADVERSARIAL NETWORKS AND UNCERTAINTY ESTIMATION
1207OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
5430OLKAVS: AN OPEN LARGE-SCALE KOREAN AUDIO-VISUAL SPEECH DATASET
5200OMNIDIRECTIONAL MULTI-ROTOR AERIAL VEHICLE POSE OPTIMIZATION: A NOVEL APPROACH TO PHYSICAL LAYER SECURITY
7277ON ESTIMATING LINK PREDICTION UNCERTAINTY USING STOCHASTIC CENTERING
5277ON FINE-TUNING PRE-TRAINED SPEECH MODELS WITH EMA-TARGET SELF-SUPERVISED LOSS
7894On Generalized Signature Graphs
4929ON HRTF NOTCH FREQUENCY PREDICTION USING ANTHROPOMETRIC FEATURES AND NEURAL NETWORKS
7182ON IMPROVED DISTRIBUTED RANDOM RESHUFFLING OVER NETWORKS
10447ON MEASURES OF UNCERTAINTY IN CLASSIFICATION
10050ON OPTIMIZING TIMESTEPS OF AN EDM BASED DIFFUSION SAMPLING PROCEDURE
2262ON REAL-TIME MULTI-STAGE SPEECH ENHANCEMENT SYSTEMS
6973ON THE CHOICE OF THE OPTIMAL TEMPORAL SUPPORT FOR AUDIO CLASSIFICATION WITH PRE-TRAINED EMBEDDINGS
10449ON THE CONTRACTIVITY OF PLUG-AND-PLAY OPERATORS
8480On the Convergence of Hierarchical Federated Learning with Gradient Quantization and Imperfect Transmission
5784ON THE CONVERGENCE OF SINGLE-TIMESCALE MULTI-SEQUENCE STOCHASTIC APPROXIMATION WITHOUT FIXED POINT SMOOTHNESS
3067ON THE DESIGN OF PLANAR DIFFERENTIAL MICROPHONE ARRAYS WITH SPECIFIED BEAMWIDTH OR SIDELOBE LEVEL
4161ON THE EFFECT OF DATA-AUGMENTATION ON LOCAL EMBEDDING PROPERTIES IN THE CONTRASTIVE LEARNING OF MUSIC AUDIO REPRESENTATIONS
3450ON THE EQUIVALENCE OF DYNAMIC MODE DECOMPOSITION AND COMPLEX NONNEGATIVE MATRIX FACTORIZATION
10435ON THE ESTIMATION OF TSALLIS ENTROPY AND A NOVEL INFORMATION MEASURE BASED ON ITS PROPERTIES
6996ON THE GENERALIZATION ERROR OF BYZANTINE-RESILIENT DECENTRALIZED LEARNING
8243ON THE IMPORTANCE OF NEURAL WIENER FILTER FOR RESOURCE EFFICIENT MULTICHANNEL SPEECH ENHANCEMENT
2145ON THE OPEN PROMPT CHALLENGE IN CONDITIONAL AUDIO GENERATION
8895ON THE PRIVACY OF FEDERATED CLUSTERING: A CRYPTOGRAPHIC VIEW
9549ON THE RELATION BETWEEN INTERNAL LANGUAGE MODEL AND SEQUENCE DISCRIMINATIVE TRAINING FOR NEURAL TRANSDUCERS
8808ON THE RESILIENCE OF ONLINE FEDERATED LEARNING TO MODEL POISONING ATTACKS THROUGH PARTIAL SHARING
6053ON THE ROLE OF ROOM ACOUSTICS IN AUDIO PRESENTATION ATTACK DETECTION
1782ON THE TRADEOFF BETWEEN PRIVACY PRESERVATION AND BYZANTINE-ROBUSTNESS IN DECENTRALIZED LEARNING
7825ON TIME-ENCODED SAMPLING FOR MULTIGENERATOR SHIFT INVARIANT SPACES
11454On Training Speech Separation Models With Various Numbers of Speakers
9887ON UNIQUE LOCALIZATION OF UNCORRELATED CONSTANT-MODULUS SOURCES USING SPARSE LINEAR ARRAYS
4649ON-DEVICE CONSTRAINED SELF-SUPERVISED LEARNING FOR KEYWORD SPOTTING VIA QUANTIZATION AWARE PRE-TRAINING AND FINE-TUNING
7452ONE MODEL TO RULE THEM ALL ? TOWARDS END-TO-END JOINT SPEAKER DIARIZATION AND SPEECH RECOGNITION
7061ONE-BIT QUANTIZATION ROBUST TO ANGLE-OF-ARRIVALS FOR UNIFORM LINEAR ANTENNA ARRAY
5599ONE-CLASS KNOWLEDGE DISTILLATION FOR SPOOFING SPEECH DETECTION
6439ONE-EPOCH TRAINING WITH SINGLE TEST SAMPLE IN TEST TIME FOR BETTER GENERALIZATION OF COUGH-BASED COVID-19 DETECTION MODEL
5829ONE-SHOT SENSITIVITY-AWARE MIXED SPARSITY PRUNING FOR LARGE LANGUAGE MODELS
2798One-stage Deep Stereo Network
2585ONE-STAGE TRAINING GENERATIVE PARADIGM FOR GENERALIZED ZERO-SHOT LEARNING
10107One-Step Late Fusion Multi-view Clustering with Compressed Subspace
4250Online Auditing of Information Flow
5529Online Caching with Switching Cost and Operational Long-term Constraints: An Online Learning Approach
7964ONLINE MOUSE BEHAVIOR DETECTION BY HISTORICAL DEPENDENCY AND TYPICAL INSTANCES
6084ONLINE SPEAKER DIARIZATION OF MEETINGS GUIDED BY SPEECH SEPARATION
3552ONLINE TARGET SOUND EXTRACTION WITH KNOWLEDGE DISTILLATION FROM PARTIALLY NON-CAUSAL TEACHER
4642Open-set DeepFake Detection to fight the Unknown
2492OPENTE: OPEN-STRUCTURE TABLE EXTRACTION FROM TEXT
7122Open-vocabulary Keyword-spotting with Adaptive Instance Normalization
3426Open-Vocabulary Skeleton Action Recognition with Diffusion Graph Convolutional Network and Pre-Trained Vision-Language Models
6178OPINE: Leveraging A Optimization-Inspired Deep Unfolding Method for Multi-channel Speech Enhancement
1081OPNet: Deep Occlusion Perception Network with Boundary Awareness for Amodal Instance Segmentation
5899OPTIMAL ANN-SNN CONVERSION WITH GROUP NEURONS
8361OPTIMAL BEAMFORMING STRUCTURE FOR RATE SPLITTING MULTIPLE ACCESS
3430OPTIMAL BER MINIMUM PRECODER DESIGN FOR OTFS-BASED ISAC SYSTEMS
8987OPTIMAL STRUCTURE OF RECEIVE BEAMFORMING FOR OVER-THE-AIR COMPUTATION
9081Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks
3292OPTIMIZING k IN kNN GRAPHS WITH GRAPH LEARNING PERSPECTIVE
11863OPTIMIZING MUSIC SOURCE SEPARATION IN COMPLEX AUDIO ENVIRONMENTS THROUGH PROGRESSIVE SELF-KNOWLEDGE DISTILLATION
7554Optimizing Synchronization Delay for Digital Twin over Wireless Networks
7325Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning
11473OSIN: OBJECT-CENTRIC SCENE INFERENCE NETWORK FOR UNSUPERVISED VIDEO ANOMALY DETECTION
11464Outlier Censoring via Block Sparse Learning
5508OUTLIER-ROBUST FEATURE SELECTION WITH L2,1-NORM MINIMIZATION AND GROUP ROW-SPARSITY INDUCED CONSTRAINTS
4104OUT-OF-DISTRIBUTION DETECTION FOR LEARNING-BASED CHEST X-RAY DIAGNOSIS
8514P2DT: MITIGATING FORGETTING IN TASK-INCREMENTAL LEARNING WITH PROGRESSIVE PROMPT DECISION TRANSFORMER
3131PaCaS-WAA: Patch-based Contrastive Semi-supervised Learning with Wavelet Guidance and Adaptive Augmentation for Tumour Segmentation
9352PANORAMIC IMAGE INPAINTING WITH GATED CONVOLUTION AND CONTEXTUAL RECONSTRUCTION LOSS
2522PARALINGUISTICS-ENHANCED LARGE LANGUAGE MODELING OF SPOKEN DIALOGUE
5527PARALLEL AUGMENTATION AND DUAL ENHANCEMENT FOR OCCLUDED PERSON RE-IDENTIFICATION
7737PARAMETER EFFICIENT AUDIO CAPTIONING WITH FAITHFUL GUIDANCE USING AUDIO-TEXT SHARED LATENT REPRESENTATION
4724Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
11540PARAMETER ESTIMATION PROCEDURES FOR DEEP MULTI-FRAME MVDR FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
4272Parameter Estimation via Expectation Maximization - Expectation Consistent Algorithm
4263PARAMETER-EFFICIENT ADAPTATION FOR COMPUTATIONAL IMAGING
6716Pareto Graph Self-Supervised Learning
7632PARODY DETECTION USING SOURCE-TARGET ATTENTION WITH TEACHER-FOURCED LYRICS
1407PART REPRESENTATION LEARNING WITH TEACHER-STUDENT DECODER FOR OCCLUDED PERSON RE-IDENTIFICATION
9555PARTIAL CONVOLUTIONAL BASED-RADIO MAP RECONSTRUCTION FOR URBAN ENVIRONMENTS WITH INACCESSIBLE AREAS
3848PARTIALLY OBSERVABLE MODEL-BASED LEARNING FOR ISAC RESOURCE ALLOCATION
7180PASTE AND HARMONIZE VIA DENOISING: SUBJECT-DRIVEN IMAGE EDITING WITH FROZEN PRE-TRAINED DIFFUSION MODEL
7080Patch Inherent Feature Guided Mask Selection for Image Compression
9039PATCH-LEVEL KNOWLEDGE DISTILLATION AND REGULARIZATION FOR MISSING MODALITY MEDICAL IMAGE SEGMENTATION
2425Patch-wise Augmentation for Anomaly Detection and Localization
7892PATIENT-ADAPTIVE AND LEARNED MRI DATA UNDERSAMPLING USING NEIGHBORHOOD CLUSTERING
11882PATIENT-SPECIFIC MODELING OF DAILY ACTIVITY PATTERNS FOR UNSUPERVISED DETECTION OF PSYCHOTIC AND NON-PSYCHOTIC RELAPSES
9772PAVITS: EXPLORING PROSODY-AWARE VITS FOR END-TO-END EMOTIONAL VOICE CONVERSION
3602PECER: EMPATHETIC RESPONSE GENERATION VIA DYNAMIC PERSONALITY EXTRACTION AND CONTEXTUAL EMOTIONAL REASONING
7142PECR: PARAMETER-EFFICIENT TRANSFER LEARNING WITH CROSS-MODAL REPRESENTATION LEARNING FOR REMOTE SENSING VISUAL QUESTION ANSWERING
10442PENDANTSS: PEnalized Norm-Ratios Disentangling Additive Noise, Trend and Sparse Spikes
7727PERCEIVING MULTI-LAYER REPRESENTATIONS FOR NO-REFERENCE IMAGE QUALITY ASSESSMENT
9460PERCEPTUAL QUALITY EVALUATION FOR FASTER PLAYBACK VIDEOS
7563PERCEPTUALLY-MOTIVATED SPATIAL AUDIO CODEC FOR HIGHER-ORDER AMBISONICS COMPRESSION
7476PERFORMANCE AND ENERGY BALANCE: A COMPREHENSIVE STUDY OF STATE-OF-THE-ART SOUND EVENT DETECTION SYSTEMS
1322PERFORMANCE CONDITIONING FOR DIFFUSION-BASED MULTI-INSTRUMENT MUSIC SYNTHESIS
4832PERIOCULAR BIOMETRICS ENHANCEMENT THROUGH MULTIMODAL EMBEDDINGS AND CLASSIFIER ADAPTATION
10206PERIODGRAD: TOWARDS PITCH-CONTROLLABLE NEURAL VOCODER BASED ON A DIFFUSION PROBABILISTIC MODEL
3813Permutation-alignment method using manifold optimization for frequency-domain blind source separation
5955PERSONA EXTRACTION THROUGH SEMANTIC SIMILARITY FOR EMOTIONAL SUPPORT CONVERSATION GENERATION
11926PERSONALISED ANOMALY DETECTORS AND PROTOTYPICAL REPRESENTATIONS FOR RELAPSE DETECTION FROM WEARABLE-BASED DIGITAL PHENOTYPING
1861PERSONALIZATION OF CTC-BASED END-TO-END SPEECH RECOGNITION USING PRONUNCIATION-DRIVEN SUBWORD TOKENIZATION
7654Personalized Federated Learning with Attention-based Client Selection
6534PERSONALIZED LOCAL DIFFERENTIALLY PRIVATE FEDERATED LEARNING WITH ADAPTIVE CLIENT SAMPLING
7026PERSONALIZED NEURAL SPEECH CODEC
7254PERSONALIZED OVER-THE-AIR FEDERATED LEARNING WITH PERSONALIZED RECONFIGURABLE INTELLIGENT SURFACES
9692PFCF-NET: A NETWORK BASED ON PROGRESSIVE FEATURE INTERACTION AND CROSS-SCALE FEATURE FUSION FOR REMOTE SENSING CHANGE DETECTION
6961PFDM: Parser-Free Virtual Try-on via Diffusion Model
6854PHASE CONTINUITY-AWARE SELF-ATTENTIVE RECURRENT NETWORK WITH ADAPTIVE FEATURE SELECTION FOR ROBUST VAD
8473PHASE LEARNING BASED ON INTERACTIVE PERCEPTION FOR LIMITED-SAMPLE RESIDENTIAL AREA SEMANTIC SEGMENTATION
9853PHASE RECONSTRUCTION IN SINGLE CHANNEL SPEECH ENHANCEMENT BASED ON PHASE GRADIENTS AND ESTIMATED CLEAN-SPEECH AMPLITUDES
5621Phase Retrieval by Tensor Total Least Squares
8349PHASE-SPACE-GUIDED DEEP LEARNING FOR TIME SERIES FORECASTING
7528PHISANET: PHONETICALLY INFORMED SPEECH ANIMATION NETWORK
3643PHONEME-AWARE ENCODING FOR PREFIX-TREE-BASED CONTEXTUAL ASR
9357Photovoltaic power forecasting using sky images and sun motion
7986PhyOT: Physics-informed object tracking in surveillance cameras
7354Physically-constrained block-term tensor decomposition for polarimetric image recovery
10448PHYSICS-GUIDED DEEP SCATTER ESTIMATION BY WEAK SUPERVISION FOR QUANTITATIVE SPECT
7718PHYSICS-GUIDED VARIATIONAL GRAPH AUTOENCODER FOR AIR QUALITY INFERENCE
8103PIANO TRANSCRIPTION WITH HARMONIC ATTENTION
9195PILOT LENGTH MINIMIZATION VIA AP-UE CLUSTERING IN CELL-FREE SYSTEMS
7242PIXEL-SUPERPIXEL CONTRASTIVE LEARNING AND PSEUDO-LABEL CORRECTION FOR HYPERSPECTRAL IMAGE CLUSTERING
4255PJSCC: A PUNCTURING-BASED JOINT SOURCE CHANNEL CODING SCHEME WITH HIERARCHICAL DOWN-SAMPLING LAYER
5337PLS: UNSUPERVISED DOMAIN ADAPTATION FOR 3D OBJECT DETECTION VIA PSEUDO-LABEL SIZES
6350Plug-and-Play Algorithm coupled with Low-Rank Quadratic Envelope Regularization for Compressive Spectral Imaging
8926PLUG-AND-PLAY MVDR BEAMFORMING FOR SPEECH SEPARATION
8203PMDI: COMBINING PARAMETRIC-MODEL AND DEPTH-AWARE IMPLICIT FUNCTION FOR SINGLE-VIEW HUMAN RECONSTRUCTION
1129PMMWDECONV: UNSUPERVISED DATA-CONSISTENT BLIND PASSIVE MILLIMETER-WAVE IMAGE DECONVOLUTION WITH GLOBAL CONTEXT PRIORS
4655PN-DetX: A Dedicated Framework for Pulmonary Nodule Detection in X-ray Images
6972POISONING-FREE DEFENSE AGAINST BLACK-BOX MODEL EXTRACTION
10042PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
2520POLARDB: FORMULA-DRIVEN DATASET FOR PRE-TRAINING TRAJECTORY ENCODERS
6506POLITICAL TWEET SENTIMENT ANALYSIS FOR PUBLIC OPINION POLLING
11451PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation
1920POSE-HMR: HEURISTIC TRANSFORMER WITH POSTURAL PRIOR CONSTRAINTS FOR 3D HUMAN MESH RECONSTRUCTION
7411POSITION-AWARE ACTIVE LEARNING FOR MULTI-MODAL ENTITY ALIGNMENT
9264POSITIVE TRANSFER OF THE WHISPER SPEECH TRANSFORMER TO HUMAN AND ANIMAL VOICE ACTIVITY DETECTION
8990Posterior Sampling Algorithms for Unsupervised Speech Enhancement with Recurrent Variational Autoencoder
7153POSTERIOR VARIANCE-PARAMETERISED GAUSSIAN DROPOUT: IMPROVING DISENTANGLED SEQUENTIAL AUTOENCODERS FOR ZERO-SHOT VOICE CONVERSION
4387POST-TRAINING EMBEDDING ALIGNMENT FOR DECOUPLING ENROLLMENT AND RUNTIME SPEAKER RECOGNITION MODELS
3608POWER-AWARE TASK-BASED LEARNING OF NEUROMORPHIC ADCS
8933PRACTICAL CHALLENGE AND SOLUTION FOR IRS-AIDED INDOOR LOCALIZATION SYSTEM
1188Predict and Interpret Health Risk using EHR through Typical Patients
1252PREDICTING ADVERSE EVENTS FOR PATIENTS WITH TYPE-1 DIABETES VIA SELF-SUPERVISED LEARNING
3275PREDICTING FALL EVENTS BY A SPATIO-TEMPORAL TOPOLOGICAL NETWORK WITH MULTIPLE WEARABLE SENSORS
9462PREDICTING RTMS TREATMENT EFFECTS USING OPEN-LOOP CONTROL AND NEURAL MANIFOLD
4809PREDICTION-CORRECTION LINE SEGMENT DETECTION
3501PRE-ECHO REDUCTION IN TRANSFORM AUDIO CODING VIA TEMPORAL ENVELOPE CONTROL WITH MACHINE LEARNING BASED ESTIMATION
3205PRE-POST INTERACTION LEARNING FOR BRAIN TUMOR SEGMENTATION WITH MISSING MRI MODALITIES
6656PRE-TRAINED ACOUSTIC-AND-TEXTUAL MODELING FOR END-TO-END SPEECH-TO-TEXT TRANSLATION
7891PRIORITIZING DATA ACQUISITION FOR END-TO-END SPEECH MODEL IMPROVEMENT
2330PRIVACY LEAKAGE IN GRAPH SIGNAL TO GRAPH MATCHING PROBLEMS
7767PRIVACY PRESERVING FEDERATED LEARNING FROM MULTI-INPUT FUNCTIONAL PROXY RE-ENCRYPTION
4709PRIVACY PRESERVING GAZE ESTIMATION VIA FEDERATED LEARNING ADAPTED TO EGOCENTRIC VIDEO
5131PRIVACY-AWARE JOINT SOURCE-CHANNEL CODING FOR IMAGE TRANSMISSION BASED ON DISENTANGLED INFORMATION BOTTLENECK
9114PRIVACY-PRESERVING ATTENTION-WEIGHTED MULTI-SOURCE DOMAIN ADAPTATION FOR EEG MOTOR IMAGERY
3997PRIVACY-PRESERVING DEEP LEARNING USING DEFORMABLE OPERATORS FOR SECURE TASK LEARNING
3365PRIVACY-PRESERVING DISTRIBUTED OPTIMISATION USING STOCHASTIC PDMM
2790ProAug: Prototype-Based Augmentation for Long-Tailed Image Classification
7250PROBABILISTIC SIMPLEX COMPONENT ANALYSIS VIA VARIATIONAL AUTO-ENCODING
3084PROBABILISTIC SPIKE TRAIN INFERENCE
9538PROBABILITY-AWARE WORD-CONFUSION-NETWORK-TO-TEXT ALIGNMENT APPROACH FOR INTENT CLASSIFICATION
1465PROBMCL: SIMPLE PROBABILISTIC CONTRASTIVE LEARNING FOR MULTI-LABEL VISUAL CLASSIFICATION
7602PROFILE-ERROR-TOLERANT TARGET-SPEAKER VOICE ACTIVITY DETECTION
9230PROGRESSIVE IMAGE SYNTHESIS FROM SEMANTICS TO DETAILS WITH DENOISING DIFFUSION GAN
4637PROGRESSIVE LEARNING BASED KNOWLEDGE DISTILLATION FOR LOW RESOLUTION CEREBRAL MICROBLEED SEGMENTATION
6762PROGRESSIVE UNSUPERVISED DOMAIN ADAPTATION FOR ASR USING ENSEMBLE MODELS AND MULTI-STAGE TRAINING
10076PROGRESSIVELY LEARNING FROM MACRO-EXPRESSIONS FOR MICRO-EXPRESSION RECOGNITION
4128PRO-HAN: A HETEROGENEOUS GRAPH ATTENTION NETWORK FOR PROFILE-BASED SPOKEN LANGUAGE UNDERSTANDING
2103Promoting Independence of Depression and Speaker Features for Speaker Disentanglement in Speech-based Depression Detection
3308PROMPTASR FOR CONTEXTUALIZED ASR WITH CONTROLLABLE STYLE
4110PROMPT-BASED PERSONALIZED FEDERATED LEARNING FOR MEDICAL VISUAL QUESTION ANSWERING
4917Prompt-driven Target Speech Diarization
7403PROMPTFORMER: PROMPTED CONFORMER TRANSDUCER FOR ASR
7690PROMPTING AUDIOS USING ACOUSTIC PROPERTIES FOR EMOTION REPRESENTATION
6562PROMPTING LABEL EFFICIENCY IN FEDERATED GRAPH LEARNING VIA PERSONALIZED SEMI-SUPERVISION
5572PROMPTING LARGE LANGUAGE MODELS WITH FINE-GRAINED VISUAL RELATIONS FROM SCENE GRAPH FOR VISUAL QUESTION ANSWERING
7953Prompting Large Language Models with Speech Recognition Abilities
2407PROMPTING TO PROMPT FOR REHEARSAL-FREE CLASS INCREMENTAL LEARNING
9691PROMPTTTS++: CONTROLLING SPEAKER IDENTITY IN PROMPT-BASED TEXT-TO-SPEECH USING NATURAL LANGUAGE DESCRIPTIONS
3424PROMPTVC: FLEXIBLE STYLISTIC VOICE CONVERSION IN LATENT SPACE DRIVEN BY NATURAL LANGUAGE PROMPTS
7395PROPOSAL DISTILLATION OF MULTI-MODAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
4990PROTOTYPE CALIBRATION WITH SYNTHESIZED SAMPLES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION
4239Prototype-Guided Masking for Unsupervised Domain Adaptation
3083PROVABLE RANDOMIZED COORDINATE DESCENT FOR MATRIX COMPLETION
3612PROXIMAL BELLMAN MAPPINGS FOR REINFORCEMENT LEARNING AND THEIR APPLICATION TO ROBUST ADAPTIVE FILTERING
1445PseKD: Phase-shift Encoded Knowledge Distillation for Oriented Object Detection in Remote Sensing Images
5049Pseudo Labels Regularization for Imbalanced Partial-Label Learning
3112Pseudo-outlier synthesis using q-Gaussian distributions for out-of-distribution detection
4770PU-EdgeFormer++: an Advanced Hierarchical Edge Transformer for Arbitrary-Scale Point Cloud Upsampling using Distance Fields
6619PUSH4REC: TEMPORAL AND CONTEXTUAL TREND-AWARE TRANSFORMER PUSH NOTIFICATION RECOMMENDER
4806PVCG: Prompt-based Vision-aware Classification and Generation for Multi-modal Rumor Detection
3883PVITNET: AN EFFECTIVE APPROACH FOR ANDROID MALWARE DETECTION USING PYRAMID FEATURE PROCESSING AND VISION TRANSFORMER
4752Pyramid: A Heterogeneous Data Integration Algorithm Based on Hierarchical Graph
4403QUANTIFYING SPATIAL AUDIO QUALITY IMPAIRMENT
4329QUANTIFYING THE EFFECT OF SIMULATOR-BASED DATA AUGMENTATION FOR SPEECH RECOGNITION ON AUGMENTED REALITY GLASSES
8040QUANTIZATION NOISE MASKING IN PERCEPTUAL NEURAL AUDIO CODER
7922QUANTIZED DECODER IN LEARNED IMAGE COMPRESSION FOR DETERMINISTIC RECONSTRUCTION
11472QUANTIZED RADIO MAP ESTIMATION USING TENSOR AND DEEP GENERATIVE MODELS
11556Quantum Algorithm for Signal Denoising
8261QUANTUM FEDERATED LEARNING WITH QUANTUM NETWORKS
6031QUANTUM INSPIRED IMAGE AUGMENTATION APPLICABLE TO WAVEGUIDES AND OPTICAL IMAGE TRANSFER VIA ANDERSON LOCALIZATION
7487QUANTUM PRIVACY AGGREGATION OF TEACHER ENSEMBLES (QPATE) FOR PRIVACY PRESERVING QUANTUM MACHINE LEARNING
5632Quantum Ranging Enhanced TDoA Localization
3834QUANTUM TOPIC MODEL: TOPIC MODELING USING VARIATIONAL QUANTUM CIRCUITS
8059QUAPPROX: A FRAMEWORK FOR BENCHMARKING THE APPROXIMABILITY OF VARIATIONAL QUANTUM CIRCUIT
7612RADAR PERCEPTION WITH SCALABLE CONNECTIVE TEMPORAL RELATIONS FOR AUTONOMOUS DRIVING
4550RADAR RECOGNITION IN THE WILD: ENHANCING RADAR EMITTER RECOGNITION THROUGH AUTO-CORRELATION MODEL-AGNOSTIC META LEARNING
10288RadarDiff: Improving Sea Clutter Suppression using Diffusion Models for Radar images
6006RADEMACHER COMPLEXITY REGULARIZATION FOR CORRELATION-BASED MULTIVIEW REPRESENTATION LEARNING
7065RADIO SLAM WITH HYBRID SENSING FOR MIXED REFLECTION TYPE ENVIRONMENTS
11886RAD-NET: A REPAIRING AND DENOISING NETWORK FOR SPEECH SIGNAL IMPROVEMENT
2121RANDOMIZED MAXIMUM LIKELIHOOD VIA HIGH-DIMENSIONAL BAYESIAN OPTIMIZATION
9442RANKING ENHANCED FINE-GRAINED CONTRASTIVE LEARNING FOR RECOMMENDATION
4772RANKING OF VISUAL TRACKERS USING ROBUST ERROR NORMS
10132RAPID CHANGE LOCALIZATION IN DYNAMIC GRAPHICAL MODELS
1279RAPID HYBRID MODULAR RECEIVE BEAMFORMING VIA LEARNED OPTIMIZATION
8940Rate-Quality based Rate Control Model for Neural Video Compression
9663RATING-AUGMENTED NO-REFERENCE POINT CLOUD QUALITY ASSESSMENT USING MULTI-TASK LEARNING
2244RCIF: TOWARDS ROBUST DISTRIBUTED DNN COLLABORATIVE INFERENCE UNDER HIGHLY LOSSY NETWORKS
4703RDANET:REJECT DOMAIN ATTENTION NETWORK FOR CONFUSED FACIAL EXPRESSION RECOGNITION
4933RD-COST REGRESSION SPEED UP TECHNIQUE FOR VVC INTRA BLOCK PARTITIONING
4677RD-NeRF: Neural Robust Distilled Feature Fields for Sparse-view Scene Segmentation
1571READ, SPELL AND REPEAT: SCENE TEXT RECOGNITION WITH VISION-LANGUAGE CIRCULAR REFINEMENT
9860REAL-ORIENTED OBJECT DETECTION DRIVEN BY INTELLIGENT STOCKBREEDING
3927REAL-TIME LOW-LATENCY MUSIC SOURCE SEPARATION USING HYBRID SPECTROGRAM-TASNET
1598REAL-TIME MULTI-HUMAN PARSING ON EMBEDDED DEVICES
4684REAL-TIME PRIVACY-PRESERVING FALL RISK ASSESSMENT WITH A SINGLE BODY-WORN TRACKING CAMERA
4364REAL-TIME STEREO SPEECH ENHANCEMENT WITH SPATIAL-CUE PRESERVATION BASED ON DUAL-PATH STRUCTURE
11901REBUILD, REGENERATE: A GATED TEMPORAL CONVOLUTION BASED GAN FOR SPEECH SIGNAL IMPROVEMENT
7661RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
7577RECENT ADVANCES IN SCALABLE ENERGY-EFFICIENT AND TRUSTWORTHY SPIKING NEURAL NETWORKS: FROM ALGORITHMS TO TECHNOLOGY
2342RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE SUPER-RESOLUTION
10052Reconstruction of sound field through diffusion models
4267Recovering from Privacy-Preserving Masking with Large Language Models
10187RECOVERING MISSING NODE FEATURES WITH LOCAL STRUCTURE-BASED EMBEDDINGS
7749RECURSIVE-TAIL-FISTA FOR SPARSE SIGNAL RECOVERY
3005Redefining Night Vision: The Power of MSR-Driven Neural ISP
2876REDUCED-DIMENSIONAL DECOMPOSITION AND EIGENSPACE RECONSTRUCTION OF COHERENT SOURCES WITH ARBITRARY RECTANGLE ARRAYS
6438REDUCING THE COMPLEXITY OF NORMALIZING FLOW ARCHITECTURES FOR POINT CLOUD ATTRIBUTE COMPRESSION
3961REFERENCE LINE NETWORK: ON SIMULTANEOUS GAUSSIAN LINE DETECTION AND CONNECTION GRAPH INFERENCE
3863Refinement Bird's Eye View Feature for 3D Lane Detection with Dual-Branch View Transformation Module
4460REFINING 3D HUMAN MESH VIA MODEL-FREE OFFSETS ESTIMATION
7829Refining Text Input for Augmentative and Alternative Communication (AAC) Devices: Analysing Language Model Layers for Optimisation
9298REFLECTION REMOVAL USING RECURRENT POLARIZATION-TO-POLARIZATION NETWORK
3207REFLOW-TTS: A RECTIFIED FLOW MODEL FOR HIGH-FIDELITY TEXT-TO-SPEECH
2869REGION-ADAPTIVE VIDEO SHARPENING VIA RATE-PERCEPTION OPTIMIZATION
2287REGIR: REFINED GEOMETRY FOR SINGLE-IMAGE IMPLICIT CLOTHED HUMAN RECONSTRUCTION
3016REGULARIZED CONDITIONAL ALIGNMENT FOR MULTI-DOMAIN TEXT CLASSIFICATION
9545Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
5249Reinforcement Learning Compensated Filter for Multi-agents Cooperative Localization
7631REINFORCEMENT LEARNING-GUIDED OPTOGENETIC STIMULATION POLICIES FOR ROBUST FUNCTIONAL NETWORK DISCOVERY
9602RELATIONAL GRAPH-BRIDGED IMAGE-TEXT INTERACTION: A NOVEL METHOD FOR MULTI-MODAL RELATION EXTRACTION
4840REMIXED2REMIXED: DOMAIN ADAPTATION FOR SPEECH ENHANCEMENT BY NOISE2NOISE LEARNING WITH REMIXING
11930REMIXING MUSIC FOR HEARING AIDS USING ENSEMBLE OF FINE-TUNED SOURCE SEPARATORS
11879RENET: A TIME-FREQUENCY DOMAIN GENERAL SPEECH RESTORATION NETWORK FOR ICASSP 2024 SPEECH SIGNAL IMPROVEMENT CHALLENGE
3570RENYI DIFFERENTIAL PRIVACY IN THE SHUFFLE MODEL: ENHANCED AMPLIFICATION BOUNDS
9064RENYI DIVERGENCES LEARNING FOR EXPLAINABLE CLASSIFICATION OF SAR IMAGE PAIRS
4680REPARAMETERIZATION HEAD FOR EFFICIENT MULTI-INPUT NETWORKS
3950Representation and Boundary enhancement for Action Segmentation using Transformer
1433REPRESENTATION LEARNING ACROSS FEATURE AND TOPOLOGY VIEWS WITH OUTPUT CORRECTION FOR GRAPH CONVOLUTIONAL NETWORKS
11560Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
4308REPURPOSING MU-MIMO DOWNLINK FOR JOINT WIRELESS COMMUNICATIONS AND IMAGING VIA VIRTUAL USERS
9451RESIDUAL DENSE SWIN TRANSFORMER FOR CONTINUOUS DEPTH-INDEPENDENT ULTRASOUND IMAGING
5123RESIDUALTRANSFORMER: RESIDUAL LOW-RANK LEARNING WITH WEIGHT-SHARING FOR TRANSFORMER LAYERS
2812RESOURCE-CONSTRAINED STEREO SINGING VOICE CANCELLATION
4462RESOURCE-EFFICIENT SEPARATION TRANSFORMER
2994RETAINING INFORMATIVE LATENT VARIABLES IN PROBABILISTIC SEGMENTATION
5310RETHINKING NORMALS: DIRECTION GUIDED POINT CLOUD RECOGNITION
8616RETHINKING SESSION VARIABILITY: LEVERAGING SESSION EMBEDDINGS FOR SESSION ROBUSTNESS IN SPEAKER VERIFICATION
8260RETHINKING TARGETED ADVERSARIAL ATTACKS FOR NEURAL MACHINE TRANSLATION
7942Retrieval Augmented End-to-End Spoken Dialog Models
3735RETRIEVAL-AUGMENTED TEXT-TO-AUDIO GENERATION
7125RETRIEVAL-GENERATION SYNERGY AUGMENTED LARGE LANGUAGE MODELS
7971REVEALING EMOTIONAL CLUSTERS IN SPEAKER EMBEDDINGS: A CONTRASTIVE LEARNING STRATEGY FOR SPEECH EMOTION RECOGNITION
5314REVERSIBLE JUMP MARKOV CHAIN MONTE CARLO FOR PULSE FITTING
4662REVISE THE NLU: A PROMPTING STRATEGY FOR ROBUST DIALOGUE SYSTEM
11557Revisiting Deep Generalized Canonical Correlation Analysis
7934REVISITING SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATION FROM A MUTUAL INFORMATION PERSPECTIVE
8966REVISITING THE EQUIVALENCE OF IN-CONTEXT LEARNING AND GRADIENT DESCENT: THE IMPACT OF DATA DISTRIBUTION
10286REWEIGHTED ATOMIC NORM MINIMIZATION FOR ONE-BIT MULTICHANNEL SPECTRAL COMPRESSED SENSING
2444RGB IMAGES ENHANCING HYPERSPECTRAL IMAGE DENOISING WITH DIFFUSION MODEL
11912RGBT2HS-Net: Reconstructing a hyper-spectral volume from an RGB-T stack via an attention-powered multiresolution framework
7813Riemannian Diffusion Adaptation over Graphs with Application to Online Distributed PCA
4021RIS LOCALIZATION AND SPATIALLY WIDEBAND FILTERING EFFECTS
8454RISK-MANAGED SPARSE INDEX TRACKING VIA MARKET GRAPH CLUSTERING
3267RK-core: An established methodology for exploring the hierarchical structure within datasets
2273RL-EMO: A REINFORCEMENT LEARNING FRAMEWORK FOR MULTIMODAL EMOTION RECOGNITION
1900RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition
9483ROBUST AND IMPERCEPTIBLE COMMERCIAL CAMERA-SCREEN COMMUNICATION WITH 60HZ REFRESH RATE
1519ROBUST BEAMFORMING FOR DFRC SYSTEMS IN COMPLEX ENVIRONMENTS
7352ROBUST CROSS-DOMAIN SPEAKER VERIFICATION WITH MULTI-LEVEL DOMAIN ADAPTERS
9769Robust decoding of the auditory attention from EEG recordings through graph convolutional networks
8702Robust DOA estimation from deep acoustic imaging
3440ROBUST FACE RECOGNITION BASED ON AN ANGLE-AWARE LOSS AND MASKED AUTOENCODER PRE-TRAINING
8792Robust Lightweight Depth Estimation Model via Data-free Distillation
10333ROBUST LOCALIZATION OF KEY FOB USING CHANNEL IMPULSE RESPONSE OF ULTRA WIDE BAND SENSORS FOR KEYLESS ENTRY SYSTEM
3052ROBUST LOW-RANK CORRELATION FITTING
11459ROBUST MULTISTATIC TARGET LOCALIZATION IN THE PRESENCE OF NLOS ERRORS AND OUTLIERS
7671ROBUST NEAR-FIELD BEAMFORMING FOR MILLIMETER WAVE COMMUNICATION SYSTEM WITH APERTURE PERTURBATION
9566Robust Recovery of Joint Sparse signals via Simultaneous Orthogonal Matching Pursuit
4288Robust regression analysis based on the K-divergence
1804ROBUST SELF-SUPERVISED LEARNING WITH CONTRAST SAMPLES FOR NATURAL LANGUAGE UNDERSTANDING
2577ROBUST SINGLE-PARTICLE CRYO-EM IMAGE DENOISING AND RESTORATION
6210ROBUST SPEAKER PERSONALISATION USING GENERALIZED LOW-RANK ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION
2036ROBUST SPOOF SPEECH DETECTION BASED ON MULTI-SCALE FEATURE AGGREGATION AND DYNAMIC CONVOLUTION
9479ROBUST SYMBOL-LEVEL PRECODING VIA A SYMBOL-PERTURBED ZERO-FORCING STRUCTURE
6965ROBUST WAKE WORD SPOTTING WITH FRAME-LEVEL CROSS-MODAL ATTENTION BASED AUDIO-VISUAL CONFORMER
3074ROBUSTNESS AGAINST ADVERSARIAL ATTACKS VIA LEARNING CONFINED ADVERSARIAL POLYTOPES
4746ROBUSTNESS EVALUATION OF MACHINE LEARNING MODELS FOR ROBOT ARM ACTION RECOGNITION IN NOISY ENVIRONMENTS
8013ROBUSTTSVAR: A ROBUST TIME SERIES VARIANCE ESTIMATION ALGORITHM
8106RoFi: Robust WiFi Intrusion Detection via Distribution Matching
11552ROTOR NOISE-AWARE NOISE COVARIANCE MATRIX ESTIMATION FOR UNMANNED AERIAL VEHICLE AUDITION
5602RSED: Zero-shot Relation Triplet Extraction via Relation Selection and Entity Boundary Detection
4886RTLBP - AN EFFICIENT LOCAL PATTERN FOR FACIAL IMAGES RETRIEVAL
11482RTSNet: Learning to Smooth in Partially Known State-Space Models
3179RVAE-EM: GENERATIVE SPEECH DEREVERBERATION BASED ON RECURRENT VARIATIONAL AUTO-ENCODER AND CONVOLUTIVE TRANSFER FUNCTION
3936RVDNET: A TWO-STAGE NETWORK FOR REAL-WORLD VIDEO DESNOWING WITH DOMAIN ADAPTATION
3048S2E: Towards an End-to-End Entity Resolution Solution from Acoustic Signal
2432SADA: SAUDI AUDIO DATASET FOR ARABIC
9848SADE: A Speaker-Aware Dual Encoding Model based on DiagBERT for Medical Triage and Pre-diagnosis
6598SALIENCY PREDICTION OF SPORTS VIDEOS: A LARGE-SCALE DATABASE AND A SELF-ADAPTIVE APPROACH
10374SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
10140SAM: A Self-adaptive Attention Module for Context-Aware Recommendation System
3926SAM-Deblur: Let Segment Anything Boost Image Deblurring
7131SAMF: SMALL-AREA-AWARE MULTI-FOCUS IMAGE FUSION FOR OBJECT DETECTION
9231SAM-GEBD : ZERO-COST APPROACH FOR GENERIC EVENT BOUNDARY DETECTION
3359SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING
3893SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks
6906Sampling and Recovery of Signals over Product Cell Structures
9879SAMVG: A MULTI-STAGE IMAGE VECTORIZATION MODEL WITH THE SEGMENT-ANYTHING MODEL
3109Sandwiched Lo-res Simulation for Scalable Flood Modeling
7206SAR2NDVI: Pre-training for SAR-to-NDVI Image Translation
1122SASA: Saliency-Aware Self-Adaptive Snapshot Compressive Imaging
1320SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
1276SBM: Smoothness-based Minimization for Domain Generalization
7914SCALABLE AND EFFICIENT SPEECH ENHANCEMENT USING MODIFIED COLD DIFFUSION: A RESIDUAL LEARNING APPROACH
4850Scalable Ensemble-based Detection Method Against Adversarial Attacks For Speaker Verification
3234Scalable Model-Based Gaussian Process Clustering
3298SCALE-AWARE COMPETITION NETWORK FOR PALMPRINT RECOGNITION
2553Scale-free and Task-generic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator
11939SCALING NVIDIA's MULTI-SPEAKER MULTI-LINGUAL TTS SYSTEMS WITH ZERO-SHOT TTS TO INDIC LANGUAGES
9656Scaling Results for Robust Distributed Estimation in Sensor Networks using Order Statistics
9481SCANPCGC: LEARNING-BASED LOSSLESS POINT CLOUD GEOMETRY COMPRESSION USING SEQUENTIAL SLICE REPRESENTATION
6876SCENE SKETCH-TO-IMAGE SYNTHESIS BASED ON MULTI-OBJECT CONTROL
8692SC-MAD: MIXTURES OF HIGHER-ORDER NETWORKS FOR DATA AUGMENTATION
8187SCNet: Sparse Compression Network for Music Source Separation
8784SCORE CALIBRATION BASED ON CONSISTENCY MEASURE FACTOR FOR SPEAKER VERIFICATION
7973SCORE: SELF-SUPERVISED CORRESPONDENCE FINE-TUNING FOR IMPROVED CONTENT REPRESENTATIONS
4582SCORE-BASED DIFFUSION MODELS FOR PHOTOACOUSTIC TOMOGRAPHY IMAGE RECONSTRUCTION
2138ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter
9216SCRN: A SPECTROGRAM CONVOLUTIONAL RECURRENT NETWORK FOR AOA ESTIMATION USING BLUETOOTH 5
3615SDEMG: Score-Based Diffusion Model for Surface Electromyographic Signal Denoising
7954SD-HUBERT: SENTENCE-LEVEL SELF-DISTILLATION INDUCES SYLLABIC ORGANIZATION IN HUBERT
2120SDIF-DA: A SHALLOW-TO-DEEP INTERACTION FRAMEWORK WITH DATA AUGMENTATION FOR MULTI-MODAL INTENT DETECTION
8641SDRNET: SALIENCY-GUIDED DYNAMIC RESTORATION NETWORK FOR RAIN AND HAZE REMOVAL IN NIGHTTIME IMAGES
2679SEACO-PARAFORMER: A NON-AUTOREGRESSIVE ASR SYSTEM WITH FLEXIBLE AND EFFECTIVE HOTWORD CUSTOMIZATION ABILITY
8122SEA-GNN: SEQUENCE EXTENSION AUGMENTED GRAPH NEURAL NETWORK FOR SEQUENTIAL RECOMMENDATION
3253SEAM MASK GUIDED PARTIAL RECONSTRUCTION WITH QUANTUM-INSPIRED LOCAL AGGREGATION FOR DEEP IMAGE STITCHING
4059Search for gravitational wave probes - A self-supervised learning for pulsars based on signal contexts
1671Search Robust and Adaptable Architecture
8008SEC2SEC CO-ATTENTION TRANSFORMER FOR VIDEO-BASED APPARENT AFFECTIVE PREDICTION
7771SECP: A SPEECH ENHANCEMENT-BASED CURATION PIPELINE FOR SCALABLE ACQUISITION OF CLEAN SPEECH
4101SECTOR-BASED INTERFERENCE CANCELLATION FOR ROBUST KEYWORD SPOTTING APPLICATIONS USING AN INFORMED MPDR BEAMFORMER
6264SECURE ENERGY EFFICIENCY FAIRNESS MAXIMIZATION IN BACKSCATTER THROUGHPUT CONSTRAINED UAV-ASSISTED DATA COLLECTION
2859SECURELY AND EFFICIENTLY OUTSOURCING NEURAL NETWORK INFERENCE VIA PARALLEL MSB EXTRACTION
3935Security Equivalence Assessment Between Cloud Standards by Mapping of Control Items
9572SEEING THROUGH THE CONVERSATION: AUDIO-VISUAL SPEECH SEPARATION BASED ON DIFFUSION MODEL
1060SEEKING SIMILARITIES WHILE REMOVING DIFFERENCES: GRAPH NEURAL NETWORKS BASED ON NODE CORRELATION
8529SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
6120SEGLLM: TOPIC-ORIENTED CALL SEGMENTATION VIA LLM-BASED CONVERSATION SYNTHESIS
3513Segment Anything Model guided Semantic Knowledge Learning for Remote Sensing Change Detection
1277Segment Anything Model Meets Image Harmonization
5603SEGMENT THEN MATCH: FIND THE CARRIER BEFORE REASONING IN SCENE-TEXT VQA
9054SEGMENTATION-DRIVEN INFRARED AND VISIBLE IMAGE FUSION VIA TRANSFORMER-ENHANCED ARCHITECTURE SEARCHING
1844Segmented Error Minimisation (SEMI) for Robust Training of Deep Learning Models with Non-linear Shifts in Reference Data
9774SELECTING N-LOWEST SCORES FOR TRAINING MOS PREDICTION MODELS
11549Selective Acoustic Feature Enhancement for Speech Emotion Recognition with Noisy Speech
10128Selective Domain-invariant Feature for Generalizable Deepfake Detection
10113SELECTIVE USER FORWARDED CELL-FREE MASSIVE MIMO WITH QUANTIZED SYMBOLS
8871SELF KNOWLEDGE DISTILLATION BASED ON LAYER-WISE WEIGHTED FEATURE IMITATION FOR EFFICIENT OBJECT DETECTION
9196SELF-ADAPTIVE SCALE HANDLING FOR FORECASTING TIME SERIES WITH SCALE HETEROGENEITY
3704SELF-DISTILLED DYNAMIC FUSION NETWORK FOR LANGUAGE-BASED FASHION RETRIEVAL
1748SELF-KNOWLEDGE DISTILLATION WITH LEARNING FROM ROLE-MODEL SAMPLES
10421SELF-MOTION AS SUPERVISION FOR EGOCENTRIC AUDIOVISUAL LOCALIZATION
10238SELF-SUPERVISED ADAPTIVE AV FUSION MODULE FOR PRE-TRAINED ASR MODELS
6576SELF-SUPERVISED ADAPTIVE PRE-TRAINING OF MULTILINGUAL SPEECH MODELS FOR LANGUAGE AND DIALECT IDENTIFICATION
3908Self-supervised Cross-level Consistency Learning for Fundus Image Classification
3117Self-supervised Domain Exploration with an Optimal Transport Regularization for Open Set Cross-domain Speech Emotion Recognition
8394SELF-SUPERVISED DUAL GENERATIVE NETWORKS FOR EDGE-PRESERVING IMAGE SMOOTHING
2323Self-Supervised Face Image Restoration with a One-Shot Reference
1174SELF-SUPERVISED LEARNING FOR ANOMALOUS SOUND DETECTION
3847SELF-SUPERVISED LEARNING FOR SLEEP STAGE CLASSIFICATION WITH TEMPORAL AUGMENTATION AND FALSE NEGATIVE SUPPRESSION
7943SELF-SUPERVISED MODELS OF SPEECH INFER UNIVERSAL ARTICULATORY KINEMATICS
2320SELF-SUPERVISED MULTI-SCALE HIERARCHICAL REFINEMENT METHOD FOR JOINT LEARNING OF OPTICAL FLOW AND DEPTH
6725Self-Supervised Path Planning in UAV-aided Wireless Networks based on Active Inference
1954SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIVITY DETECTION IN ADVERSE CONDITIONS
8422SELF-SUPERVISED PULSE-AWARE INTERPRETABLE DISENTANGLED ECG REPRESENTATION LEARNING
9990SELF-SUPERVISED REINFORCEMENT LEARNING FOR OUT-OF-DISTRIBUTION RECOVERY VIA AUXILIARY REWARD
8787SELF-SUPERVISED SPATIALLY VARIANT PSF ESTIMATION FOR ABERRATION-AWARE DEPTH-FROM-DEFOCUS
9492Self-supervised Speaker Verification Employing a Novel Clustering Algorithm
8148SELF-SUPERVISED SPEAKER VERIFICATION WITH ADAPTIVE THRESHOLD AND HIERARCHICAL TRAINING
11937SELF-SUPERVISED SPEECH REPRESENTATION AND CONTEXTUAL TEXT EMBEDDING FOR MATCH-MISMATCH CLASSIFICATION WITH EEG RECORDING
3806SELF-TRAINING DOMAIN ADAPTATION VIA WEIGHT TRANSMISSION BETWEEN GENERATORS
6979SELM: Speech Enhancement Using Discrete Tokens and Language Models
6618SEMANTIC DISTILLATION AND STRUCTURAL ALIGNMENT NETWORK FOR FAKE NEWS DETECTION
7037SEMANTIC ENRICHMENT FOR VIDEO QUESTION ANSWERING WITH GATED GRAPH NEURAL NETWORKS
8812SEMANTIC LATENT DECOMPOSITION WITH NORMALIZING FLOWS FOR FACE EDITING
3505SEMANTIC PROXIMITY ALIGNMENT: TOWARDS HUMAN PERCEPTION-CONSISTENT AUDIO TAGGING BY ALIGNING WITH LABEL TEXT DESCRIPTION
8640SEMANTIC RECONSTRUCTION OF CONTINUOUS LANGUAGE FROM MEG SIGNALS
4067SEMANTIC SECURITY: A DIGITAL WATERMARK METHOD FOR IMAGE SEMANTIC PRESERVATION
8878SEMANTIC SEGMENTATION FOR MULTI-SCENE REMOTE SENSING IMAGES WITH NOISY LABELS BASED ON UNCERTAINTY PERCEPTION
4126SEMANTIC-ENHANCED SUPERVISED CONTRASTIVE LEARNING
7108SEMANTIC-GUIDED NETWORK WITH CONTRASTIVE LEARNING FOR VIDEO CAPTION
7738SEMANTICMAPPER: REGION-SPECIFIC DOMAIN ADAPTATION FOR 3D SHAPES THROUGH LEXICAL DELINEATION
9406SEMANTIC-PRESERVING IMAGE CODING BASED ON CONDITIONAL DIFFUSION MODELS
7389SEMANTICS DRIVEN MULTI-VIEW KNOWLEDGE GRAPH EMBEDDING FOR CROSS-LINGUAL ENTITY ALIGNMENT
8712SemDA: Communication-efficient Data Aggregation Through Distributed Semantic Transmission
7155SEMI-AUTOREGRESSIVE STREAMING ASR WITH LABEL CONTEXT
1626SEMI-BLIND ESTIMATION OF DIRECT-TO-REVERBERANT ENERGY RATIO USING RESIDUAL ENERGY TEST STATISTICS
1169SEMI-DECOUPLED 6D POSE ESTIMATION VIA MULTI-MODAL FEATURE FUSION
3897SEMI-SUPERVISED DOMAIN ADAPTATION FOR EEG-BASED SLEEP STAGE CLASSIFICATION
5606SEMI-SUPERVISED METRICS-BASED SELF-TRAINING ROOT CAUSE ANALYSIS FOR CLOUD-NATIVE SYSTEMS WITH CLASS-IMBALANCED DATA
1168SEMI-SUPERVISED SOUND EVENT DETECTION WITH LOCAL AND GLOBAL CONSISTENCY REGULARIZATION
5960SEMI-SUPERVISED VOLUMETRIC MEDICAL IMAGE SEGMENTATION VIA CLASS PROTOTYPE GUIDED DISTRIBUTION-ALIGNED REPRESENTATION LEARNING
1768SENSI-BERT: TOWARDS SENSITIVITY DRIVEN FINE-TUNING FOR PARAMETER-EFFICIENT LANGUAGE MODEL
3355SENSING WITH RANDOM SIGNALS
4525SENSING-AIDED COMMUNICATION CHANNEL ESTIMATION WITH TENSOR-BASED MOVING TARGET LOCALIZATION
4791SENSING-ASSISTED DISTRIBUTED USER SCHEDULING AND BEAMFORMING IN MULI-CELL MMWAVE NETWORKS
8331SEQUENCE OF LINEAR PROGRAM FOR ROBUST PHASE RETRIEVAL
1855SEQUENTIAL ACQUISITION OF FEATURES AND EXPERTS FOR DATUM–WISE CLASSIFICATION
8374SEQUENTIAL DETECTION OF ANOMALIES IN NOISY OUTPUTS OF AN UNKNOWN FUNCTION USING GAUSSIAN AND YULE-SIMON PROCESSES
9431SEQUENTIAL MONTE CARLO GRAPH CONVOLUTIONAL NETWORK FOR DYNAMIC BRAIN CONNECTIVITY
4108Sequential Wasserstein Uncertainty Sets for Minimax Robust Online Change Detection
4377SERC-GCN: SPEECH EMOTION RECOGNITION IN CONVERSATION USING GRAPH CONVOLUTIONAL NETWORKS
7106SE-SIS: shadow-embeddable lossless secret image sharing for greyscale images
1883S-Evaluator: Enhance Factual Consistency Evaluator with Adversarial Data Synthesized by Large Language Model
9029SG2SC: A GENERATIVE SEMANTIC COMMUNICATION FRAMEWORK FOR SCENE UNDERSTANDING-ORIENTED IMAGE TRANSMISSION
7011SGM: A DATASET FOR 3D GARMENT RECONSTRUCTION FROM SINGLE HAND-DRAWN SKETCH
3865SGT: SELF-GUIDED TRANSFORMER FOR FEW-SHOT SEMANTIC SEGMENTATION
4558Shapley Value Guided Extractive Text Summarization
1609SHIFT OPERATOR AND SEPARATION FILTER FOR DIFFERENT PERIOD MIXED SIGNALS USING COMPANION MATRIX
8806Shifted-rectangle-window Based Transformer for Non-displaced Femoral Neck Fracture Diagnosis
3520SIANet: Support Information-Aware Network for Category-Agnostic Pose Estimation
3212SICRN: ADVANCING SPEECH ENHANCEMENT THROUGH STATE SPACE MODEL AND INPLACE CONVOLUTION TECHNIQUES
5064SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model
7227SIGNAL RECONSTRUCTION FROM NONIDEAL SAMPLES IN FRACTIONAL FOURIER TRANSFORM DOMAIN
11924SIGNAL SEPARATION IN RADIO SPECTRUM USING SELF-ATTENTION MECHANISM
2510Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
7031SIGNIFICANT ASR ERROR DETECTION FOR CONVERSATIONAL VOICE ASSISTANTS
6399SimFall: A Data Generator For RF-based Fall Detection
4181SIMILAR BUT FASTER: MANIPULATION OF TEMPO IN MUSIC AUDIO EMBEDDINGS FOR TEMPO PREDICTION AND SEARCH
3324SIMILARITY KNOWLEDGE DISTILLATION WITH CALIBRATED MASK
2868SimMKD: Simple Mask-Flow Keypoint Detection for both typhoon detection and typhoon eye location
4039Simple Contrastive Representation Learning for Time Series Forecasting
2054SIMULTANEOUS INTERIOR AND EXTERIOR SOUND FIELD SYNTHESIS USING CYLINDRICAL AND SPHERICAL LOUDSPEAKER ARRAYS
2507SIMULTANEOUS POSITIONING AND TRACKING USING DYNAMIC FACTOR GRAPHS AND GEOMETRIC AVERAGE FUSION
8175SINGFAKE: SINGING VOICE DEEPFAKE DETECTION
3964SINGLE AND FEW-STEP DIFFUSION FOR GENERATIVE SPEECH ENHANCEMENT
10444SINGLE CHANNEL MULTIPLE SIGNAL CLASSIFICATION USING PSEUDO-DOPPLER
2995Single Image Reflection Removal using Feature Difference Enhancement
5941SINGLE-CHANNEL BLIND DEREVERBERATION BASED ON RANK-1 MATRIX LIFTING IN TIME-FREQUENCY DOMAIN
7773Single-pixel imaging of dynamic flows using Neural ODE regularization
2647SINGLE-SOURCE DOMAIN GENERALIZATION IN FUNDUS IMAGE SEGMENTATION VIA MODERATING AND INTERPOLATING INPUT SPACE AUGMENTATION
11944SINGLE-STAGE TTS WITH ADAPTED VOCODER AND CROSS-ATTENTION: TALTECH SYSTEMS FOR THE LIMMITS’24 CHALLENGE
11862SIR-PROGRESSIVE AUDIO-VISUAL TF-GRIDNET WITH ASR-AWARE SELECTOR FOR TARGET SPEAKER EXTRACTION IN MISP 2023 CHALLENGE
8113SITUATIONAL SIGNAL PROCESSING WITH ECOLOGICAL MOMENTARY ASSESSMENT: LEVERAGING ENVIRONMENTAL CONTEXT FOR COCHLEAR IMPLANT USERS
4870Situation-aware adaptive transmit beamforming for automotive radars
1663SJTU-TMQA: A QUALITY ASSESSMENT DATABASE FOR STATIC MESH WITH TEXTURE MAP
2622SKETCH-BASED 3D SHAPE RETRIEVAL WITH MULTI-VIEW FUSION TRANSFORMER
8606SKETCHED COLUMN-BASED MATRIX APPROXIMATION WITH SIDE INFORMATION
7002SKILLNET-X: A MULTILINGUAL MULTITASK MODEL WITH SPARSELY ACTIVATED SKILLS
1630SKIN TONE DISENTANGLEMENT IN 2D MAKEUP TRANSFER WITH GRAPH NEURAL NETWORKS
8075SKIP-STEP CONTRASTIVE PREDICTIVE CODING FOR TIME SERIES ANOMALY DETECTION
4908SLIDESPEECH: A LARGE SCALE SLIDE-ENRICHED AUDIO-VISUAL CORPUS
7609SLOWFAST NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
9464SMALL OBJECT DETECTION ON THE WATER SURFACE BASED ON RADAR AND CAMERA FUSION
1684SMALL-FOOTPRINT AUTOMATIC SPEECH RECOGNITION SYSTEM USING TWO-STAGE TRANSFER LEARNING BASED SYMMETRIZED TERNARY WEIGHT NETWORK
8447Small-Footprint Convolutional Neural Network with reduced feature map for Voice Activity Detection
10412SMMA-NET: AN AUDIO CLUE-BASED TARGET SPEAKER EXTRACTION NETWORK WITH SPECTROGRAM MATCHING AND MUTUAL ATTENTION
3490SMOOTH START: A UNIFIED APPROACH FOR GRADUAL TRANSITION FROM COLD TO OLD IN RECOMMENDER SYSTEMS
7711SNAPSHOT PROMPT ENSEMBLE FOR PARAMETER-EFFICIENT SOFT PROMPT TRANSFER
6044SNORE SOUND FEATURES BASED ON PERCUSSIVE ENHANCING AND POSITIONAL ENCODING COMBINED WITH MULTI-TASK LEARNING FOR OSAHS DETECTION
1951SOCIAL LEARNING WITH ADAPTIVE MODELS
2289SOCIAL LODE: HUMAN TRAJECTORY PREDICTION WITH LATENT ODES
9670SOD-UAV: small object detection for unmanned aerial vehicle images via improved YOLOv7
4812SOFT ALIGNMENT OF MODALITY SPACE FOR END-TO-END SPEECH TRANSLATION
2112SOFT DYNAMIC TIME WARPING WITH VARIABLE STEP WEIGHTS
4441Soft Image Segmentation using Gradient Graph Laplacian Regularizer
10088SOLUTION AND ANALYSIS FOR 3-D LOCALIZATION IN CLOSED-FORM INTEGRATING SA AND TDOA MEASUREMENTS
6686SO-NET: MODEL-AGNOSTIC SEQUENTIAL HAND POSE OPTIMIZATION FRAMEWORK
3651SORTING, REASONING, AND EXTRACTION: AN EASY-TO-HARD REASONING FRAMEWORK FOR DOCUMENT-LEVEL EVENT ARGUMENT EXTRACTION
11505SOUND FIELD INTERPOLATION FOR ROTATION-INVARIANT MULTICHANNEL ARRAY SIGNAL PROCESSING
1099SOUNDLOCD: AN EFFICIENT CONDITIONAL DISCRETE CONTRASTIVE LATENT DIFFUSION MODEL FOR TEXT-TO-SOUND GENERATION
8177SOURCE-FREE DOMAIN ADAPTATION FOR MILLIMETER WAVE RADAR BASED HUMAN ACTIVITY RECOGNITION
8547SOURCE-FREE ONLINE DOMAIN ADAPTIVE SEMANTIC SEGMENTATION OF SATELLITE IMAGES UNDER IMAGE DEGRADATION
1959SourceP: Detecting Ponzi Schemes on Ethereum with Source Code
4289Space-Time Adaptive Processing for radars in Connected and Automated Vehicular Platoons
8467SPARSE BAYESIAN LEARNING-BASED DIRECT LOCALIZATION FOR DISTRIBUTED SENSOR ARRAYS WITH UNKNOWN GAIN AND PHASE ERRORS
6319SPARSE BAYESIAN SYNTHETIC APERTURE PROCESSING BASED DOA ESTIMATION WITH DEFORMED TOWED ARRAYS
1310SPARSE CHANNEL REPRESENTATION AND ESTIMATION IN NEAR FIELD COMMUNICATIONS
7685SPARSE PCA WITH FALSE DISCOVERY RATE CONTROLLED VARIABLE SELECTION
4522Sparse Regularization based on Reverse Ordered Weighted L1-norm and Its Application to Edge-preserving Smoothing
8856SPARSE SOUND FIELD REPRESENTATION USING COMPLEX ORTHOGONAL MATCHING PURSUIT
2353SPARSE, WEIGHT-CONSTRAINED ARRAYS WITH O(N) APERTURE FOR REDUCED MUTUAL COUPLING
7276SPARSELY SHARED LORA ON WHISPER FOR CHILD SPEECH RECOGNITION
5620SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer
1740SPASE: SPAtial Saliency Explanation for time series models
9346SPATIAL FORMATION-GUIDED NETWORK FOR GROUP ACTIVITY RECOGNITION
7940SPATIAL SCAPER: A LIBRARY TO SIMULATE AND AUGMENT SOUNDSCAPES FOR SOUND EVENT LOCALIZATION AND DETECTION IN REALISTIC ROOMS
7567SPATIALCODEC: NEURAL SPATIAL SPEECH CODING
2493SPATIAL-TEMPORAL INTERACTION DECODING TRANSFORMER FOR UNSUPERVISED MULTIVARIATE TIME SERIES ANOMALY DETECTION
5845SPATIO-TEMPORAL ACTION DETECTION WITH A MOTION SENSE AND SEMANTIC CORRECTION FRAMEWORK
4629SPATIO-TEMPORAL CORRELATION LEARNING FOR MULTIPLE OBJECT TRACKING
1750SPATIO-TEMPORAL DATA MINING WITH INFORMATION INTEGRITY PROTECTION: GRAPH SIGNAL BASED AIR QUALITY PREDICTION
8002SPATIOTEMPORAL GROUP ANOMALY DETECTION VIA GRAPH TOTAL VARIATION ON TENSORS
3144SPCL-MER: SUPERVISED PROTOTYPICAL CONTRASTIVE LEARNING FOR MICRO-EXPRESSION RECOGNITION
8831SPDG-NET: SEMANTICS PRESERVING DOMAIN AUGMENTATION THROUGH STYLE INTERPOLATION FOR MULTI-SOURCE DOMAIN GENERALIZATION
7688SPEAK WHILE YOU THINK: STREAMING SPEECH SYNTHESIS DURING TEXT GENERATION
3090SPEAKER ADAPTATION FOR ENHANCEMENT OF BONE-CONDUCTED SPEECH
6612SPEAKER ANONYMIZATION USING NEURAL AUDIO CODEC LANGUAGE MODELS
11474SPEAKER ANONYMIZATION USING ORTHOGONAL HOUSEHOLDER NEURAL NETWORK
2942SPEAKER-ADAPTIVE LIPREADING VIA SPATIO-TEMPORAL INFORMATION LEARNING
10193SPEAKER-CENTRIC MULTIMODAL FUSION NETWORKS FOR EMOTION RECOGNITION IN CONVERSATIONS
7017SPECDIFF-GAN: A SPECTRALLY-SHAPED NOISE DIFFUSION GAN FOR SPEECH AND MUSIC SYNTHESIS
4942SPEC-NERF: MULTI-SPECTRAL NEURAL RADIANCE FIELDS
10030SPECTRAL ANALYSIS OF VOWELS AND FRICATIVES AT VARIED LEVELS OF DYSARTHRIA SEVERITY FOR AMYOTROPHIC LATERAL SCLEROSIS
10077Spectral Graph Neural Networks with Generalized Laguerre Approximation
8764SPECTROGRAM SMOOTHING FOR ESTIMATION OF THE EVOLUTIONARY SPECTRA OF UNIFORMLY MODULATED PROCESSES
10158SPECTRO-SPATIAL HYPERSPECTRAL IMAGE RECONSTRUCTION FROM INTERFEROMETRIC ACQUISITIONS
8098SPECTRUMNET: SPECTRUM-BASED TRAJECTORY ENCODE NEURAL NETWORK FOR PEDESTRIAN TRAJECTORY PREDICTION
7800SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA
11559Speech Dereverberation With Frequency Domain Autoregressive Modeling
7753Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations
7064Speech enhancement in hearing aids using target speech presence estimation based on a delayed remote microphone signal
9293SPEECH FOUNDATION MODELS ON INTELLIGIBILITY PREDICTION FOR HEARING-IMPAIRED LISTENERS
9194SPEECH GUIDED MASKED IMAGE MODELING FOR VISUALLY GROUNDED SPEECH
3726SPEECH RELATIONSHIP LEARNING FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
7074Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition
9016SPEECHDPR: END-TO-END SPOKEN PASSAGE RETRIEVAL FOR OPEN-DOMAIN SPOKEN QUESTION ANSWERING
1088SPEECH-DRIVEN EMOTIONAL 3D TALKING FACE ANIMATION USING EMOTIONAL EMBEDDINGS
2604SPGFUSION: A SEMANTIC PRIOR GUIDED INFRARED AND VISIBLE IMAGE FUSION NETWORK
1940SPGM: Prioritizing local features for enhanced speech separation performance
4476Spiking Structured State Space Model for Monaural Speech Enhancement
9045Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
9532SPIRAL SHAPE MATTERS: NOVEL BIO-INSPIRED COCHLEAR CEPSTRUM
8186SponTTS: modeling and transferring spontaneous style for TTS
9243SPOOFING ATTACK AUGMENTATION: CAN DIFFERENTLY-TRAINED ATTACK MODELS IMPROVE GENERALISATION?
6542SPTESLEEPNET: AUTOMATIC SLEEP STAGING MODEL BASED ON STRIP PATCH EMBEDDINGS AND TRANSFORMER ENCODER
1491SPY-WATERMARK: ROBUST INVISIBLE WATERMARKING FOR BACKDOOR ATTACK
2917SRCodec: Split-residual vector quantization for neural speech codec
9489SRECT: Machine-specific Spatial-resolution Enhancement in Computed Tomography
7012SR-HUBERT : AN EFFICIENT PRE-TRAINED MODEL FOR SPEAKER VERIFICATION
1520SRP-UOD: MULTI-BRANCH HYBRID NETWORK FRAMEWORK BASED ON STRUCTURAL RE-PARAMETERIZATION FOR UNDERWATER SMALL OBJECT DETECTION
8774SR-VFA: ACCURATE SELF-REFINED FACE ALIGNMENT IN VIDEOS
1336SSHNN: SEMI-SUPERVISED HYBRID NAS NETWORK FOR ECHOCARDIOGRAPHIC IMAGE SEGMENTATION
6294SSL-NET: A SYNERGISTIC SPECTRAL AND LEARNING-BASED NETWORK FOR EFFICIENT BIRD SOUND CLASSIFICATION
1770SSR-GPCST: DEEP LEARNING MODELS BASED ON FUNCTIONAL CONNECTIVITY MAPS IN AUTISM RESEARCH
3842SSTA: Salient Spatially Transformed Attack
7454STABILITY OF GRAPH CONVOLUTIONAL NEURAL NETWORKS THROUGH THE LENS OF SMALL PERTURBATION ANALYSIS
4369STABLE DISTILLATION: REGULARIZING CONTINUED PRE-TRAINING FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
1135STABLE KNOWLEDGE TRANSFER FOR CONTRASTIVE DISTILLATION
1714Stable Optimization for Large Vision Model Based Deep Image Prior in Cone-Beam CT Reconstruction
8327STABLEMISS+: PREDICTION WITH INCOMPLETE DATA UNDER AGNOSTIC MASK DISTRIBUTION SHIFT
4813STACK-AND-DELAY: A NEW CODEBOOK PATTERN FOR MUSIC GENERATION
8493Stage-Regularized Neural Stein Critics for Testing Goodness-of-Fit of Generative Models
4085STAR: DISTILLING SPEECH TEMPORAL RELATION FOR LIGHTWEIGHT SPEECH SELF-SUPERVISED LEARNING MODELS
8518STATE-AUGMENTED INFORMATION ROUTING IN COMMUNICATION SYSTEMS WITH GRAPH NEURAL NETWORKS
7895STATEFUL CONFORMER WITH CACHE-BASED INFERENCE FOR STREAMING AUTOMATIC SPEECH RECOGNITION
6877Statistical and Computational Limits of Detecting and Recovering Hidden Submatrices
8646STEALTHY BACKDOOR ATTACK TOWARDS FEDERATED AUTOMATIC SPEAKER VERIFICATION
8321STEIN VARIATIONAL GRADIENT DESCENT-BASED DETECTION FOR RANDOM ACCESS WITH PREAMBLES IN MTC
7555StemGen: A music generation model that listens
7128STEREO-MATCHING KNOWLEDGE DISTILLED MONOCULAR DEPTH ESTIMATION FILTERED BY MULTIPLE DISPARITY CONSISTENCY
4783STEREOPHONIC MUSIC SOURCE SEPARATION WITH SPATIALLY-INFORMED BRIDGING BAND-SPLIT NETWORK
9384Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
6533Stochastic Configuration Networks for Laboratory Seismic Time-to-Failure Prediction
1116STOFNET: SUPER-RESOLUTION TIME OF FLIGHT NETWORK
6910STORYTTS: A HIGHLY EXPRESSIVE TEXT-TO-SPEECH DATASET WITH RICH TEXTUAL EXPRESSIVENESS ANNOTATIONS
9517Straighforward adaptation of particle filter to fish eye images for top view pedestrian tracking
9061STRATEGIC ARMS WITH SIDE COMMUNICATION PREVAIL OVER LOW-REGRET MAB ALGORITHMS
1032STREAMING ACTIVE LEARNING FOR REGRESSION PROBLEMS USING REGRESSION VIA CLASSIFICATION
4397STREAMING ANCHOR LOSS: AUGMENTING SUPERVISION WITH TEMPORAL SIGNIFICANCE
4781StreamVC: Real-Time Low-Latency Voice Conversion
10217STRING SOUND SYNTHESIZER ON GPU-ACCELERATED FINITE DIFFERENCE SCHEME
11528STRONG LABELING OF SOUND EVENTS USING CROWDSOURCED WEAK LABELS AND ANNOTATOR COMPETENCE ESTIMATION
6875Structure matters: analyzing videos via graph neural networks for social media platform attribution
1224Structure-Aware In-Air Handwritten Text Recognition With Graph-Guided Cross-Modality Translator
6689STRUCTURE-INFORMED POSITIONAL ENCODING FOR MUSIC GENERATION
6950STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting
7278STUDY OF ABUSE DETECTION IN CONTINUOUS SPEECH FOR INDIAN LANGUAGES
1882Style Adaptation for Domain-Adaptive Semantic Segmentation
8710Style Factorization: Explore Diverse Style Variation for Domain Generalization
5611STYLECAP: AUTOMATIC SPEAKING-STYLE CAPTIONING FROM SPEECH BASED ON SPEECH AND LANGUAGE SELF-SUPERVISED LEARNING MODELS
8611STYLESPEECH: SELF-SUPERVISED STYLE ENHANCING WITH VQ-VAE-BASED PRE-TRAINING FOR EXPRESSIVE AUDIOBOOK SPEECH SYNTHESIS
11870SUB-BAND AND FULL-BAND INTERACTIVE U-NET WITH DPRNN FOR DEMIXING CROSS-TALK STEREO MUSIC
2710SUBDIVISION FEATURES-GUIDED BRAIN MRI SUPER-RESOLUTION VIA FORWARD AND BACKWARD PROPAGATION
8168SUBGROUP IDENTIFICATION THROUGH MULTIPLEX COMMUNITY STRUCTURE WITHIN FUNCTIONAL CONNECTIVITY NETWORKS
7220Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference
3250Subspace-Based Co-Array Processing for Nested Arrays Without Eigendecomposition
7881SUBSPACE-BASED DETECTION IN OFDM ISAC SYSTEMS UNDER DIFFERENT CONSTELLATIONS
8695SUBTYPE-SPECIFIC BIOMARKERS OF ALZHEIMER’S DISEASE FROM ANATOMICAL AND FUNCTIONAL CONNECTOMES VIA GRAPH NEURAL NETWORKS
6240SUMMARIZING COMMUNITY-BASED QUESTION-ANSWER PAIRS WITH FOCUS RECTIFICATION
11947SUMMARY ON THE MULTIMODAL INFORMATION-BASED SPEECH PROCESSING (MISP) 2023 CHALLENGE
6809SUNFLOWER STRATEGY FOR BAYESIAN RELATIONAL DATA ANALYSIS
3594SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
11468SUPERIORIZED ADAPTIVE PROJECTED SUBGRADIENT METHOD WITH APPLICATION TO MIMO DETECTION
4342Supplementing Missing Visions via Dialog for Scene Graph Generations
5541SURFACE-CONSTRAINED PROGRESSIVE FEATURE PRESERVING POINT CLOUD COMPRESSION
8943SVAD: A ROBUST, LOW-POWER, AND LIGHT-WEIGHT VOICE ACTIVITY DETECTION WITH SPIKING NEURAL NETWORKS
1457SWEEPMM: A HIGH-QUALITY MULTIMODAL DATASET FOR SWEEPING ROBOTS IN HOME SCENARIOS FOR VISION-LANGUAGE MODEL
6386SYLLABLE LEVEL FEATURES FOR PARKINSON'S DISEASE DETECTION FROM SPEECH
1272Symmetric Consistency with Cross-Domain Mixup for Cross-modality Cardiac Segmentation
8248SYMMETRIC VAR(1) MODELLING WITH GUARANTEED STABILITY
6456SYNCFUSION: MULTIMODAL ONSET-SYNCHRONIZED VIDEO-TO-AUDIO FOLEY SYNTHESIS
2154SYNCHFORMER: EFFICIENT SYNCHRONIZATION FROM SPARSE CUES
9955SYNONYM REPLACEMENT AND GENERATION ENHANCEMENT FOR DOCUMENT AUGMENTATION
2541SYNTHE-SEES: FACE BASED TEXT-TO-SPEECH FOR VIRTUAL SPEAKER
6567SYNTHESIZING Aβ-PET VIA AN IMAGE AND LABEL CONDITIONING LATENT DIFFUSION MODEL FOR DETECTING AMYLOID STATUS
3018SYNTHESIZING BLACK-BOX ANTI-FORENSICS DEEPFAKES WITH HIGH VISUAL QUALITY
3093SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR
9082Synthia's Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio
8215SYNTHTAB: LEVERAGING SYNTHESIZED DATA FOR GUITAR TABLATURE TRANSCRIPTION
6416SYNVOX2: TOWARDS A PRIVACY-FRIENDLY VOXCELEB2 DATASET
9501TA2P: Task-Aware Adaptive Pruning Method for Image Classification on Edge Devices
3915TACKLING ELECTRODE SHIFT IN GESTURE RECOGNITION WITH HD-EMG ELECTRODE SUBSETS
1021TACOS: LEARNING TEMPORALLY STRUCTURED EMBEDDINGS FOR FEW-SHOT KEYWORD SPOTTING WITH DYNAMIC TIME WARPING
10022Tag Antenna Structure Calibrated Backscattering Signal Detection
3243TAIL CLASSES MATTER: LONG-TAILED OBJECT DETECTION REVISITED
6614TALDS-Net: Task-Aware Adaptive Local Descriptors Selection for Few-shot Image Classification
9657TALKING FACE GENERATION FOR IMPRESSION CONVERSION CONSIDERING SPEECH SEMANTICS
9571TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING
1309TAMING PROMPT-BASED DATA AUGMENTATION FOR LONG-TAILED EXTREME MULTI-LABEL TEXT CLASSIFICATION
7417TARGET LOCALIZATION BASED ON MULTISTATIC MIMO RADAR VIA DOUBLE COUPLED CANONICAL POLYADIC DECOMPOSITION
5824TARGET OPTIMIZATION DIRECTION GUIDED TRANSFER LEARNING FOR IMAGE
10171TARGET SIGNAL POWER IMPROVEMENT AND CLUTTER SUPPRESSION VIA BEAMFORMING FOR INTEGRATED SENSING AND COMMUNICATION SYSTEMS
3147TARGET SPEAKER EXTRACTION BY DIRECTLY EXPLOITING CONTEXTUAL INFORMATION IN THE TIME-FREQUENCY DOMAIN
2997TARGET SPEECH EXTRACTION WITH PRE-TRAINED SELF-SUPERVISED LEARNING MODELS
7921TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit
5684TASK INDICATING TRANSFORMER FOR TASK-CONDITIONAL DENSE PREDICTIONS
7382TASK ORIENTED DIALOGUE AS A CATALYST FOR SELF-SUPERVISED AUTOMATIC SPEECH RECOGNITION
7990TASK SELECTION AND ASSIGNMENT FOR MULTI-MODAL MULTI-TASK DIALOGUE ACT CLASSIFICATION WITH NON-STATIONARY MULTI-ARMED BANDITS
8428Task vector algebra for ASR models
5101TASK-WISE PROMPT QUERY FUNCTION FOR REHEARSAL-FREE CONTINUAL LEARNING
2452TB-RESNET: BRIDGING THE GAP FROM TDNN TO RESNET IN AUTOMATIC SPEAKER VERIFICATION WITH TEMPORAL-BOTTLENECK ENHANCEMENT
2891TCMP: END-TO-END TOPOLOGICALLY CONSISTENT MAGNITUDE PRUNING FOR MINIATURIZED GRAPH CONVOLUTIONAL NETWORKS
3265TCNAS: TRANSFORMER ARCHITECTURE EVOLVING IN CODE CLONE DETECTION
10270TD-GPT:TARGET PROTEIN-SPECIFIC DRUG MOLECULE GENERATION GPT
6073TDT-KWS: FAST AND ACCURATE KEYWORD SPOTTING USING TOKEN-AND-DURATION TRANSDUCER
4775TEMPLATE-GUIDED DATA AUGMENTATION FOR UNBIASED SCENE GRAPH GENERATION
8950TEMPO ESTIMATION AS FULLY SELF-SUPERVISED BINARY CLASSIFICATION
2313TEMPORAL CONDITIONAL CODING FOR DYNAMIC POINT CLOUD GEOMETRY COMPRESSION
2042TEMPORAL CONVOLUTION SHRINKAGE NETWORK FOR KEYWORD SPOTTING
6927TEMPORAL INCONSISTENCY-BASED ACTIVE LEARNING
1871TEMPORAL KNOWLEDGE GRAPH EMBEDDING USING HOUSEHOLDER TRANSFORMATIONS
5718TEMPORAL RELATIONAL CONTEXT LEARNING FOR EXTRAPOLATION REASONING ON TEMPORAL KNOWLEDGE GRAPHS
7568TEMPORALLY-GUIDED TOTAL VARIATION FOR ROBUST SPATIOTEMPORAL FUSION OF SATELLITE IMAGES
4063Temporal-Spatial Prediction: pre-training on diverse datasets for EEG classification
7341T-ENFP: AN EFFICIENT TRANSFORMER ENCODER-BASED SYSTEM FOR DRIVING BEHAVIOR CLASSIFICATION
8104TEN-GUARD: TENSOR DECOMPOSITION FOR BACKDOOR ATTACK DETECTION IN DEEP NEURAL NETWORKS
5843Tensor decomposition-based data fusion for biomarker extraction from multiple EEG experiments
9792Tensor Graph Decomposition for Temporal Networks
3982Tensor Low-rank Approximation of Finite-horizon Value Functions
2310TENSOR RECONSTRUCTION-BASED SPARSE ARRAY 2-D DOA ESTIMATION OF MIXED COHERENT AND UNCORRELATED SIGNALS
8579TENSOR-GUIDED INTERPOLATION FOR OFF-GRID POWER SPECTRUM MAP CONSTRUCTION
2800TENSORIAL CONVOLUTIVE BLIND SOURCE SEPARATION
9775Test-Time Distribution Learning Adapter For Cross-Modal Visual Reasoning
10293Text Region Multiple Information Perception Network for Scene Text Detection
1797TEXT2AVATAR: TEXT TO 3D HUMAN AVATAR GENERATION WITH CODEBOOK-DRIVEN BODY CONTROLLABLE ATTRIBUTE
2059Text-Driven 3D Human Generation via 2D Image Collections
4720TEXT-DRIVEN TALKING FACE SYNTHESIS BY REPROGRAMMING AUDIO-DRIVEN MODELS
5094TEXT-ONLY UNSUPERVISED DOMAIN ADAPTATION FOR NEURAL TRANSDUCER-BASED ASR PERSONALIZATION USING SYNTHESIZED DATA
2483TEXTROLSPEECH: A TEXT STYLE CONTROL SPEECH CORPUS WITH CODEC LANGUAGE TEXT-TO-SPEECH MODELS
4034TEXTUAL TOKENS CLASSIFICATION FOR MULTI-MODAL ALIGNMENT IN VISION-LANGUAGE TRACKING
4348Texture and normal map estimation for 3D face reconstruction
7196Texture-Unet: A Texture-Aware Network for Bone Marrow Smear Whole-slide Image Region of Interest Segmentation
2570TEXT-VIDEO COMPLETION NETWORKS WITH MOTION COMPENSATION AND ATTENTION AGGREGATION
7320T-FOLEY: A CONTROLLABLE WAVEFORM-DOMAIN DIFFUSION MODEL FOR TEMPORAL-EVENT-GUIDED FOLEY SOUND SYNTHESIS
5081TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
6958THE 2ND CLARITY PREDICTION CHALLENGE: A MACHINE LEARNING CHALLENGE FOR HEARING AID INTELLIGIBILITY PREDICTION
11948THE 2ND E-PREVENTION CHALLENGE: PSYCHOTIC AND NON-PSYCHOTIC RELAPSE DETECTION USING WEARABLE-BASED DIGITAL PHENOTYPING
9320THE COLLABORATION OF 3D CONVOLUTIONS AND CRO-TSM IN LIPREADING
11889The Data-Driven Radio Frequency Signal Separation Challenge
8694The Devil is in Details: Delving into Lite FFN Design for Vision Transformers
8459THE DOUBLE-EDGED SWORD OF AI SAFETY: BALANCING ANOMALY DETECTION AND OOD GENERALIZATION VIA MODEL ANCHORING
7679THE EFFECTS OF LOUDNESS AND SMILING ON TIMBRE FEATURES: IMPLICATIONS FOR CHARISMATIC VOICES IN MANDARIN, GERMAN AND DANISH
11869THE FAWAISPEECH SYSTEM FOR MULTI-CHANNEL SPEECH RECOGNITION IN ICMC-ASR CHALLENGE
11856THE FOSAFER SYSTEM FOR THE ICASSP2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
11881THE ICASSP 2024 AUDIO DEEP PACKET LOSS CONCEALMENT GRAND CHALLENGE
11918The ICASSP SP Cadenza Challenge: Music Demixing/Remixing For Hearing Aids
2633THE JOINT GRID-FREE DOA AND POLARIZATION ESTIMATION ALGORITHM BASED ON ATOMIC NORM MINIMIZATION
9142THE MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) 2023 CHALLENGE: AUDIO-VISUAL TARGET SPEAKER EXTRACTION
8139The power of few: accelerating and enhancing data reweighting with coreset selection
8022THE RAO, WALD, AND LIKELIHOOD-RATIO TESTS UNDER GENERALIZED SELF-CONCORDANCE
11851THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
3181THE SELECTIVITY AND COMPETITION OF THE MIND’S EYE IN VISUAL PERCEPTION
11905THE THU-HCSI MULTI-SPEAKER MULTI-LINGUAL FEW-SHOT VOICE CLONING SYSTEM FOR LIMMITS’24 CHALLENGE
11893THE USTC SYSTEM FOR CADENZA 2024 CHALLENGE
11852THE USTC-NERCSLIP SYSTEMS FOR THE ICMC-ASR CHALLENGE
11880The XMUSPEECH SYSTEM FOR AUDIO-VISUAL TARGET SPEAKER EXTRACTION IN MISP 2023 CHALLENGE
8924Theme-enhanced Hard Negative Sample Mining for Open-domain Question Answering
10373Think as People: Context-driven Multi-image News Captioning with Adaptive Dual Attention
11519Third-Order Nested Array: An Optimal Geometry For Third-Order Cumulants Based Array Processing
7463THREE-DIMENSIONAL DECOUPLED ATOMIC NORM MINIMIZATION
7251THREE-DIMENSIONAL SOUND WAVE PROPAGATION REPRODUCTION BY CE-FDTD SIMULATION APPLYING ACTUAL RADIATION CHARACTERISTICS
2839Three-dimensional Spatial-Temporal Near-Field Passive Localization Based on an Exact Spatial Propagation Model
4008Through-the-Wall Radar Imaging with wall clutter removal via Riemannian optimization on the fixed-rank manifold
7428TIA: A TEACHING INTONATION ASSESSMENT DATASET IN REAL TEACHING SITUATIONS
8273TIMBRE-TRAP: A LOW-RESOURCE FRAMEWORK FOR INSTRUMENT-AGNOSTIC MUSIC TRANSCRIPTION
5511Time Changed Normalizing Flows for accurate SDE modeling
2943TIME-INTERVAL VISUAL SALIENCY PREDICTION IN MAMMOGRAM READING
4310TIME-MODULATED INTELLIGENT REFLECTING SURFACE FOR WAVEFORM SECURITY
4611TITAN: BRINGING THE DEEP IMAGE PRIOR TO IMPLICIT REPRESENTATIONS
7149TNFORMER: SINGLE-PASS MULTILINGUAL TEXT NORMALIZATION WITH A TRANSFORMER DECODER MODEL
2159TODM: TRAIN ONCE DEPLOY MANY EFFICIENT SUPERNET-BASED RNN-T COMPRESSION FOR ON-DEVICE ASR MODELS
1875TOKEN-BASED SPATIOTEMPORAL REPRESENTATION OF THE EVENTS
2160TokenMotion: Motion-Guided Vision Transformer for Video Camouflaged Object Detection Via Learnable Token Selection
3739TOPOLOGICAL NEURAL NETWORKS OVER THE AIR
3872TOPOLOGY-DEPENDENT PRIVACY BOUND FOR DECENTRALIZED FEDERATED LEARNING
8458Topology-Regularized Self-Knowledge Distillation for Transductive-Inductive Learning of Brain Disorder Diagnosis
1979Touring sampling with pushforward maps
1145Toward Quantifiable Face Age Transformation
3461TOWARD SUFFICIENT SPATIAL-FREQUENCY INTERACTION FOR GRADIENT-AWARE UNDERWATER IMAGE ENHANCEMENT
9139TOWARDS 3D COMPUTATIONAL PERSICOPY WITH AN ORDINARY CAMERA: A SEPARABLE NON-LINEAR LEAST SQUARES FORMULATION.
2292TOWARDS A UNIFIED VIEW OF ADVERSARIAL TRAINING: A CONTRASTIVE PERSPECTIVE
4361Towards a World-English Language Model for On-Device Virtual Assistants
8833TOWARDS AN INTERPRETABLE REPRESENTATION OF SPEAKER IDENTITY VIA PERCEPTUAL VOICE QUALITIES
7249TOWARDS AN OBJECTIVE QUALITY METRIC FOR INTERPOLATED DIRECTIONAL ROOM IMPULSE RESPONSES
1856Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
3589TOWARDS AUTOMATIC DATA AUGMENTATION FOR DISORDERED SPEECH RECOGNITION
7617Towards Building the FederatedGPT: Federated Instruction Tuning
1027TOWARDS CONTROLLED TABLE-TO-TEXT GENERATION WITH SCIENTIFIC REASONING
9383TOWARDS DISEASE-AWARE SELF-SUPERVISED DYNAMIC BRAIN NETWORK LEARNING FOR MENTAL DIAGNOSIS
2336Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models
9901TOWARDS ENABLING DPOAE ESTIMATION ON SINGLE-SPEAKER EARBUDS
4314TOWARDS END-TO-END SPOKEN GRAMMATICAL ERROR CORRECTION
7014TOWARDS FASTER END-TO-END DATA TRANSMISSION OVER VOICE CHANNELS
2360Towards Generic Deepfake Detection with Dynamic Curriculum
8511TOWARDS HIGH RESOLUTION WEATHER MONITORING WITH SOUND DATA
4710TOWARDS HIGH-PERFORMANCE AND LOW-LATENCY FEATURE-BASED SPEAKER ADAPTATION OF CONFORMER SPEECH RECOGNITION SYSTEMS
3631TOWARDS IMPROVING SPEECH EMOTION RECOGNITION USING SYNTHETIC DATA AUGMENTATION FROM EMOTION CONVERSION
6437TOWARDS INTELLIGENT DESIGN: A SELF-DRIVEN FRAMEWORK FOR COLLOCATED CLOTHING SYNTHESIS LEVERAGING FASHION STYLES AND TEXTURES
9502TOWARDS INTERPRETABILITY OF AUTOMATIC PHONEME ANALYSIS IN CLEFT LIP AND PALATE SPEECH
7040Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion Model
8875TOWARDS OMNISCIENT FEATURE ALIGNMENT FOR VIDEO RESCALING
4674TOWARDS OPTIMAL VOICE DISENTANGLEMENT WITH WEAK SUPERVISION
4120TOWARDS OPTIMIZED MULTI-CHANNEL MODULO-ADCS: MODULI SELECTION STRATEGIES AND BIT DEPTH ANALYSIS
3107TOWARDS PRACTICAL AND EFFICIENT IMAGE-TO-SPEECH CAPTIONING WITH VISION-LANGUAGE PRE-TRAINING AND MULTI-MODAL TOKENS
2683TOWARDS RESOURCE-EFFICIENT AND SECURE FEDERATED MULTIMEDIA RECOMMENDATION
4755TOWARDS ROBUST MULTIMODAL PROMPTING WITH MISSING MODALITIES
2885TOWARDS UNIVERSAL SPEECH DISCRETE TOKENS: A CASE STUDY FOR ASR AND TTS
6022TOWARDS VIDEO-TEXT RETRIEVAL ADVERSARIAL ATTACK
3370T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image
1725TRACKING BEYOND THE UNAMBIGUOUS RANGE WITH MODULO SINGLE-PHOTON LIDAR
7042TRACKING OF MULTIPLE SPAWNING TARGETS WITH HETEROGENEOUS SENSORS FOR SEABED-TO-SPACE SITUATIONAL AWARENESS
4675TraDeS++: Enhancing Multi-Object Tracking of Real Low Confidence Targets Using a Pyramid-like Self-Attention Model
4360Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing
10441Training a Radial Basis Function Network Under Transformed Probability Measure
2155TRAINING AUDIO CAPTIONING MODELS WITHOUT AUDIO
9291TRAINING GENERATIVE ADVERSARIAL NETWORK-BASED VOCODER WITH LIMITED DATA USING AUGMENTATION-CONDITIONAL DISCRIMINATOR
4206Training Ultra-Low-Latency Spiking Neural Networks from Scratch
4327TRAJECTORY SET EMPOWERED HYPERGRAPH TRANSFORMER FOR MOBILE SENSOR BASED TRAFFIC PREDICTION
1271TRANSAVS: END-TO-END AUDIO-VISUAL SEGMENTATION WITH TRANSFORMER
3828TransCycle: A Data Augmentation Method For 3D Human Pose Estimation
7862Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
9891TRANSENTENCE: SPEECH-TO-SPEECH TRANSLATION VIA LANGUAGE-AGNOSTIC SENTENCE-LEVEL SPEECH ENCODING WITHOUT LANGUAGE-PARALLEL DATA
9048TRANSFER THE LINGUISTIC REPRESENTATIONS FROM TTS TO ACCENT CONVERSION WITH NON-PARALLEL DATA
8649Transferable Models for Bioacoustics with Human Language Supervision
4183Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation
2468TRANSFORMER MODEL WITH MULTI-TYPE CLASSIFICATION DECISIONS FOR INTRUSION ATTACK DETECTION OF TRACK TRAFFIC AND VEHICLE
6825TRANSFORMER-INSPIRED LIGHTWEIGHT MODEL FOR EFFICIENT TIME SERIES FORECASTING
7384TRANSFORMING CARDIOVASCULAR HEALTH: A TRANSFORMER-BASED APPROACH TO CONTINUOUS, NON-INVASIVE BLOOD PRESSURE ESTIMATION VIA RADAR SENSING
3880TRANSLATOTRON 3: SPEECH TO SPEECH TRANSLATION WITH MONOLINGUAL DATA
6021TRANSMIT BEAMPATTERN OPTIMIZATION FOR MIMO-ISAC SYSTEMS WITH HYBRID BEAMFORMING
9741TRANSMITTING DATA THROUGH RECONFIGURABLE INTELLIGENT SURFACE: A SPATIAL SIGMA-DELTA MODULATION APPROACH
3480TRANSMUSIC: A TRANSFORMER-AIDED SUBSPACE METHOD FOR DOA ESTIMATION WITH LOW-RESOLUTION ADCS
4296TREE NETWORK DESIGN FOR FASTER DISTRIBUTED MACHINE LEARNING PROCESS WITH DISTRIBUTED DUAL COORDINATE ASCENT
9965TREE OF UNCERTAIN THOUGHTS REASONING FOR LARGE LANGUAGE MODELS
9280TREEMIL: A MULTI-INSTANCE LEARNING FRAMEWORK FOR TIME SERIES ANOMALY DETECTION WITH INEXACT SUPERVISION
1496TREND-HEURISTIC REINFORCEMENT LEARNING FRAMEWORK FOR NEWS-ORIENTED STOCK PORTFOLIO MANAGEMENT
3406TRET: Two Stream-based Regionally Enhanced Transformers for Person Re-identification
1761TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal Processing
2609TRUSTED DEEP DOMAIN ADAPTATION WITH UNCERTAINTY MEASURE BASED ON EVIDENCE THEORY
5312TRUST-SER: ON THE TRUSTWORTHINESS OF FINE-TUNING PRE-TRAINED SPEECH EMBEDDINGS FOR SPEECH EMOTION RECOGNITION
6926T-SOT FNT: STREAMING MULTI-TALKER ASR WITH TEXT-ONLY DOMAIN ADAPTATION CAPABILITY
8109TURN-TAKING AND BACKCHANNEL PREDICTION WITH ACOUSTIC AND LARGE LANGUAGE MODEL FUSION
7797TWO-EDGE-RESOLVED 3D NON-LINE-OF-SIGHT IMAGING: A FISHER INFORMATION EQUALIZED DISCRETIZATION
3833TWO-STAGE ACOUSTIC ECHO CANCELLATION NETWORK WITH DUAL-PATH ALIGNMENT
11861TWO-STAGE NEURAL NETWORK MODEL WITH PACKET LOSS DETECTION FOR ICASSP 2024 PLC CHALLENGE
6370TWO-STAGE TRANSFER LEARNING FOR FUSION AND CLASSIFICATION OF AIRBORNE HYPERSPECTRAL IMAGERY
1997TWO-STEP KNOWLEDGE DISTILLATION FOR TINY SPEECH ENHANCEMENT
4890TYPE-AWARE DECODING VIA EXPLICITLY AGGREGATING EVENT INFORMATION FOR DOCUMENT-LEVEL EVENT EXTRACTION
4081U2R: UNDERWATER ULTRASONIC REFLECTION WAVE DATASET TOWARD POSE-INVARIANT MATERIAL RECOGNITION
2477UAMIX-MAE: EFFICIENT TUNING OF PRETRAINED AUDIO TRANSFORMERS WITH UNSUPERVISED AUDIO MIXTURES
4072UAV Operation Time Minimization for Wireless-Powered Data Collection
8395UAV-based Dynamic Object Tracking with Radio Map
2963ULTRA LOW COMPLEXITY DEEP LEARNING BASED NOISE SUPPRESSION
1760ULTRA-LIGHTWEIGHT NEURAL DIFFERENTIAL DSP VOCODER FOR HIGH QUALITY SPEECH SYNTHESIS
4785ULTRA-LOW DELAY LOSSLESS COMPRESSION OF HIGHER ORDER AMBISONICS
2728UNAD: UNIVERSAL ANATOMY-INITIALIZED NOISE DISTRIBUTION LEARNING FRAMEWORK TOWARDS LOW-DOSE CT DENOISING
5126UNCERTAINTY QUANTIFICATION IN DEEP LEARNING BASED KALMAN FILTERS
7672UNCERTAINTY-GUIDED CONTRASTIVE LEARNING FOR SINGLE SOURCE DOMAIN GENERALISATION
7274Uncertainty-guided Person Search model with Auxiliary Shallow Feature Exploration
8637Uncertainty-Guided Physics-Driven Deep Learning Reconstruction via Cyclic Measurement Consistency
2884UNCOVERING STRONG TIES: A STUDY OF INDIRECT SYBIL ATTACK ON SIGNED SOCIAL NETWORK
3375UNDERLYING-COMPLEMENTARITY AND SURROUNDING-CORRESPONDENCE FOR MULTI-VIEW CLUSTERING
7151UNDERSTANDING DATA AUGMENTATION FROM A ROBUSTNESS PERSPECTIVE
6550Understanding Gaussian Noise Mismatch: A Hellinger Distance Approach
3047UNDERSTANDING PROBE BEHAVIORS THROUGH VARIATIONAL BOUNDS OF MUTUAL INFORMATION
8479UNeC: UNSUPERVISED EXPLORING IN CONTROLLABLE SPACE
8234UNIDEAL: CURRICULUM KNOWLEDGE DISTILLATION FEDERATED LEARNING
4599UNIDIRECTIONAL BRAIN-COMPUTER INTERFACE: ARTIFICIAL NEURAL NETWORK ENCODING NATURAL IMAGES TO fMRI RESPONSE IN THE VISUAL CORTEX
4609Unified Analysis of Correlation-Aware Joint Sparse Support Recovery with l_0-Norm Constraint
1838UNIFIED PRETRAINING TARGET BASED VIDEO-MUSIC RETRIEVAL WITH MUSIC RHYTHM AND VIDEO OPTICAL FLOW INFORMATION
3197UNIFIED PROBABILITY DISTRIBUTIONS OF GENERALIZED COMPOSITE FADING WITH INVERSE-TYPE DISTRIBUTIONS OF LARGE-SCALE SHADOWING/FLUCTUATIONS
7442UNIFIED SPEECH AND GESTURE SYNTHESIS USING FLOW MATCHING
1421Unified sRGB Real Noise Synthesizing with Adaptive Feature Modulation
5098Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations
3504UNIMODAL AGGREGATION FOR CTC-BASED SPEECH RECOGNITION
4238UNINTENDED MEMORIZATION IN LARGE ASR MODELS, AND HOW TO MITIGATE IT
5564UNITARY APPROXIMATE MESSAGE PASSING FOR MATRIX FACTORIZATION
8598UNIT-DSR: DYSARTHRIC SPEECH RECONSTRUCTION SYSTEM USING SPEECH UNIT NORMALIZATION
8854UNIVERSAL ADVERSARIAL ATTACK AGAINST SPEAKER RECOGNITION MODELS
7795UNIX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing
5068Unlabelled Sensing with Priors: Algorithm and Bounds
2057UNLEASHING TRIGGER-FREE EVENT DETECTION: REVEALING EVENT CORRELATIONS VIA A CONTRASTIVE DERANGEMENT FRAMEWORK
9159UNLOCKING DEEP LEARNING: A BP-FREE APPROACH FOR PARALLEL BLOCK-WISE TRAINING OF NEURAL NETWORKS
2415UNRAVEL ANOMALIES: AN END-TO-END SEASONAL-TREND DECOMPOSITION APPROACH FOR TIME SERIES ANOMALY DETECTION
5892UNRAVELING EXPLAINABLE REINFORCEMENT LEARNING USING BEHAVIOR TREE STRUCTURES
7152UNRESTRICTED GLOBAL-PHASE-BIAS AWARE SINGLE-CHANNEL SPEECH ENHANCEMENT WITH CONFORMER-BASED METRIC GAN
9684UNROLLED PROXIMAL GRADIENT DESCENT METHOD FOR NON-NEGATIVE LEAST SQUARES PROBLEM
2242UNSUPERVISED ACCENT ADAPTATION THROUGH MASKED LANGUAGE MODEL CORRECTION OF DISCRETE SELF-SUPERVISED SPEECH UNITS
2311UNSUPERVISED ACOUSTIC SCENE MAPPING BASED ON ACOUSTIC FEATURES AND DIMENSIONALITY REDUCTION
6532UNSUPERVISED ANOMALY DETECTION FOR MULTIVARIATE TIME SERIES USING DIFFUSION MODEL
1115UNSUPERVISED CONTINUAL LEARNING OF IMAGE REPRESENTATION VIA REMEMORY-BASED SIMSIAM
1219UNSUPERVISED DISPARITY ESTIMATION FOR LIGHT FIELD VIDEOS
8874UNSUPERVISED EXTRACTIVE DIALOGUE SUMMARIZATION IN HYPERDIMENSIONAL SPACE
7726UNSUPERVISED HARMONIC PARAMETER ESTIMATION USING DIFFERENTIABLE DSP AND SPECTRAL OPTIMAL TRANSPORT
5007UNSUPERVISED HUMAN ACTIVITY RECOGNITION VIA LARGE LANGUAGE MODELS AND ITERATIVE EVOLUTION
2887UNSUPERVISED LEARNING BASED END-TO-END DELAYLESS GENERATIVE FIXED-FILTER ACTIVE NOISE CONTROL
5764UNSUPERVISED LEARNING OF FACIAL OPTICAL FLOW VIA OCCLUSION-AWARE GLOBAL-LOCAL MATCHING
1292UNSUPERVISED LEARNING OF NEURAL SEMANTIC MAPPINGS WITH THE HUNGARIAN ALGORITHM FOR COMPOSITIONAL SEMANTICS
4323UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION
4040UNSUPERVISED MULTI-DOMAIN DATA SELECTION FOR ASR FINE-TUNING
7324Unsupervised Multiple Choices Question Answering via Universal Corpus
7920UNSUPERVISED MULTIPLE DOMAIN TRANSLATION THROUGH CONTROLLED DISENTANGLEMENT IN VARIATIONAL AUTOENCODER
7497UNSUPERVISED OPTIMAL POWER FLOW USING GRAPH NEURAL NETWORKS
7192UNSUPERVISED PITCH-TIMBRE DISENTANGLEMENT OF MUSICAL INSTRUMENTS USING A JACOBIAN DISENTANGLED SEQUENTIAL AUTOENCODER
11885UNSUPERVISED RELAPSE DETECTION USING WEARABLE-BASED DIGITAL PHENOTYPING FOR THE 2ND E-PREVENTION CHALLENGE
8620UNSUPERVISED REMOTE SENSING HAZE REMOVAL BASED ON SALIENCY-GUIDED TRANSMISSION REFINEMENT
9017Unsupervised Speech Enhancement with Diffusion-based Generative Models
4604UNSUPERVISED SPEECH RECOGNITION WITH N-SKIPGRAM AND POSITIONAL UNIGRAM MATCHING
5793UNSUPERVISED TOPIC-CONDITIONAL EXTRACTIVE SUMMARIZATION
7532UPDATED CORPORA AND BENCHMARKS FOR LONG-FORM SPEECH RECOGNITION
9637UPLINK SYMBOL DETECTION IN DYNAMIC TDD MIMO SYSTEMS WITH AP-AP INTERFERENCE
2781URBAN TRAFFIC FLOW FORECASTING BASED ON SPATIAL-TEMPORAL GRAPH CONTRASTIVE LEARNING
8183USEE: UNIFIED SPEECH ENHANCEMENT AND EDITING WITH CONDITIONAL DIFFUSION MODELS
3319USER-ASSISTED NETWORKED SENSING IN OFDM CELLULAR NETWORK WITH ERRONEOUS ANCHOR POSITION INFORMATION
7047USING CLUSTERING TO IMPROVE THE PERFORMANCE OF FEW-SHOT LEARNING
1628Using Temporal Consistency for Compressed Sensing in High-Resolution mmWave Sounding
4257USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
7380USM-SCD: MULTILINGUAL SPEAKER CHANGE DETECTION BASED ON LARGE PRETRAINED FOUNDATION MODELS
11490Utility-driven Joint Caching and Bitrate Allocation for Real-Time Immersive Videos
8342UTILIZING SECOND-ORDER INFORMATION IN NOISY INFORMATION-SHARING ENVIRONMENTS FOR DISTRIBUTED OPTIMIZATION
11452Variable-Wise Diagonal Preconditioning for Primal-Dual Splitting: Design and Applications
7846Variance Reduction Can Improve Trade-off In Multi-Objective Learning
7744VARIATIONAL ANALYSIS OF ADVERSARIAL REGULARIZATION FOR SOLVING INVERSE PROBLEMS
6016VARIATIONAL CONNECTIONIST TEMPORAL CLASSIFICATION FOR ORDER-PRESERVING SEQUENCE MODELING
7864VCD: A Video Conferencing Dataset for Video Compression
9250V-DDPM: MRI RICIAN NOISE REMOVAL MODEL BASED ON VST AND DDPM
7681VECTOR APPROXIMATE MESSAGE PASSING FOR NOT SO LARGE N.I.I.D. GENERALIZED I/O LINEAR MODELS
6422VECTOR APPROXIMATE MESSAGE PASSING WITH ARBITRARY I.I.D. NOISE PRIORS
2667VECTOR NONLINEAR HAWKES MODEL WITH INHIBITION
9008VECTOR QUANTIZATION KNOWLEDGE TRANSFER FOR END-TO-END TEXT IMAGE MACHINE TRANSLATION
8172VFD-NET: VOCODER FINGERPRINTS DETECTION FOR FAKE AUDIO
1688VGDIFFZERO: TEXT-TO-IMAGE DIFFUSION MODELS CAN BE ZERO-SHOT VISUAL GROUNDERS
8284VIC-KD: VARIANCE-INVARIANCE-COVARIANCE KNOWLEDGE DISTILLATION TO MAKE KEYWORD SPOTTING MORE ROBUST AGAINST ADVERSARIAL ATTACKS
7215VIDEO ANOMALY PREDICTION: PROBLEM, DATASET AND METHOD
3720Video-language Graph Convolutional Network for Human Action Recognition
2274View Crafting for Instance-Level Representation from Scene Images
3160VIEWING WRITING AS VIDEO: OPTICAL FLOW BASED MULTI-MODAL HANDWRITTEN MATHEMATICAL EXPRESSION RECOGNITION
6122VILAS: EXPLORING THE EFFECTS OF VISION AND LANGUAGE CONTEXT IN AUTOMATIC SPEECH RECOGNITION
11554VIRTUAL BASS ENHANCEMENT VIA MUSIC DEMIXING
11888Vision Transformer MST++: Efficient Hyperspectral Skin Reconstruction
9808VISION TRANSFORMER WITH 2D EXPLICIT POSITION ENCODING
5048VISION-SENSOR ATTENTION BASED CONTINUAL MULTIMODAL EGOCENTRIC ACTIVITY RECOGNITION
2752VISUAL ADAPT FOR RGBD TRACKING
1891VISUAL PROMPT TUNING FOR WEAKLY SUPERVSED PHRASE GROUNDING
3123VISUAL SPEECH RECOGNITION FOR LANGUAGES WITH LIMITED LABELED DATA USING AUTOMATIC LABELS FROM WHISPER
5065VISUAL-LINGUISTIC REPRESENTATION LEARNING WITH DEEP CROSS-MODALITY FUSION FOR REFERRING MULTI-OBJECT TRACKING
2640VISUALLY DEHALLUCINATIVE INSTRUCTION GENERATION
3581Visually Guided Binaural Audio Generation with Cross-modal Consistency
1530VK-G2T: VISION AND CONTEXT KNOWLEDGE ENHANCED GLOSS2TEXT
7024VL-FAS: DOMAIN GENERALIZATION VIA VISION-LANGUAGE MODEL FOR FACE ANTI-SPOOFING
3892VMCC-NET: UNCOVERING CHALLENGING REGIONS IN SEMI-SUPERVISED MEDICAL IMAGE SEGMENTATION WITH VOXEL MASK BASED CYCLIC-CONSISTENCY NETWORK
1836VOCAL FOLD DYNAMICS FOR AUTOMATIC DETECTION OF AMYOTROPHIC LATERAL SCLEROSIS FROM VOICE
7164VOICE ANONYMIZATION FOR ALL - BIAS EVALUATION OF THE VOICE PRIVACY CHALLENGE BASELINE SYSTEMS
2004VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
5036VOICEFLOW: EFFICIENT TEXT-TO-SPEECH WITH RECTIFIED FLOW MATCHING
9312VoiceLDM: Text-to-Speech with Environmental Context
7606VOLUMETRIC 3D POINT CLOUD ATTRIBUTE COMPRESSION: LEARNED POLYNOMIAL BILATERAL FILTER FOR PREDICTION
2345VoxBlink: A Large Scale Speaker Verification Dataset on Camera
9267VOXMM: RICH TRANSCRIPTION OF CONVERSATIONS IN THE WILD
7874VOXTLM: UNIFIED DECODER-ONLY MODELS FOR CONSOLIDATING SPEECH RECOGNITION, SYNTHESIS AND SPEECH, TEXT CONTINUATION TASKS
3771VRDMG: VOCAL RESTORATION VIA DIFFUSION POSTERIOR SAMPLING WITH MULTIPLE GUIDANCE
3371VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION
7958Vulnerability of Face Age Verification to Replay Attacks
5306WATER LEAK DETECTION VIA DOMAIN ADAPTATION
3576WATERDIFF: PERCEPTUAL IMAGE WATERMARKS VIA DIFFUSION MODEL
2545WAV2VEC-VC: VOICE CONVERSION VIA HIDDEN REPRESENTATIONS OF WAV2VEC 2.0
8500WAVELET-DECOUPLING CONTRASTIVE ENHANCEMENT NETWORK FOR FINE-GRAINED SKELETON-BASED ACTION RECOGNITION
10071Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing
3081WAVELET-INSPIRED MULTISCALE GRAPH CONVOLUTIONAL RECURRENT NETWORK FOR TRAFFIC FORECASTING
2695WAVER: WRITING-STYLE AGNOSTIC TEXT-VIDEO RETRIEVAL VIA DISTILLING VISION-LANGUAGE MODELS THROUGH OPEN-VOCABULARY KNOWLEDGE
2063Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos
2758Weakly Supervised Few-Shot Segmentation through Textual Prompt
8891WEAKLY-SUPERVISED CROWD COUNTING WITH TOKEN ATTENTION AND FUSION: A SIMPLE AND EFFECTIVE BASELINE
3945WFTNET: EXPLOITING GLOBAL AND LOCAL PERIODICITY IN LONG-TERM TIME SERIES FORECASTING
2938WHAT DO NEURAL NETWORKS LISTEN TO? EXPLORING THE CRUCIAL BANDS IN SPEECH ENHANCEMENT USING SINC-CONVOLUTION
2043WHAT DO SELF-SUPERVISED SPEECH AND SPEAKER MODELS LEARN? NEW FINDINGS FROM A CROSS MODEL LAYER-WISE ANALYSIS
9672WHEN GREEN LEARNING MEETS FEDERATED LEARNING: TOWARD DISTRIBUTED LEARNING WITH LOW COMPLEXITY AND MODEL HETEROGENEITY
8965WHEN TRAINING-FREE NAS MEETS VISION TRANSFORMERS: A NEURAL TANGENT KERNEL PERSPECTIVE
9880WHICH IS THE BETTER TEACHER ACTION? A NEW RANKING MODEL AND DATASET
5374WHISPER-BASED TRANSFER LEARNING FOR ALZHEIMER DISEASE CLASSIFICATION: LEVERAGING SPEECH SEGMENTS WITH FULL TRANSCRIPTS AS PROMPTS
11487WHY DO ANGULAR MARGIN LOSSES WORK WELL FOR SEMI-SUPERVISED ANOMALOUS SOUND DETECTION?
1117Widrow-Hoff LMS Adaline Demonstrator for Schools and Colleges
7777Wi-Fi based Indoor Monitoring enhanced by Multimodal Fusion
8225WiFiAct: Enhancing Human Sensing Through Environment Robust Preprocessing and Bayesian Self-Supervised Learning
7824WiGig-based Joint Multi-Person Positioning and Respiration Sensing
1657Window-based Convolutional Sparse Coding: Towards A Unified Framework
5080X-CAUNET: CROSS-COLOR CHANNEL ATTENTION WITH UNDERWATER IMAGE-ENHANCING TRANSFORMER
11875XIMALAYA ASDR SYSTEM FOR ICASSP 2024 IN-CAR MULTI-CHANNEL (ICMC) ASR CHALLENGE
2433XMP: A Cross-Attention Multi-Scale Performer for File Fragment Classification
8596YOLO-MED : MULTI-TASK INTERACTION NETWORK FOR BIOMEDICAL IMAGES
5836ZE-FESG: A ZERO-SHOT FEATURE EXTRACTION METHOD BASED ON SEMANTIC GUIDANCE FOR NO-REFERENCE VIDEO QUALITY ASSESSMENT
3999ZERO- AND FEW-SHOT SOUND EVENT LOCALIZATION AND DETECTION
1424ZERO RESOURCE CODE-SWITCHED SPEECH BENCHMARK USING SPEECH UTTERANCE PAIRS FOR MULTIPLE SPOKEN LANGUAGES
2740ZERO SHOT AUDIO TO AUDIO EMOTION TRANSFER WITH SPEAKER DISENTANGLEMENT
8138Zero-Shot Co-salient object detection Framework
9568Zero-shot Imitation Policy via Search in Demonstration Dataset
4270ZERO-SHOT INTENT CLASSIFICATION USING A SEMANTIC SIMILARITY AWARE CONTRASTIVE LOSS AND LARGE LANGUAGE MODEL
3262ZERO-SHOT OBJECT DETECTION WITH PARTITIONED CONTRASTIVE FEATURE ALIGNMENT
8834ZIGZAG ATTENTION: A STRUCTURAL AWARE MODULE FOR LANE DETECTION
6709ZIV-ZAKAI BOUND FOR DOA ESTIMATION WITH GAIN-PHASE ERROR