List of Accepted Papers

Following is the list of accepted ICASSP 2026 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at papers@2026.ieeeicassp.org.

Paper Number Paper Title
5047$\ell_0$-ESP: EFFICIENT STRUCTURED PRUNING FOR CONVOLUTIONAL NEURAL NETWORK COMPRESSION BASED ON $\ell_0$-NORM OPTIMIZATION
6153$S^3$: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models
10540(P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration
7398{AutoACC: LLM-Driven Irregular Operator Optimization for Inference Acceleration on RISC-V Edge Devices
13205{CAUSAL BLIND SOURCE SEPARATION: UNMIXING MULTIVARIATE SIGNALS BY DISCOVERING THEIR LATENT GENERATIVE GRAPH
191381-BIT UNLIMITED SAMPLING BEYOND FOURIER DOMAIN: LOW-RESOLUTION SAMPLING OF QUANTIZATION NOISE
157692 IN 1: A DUAL-PURPOSE APPROACH FOR EO-SAR SHIP DETECTION WITH SOURCE-FREE DOMAIN ADAPTATION
59972025 URGENT SPEECH ENHANCEMENT CHALLENGE MULTILINGUAL P.808 LISTENING TESTS: APPROACH AND RESULTS
116102I-Instruct: Generative Joint Empathy Detection and Empathy Intent Classification via Inter-Task and Inter-Instance Interactions
25963D MESH GRID ROOM IMPULSE RESPONSES MEASURED WITH A LINEAR MICROPHONE ARRAY AND SUPPRESSION OF FRAME REFLECTIONS
167843D MESH STEGANOGRAPHY ALGORITHM BASED ON NON-ADDITIVE DISTORTION MINIMIZATION
143303D MOTION SYNTHESIS FROM SPARSE TRACKING WITH AUTOREGRESSIVE TEMPORAL WINDOWS
16303D SCENE FLOW RECONSTRUCTION FOR DYNAMIC DEBLURRING WITH BOKEH RENDERING
168543D-AWARE SEMANTIC ALIGNMENT: JOINT GLOBAL AND LOCAL MODELING FOR 3D FEW-SHOT ANOMALY DETECTION
113223D-Aware Shadow Generation for Composite Image
168643DIFFUSIONDET: DIFFUSION MODEL FOR 3D OBJECT DETECTION WITH ROBUST LIDAR-CAMERA FUSION
174583DME: DUAL-BRANCH ENCODER WITH PROGRESSIVE MASKING FOR 3D MEDICAL FOUNDATION ENCODING MODEL
77483GeM Pooling: Direction-Aware and Compact Global Descriptors for Visual Place Recognition
184323-KEY-INPUT: EXPLORING THE THEORETICAL MINIMUM KEYS FOR TEXT ENTRY
12641A BAYESIAN APPROACH TO SINGING SKILL EVALUATION USING SEMITONE PITCH HISTOGRAM AND MCMC-BASED GENERATED QUANTITIES
4027A BENCHMARK DATASET AND BASELINE FRAMEWORK FOR ACTION RECOGNITION IN POWER CONSTRUCTION SAFETY
13360A Benchmark for Joint Dialogue Satisfaction, Emotion Recognition, and Emotion State Transition Prediction
12535A BIMODAL APPROACH FOR DETECTING FATIGUE USING SPEECH AND PERSONAL ASSESSMENTS IN COLLEGE STUDENTS
17989A Broadband Unit-Circle MVDR Beamformer with Spatial Adaptive Canceller
13692A CENTRALIZED PLANNING WITH DECENTRALIZED EXECUTION FRAMEWORK FOR COUNTER-UAV OPERATIONS IN URBAN ENVIRONMENTS
12538A Class of Finitely LMI Representable Worst-Case SINR Maximization Problems of Robust Adaptive Beamforming for General-Rank Signal Models
6787A COASTAL WIND SPEED RECONSTRUCTION ALGORITHM INTEGRATING LIMITED BUOY OBSERVATIONS AND LAYERED REFINEMENT
5990A COCKTAIL-PARTY BENCHMARK: MULTI-MODAL DATASET AND COMPARATIVE EVALUATION RESULTS
6227A COMPARATIVE STUDY ON HOW DATA NORMALIZATION AFFECTS ZERO-SHOT GENERALIZATION IN TIME SERIES FOUNDATION MODELS
9885A Comprehensive Benchmark for Evaluating Video Colorization and Color Propagation Methods
9587A Comprehensive Ecosystem for Open-Domain Customized Video Generation
18898A COMPREHENSIVE GUIDE TO MULTISET CANONICAL CORRELATION ANALYSIS AND ITS APPLICATION TO JOINT BLIND SOURCE SEPARATION
16901A Conflict-Free SpDMM Accelerator for GCN Inference on FPGA
17419A CONSISTENT LEARNING DEPRESSION DETECTION FRAMEWORK INTEGRATING MULTI-VIEW ATTENTION
8036A CONVERSATIONAL ENTITY LINKING METHOD BASED ON SENTENCE LEVEL AND TOKEN LEVEL DUAL EVALUATION
12657A CONVEX DEMIXING APPROACH FOR HYBRID-FIELD CHANNEL ESTIMATION OF XL-MIMO SYSTEMS VIA ATOMIC NORM MINIMIZATION
6197A DATA DRIVEN DESIGN FOR OPTIMAL SAMPLED SYNCHRONIZATION OF CHAOTIC SYSTEMS
14137A Data-Centric Framework for Scientific Natural Language Inference via LLM-Driven Information-Theoretic Augmentation
8493A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities
12572A Data-Informed Adaptive Convolution Kernel Learning Method for Image Fusion
14313A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks
17728A DECOMPOSITION-BASED STATE SPACE MODEL FOR MULTIVARIATE TIME-SERIES FORECASTING
19143A Deep Generative Model for Five-Class Sleep Staging with Arbitrary Sensor Input
12174A DEEP LEARNING-BASED APPROACH TO TRAFFIC ACCIDENT EVIDENCE EXTRACTION
17585A DISCRETE WAVELET TRANSFORM-BASED LIGHTWEIGHT TRANSFORMER MODEL FOR INTELLIGENT FAULT DIAGNOSIS
3513A DISTRIBUTION MATCHING APPROACH TO NEURAL PIANO TRANSCRIPTION WITH OPTIMAL TRANSPORT
17290A DUAL-BRANCH FRAMEWORK FOR SEMANTIC CHANGE DETECTION WITH BOUNDARY AND TEMPORAL AWARENESS
4204A DUAL-CHANNEL ASR-LLM ARCHITECTURE WITH A PROGRESSIVE TRAINING STRATEGY FOR LOW-RESOURCE SPEECH RECOGNITION
9650A DUAL-CONTEXT FUSION MODEL FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS
14453A DUAL-MODULATION FRAMEWORK FOR RGB-T CROWD COUNTING VIA SPATIALLY MODULATED ATTENTION AND ADAPTIVE FUSION
6280A DUAL-PATH APPROACH TO OPTIMIZING LLMS: ENTROPY CONSTRAINT FOR EXPLOITATION AND NEURAL PERTURBATION FOR EXPLORATION
2777A DUAL-PATH MAMBA WITH FIXED AND VARIABLE PATCHES FOR TIME SERIES FORECASTING
15436A Dynamic Dual-Backbone Model for Adaptive Vehicle-Pedestrian Detection
18000A Dynamic Gated Cross-Attention Framework for Audio-Text Apparent Personality Analysis
18866A FAST ALGORITHM FOR COMPUTATION OF GENERAL INTEGER-ORDER HANKEL TRANSFORMS
9493A FEATURE-OPTIMIZED AUDIO WATERMARKING ALGORITHM WITH ADAPTIVE EMBEDDING STRENGTH
9577A FINE-GRAINED MASK-GUIDED MULTIMODAL FRAMEWORK FOR WEAKLY SUPERVISED INSTANCE SEGMENTATION OF ROCK MICROGRAPHS
4341A FINE-GRAINED MODALITY ALIGNMENT MODEL FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS
16172A FRAMEWORK FOR BIPARTITE GRAPH STRUCTURE LEARNING THROUGH EIGENVECTOR PARTITIONING
14631A FRAMEWORK FOR CONTROLLED MULTI-SPEAKER AUDIO SYNTHESIS FOR ROBUSTNESS EVALUATION OF SPEAKER DIARISATION SYSTEMS
11990A FRAMEWORK FOR TEXT-TO-SEMANTIC SEGMENTATION MAP GENERATION
9009A GAME-THEORETIC APPROACH FOR DISTRIBUTED MEC-ENABLED COLLABORATIVE INFERENCE IN AIGC NETWORKS
15117A GENERALIZATION STRATEGY FOR SPEECH QUALITY PREDICTION: FROM DOMAIN-SPECIFIC TO UNIFIED DATASETS
14537A Generative Model for Controllable Feature Heterophily in Graphs
14495A GENERATIVE-FIRST NEURAL AUDIO AUTOENCODER
17060A Graph-Based Framework for Detecting Small Noisy Targets: Theory and Analysis
1874A High Performance Hardware Accelerator For Fully Homomorphic Encryption and Application to Neural Networks
16402A Hybrid Convolution-Mamba Network With Tone-Octave Contrastive Learning For Stratified Semi-supervised Singing Melody Extraction
11874A HYBRID GRID-BASED METHOD FOR VIDEO REPRESENTATION
14715A JOINT SPATIAL TIME-FREQUENCY ATTENTION FOR LEAKAGE DETECTION IN WATER DISTRIBUTION NETWORKS
12155A KEYWORD QUERY SYSTEM FOR NON-PUBLIC DATABASE SCHEMAS BASED ON SEMANTIC-ENHANCED INVERTED INDEXING
12707A Latent Drift-Guided Replay Method for Robust Continual Learning in Medical Imaging
5364A LEARNING-BASED AUTOMOTIVE SOUND FIELD REPRODUCTION METHOD USING PLANE-WAVE DECOMPOSITION AND MULTI-POSITION CONSTRAINT
11730A LIGHTWEIGHT FOURIER-BASED NETWORK FOR BINAURAL SPEECH ENHANCEMENT WITH SPATIAL CUE PRESERVATION
11236A LIGHTWEIGHT NETWORK WITH ADAPTIVE CONTEXT AND FREQUENCY-SPATIAL SYNERGY FOR HUMAN POSE ESTIMATION
18683A LIGHT-WEIGHT PRNU-BASED CAMERA-DEVICE AUTHENTICATION BASED ON DEVICE-SPECIFIC IMAGE DOWNSAMPLING
16437A LIGHTWEIGHT SEMANTIC SEGMENTATION SYSTEM FOR 3D MEDICAL IMAGE
16847A LLM-Driven Acoustic Semantic Enriched Framework For Underwater Acoustic Target Recognition
5025A long-form single-speaker real-time MRI speech dataset and benchmark
9994A LOW-COMPLEXITY EQUALIZER DESIGN FOR OTFS MODULATION IN DOUBLY-DISPERSIVE CHANNELS
6542A Low-Rank Angular Domain Sampling SVD Approximation for Massive MIMO Signal Processing
1848A Malicious Policy Detection Approach Enhanced by Threat Knowledge in LLM-Based Embodied Robots
15620A MARITIME SMALL TARGET DETECTION METHOD USING RANDOM FOREST WITH KMD FEATURE ENHANCEMENT AND COST-SENSITIVE LEARNING
16114A Memory-Augmented Dual-Stream Framework to achieve Long-Horizon Generalization in Robotic Manipulation
9876A mixed precision FFT with applications in MRI
7079A MODEL-HETEROGENEOUS FEDERATED UNLEARNING METHOD VIA NEGATIVE KNOWLEDGE DISTILLATION
12074A MODIFIED CONCEFT FRAMEWORK WITH OPTIMAL MULTITAPER WEIGHTS FOR ROBUST SYNCHROSQUEEZING
11796A MODIFIED YOLO WITH DUAL-BRANCH ATTENTION FOR HIGH-ACCURACY DETECTION OF CORN SEEDLINGS IN UAV IMAGES
18916A MOUSE DYNAMICS AUTHENTICATION SYSTEM WITH A RECURRENCE PLOT IMAGE REPRESENTATION AND A VISION TRANSFORMER FRAMEWORK
10952A MULTI-AGENT SYSTEM FOR ZERO-SHOT CONTROLLABLE IMAGE CAPTIONING
6764A MULTI-FREQUENCY CONTINUOUS-SHARE TRADING ALGORITHM WITH GARCH AND DEEP REINFORCEMENT LEARNING
13142A MULTIMODAL DEPTH-AWARE METHOD FOR EMBODIED REFERENCE UNDERSTANDING
9763A multi-prototypes graph-based clustering algorithm with entropy regularization
1163A MULTI-ROUND INFERENCE BASED MACHINE READING COMPREHENSION MODEL FOR EMOTION CAUSE PAIR EXTRACTION
6781A MULTI-SCALE SPATIALLY COLLABORATIVE FREQUENCY-GUIDED NETWORK FOR IMAGE DERAINING
18230A MULTI-TASK APPROACH TOWARDS ROBUST VIETNAMESE AUDIO-BASED TOXIC SPAN DETECTION
10053A Multi-View Fusion Framework for Audio-Visual Multi-Speaker Tracking
16598A Neural Operator for Spatiotemporal Significant Wave Height Prediction Based on Spectral Residual Region Partitioning
10291A NEW ADAPTIVE HYBRID REPRESENTATION METHOD FOR SINGLE-CELL DATA
6629A NEW METHOD AND DATASET FOR CLASSROOM TEACHING STAGE SEGMENTATION
11472A NEW WEIGHT-TYING ARCHITECTURE OF NONNEGATIVE NEURAL NETWORK: CONVERGENT PLUG-AND-PLAY IMAGE RESTORATION BY MONOTONE LIPSCHITZ-GRADIENT DENOISER
13540A NONITERATIVE PHASE RETRIEVAL CONSIDERING THE ZEROS OF STFT MAGNITUDE
6692A NON-OVERLAPPING HAWKES MODELING FOR TESTING GRANGER CAUSALITY
18973A NONPARAMETRIC VARIABLE FORGETTING FACTOR RECURSIVE LEAST-SQUARES ALGORITHM
13000A NO-REFERENCE SCREEN CONTENT IMAGE QUALITY ASSESSMENT METHOD BASED ON REGIONAL DISTORTION PERCEPTION
10347A Noval Monte Carlo Gradient Method Based on Meta-learning for Effective Step-size Selection in Active Noise Control
16281A NOVEL ARBITRARY RECOVERABLE MULTI-IMAGE HIDING ALGORITHM BASED ON POLARIZATION THEORY
10958A NOVEL ARRAY DESIGN WITH INCREASED DEGREES OF FREEDOM BY EMPLOYING THE VERTICAL MOTION OF UNIFORM CIRCULAR ARRAY
3503A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech
9932A NOVEL BAYESIAN EM-LIKE ALGORITHM FOR FAST COMPTON CAMERA IMAGING
10825A novel intrinsic Cramér-Rao bound for exact Gaussian distribution on Lie groups
5261A Novel Iterative OTFS Detector based on Local L-MMSE and Global Message Passing
8330A NOVEL MULTIBEAM TIME-DIVISION ISAC APPROACH WITH ACCURATE SENSING
11194A NOVEL MULTI-SCALE FEATURE FUSION METHOD FOR REAL-TIME DANGEROUS DRIVING BEHAVIOR DETECTION IN REAL-WORLD DRIVING SCENARIOS
1176A NOVEL MULTISCALE ORDER-FREQUENCY SPECTRAL CORRELATION ESTIMATOR FOR ANGLE-TIME CYCLOSTATIONARY SIGNALS
2876A NOVEL SELF-CORRECTING DIRECT POSITION DETERMINATION IN ASYNCHRONOUS SENSOR NETWORKS
17299A Novel Underwater Integrated Communication and Positioning Algorithm Based on OAM-OFDM
12521A NUMERICALLY STABLE HOUSEHOLDER-BASED EX-RLS ALGORITHM
14171A PARAMETER-EFFICIENT MULTI-SCALE CONVOLUTIONAL ADAPTER FOR SYNTHETIC SPEECH DETECTION
13453A PARAMETRIC POWER MODEL OF UPPER MID-BAND (FR3) BASE STATIONS FOR 6G
9018A PERSONALIZED FRAMEWORK FOR AUTOMATED AUDIO TUNING ON SHORT-FORM VIDEO PLATFORMS
3523A PERSONALIZED REAL-TIME PROACTIVE VOICE MEMORY ASSISTANT
17078A PSEUDOINVERSE-BASED MOMENTUM FISTA FOR SPARSE SIGNAL RECOVERY
9499A Query-based End-to-End Transformer for Third-person Human Gaze Analysis via Joint Fine-tuning Strategy
1032A Queueing Model for Memory Controller Scheduler Subject to DRAM Column Access Timing Constraints
11458A RANDOM MATRIX PERSPECTIVE OF ECHO STATE NETWORKS: FROM PRECISE BIAS–VARIANCE CHARACTERIZATION TO OPTIMAL REGULARIZATION
16607A ROBUST KNN APPROACH FOR MULTI-CLASS LARYNGEAL DISEASE DETECTION USING MFCC FEATURES
10501A ROBUST METHOD FOR GEAR FAILURE DETECTION AND SEVERITY ESTIMATION BASED ON MULTI-SENSOR PHYSICAL FEATURE FUSION AND DOMAIN ADAPTATION
6089A SCALED POISSON BAYESIAN MODEL FOR VIRAL EPIDEMIC MONITORING
10207A SIMILARITY-GUIDED AGGREGATION NETWORK FOR SOUND EVENT LOCALIZATION AND DETECTION WITH SOURCE DISTANCE ESTIMATION
6190A SPECTRAL-GUIDED LATENT PHYSICS SOLVER FOR PDE PROBLEMS
8048A SPEECH-DRIVEN PARADIGM FOR PHYSICS-INFORMED MODELING OF COUPLED MICRO-SPEAKERS
10889A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering
5854A STAGE-WISE LEARNING STRATEGY WITH FIXED ANCHORS FOR ROBUST SPEAKER VERIFICATION
13295A State-Dependent Markov Diffusion Process for Generative Speech Enhancement
13120A STUDY OF DATA SELECTION STRATEGIES FOR PRE-TRAINING SELF-SUPERVISED SPEECH MODELS
15434A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection
2629A SUPPORT VECTOR APPROACH IN SEGMENTED REGRESSION FOR MAP-ASSISTED NON-COOPERATIVE SOURCE LOCALIZATION
7202A TASK-AWARE DUAL-LEVEL SELF-SUPERVISED LEARNING METHOD FOR EFFECTIVE SOUND EVENT DETECTION
5888A TEXT-TO-TEXT ALIGNMENT ALGORITHM FOR BETTER EVALUATION OF MODERN SPEECH RECOGNITION SYSTEMS
11136A Training-Free Framework for High-Fidelity Appearance Transfer via Diffusion Transformers
9484A TWO-PHASE HYBRID TASK SCHEDULING ALGORITHM WITH ROUTE PLANNING
14911A Two-Stage Globally-Diverse Adversarial Attack for Vision-Language Pre-training Models
5558A Unified Four-Stage Dynamic Cycle for Robust Federated Fine-Tuning of Large Language Models
12990A UNIFIED HARDWARE ACCELERATOR FOR PRIVACY PRESERVING LLMS CLIENT-SIDE BASED ON CKKS HOMOMORPHIC ENCRYPTION
10380a Unified Rate Control Method for Spinning and Non-Spinning LiDAR Point Cloud Compression
4067A Unified SVD-Modal Solution for Sparse Sound Field Reconstruction with Hybrid Spherical-Linear Microphone Arrays
3844A UNITARY QUANTUM PROCESS TOMOGRAPHY METHOD BASED ON A DENSITY MATRIX DIAGONALIZATION
16472A UNSUPERVISED DOMAIN ADAPTATION FRAMEWORK FOR SEMI-SUPERVISED MELODY EXTRACTION USING CONFIDENCE MATRIX REPLACE AND NEAREST NEIGHBOUR SUPERVISION
1768A User-Item Aware Encoding Framework for Short Video
15722A WAVELET-BASED GRAPH DYNAMICAL CONFIDENCE INFORMATION BOTTLENECK NETWORK FOR CLASS-IMBALANCED NODE CLASSIFICATION
13865A Wavelet-Based Network with Multi-Scale Feature Complementarity Enhancement for Salient Object Detection in Optical Remote Sensing Images
14895A WAVELET–QUATERNION NEURAL MODULE FOR UNIVERSAL VISUAL BACKBONES
5128A3D: ADVANCED ADVERSARIAL ATTACK AS DETECTION FRAMEWORK FOR EDGE DEVICES
14511ABC-EVAL: BENCHMARKING LARGE LANGUAGE MODELS ON SYMBOLIC MUSIC UNDERSTANDING AND INSTRUCTION FOLLOWING
17318ABRACADDBRA: TOUCH-GUIDED OBJECT ADDITION BY DECOUPLING PLACEMENT AND EDITING SUBTASKS
17197ABS-HUNET: AN ULTRA-LIGHTWEIGHT SPEECH ENHANCEMENT MODEL WITH ADAPTIVE BAND-SPLIT AND HALF-UNET DESIGN
16546ACAVCAPS: ENABLING LARGE-SCALE TRAINING FOR FINE-GRAINED AND DIVERSE AUDIO UNDERSTANDING​
17182Accelerated Approximate Message Passing
11289Accelerated Sinkhorn Algorithms for Partial Optimal Transport
12336Accelerated training of Gaussian processes using banded square exponential covariances
13081ACCELERATING 3D GAUSSIAN SPLATTING VIA WAVELET-GUIDED SCHEDULING
14708ACCELERATING FEDERATED LEARNING THROUGH DROPOUT OF RENEWABLE NEURON PARAMETERS
3280ACCELERATING KBQA VIA LOGICAL-QUESTION BIDIRECTIONAL RERANKING
18106ACCELERATING VEHICULAR FEDERATED LEARNING VIA CONVERGENCE-AWARE HIERARCHICAL SCHEDULING
3563ACCELGS: AN ACCELERATION FRAMEWORK FOR LARGE-SCALE 3D GAUSSIAN SPLATTING TRAINING
16467ACCENT-INVARIANT AUTOMATIC SPEECH RECOGNITION VIA SALIENCY-DRIVEN SPECTROGRAM MASKING
15083ACCEPTANCE-GUIDED ADAPTIVE SPECULATIVE DECODING FOR EFFICIENT LARGE LANGUAGE MODEL INFERENCE
14448ACCLID: ACCENT-AWARE LANGUAGE IDENTIFICATION FOR ROBUST MULTILINGUAL SPEECH RECOGNITION
1226ACD-CLIP: DECOUPLING REPRESENTATION AND DYNAMIC FUSION FOR ZERO-SHOT ANOMALY DETECTION
3084Achieving Linear Speed-Up for Distributed Inexact-ADMM
6230ACHIEVING PARETO OPTIMALITY IN GAMES VIA SINGLE-BIT FEEDBACK
9307ACIR-MACL: EFFECTIVE MULTIMODAL SENTIMENT ANALYSIS VIA ATTENTION-BASED CAUSAL INTERVENTION REGULARIZATION AND MULTI-ASPECT CONTRASTIVE LEARNING
16264ACM: MULTIPLE ATTRIBUTES CONTRASTIVE MECHANISM FOR VALUE DECOMPOSITION IN MULTI-AGENT REINFORCEMENT LEARNING
15340ACOUSTIC AND FACIAL MARKERS OF PERCEIVED CONVERSATIONAL SUCCESS IN SPONTANEOUS SPEECH
17105ACOUSTIC FEEDBACK CANCELLATION IN HEARING AIDS EXPLOITING AN INERTIAL SENSOR
1182ACOUSTIC NON-STATIONARITY OBJECTIVE ASSESSMENT WITH HARD LABEL CRITERIA FOR SUPERVISED LEARNING MODELS
19118Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities
14026ACOUSTIC TELEPORTATION VIA DISENTANGLED NEURAL AUDIO CODEC REPRESENTATIONS
3260ACTION-AWARE QUERY SELECTION AND AMBIGUOUS SNIPPET DISAMBIGUATION FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
7518ActionHSMR: Sequence-based 3D Human Pose and Mesh Estimation with Temporal Consistency
12678ACTIVE INFERENCE FRAMEWORK FOR CLOSED-LOOP SENSING, COMMUNICATION, AND CONTROL IN UAV SYSTEMS
14428ACTIVE JAMMER LOCALIZATION VIA ACQUISITION-AWARE PATH PLANNING
12135ACTIVE SENSING BASED BEAM ALIGNMENT FOR BACKSCATTER COMMUNICATION
7206ACTIVE SEQUENTIAL HYPOTHESIS TESTING WITH NON-HOMOGENEOUS COSTS
4719ACTIVE-EDIT: HIGH-FIDELITY 3D EDITING FROM A HANDFUL OF TASK-RELEVANT
2507ACTIVEPARAM: SELECTIVE PARAMETERIZATION FOR EFFICIENT AND ROBUST RETRIEVAL-AUGMENTED GENERATION
2815ACTIVITY RECOGNITION USING INAUDIBLE ACOUSTIC FMCW
12383ADAEVOL: DYNAMIC ADAPTER MERGING FOR EFFECTIVE CONTINUAL LEARNING AND KNOWLEDGE TRANSFER IN LARGE LANGUAGE MODELS
11440ADAFLOW: EFFICIENT LONG VIDEO EDITING VIA ADAPTIVE ATTENTION SLIMMING AND KEYFRAME SELECTION
4334AdaNODEs: Test Time Adaptation for Time Series Forecasting Using Neural ODEs
6768AdaParse: A Structured Lipschitz Regularization Framework for Robust Reinforcement Learning
3172AdaPrune: A Two-Stage Filter-Select Methods for Visual Token Pruning in Specialized VLMs
12619ADAPTER-STATE SHARING CLIP FOR PARAMETER-EFFICIENT MULTIMODAL SARCASM DETECTION
14046ADAPTING DIARIZATION-CONDITIONED WHISPER FOR END-TO-END MULTI-TALKER SPEECH RECOGNITION
14392ADAPTING WHISPER FOR PADDING-FREE INFERENCE USING AN ENCODER ATTENTION MASK AND KNOWLEDGE DISTILLATION
11235Adaptive and Balanced Re-initialization for Long-timescale Continual Test-time Domain Adaptation
6151Adaptive Closed-Form DOA Estimation in the Spherical Harmonics Domain
3105ADAPTIVE COMPRESSED INTEGRATE-AND-FIRE TIME ENCODING MACHINE
10761Adaptive Defense against Stationary Test-Time Attacks on Classifiers
15571ADAPTIVE DETERMINISTIC FLOW MATCHING FOR TARGET SPEAKER EXTRACTION
17454ADAPTIVE DISTILLATION FOR LM-GNN ALIGNMENT IN SEMI-SUPERVISED TEXT-ATTRIBUTED GRAPH NODE CLASSIFICATION
3680ADAPTIVE EMBEDDING FUSION WITH CONTRASTIVE LEARNING FOR ROBUST FULLY FEW-SHOT CLASS-INCREMENTAL AUDIO CLASSIFICATION
16632ADAPTIVE FEW-SHOT CHANNEL STATE INFORMATION PHYSICAL LAYER AUTHENTICATION FOR LEO CONSTELLATIONS
15754Adaptive Graph Coarsening for Efficient GNN Training
11568ADAPTIVE GUIDANCE SEMANTICALLY ENHANCED VIA MULTIMODAL LLM FOR EDGE-CLOUD OBJECT DETECTION
6013ADAPTIVE METAHEURISTIC-OPTIMIZED STOCHASTIC RESONANCE NETWORK FOR DOA ESTIMATION IN LOW-SNR UNDERWATER ENVIRONMENTS
16731Adaptive Multi-Scale Correlation Meta-Network for Few-Shot Remote Sensing Image Classification
5818ADAPTIVE PER-CHANNEL ENERGY NORMALIZATION FRONT-END FOR ROBUST AUDIO SIGNAL PROCESSING
3175ADAPTIVE REPRESENTATION REFINEMENT FOR ROBUST FINE-GRAINED FEW-SHOT IMAGE CLASSIFICATION
11963Adaptive Retrieval-Augmented Generation via Contrastive Learning on Implicit Feedback
15076ADAPTIVE RFS TRACKING FOR SWARM UAV-BORNE RADARS USING RANGE-DOPPLER MEASUREMENTS
14016ADAPTIVE ROTARY STEERING WITH JOINT AUTOREGRESSION FOR ROBUST EXTRACTION OF CLOSELY MOVING SPEAKERS IN DYNAMIC SCENARIOS
6623ADAPTIVE RUNGE-KUTTA DYNAMICS FOR SPATIOTEMPORAL PREDICTION
4193Adaptive Score Calibration for Content-Based Image Retrieval
8237ADAPTIVE SHARED EXPERTS WITH LORA-BASED MIXTURE OF EXPERTS FOR MULTI-TASK LEARNING
14308ADAPTIVE SPATIAL GOODNESS ENCODING: SCALING THE FORWARD-FORWARD ALGORITHM FOR CONVOLUTIONAL NEURAL NETWORKS
3040ADAPTIVE SPEAKER EMBEDDING SELF-AUGMENTATION FOR PERSONAL VOICE ACTIVITY DETECTION WITH SHORT ENROLLMENT SPEECH
12021ADAPTIVE SPECTRAL GRAPH PARTITIONING FOR PORTFOLIO OPTIMISATION
18183ADAPTIVE SPECTRAL WEIGHTING IN SAGITTAL-PLANE SOUND LOCALIZATION: A RELIABILITY-DRIVEN APPROACH
15336ADAPTIVE TASK-INCREMENTAL LEARNING FOR UNDERWATER ACOUSTIC RECOGNITION BASED ON MIXTURE-OF-EXPERTS ADAPTER
3774ADAPTIVE TOPOLOGICAL CONSTRAINT ENHANCED PEDESTRIAN TRAJECTORY PREDICTION
14925ADAPTIVE VOLUMETRIC VIDEO STREAMING WITH IMAGE-BASED RENDERING
17829Adaptive Waveform Design for Cognitive FDA Radar Using AT-WWB
11823ADAPTIVE WORLD MODEL WITH LATENT GENERATION ALGORITHM FOR DEEP REINFORCEMENT LEARNING IN PORTFOLIO OPTIMIZATION
5629ADAPTIVEDIFFUSEMOTION: ADAPTIVE MULTI-TASK DIFFUSION MODEL FOR SPEECH-DRIVEN HOLISTIC MOTION GENERATION
2125ADAPTIVELY TAMING ESTIMATION BIAS FOR DEEP REINFORCEMENT LEARNING WITH MULTI-OBJECTIVE OPTIMIZATION
13060ADAPTIVELY WEIGHTED MULTI-MODAL JOINT ENTROPY WITH DYNAMIC ALLOCATION AND FAULT-TOLERANT FUSION FOR INDUSTRIAL DIAGNOSTICS
13528ADAPTIVE-VOCO: COMPLEXITY-AWARE VISUAL TOKEN COMPRESSION FOR VISION-LANGUAGE MODELS
15222AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration
13152ADD-RAG: AGENT-DRIVEN DYNAMIC RAG WITH ADAPTIVE RETRIEVAL STRATEGIES AND MULTI-RETRIEVER COLLABORATION FOR ENHANCED GENERATION
12179ADDRESSING GRADIENT MISALIGNMENT IN DATA-AUGMENTED TRAINING FOR ROBUST SPEECH DEEPFAKE DETECTION
4310ADEPT: An Entropy-Driven Dual-Strategy Agent for Interactive Video Retrieval
9747ADH-VA: ADAPTIVE DIRECTED-HYPERGRAPH CONVOLUTION WITH VA CONTRASTIVE LEARNING FOR MULTIMODAL CONVERSATIONAL EMOTION RECOGNITION
4462ADORE: ASYMMETRIC RELATIONAL DISTILLATION WITH RERANKING FOR INSTANCE LEVEL IMAGE RETRIEVAL
11682ADP-NET: AN ASYMMETRIC DUAL-BRANCH NETWORK FOR DIBR HOLE FILLING
14642ADREC: TRAINING AN AUTONOMOUS DECISION-MAKING RECOMMENDATION AGENT THROUGH BEHAVIOR CLONING
16668ADVANCED MODELING OF INTERLANGUAGE SPEECH INTELLIGIBILITY BENEFIT WITH L1-L2 MULTI-TASK LEARNING USING DIFFERENTIABLE K-MEANS FOR ACCENT-ROBUST DISCRETE TOKEN-BASED ASR
15857ADVANCING FINE-GRAINED SENTIMENT ANALYSIS IN COMPLEX CONTEXTS: A NEW BENCHMARK AND INTERPRETATION-ENHANCED APPROACH
13115ADVANCING LLM-BASED MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION WITH GLOBAL CROSS-CHANNEL ATTENTION AND SENTENCE-ORDERED FIRST-IN FIRST-OUT SERIALIZED OUTPUT TRAINING
16771Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise
15761ADVANCING SPEAKER BASED VOCAL EFFORT CLASSIFICATION WITH WAVLM AND DATA AUGMENTATION IN NATURALISTIC NON-CALIBRATED SPEECH RECORDINGS
10976ADVANCING SPEECH SUMMARIZATION IN MULTI-MODAL LLMS WITH REINFORCEMENT LEARNING
9404ADVANCING SPEECH UNDERSTANDING IN SPEECH-AWARE LANGUAGE MODELS WITH GRPO
16916ADVANTAGE-WEIGHTED POLICY LEARNING WITH ADAPTIVE REGULARIZATION FOR OFFLINE REINFORCEMENT LEARNING
10660ADVERSARIAL CONTRASTIVE RETRIEVAL-AUGMENTED GENERATION
5559Adversarial Defense via Generative Speech Enhancement Module
10907Adversarial Detection via Multi-Layer Contrastive Learning and Cross-Layer Stability Analysis
17114ADVERSARIAL FINE-TUNING ON SPEECH FOUNDATION MODEL WITH VULNERABLE ATTENTION CONSISTENCY REGULARIZATION FOR ROBUST SPEECH RECOGNITION
10884Adversarial label recovery with Multi-Modal Fusion and Dual-Task Contrastive Learning
14161Adversarial Learning with a Uniformly Distributed Cost Bound
12308ADVERSARIAL PROMPT DISTILLATION FOR VISION-LANGUAGE MODELS
3414ADVERSARIAL RIVALRY LEARNING FOR MUSIC CLASSIFICATION
13708ADVERSARIAL UPDATE-BASED FEDERATED UNLEARNING FOR POISONED MODEL RECOVERY
14994ADVERSE EFFECT REMOVAL NETWORK VIA UNSUPERVISED WEATHER TYPE TRANSFER
6015AEGIS: ENHANCING PROVENANCE-BASED INTRUSION DETECTION SYSTEM WITH LLM-POWERED DEEP SEMANTIC REPRESENTATION
13756AERIAL VIDEO ACTION RECOGNITION WITH PRETRAINED VISION-LANGUAGE MODEL
10880AERIS-RTDETR: ULTRASOUND-AWARE REAL-TIME DETECTION WITH ORTHOGONAL ANISO-SCALE BLOCKS AND ECHOGENICITY-GUIDED FUSION
1878AEROGSPNET: GRAPH SIGNAL PROCESSING FOR MULTI-TASK AERODYNAMIC PREDICTION
2322AFD-SLU: ADAPTIVE FEATURE DISTILLATION FOR SPOKEN LANGUAGE UNDERSTANDING
9667AFER: ADAPTIVE FACT SELECTION VIA ENTROPY REDUCTION FOR FACTUAL LONG-FORM GENERATION
12923AFFECT-JIGSAW: INTEGRATING CORE AND PERIPHERAL EMOTIONS FOR HARMONIOUS FINE-GRAINED MULTIMODAL EMOTION RECOGNITION
1383Affordance Benchmark for MLLMs
1154AFFORDANCE OBJECT SWAPPING FOR HAND-OBJECT INTERACTION IMAGES
6569AFT: AN EXEMPLAR-FREE CLASS INCREMENTAL LEARNING METHOD FOR ENVIRONMENTAL SOUND CLASSIFICATION
14216AGENT-GSPO: COMMUNICATION-EFFICIENT MULTI-AGENT SYSTEMS VIA GROUP SEQUENCE POLICY OPTIMIZATION
8601AGFORMER: ADAPTIVE GRAPH TRANSFORMER FOR MULTISPECTRAL AND HYPERSPECTRAL IMAGE FUSION
6933AG-FUSION: ADAPTIVE GATED MULTIMODAL FUSION FOR 3D OBJECT DETECTION IN COMPLEX SCENES
16793AGI-CLIP: MULTI-MODAL LLM KNOWLEDGE TRANSFER FOR AI-GENERATED IMAGE QUALITY ASSESSMENT
15872AGRIDOCTOR: A MULTIMODAL INTELLIGENT ASSISTANT FOR AGRICULTURE
13073AGRI-MIX:MUTUAL INFORMATION-GUIDED HIERARCHICAL FUSION FOR AGRICULTURAL DISEASE MULTIMODAL RELATION EXTRACTION
12581AHAI: ADAPTIVE HYBRID-ATTENTION INFERENCE FOR DIFFUSION-BASED ARBITRARY STYLE TRANSFER
13189AHM-NET: AN ASYMMETRIC HIERARCHICAL MULTI-MODAL FUSION NETWORK FOR ROBUST UAV DETECTION USING RGB AND EVENT DATA
11557AI-AIDED CONSENSUS KALMAN TRACKING IN PARTIALLY-KNOWNSTATE-SPACE MODELS
9681AIBA-YOLO: Adaptive Information Balance Augmentation YOLO
16218AI-GENERATED MUSIC DETECTION IN BROADCAST MONITORING
1841AIMREC: ALIGNING BOTH INDIVIDUALS AND MODALITIES FOR MULTIMODAL RECOMMENDATION
1820AirGlove: Exploring Egocentric 3D Hand Tracking and Appearance Generalization for Sensing Gloves
8862AISHELL6-WHISPER: A CHINESE MANDARIN AUDIO-VISUAL WHISPER SPEECH DATASET WITH SPEECH RECOGNITION BASELINES
13247AITG: AUTOMATING INTENT-ORIENTED TASK GENERATION FOR MOBILE GUI-AGENT
14595AL-COLE: AUGMENTED LAGRANGIAN FOR CONSTRAINED LEARNING
3419ALFM: ADAPTIVE LOCAL FEATURE MINING OF VISION-LANGUAGE MODELS FOR OUT-OF-DISTRIBUTION DETECTION
3831Algebraic Covariance Matrix Reconstruction for Sparse Arrays Using Newton's Identities
5900ALIGN TO THE PIVOT: DUAL ALIGNMENT WITH SELF-FEEDBACK FOR MULTILINGUAL MATH REASONING
9984Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
13542ALIGN3D: PROGRESSIVE DIFFUSION ADAPTATION WITH GEOMETRY-AWARE PROPAGATION FOR CONSISTENT 3D SCENE EDITING
3903ALIGNCLIP: MINING AND ALIGNING MULTI-SCALE VISION-LANGUAGE FEATURES FOR ZERO-SHOT SEMANTIC SEGMENTATION
6610ALIGNING GENERATIVE SPEECH ENHANCEMENT WITH PERCEPTUAL FEEDBACK
17252ALIGNING GEOMETRY REPRESENTATION AND INITIALIZATION IN CLASS-INCREMENTAL SEMANTIC SEGMENTATION
10994ALIGNING LANGUAGE MODELS FOR LYRIC-TO-MELODY GENERATION WITH RULE-BASED MUSICAL CONSTRAINTS
14678ALLEVIATING FORGETTING IN CLASS-INCREMENTAL LEARNING VIA IMPLICIT SEMANTIC AUGMENTATION
10687ALLEVIATING OVERTHINKING IN LARGE REASONING MODELS VIA SELF-ITERATIVE PREFERENCE OPTIMIZATION
16058ALMA-CHOR: LEVERAGING AUDIO-LYRIC ALIGNMENT WITH MAMBA FOR CHORUS DETECTION
4585Alternating Balancing Sums for Accurate Low-Power Dot Products
15318ALTPROJ-MIN: A PROJECTION-BASED ALTERNATING MINIMIZATION ALGORITHM FOR LOW-RANK MATRIX RECOVERY
14514AMBER²: DUAL AMBIGUITY-AWARE EMOTION RECOGNITION APPLIED TO SPEECH AND TEXT
2567AMBIDROP: ARRAY-AGNOSTIC SPEECH ENHANCEMENT USING AMBISONICS ENCODING AND DROPOUT-BASED LEARNING
17431AMBISONIC-DML: A Benchmark Dataset for Dynamic Higher-Order Ambisonics Music with Motion-Aligned Stems
5035AMFN:Adaptive Multi-view Fusion Network Framework
10418AMGHI-CR: ADAPTIVE MASK-GUIDED HIGH-ORDER INTERACTION NETWORK FOR CLOUD REMOVAL
15887AMODAL INSTANCE SEGMENTATION BY EXPANDING FROM ACTIVE BOUNDARY WITH COMPATIBLE PRIOR
17433AMPLITUDE OPTIMIZATION DRIVEN MULTI-OFDM WAVEFORM DESIGN WITH GOOD PMEPR AND ISL PERFORMANCES FOR JOINT RADAR AND COMMUNICATIONS
5384A-MSA: ADVERSARIAL FEATURE DISENTANGLEMENT FOR MULTIMODAL SENTIMENT ANALYSIS
5673AN ADAPTIVE SAMPLING METHOD BASED ON REINFORCEMENT LEARNING FOR WIND POWER FORECASTING UNDER EXTREME WEATHER
9544AN AMP-BASED ASYMPTOTIC ANALYSIS FOR NONLINEAR ONE-BIT PRECODING
9635AN AUDIO-VISUAL SPEECH SEPARATION NETWORK WITH JOINT CROSS-ATTENTION AND ITERATIVE MODELING
10082AN EFFECTIVE DATA AUGMENTATION METHOD BY ASKING QUESTIONS ABOUT SCENE TEXT IMAGES
13351AN EFFICIENT NEURAL NETWORK FOR MODELING HUMAN AUDITORY NEUROGRAMS FOR SPEECH
4378An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas
9719AN ENHANCED GRAVITATIONAL-WAVE DETECTION AND INTELLIGENT ANALYSIS FRAMEWORK BASED ON MULTIMODAL LARGE LANGUAGE MODELS
16637AN ENHANCED MEMORY ATTENTION AND CONTENT-GUIDED MODEL FOR MINI LED ANOMALY DETECTION
11473AN ENSEMBLE DEFENSE METHOD AGAINST FALSE DATA IN STRUCTURED PREFERENCE LEARNING
13481AN ENVELOPE SEPARATION AIDED MULTI-TASK LEARNING MODEL FOR BLIND SOURCE COUNTING AND LOCALIZATION
11517AN EVENT-BASED SEQUENCE MODELING APPROACH TO RECOGNIZING NON-TRIAD CHORDS WITH OVERSEGMENTATION MINIMIZATION
17130AN EXACT PENALTY METHOD FOR SPARSITY-CONSTRAINED OPTIMIZATION
11981AN IMPROVED CONVERGENCE ANALYSIS OF GOSSIP METHODS FOR LARGE RANDOM GRAPHS
17002An Information Geometric Approach to Fairness With Equalized Odds Constraint
1941An Information-Theoretic Approach to Optimal Universal Quantum Encoding for Statistical Inference
13920AN ITERATIVE FIXED-POINT KERNEL MINIMUM ERROR ENTROPY ALGORITHM
7995An Unsupervised Alignment Feature Fusion System for Spoken Language-based Dementia Detection
6677ANALYTIC INCREMENTAL LEARNING FOR SOUND SOURCE LOCALIZATION WITH IMBALANCE RECTIFICATION
13915ANALYTICAL FRAMEWORK FOR WIRELESS LOCALISATION USING TERAHERTZ BACKSCATTERING TAGS
6890ANCHOR FIELD CONSISTENCY FOR IMPERCEPTIBLE ADVERSARIAL ATTACKS ON 3D POINT CLOUDS
13072ANCHORED SPECTRAL ESTIMATOR FOR RIGID MOTION SYNCHRONIZATION
6312ANGULARDINO: SEMI-SUPERVISED ANOMALY DETECTION VIA SELF-DISTILLATION WITH HYBRID ANGULAR MARGIN
6146AnimateScene: Camera-controllable Animation in Any Scene
18083ANIPU: Geometry-Aware Point Cloud Upsampling via Anisotropic Differential Operators
9487Anisotropic Tensor Deconvolution of Hyperspectral Images
3999ANNEALED GUIDED DIFFUSION WITH OPTIONAL MANIFOLD PROJECTION REMOVAL
10353ANOMALY DRIVING BEHAVIOR IDENTIFICATION IN TRAFFIC PERCEPTION WITH TRANSFORMER-BASED MULTI-MODAL SIGNAL FUSION
4834ANOMALY-AWARE ASSOCIATION DISCREPANCY FOR TEMPORAL ANOMALY DETECTION
3696ANTI-EXCEPTION ACTION GENERATION FOR AUTOMATIC PLANNING
5746ANYACCOMP: GENERALIZABLE ACCOMPANIMENT GENERATION VIA QUANTIZED MELODIC BOTTLENECK
4877ANYRIR: ROBUST NON-INTRUSIVE ROOM IMPULSE RESPONSE ESTIMATION IN THE WILD
2580APKD: ALIGNED AND PACED KNOWLEDGE DISTILLATION TOWARDS LIGHTWEIGHT HETEROGENEOUS MULTIMODAL EMOTION RECOGNITION
3396APMDET: DEFENDING AGAINST OBJECT-BASED ATTACKS FOR LIDAR DETECTION IN AUTONOMOUS DRIVING
7117APPLE: ATTENTION-PROMOTED PROTOTYPE LEARNING FOR FEDERATED CROSS-MODAL HASHING
2203APPROXIMATE MESSAGE PASSING FOR MULTI-PREAMBLE DETECTION IN OTFS RANDOM ACCESS
13164Approximating Products of Distributions via Variable Duplication and Belief Propagation
17712APPROXIMATING THE LIKELIHOOD OF A WHITE, NON-GAUSSIAN, NON-IID, SKEWED STATIONARY PROCESS, WITH APPLICATIONS IN SIGNAL DETECTION
16984APSDA: ADVERSARIALLY PRUNED SPARSE DYNAMIC ATTENTION FOR ROBUST HANDWRITTEN TEXT RECOGNITION
16912APSFORMER: ENHANCING TRANSFORMER IN TIME SERIES FORECASTING WITH ADAPTIVE MULTI-SCALE PATCH AND SPARSE ATTENTION
4569AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering
6256AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs
17379ARA-BEST-RQ: MULTI DIALECTAL ARABIC SSL
6982ARABKT: A COMPREHENSIVE ARAB KNOWLEDGE EVALUATION SUITE FOR LARGE LANGUAGE MODELS
3212ARAP-GS: DRAG-DRIVEN AS-RIGID-AS-POSSIBLE 3D GAUSSIAN SPLATTING EDITING WITH DIFFUSION PRIOR
12577Arbitrarily Settable Frame Rate Neural Speech Codec with Content Adaptive Variable Length Segmentation
15852AR-BSNET: TOWARDS ULTRA-LOW COMPLEXITY AUTOREGRESSIVE TARGET SPEAKER EXTRACTION WITH BAND-SPLIT MODELING
10238ARCHAGENT: SCALABLE LEGACY SOFTWARE ARCHITECTURE RECOVERY WITH LLMS
13504ARCHI-TTS: A FLOW-MATCHING-BASED TEXT-TO-SPEECH MODEL WITH SELF-SUPERVISED SEMANTIC ALIGNER AND ACCELERATED INFERENCE
9649ARCTIMESDE: ALIGNING COMPUTE WITH INFORMATION VIA ARC LENGTH TIME IN NEURAL SDES
17977ARE MODERN SPEECH ENHANCEMENT SYSTEMS VULNERABLE TO ADVERSARIAL ATTACKS?
17668ARE THESE EVEN WORDS? QUANTIFYING THE GIBBERISHNESS OF GENERATIVE SPEECH MODELS
4636Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
16234ARGI: ANCHOR-GUIDED RIGID GEOMETRY ASSISTS POINT CLOUD INTERPOLATION
1935AR-LIF: ADAPTIVE RESET LEAKY INTEGRATE-AND-FIRE NEURON FOR SPIKING NEURAL NETWORKS
9548AROMMA: UNIFYING OLFACTORY EMBEDDINGS FOR SINGLE MOLECULES AND MIXTURES
6235ARRAYDPS-REFINE: GENERATIVE REFINEMENT OF DISCRIMINATIVE MULTI-CHANNEL SPEECH ENHANCEMENT
14242ARROW-GS: DIRECTED GROWTH FOR EFFICIENT 3D GAUSSIAN SPLATTING
3305ARTI-6: TOWARDS SIX-DIMENSIONAL ARTICULATORY SPEECH ENCODING
10473ARTIFACT-AWARE EVALUATION FOR HIGH-QUALITY VIDEO GENERATION
10083ARTIFREE: DETECTING AND REDUCING GENERATIVE ARTIFACTS IN DIFFUSION-BASED SPEECH ENHANCEMENT
18286ARTPOSE: STYLE-ADAPTIVE MIXTURE-OF-EXPERTS FOR HUMAN POSE ESTIMATION IN ARTISTIC IMAGES
11131ASRC-SNN: ADAPTIVE SKIP RECURRENT CONNECTION SPIKING NEURAL NETWORK
13066ASSESSING IDENTITY LEAKAGE IN TALKING FACE GENERATION: METRICS AND EVALUATION FRAMEWORK
11863Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions
7641ASSESSING THE IMPACT OF SPEAKER IDENTITY IN SPEECH SPOOFING DETECTION
13263Assessing the Perceptual Impact of Low-Altitude Aircraft Noise in Cities: An Auralization Framework Using Gaussian Beam Tracing
7969ASTCA: A NOVEL MOTION PREDICTION FRAMEWORK FOR ROBUST TARGET TRACKING ADDRESSING HIGH MANEUVERABILITY AND FALSE ALARMS
16475ASTMNET: ADAPTIVE SPECTRAL TOKEN MIXER WITH SELECTIVE FEATURE ENHANCEMENT FOR PANSHARPENING
18078ASWE: Adaptive Small-World Encoder for Efficient Channel Coding
3531Asymmetric Region Denoising and Rotation Equivariant for Image Reflection Symmetry Detection
16184Asymmetric StarFus: Learning Incoherent Measurements for Semantics-Aware Spatial-Spectral Fusion
19109ASYMPTOTIC ANALYSIS OF SYNCHRONOUS SIGNAL PROCESSING
18873Asymptotic Classification Error for Heavy-Tailed Renewal Processes
18872Asymptotic Error Rates for Point Process Classification
10924ASYMPTOTICALLY OPTIMAL BANDIT ONLINE CLUSTERING FOR SINGLE PARAMETER EXPONENTIAL FAMILY OF DISTRIBUTIONS
9828ASYNCHRONOUS HIGH-SPEED TRACKING OF ASTRONOMICAL OBJECTS USING NEUROMORPHIC CAMERA FOR EDGE COMPUTING
17200Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation
12843ATNPLOC: PHASE-ENHANCED ASYNCHRONOUS TDOA FOR ACCURATE UWB LOCALIZATION
5662ATO: ADAPTIVE TARGET OPTIMIZATION FOR SEMI-SUPERVISED DOMAIN ADAPTATION VIA DEEP REINFORCEMENT LEARNING
10805ATOM: Adaptive Token-level Optimal Transport Mixup for Speech Translation
3670ATOMIC NORM MINIMIZATION REVISITED: PROGRESSIVE ATOM IDENTIFICATION AND REFINEMENT
14672ATOMU: ARTIFICIAL TEMPLATE OF MARKER UNIT FOR 1/100-PIXEL ACCURACY DISPLACEMENT MEASUREMENT
11271ATTENTION OUTPUT PROJECTION IMPORTANCE SCORE FOR KEY-VALUE EVICTION
10744ATTENTION TO DETAILS, LOGITS TO TRUTH: VISUAL-AWARE ATTENTION AND LOGITS ENHANCEMENT TO MITIGATE HALLUCINATIONS IN LVLMS
6790ATTENTION2PROBABILITY: ATTENTION-DRIVEN TERMINOLOGY PROBABILITY ESTIMATION FOR ROBUST SPEECH-TO-TEXT SYSTEM
17998ATTENTION-BASED ENCODER-DECODER TARGET-SPEAKER VOICE ACTIVITY DETECTION FOR ROBUST SPEAKER DIARIZATION
3709ATTENTION-ENHANCED LEARNING FOR SENSING-ASSISTED LONG-TERM BEAM TRACKING IN MMWAVE COMMUNICATIONS
11669ATTENTION-GUIDED DYNAMIC COMPENSATION SAMPLING FOR ROBUST INVERSION-BASED DIFFUSION WATERMARKING
15305Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
6893Attentive AV-FusionNet: Audio-Visual Quality Prediction with Hybrid Attention
3058ATTENTIVE MASKED SELF-DISTILLATION FOR RESPIRATORY SOUND CLASSIFICATION
13156Attn-Defense: Attention-Guided Detection, Location and Removal for Indirect Prompt Injection
17999ATTRIBUTE DRIVEN W SPACE FOR QUERY LIMITED FACE TEMPLATE INVERSION
10001AUDEN-VOICE: GENERAL-PURPOSE VOICE ENCODER FOR SPEECH AND LANGUAGE UNDERSTANDING
16568AUDIENCE-AWARE CO-SPEECH GESTURE GENERATION IN PUBLIC SPEAKING VIA ANTICIPATION TOKENS
13821AUDIO DEEPFAKE DETECTION AT THE FIRST GREETING: “HI!”
16937AUDIO EFFECT ESTIMATION WITH DNN-BASED PREDICTION AND SEARCH ALGORITHM
12710AUDIOCARDS: STRUCTURED METADATA IMPROVES AUDIO LANGUAGE MODELS FOR SOUND DESIGN
12856AUDIO-CONDITIONED DIFFUSION LLMS FOR ASR AND DELIBERATION PROCESSING
14439AUDIOFUSE: UNIFIED SPECTRAL-TEMPORAL LEARNING VIA A HYBRID VIT-1D CNN ARCHITECTURE FOR PHONOCARDIOGRAM CLASSIFICATION
8244AUDIOGENIE-REASONER: A TRAINING-FREE MULTI-AGENT FRAMEWORK FOR COARSE-TO-FINE AUDIO DEEP REASONING
6464AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation
15380Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection
18882AUDIOSETCAPS: AN ENRICHED AUDIO-CAPTION DATASET USING AUTOMATED GENERATION PIPELINE WITH LARGE AUDIO AND LANGUAGE MODELS
12298AUDIO-TEXT JAILBREAK ATTACK ON LARGE AUDIO-LANGUAGE MODELS: TOWARDS GENERALITY AND STEALTHINESS
18116Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver
13509AUDIO-VISUAL DEEPFAKE GENERATION AND DETECTION: AN EXPLORATORY SURVEY
17415AUDIO-VISUAL FEATURE FUSION FOR CALIBRATING RELEVANCE SCORES OF VIDEO MOMENT RETRIEVAL
14898Audiovisual Speech Enhancement and Voice Activity Detection Using Generative and Speech Recognition Features
12230AUDITGPT: A MULTI-AGENT FRAMEWORK FOR ENHANCING STATIC ANALYSIS
16286Auditory Illusion Benchmark for Large Audio Language Models
2217AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?
13705AUDITORY-INSPIRED TRANSFORMER FOR BINAURAL SPEECH ENHANCEMENT AND SPATIAL CUE PRESERVATION
16881AUGMENT-AND-REGULARIZE: TOWARD RELIABLE SEMI-SUPERVISED DOMAIN GENERALIZATION
13856AUGMENTED LAGRANGIAN CONTROLLER DESIGN IN MODEL-BASED REINFORCEMENT LEARNING FOR ISAC RESOURCE ALLOCATION
17340AUGMENTING IMAGE LLMS FOR DIVERSE VIDEO GROUNDING TASKS WITHOUT TRAINING
17380AURA: A STEGAFORMER-BASED SCALABLE DEEP AUDIO WATERMARK WITH EXTREME ROBUSTNESS
4179AURA: YCBCR-BASED UNIVERSAL RAW-RECONSTRUCTION FOR INVERSE ISP
11385Aurora: Precise and Lightweight Multimodal Fusion for Efficient Referring Remote Sensing Image Segmentation
3567AUTO-MATCHCUT: AN AUDIO-VISUAL RETRIEVAL FRAMEWORK FOR SEAMLESS MATCH CUTTING
6024AUTOMATIC ESTIMATION OF SPEAKER DIARIZATION ERROR RATE BASED ON FEATURES OF AUDIO QUALITY AND SPEAKER DISCRIMINABILITY
16357Automatic Music Mixing using a Generative Model of Effect Embeddings
14887Automatic Music Sample Identification with Multi-Track Contrastive Learning
2666AUTOP2C: AN LLM-BASED AGENT FRAMEWORK FOR CODE REPOSITORY GENERATION FROM MULTIMODAL CONTENT IN MACHINE LEARNING PAPERS
2455AUTOREGRESSIVE-GAUSSIAN MIXTURE MODELS: EFFICIENT GENERATIVE MODELING OF WSS SIGNALS
10705AUTOVQA-G: SELF-IMPROVING AGENTIC FRAMEWORK FOR AUTOMATED VISUAL QUESTION ANSWERING AND GROUNDING ANNOTATION
17878AUV: TEACHING AUDIO UNIVERSAL VECTOR QUANTIZATION WITH SINGLE NESTED CODEBOOK
10465AUXILIARY MULTI-LABEL TRAINING FOR IMPROVING THE ROBUSTNESS OF AUDIO DEEPFAKE DETECTION ON AI-PROCESSED DATA
15583AVATAR: AUDIO-VISUAL ADAPTIVE FUSION VIA TRAINED AGENT REINFORCEMENT FOR MULTIMODAL DEEPFAKE DETECTION
2607Averaging is Not Enough: Preserving Client-Specific Knowledge in Federated PEFT with One-Round Aggregation
1931AVO-65: A LARGE-SCALE HIERARCHICAL AUDIO-VISUAL OBJECT DATASET
3052AWARENESS OR GUIDANCE? A MODALITY-ENHANCED FUSION MODEL FOR MULTIMODAL KNOWLEDGE GRAPH COMPLETION
2233AWGFORMER: ADAPTIVE WAVELET-GUIDED TRANSFORMER FOR MULTI-RESOLUTION TIME SERIES FORECASTING
16570B2CAMO: LEVERAGING BACKGROUND CUES FOR PARAMETER-EFFICIENT FINE-TUNING IN OPEN-VOCABULARY CAMOUFLAGED OBJECT SEGMENTATION
6242BABI: BLACKLISTED ACCRETION FOR BACKDOOR INVERSION IN INSTRUCTION FINE-TUNED LLMS
17335BACHI: BOUNDARY-AWARE SYMBOLIC CHORD RECOGNITION THROUGH MASKED ITERATIVE DECODING ON POP AND CLASSICAL MUSIC
13973BACKGROUND DISAMBIGUATION CONTRASTIVE LOSS FOR ROBUST BINARY SEGMENTATION
16811BACKWARD DESIGN STFT LABORATORY FOR STEM EDUCATION: ACCESSIBLE, RESOURCE EFFICIENT, AND IMPROVED LEARNING OUTCOMES
3392BADLLM_TG:A BACKDOOR DEFENDER POWERED BY LLM TRIGGER GENERATOR
10807BADREASONER: PLANTING TUNABLE OVERTHINKING BACKDOORS INTO LARGE REASONING MODELS FOR FUN OR PROFIT
18348BadTail: Exploiting Rationale Tails for Stealthy Multimodal Backdoor Attacks
12552BadViM: Backdoor Attack against Vision Mamba
16795BALANCING ACCURACY AND DIVERSITY: EVOLVING ANCHOR MATCHING FOR VIDEO TEMPORAL GROUNDING
12971BALANCING EFFICIENCY AND FIDELITY IN IMAGE SUPER-RESOLUTION VIA ATTENTION-ENHANCED DISTILLATION
12384BALANCING REWARDS IN TEXT SUMMARIZATION: MULTI-OBJECTIVE REINFORCEMENT LEARNING VIA HYPERVOLUME OPTIMIZATION
11515BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
14607BAMoE: Bi-Attention Synergy with Expert Routing for Time Series Anomaly Detection
10980Bayesian Channel Estimation with Diffusion Probabilistic Priors
13312BAYESIAN LOW-RANK FACTORIZATION FOR ROBUST MODEL ADAPTATION
12196BAYESIAN MATRIX COMPLETION UNDER GEOMETRIC CONSTRAINTS
5991Bayesian Multi-Modal LSTM with Dynamic Uncertainty Modeling for Net fluid removal Prediction
4466BAYESIAN SIGNAL SEPARATION VIA PLUG-AND-PLAY DIFFUSION-WITHIN-GIBBS SAMPLING
17885Bayesian Uncertainty-Aware MRI Reconstruction
10149BBPE16: UTF-16-BASED BYTE-LEVEL BYTE-PAIR ENCODING FOR IMPROVED MULTILINGUAL SPEECH RECOGNITION
7637BDBR: TOWARDS RESISTANT BACKDOOR DEFENSE VIA BOUNDARY RECONSTRUCTION
12219BDIO: BIAS-AWARE DENOISING INERTIAL ODOMETRY FOR ACCURATE DRONE TRAJECTORY ESTIMATION
12938BDRNET:BIDIRECTIONAL DECOMPOSED AND RECALIBRATED LIGHTWEIGHT NETWORK FOR HUMAN POSE ESTIMATION
17265BEAM-CLIP: MULTIMODAL ALIGNMENT AND MMWAVE BEAM PATTERN REPRESENTATION LEARNING
18906BEAMFOCUSING CAPABILITIES OF A UNIFORM LINEAR ANTENNA ARRAY IN THE HOLOGRAPHIC REGIME
14310BEAMFORMER DESIGNS FOR SWARM OF REPEATER-AIDED MASSIVE MIMO ISAC
5719Beamforming using Virtual Microphones for Hearing Aid Applications
11973BEAMSPACE MODEL AND BEAM ALIGNMENT METHOD FOR RECONFIGURABLE HOLOGRAPHIC SURFACE ANTENNA SYSTEMS
2837BEAP-AGENT: BACKTRACKABLE EXECUTION AND ADAPTIVE PLANNING FOR GUI AGENTS
12040Beat and Downbeat Detection: A Reformulated Approach
6008BEATMAMBA: BIDIRECTIONAL SELECTIVE STATE-SPACE MODELING FOR EFFICIENT BEAT TRACKING
11368BeepBeep: Leveraging Structural Attenuation for Robust Device-to-Device Authentication
3406Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition
13265BELIEF PROPAGATION VIA STOCHASTIC TRANSPORT MAPPING
17932BE-MVSNET: BOUNDARY- AND EDGE-AWARE CONSTRAINED MULTI-VIEW STEREO
18213Benchmarking Emotional Accuracy and Identity Consistency in Facial Image-to-Video Generation
17802BENCHMARKING GASLIGHTING ATTACKS AGAINST SPEECH LARGE LANGUAGE MODELS
17945BENCHMARKING HUMANS AND MACHINES ON COMPLEX MULTILINGUAL SPEECH UNDERSTANDING TASKS
9856Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
18177BENCHMARKING MULTIMODAL LARGE LANGUAGE MODELS FOR FACE RECOGNITION
2951Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets
18917BERP: A Blind Estimator of Room Parameters for Single-Channel Noisy Speech Signals
9816BEST-RQ-BASED SELF-SUPERVISED LEARNING FOR WHISPER DOMAIN ADAPTATION
13869BEST-STD 2.0: BALANCED AND EFFICIENT SPEECH TOKENIZER FOR SPOKEN TERM DETECTION
14666Better Together: Uncalibrated Photometric Stereo with Shading and Specularities
16763BEV-ID: A Depth-Guided BEV Perception Method combining Feature Indexing with Depth Estimation
9419BEYOND AMPLITUDE: CHANNEL STATE INFORMATION PHASE-AWARE DEEP FUSION FOR ROBOTIC ACTIVITY RECOGNITION
15329BEYOND ANSWERS: TRAJECTORY SEMANTIC ENTROPY FOR RELIABLE UNCERTAINTY QUANTIFICATION IN LLMS
5271BEYOND ATTENTION: ADAPTING SEGMENT ANYTHING WITH FREQUENCY AND STRUCTURAL PRIORS
2335BEYOND BLURRINESS AND ARTIFACTS: A SYNERGISTIC DETERMINISTIC-PROBABILISTIC APPROACH FOR RADAR RECONSTRUCTION
9439BEYOND CLEAN DATA: NOISE ROBUST DATASET PRUNING WITH FLIP-SENSITIVITY FILTERING
6420BEYOND FACE SWAPPING: A DIFFUSION-BASED DIGITAL HUMAN BENCHMARK FOR MULTIMODAL DEEPFAKE DETECTION
17159Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation
16420Beyond History: Active Prompting with Dynamic Contemporaneous Facts for Temporal Knowledge Graph Forecasting
5985BEYOND HUMAN SKELETONS: PROMPT-GUIDED GRAPH MATCHING FOR MULTI-LIMBED POSE ESTIMATION IN ARTISTIC IMAGERY
2966BEYOND LIPS: INTEGRATING GESTURE AND LIP CUES FOR ROBUST AUDIO-VISUAL SPEAKER EXTRACTION
16083BEYOND MAPPING : DOMAIN-INVARIANT REPRESENTATIONS VIA SPECTRAL EMBEDDING OF OPTIMAL TRANSPORT PLANS
11296BEYOND OMNIDIRECTIONAL: NEURAL AMBISONICS ENCODING FOR ARBITRARY MICROPHONE DIRECTIVITY PATTERNS USING CROSS-ATTENTION
16188BEYOND PIXEL PROPHECY: HIERARCHICAL KNOWLEDGE STRUCTURES FOR TRAINING-FREE VIDEO ANOMALY PREDICTION
16504BEYOND PIXELS: A VECTOR-TO-GRAPH FRAMEWORK FOR RELIABLE SCHEMATIC AUDITING
10171Beyond Sampling: Classwise Loss Optimization for Imbalanced Deep Learning Recommendation Systems
9340Beyond Shadows: A Large-Scale Benchmark and Multi-Stage Framework for High-Fidelity Facial Shadow Removal
3410Beyond Single Video Boundaries: Unified Unsupervised Video Object Segmentation with Historical, Future, and Cross-Video Reasoning
13550BEYOND SPECTRAL PEAKS: INTERPRETING THE CUES BEHIND SYNTHETIC IMAGE DETECTION
11356BEYOND THE WINDOW: REGION-BASED ANOMALY LOCALIZATION NETWORK FOR TIME SERIES ANOMALY DETECTION
2373BEYOND VIDEO-TO-SFX: VIDEO TO AUDIO SYNTHESIS WITH ENVIRONMENTALLY AWARE SPEECH
16097BEYOND VISUAL REALISM: TOWARD RELIABLE FINANCIAL TIME SERIES GENERATION
2241B-GRPO: UNSUPERVISEDSPEECHEMOTIONRECOGNITIONBASEDON BATCHED-GROUPRELATIVEPOLICYOPTIMIZATION
6391BHSFLOW: LOW-LATENCY FLOW ESTIMATION WITH BLOCK-WISE HUBER LOSS AND SIMPLIFIED STRUCTURE
14049BI-DIRECTIONAL ATTENTION FOR DUAL-BRANCH GENERATOR FOR CHANNEL EXTRAPOLATION AND HIGH-RESOLUTION SENSING
12906BIDIRECTIONAL CLASS-TEXT AND VISION INTERACTION NETWORK FOR CAMOUFLAGED OBJECT DETECTION
5171BIDIRECTIONAL CONTINUOUS-TIME VIDEO SUPER-RESOLUTION VIA NEURAL ORDINARY DIFFERENTIAL EQUATIONS AND MULTI-ORDER SPATIAL INTERACTIONS
5970BIDIRECTIONAL SEMANTIC ENHANCEMENT NETWORK FOR VIDEO MOMENT RETRIEVAL
5727Bifrost: An Adaptive Decision Framework for Regulating Depth of Thought in LLM Agents
11811BIG: A BIDIRECTIONAL GENERATIVE VERIFICATION FRAMEWORK FOR MULTIMODAL RUMOR DETECTION
11795Bilateral Graph Filtering Framework with Alternating Optimization for Robust Multi-View Outlier Detection
11449Bimodal Fusion Framework for Dynamic Facial Expression Recognition in-the-wild
5581Bi-Modal Textual Prompt Learning for Vision-Language Models in Remote Sensing
9446BINARY MODULATION ON CONJUGATE-RECIPROCAL ZEROS (MOCZ) WITH LIST DECODING FOR UNKNOWN CHANNEL LENGTH
3431BiNR: Live Video Broadcasting Quality Assessment
15944BIOMED-R²: JOINT DIVERSITY RETRIEVAL AND EVIDENCE REASONING FOR BIOMEDICAL QUESTION ANSWERING
1988Biorthogonal Z-Transform: A Unified Framework for Generalized Signals
3163BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations
14005BIPOLAR RELATIONAL NETWORK FOR IRREGULAR TIME SERIES ANOMALY DETECTION
3480BI-RECNET: BIDIRECTIONAL RECONCILIATION NETWORK FOR HIERARCHICAL TIME SERIES FORECASTING
10020BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
7838BISAM: BI-DOMAIN SEGMENT ANYTHING MODEL FOR CAMOUFLAGED OBJECT DETECTION
10560BLACK-BOX ONLINE DATA POISONING AGAINST TRIMMING DEFENSES: AN MAB-BASED APPROACH
14069BLEED NO MORE: GENERATIVE INTERFERENCE REDUCTION FOR MUSICAL RECORDINGS
15445BLIND IMAGE DEBLURRING WITH DECOUPLED DIFFUSION REVERSION
4390Blind Lunar Soil Image Enhancement
17172Blind Online Neural Recovery of RADAR Waveforms from Linear Projections of Interleaved ADC measurements
3045Blind Quality Assessment of Stereoscopic Videos Using Scene-Based Attributes
6761BLINDDET: TOWARDS ROBUST PHYSICAL-WORLD BACKDOOR ATTACK IN LOW-LIGHT SCENARIOS AGAINST OBJECT DETECTION
16092Blink-based Biometric Identification Using Wearable EEG under Fatigue and Effort Variations
11346Block-wise 3D Gaussian Splatting for Efficient and High-Fidelity Cross-Device Rendering
9230BLOODROOT: WHEN WATERMARKING TURNS POISONOUS FOR STEALTHY BACKDOOR
11482BOATT: UNIFIED BAYESIAN ONLINE TRACKING AND ADAPTATION FOR DYNAMIC TENSOR STREAMS
11931BONE-CONDUCTION GUIDED MULTIMODAL SPEECH ENHANCEMENT WITH CONDITIONAL DIFFUSION MODELS
3681BOOSTING ANOMALY DETECTION IN INDUSTRIAL 3D DATA: AN ENTROPY-GUIDED DENOISING AUTOENCODER
9818BOOSTING CONTEXTUAL ADAPTIVE POLICY LEARNING WITH FOUNDATION MODEL GUIDANCE
8234BOOSTING KNOWLEDGE DISTILLATION VIA LOCAL CATEGORIES SIMILARITY SCALING
13872BOOSTING KNOWLEDGE SHARING AMONG AGENTS VIA GRAPH DECOUPLING
12099BOOSTING PRIOR GENERATION VIA MULTIMODAL GRADIENT ATTENTION FOR FEW-SHOT SEGMENTATION
14596BORA: BLOCKWISE ORTHOGONAL RANK-1 ADAPTIVE OPTIMIZATION
5761BOUNDARY-ENHANCED VISION MAMBA U-NET FOR MEDICAL IMAGE SEGMENTATION
14272BOUNDING MEMORIZATION WITH LOSS CURVATURE AND CONNECTIONS TO COMPRESSION
2754Box-Chain VLA: Explicit Reasoning-to-Action Interfaces for Generalizable Robotic Manipulation
15982BPMF: BIDIRECTIONAL PREDICTION BY MULTI-SCALE FEATURES FOR MULTIMODAL INDUSTRIAL ANOMALY DETECTION
12975BRAIN-GRASP: GRAPH-BASED SALIENCY PRIORS FOR IMPROVED FMRI-BASED VISUAL BRAIN DECODING
5859BRAIN-INFORMED SPEECH SEPARATION FOR COCHLEAR IMPLANTS
18998BRAIN-INSPIRED VIDEO QUALITY ASSESSMENT VIA VISUAL-EEG FEATURE ALIGNMENT
1549BRAIN-SCORE MEETS REPRESENTATIONAL SIMILARITY ANALYSIS: A METHODOLOGICAL CONVERGENCE IN MODEL-BRAIN ALIGNMENT
16377Breaking Codebook Redundancy for Faster Autoregressive Image Generation with Retrieval-Augmented Speculative Decoding
11617Breaking Cognitive Fixation in Multi-Turn Dialogue with Self-Distancing and Incubation
14665Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech
10440BREAKING THE CURSE OF DIMENSIONALITY IN GAUSSIAN PROCESS TRAINING WITH ZEROTH-ORDER ADAPTIVE PERTURBATION
13380BREAKING THE FORGETTING-MEMORIZATION TRADE-OFF: A MEMORY-ADAPTIVE OPTIMIZER FOR EFFECTIVE LARGE LANGUAGE MODELS UNLEARNING
17036BREAK-THE-BEAT! CONTROLLABLE MIDI-TO-DRUM AUDIO SYNTHESIS
7419BRIDGECODE: A DUAL SPEECH REPRESENTATION PARADIGM FOR AUTOREGRESSIVE ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS
2381Bridging Academia and Industry: Large-Scale NIR Signal Foundation for Robust Multi-Task and Real-world Modeling
4314Bridging Legal Expertise and LLMs: A Cooperative Logical Reasoning Framework for Sentencing Recommendation
18091BRIDGING MULTI-SCALE CONTEXTS: PRIOR-GUIDED DYNAMIC FUSION FOR DEGRADATION-ROBUST IMAGE RESTORATION
10169BRIDGING MULTI-VIEW STEREO AND GAUSSIAN SPLATTING: ENHANCING HIGH-FIDELITY RENDERING WITH GEOMETRIC PRIORS
12728BRIDGING PHYSICAL MODELS AND GENERATIVE PRIORS FOR DEHAZING: FROM COARSE ESTIMATION TO RESIDUAL REFINEMENT
5996BRIDGING SAR AND OPTICAL DOMAINS: SYNERGIZING BROWNIAN BRIDGE DIFFUSION AND LOCAL CONTRASTIVE LEARNING FOR IMAGE TRANSLATION
13909BRIDGING THE FRONT-END AND BACK-END FOR ROBUST ASR VIA CROSS-ATTENTION-BASED U-NET
10939BRIDGING THE GAP: A COMPARATIVE EXPLORATION OF SPEECH-LLM AND END-TO-END ARCHITECTURE FOR MULTILINGUAL CONVERSATIONAL ASR
11313BRIDGING THE GAP: TRANSFORMING NATURAL LANGUAGE QUESTIONS INTO SQL QUERIES VIA ABSTRACT QUERY PATTERN AND CONTEXTUAL SCHEMA MARKUP
16077Bridging the Knowledge Gap: LLM-Driven Contrastive Memory-of-Thought Prompting for Task-Oriented Dialogue
13219Bridging the Measurement-Simulation Gap in Room Acoustics with Real2Sim Diffusion
11642BRIDGING THE SEMANTIC GAP: CROSS-ATTENTIVE FUSION FOR JOINT ACOUSTIC-SEMANTIC SPEECH QUALITY ASSESSMENT
3605BRIDGING VISION AND LANGUAGE WITH QUANTUM STATE FOR VIDEO-TEXT RETRIEVAL
5946BRINGING MULTIMODAL FOUNDATION MODELS TO HEARING AIDS
18942BSM-iMagLS: ILD Informed Binaural Signal Matching for Reproduction With Head-Mounted Microphone Arrays
1825BSMP-SENET: BAND-SPLIT MAGNITUDEPHASE NETWORK FOR SPEECH ENHANCEMENT
2535BTCCHAT: ADVANCING REMOTE SENSING BI-TEMPORAL CHANGE CAPTIONING WITH MULTIMODAL LARGE LANGUAGE MODEL
12512BTDA: A ROBUST FRAMEWORK FOR ENCRYPTED TRAFFIC CLASSIFICATION WITH BYTE-LEVEL TLS DATA AUGMENTATION
14208BUILD WITH PRECISION: BOTTOM-UP INFERENCE OF LINEAR DAGS
12076Bundling-aware Masked Graph AutoEncoder for Bundle Recommendation
15578BUTTERFLY TRANSFORMER FOR LIGHTWEIGHT IMAGE RESTORATION
12918C&F-WSVAD: TOWARDS HIGH-PERFORMANCE COARSE AND FINE-GRAINED WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION
15148C2BNVAE: Dual-Conditional Deep Generation of Network Traffic Data for Network Intrusion Detection System Balancing
15708CADD: Condition-Anchor Dataset Distillation
15945CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation
11838CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering
2495CADMamba: Clustering ADaptive Mamba for Multivariate Time Series Forecasting
13214CAF-MAMBA: MAMBA-BASED CROSS-MODAL ADAPTIVE ATTENTION FUSION FOR MULTIMODAL DEPRESSION DETECTION
15870CALM: JOINT CONTEXTUAL ACOUSTIC-LINGUISTIC MODELING FOR PERSONALIZATION OF MULTI-SPEAKER ASR
18066CAMA:CHARACTER-AWARE MASKING AND ALIGNMENT FOR SELF-SUPERVISED STR
11527CAMEO: Collection of Multilingual Emotional Speech Corpora
3939CAMOD: CAUSAL-AWARE MODALITY DENOISING FOR MULTIMODAL DIALOGUE INTENT RECOGNITION
18946CAN AUDIO REVEAL MUSIC PERFORMANCE DIFFICULTY? INSIGHTS FROM THE PIANO SYLLABUS DATASET
1174CAN DATA AUGMENTATION BECOME A PRIVACY SHIELD FOR MODEL INVERSION ATTACKS?
16934CAN HIERARCHICAL CROSS-MODAL FUSION PREDICT HUMAN PERCEPTION OF AI DUBBED CONTENT?
9239Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs
15132Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate
11161CAN META-LEARNING ADDRESS THE CHALLENGES OF BIOSIGNAL PERSONALIZATION?
5935CAN SYNTHETIC IMAGES SERVE AS EFFECTIVE AND EFFICIENT CLASS PROTOTYPES?
15884CAN UNLEARNING OF MODELS LEAD TO ADVERSARIAL ROBUSTNESS?
15547CAN VISION LANGUAGE MODELS PERCEIVE GRAPHS ACCURATELY? A VISUAL GRAPH PERCEPTION EVALUATION BENCHMARK
13849CAPACITY ANALYSIS OF OFDM SYSTEMS WITH A SWARM OF NETWORK-CONTROLLED REPEATERS
6354CAPT: A Lightweight Continual Adversarial Pre-training Framework for Traffic Analysis Model
4634CAPTION AND AUDIO-GUIDED VIDEO REPRESENTATION LEARNING WITH GATED ATTENTION FOR PARTIALLY RELEVANT VIDEO RETRIEVAL
3162CARBON-ECS: A BENCHMARK AND A PHYSICS-DECOUPLED MODEL FOR THE EAST CHINA SEA CARBON FLUX
17604CARDINALITY-CONSTRAINED COVARIANCE ESTIMATION IN ARRAY BEAMFORMING
13250CARDIOCOT: MULTIMODAL PREDICTION OF MACE RECURRENCE RISK WITH HIERARCHICAL CHAIN-OF-THOUGHT REASONING
14986CARE: COGNITIVE-REASONING AUGMENTED REINFORCEMENT FOR EMOTIONAL SUPPORT CONVERSATION
15170CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control
9429CARE-AGENT: MULTI-AGENT COLLABORATION WITH CONFLICT-AWARE ROUTING MECHANISM FOR DIAGNOSIS PREDICTION
10536CAREFUL-MM: CAUSAL AND UNCERTAINTY-AWARE, SARCASM-ROBUST MULTIMODAL DEPRESSION DETECTION
13997CAS-J: Cross-Modal Attention Synergy for Jailbreaking Large Vision-Language Models
11103CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays
6474CAS-ODE: JOINTLY LEARNING ADAPTIVE STRUCTURES AND CONTINUOUS DYNAMICS FOR EMOTION RECOGNITION IN CONVERSATION
17803CASP: CONFIDENCE-AWARE STRUCTURAL PRESERVATION FOR CONTINUAL TEST-TIME ADAPTATION OBJECT DETECTION
4591CAST-ACF: ROBUST GENERATION AND EVALUATION FOR MULTI-GRANULARITY TIMELINE SUMMARIZATION
10700CASTELLA: LONG AUDIO DATASET WITH CAPTIONS AND TEMPORAL BOUNDARIES
10135Category-Adaptive Feature Compression for Multi-Device Collaborative Computing
3919CAUSAL DEBIASING AND FEATURE FUSION FOR OPEN SET DOMAIN ADAPTATION
8007CAUSAL EFFECT ESTIMATION UNDER NETWORK INTERFERENCE WITH STATE SPACE MODELS
17422Causal Fingerprints of AI Generative Models
6038CAUSAL INTERVENTED DISENTANGLEMENT FOR MULTI-SOURCE CROSS-DOMAIN RECOMMENDATION
13285CAUSAL-BOOTSTRAPPED MULTI-AGENT REINFORCEMENT LEARNING FOR MITIGATING THE COLD-START PROBLEM
4511CAUSALRAP: CAUSAL GRAPH-DRIVEN RETRIEVAL AUGMENTED LONG-HORIZON TASK PLANNING FOR LARGE LANGUAGE MODELS
16297CBR-DETR: AN ENHANCED RT-DETR WITH MULTI-LEVEL CONTEXT FUSION AND BIDIRECTIONAL ROUTING
11124CC-G2PNP: STREAMING GRAPHEME-TO-PHONEME AND PROSODY WITH CONFORMER-CTC FOR UNSEGMENTED LANGUAGES
12778CCMA: CONSISTENCY-AWARE CROSS-MODAL ALIGNMENT FOR TEXT-BASED PERSON RETRIEVAL
4377C-Conformer: Channel-augmented Conformer for Sound Event Localization and Detection
10974CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition
9694C-DGPA: Class-Centric Dual-Alignment Generative Prompt Adaptation
12029CDIFF: CONTEXT-DISENTANGLED IMAGE SYNTHESIS FOR ANIMAL INDIVIDUAL IDENTIFICATION
16849CD-MTO: A Joint Optimization Framework for Multi-Hop Task Offloading and Energy-Aware Computing in Mobile Edge Networks
16750CDT: ROBUST DETECTION OF DOH TUNNELS VIA BACKGROUND ASSOCIATION
10436CEAL: CROSS-EXPERT ATTENTION WITH CURATED PEER SELECTION FOR SCALABLE AND PRIVACY-PRESERVING EXEMPLAR-FREE CONTINUAL LEARNING
13315CEFL-Ranking:Re-evaluation of Communication-Efficient FL Methods
11204CE-GOCD: Central Entity-Guided Graph Optimization for Community Detection to Augment LLM Scientific Question Answering
7614CENTRALIZED SPECTRAL INITIALIZATION FOR SPARSE PHASE RETRIEVAL
13252CERF: A COMMUNICATION-EFFICIENT AND RETRAINING-FREE FRAMEWORK FOR MULTI-UAV COLLABORATIVE PERCEPTION
6371CFA: Lightweight Defense against Membership Inference Attacks through Class-wise Feature Aggregation
17875CFGAN : COMPLEX FREQUENCY-DOMAIN GRAPH ATTENTION NETWORK FOR TIME SERIES FORECASTING
13147CFIRE: Cross-View Feature Interaction for Fine-Grained Regression-based UAV Localization
3240CFRNet: Accelerating Lane Line Detection with Asymmetric Weighted Attention Distillation and Cascaded Feature Refinement
9407CGNN+: A GRAPH NEURAL INSTRUMENTAL VARIABLE FRAMEWORK FOR ROBUST CAUSAL INFERENCE IN NETWORKED DATA
1037CHAIN OF CORRECTION FOR FULL-TEXT SPEECH RECOGNITION WITH LARGE LANGUAGE MODELS
13787CHAIN-OF-CAPTION: TRAINING-FREE IMPROVEMENT OF MULTIMODAL LARGE LANGUAGE MODEL ON REFERRING EXPRESSION COMPREHENSION
12030Change Detection Methods for Non-stationary Stochastic Linear Bandits
17917CHANGE-AWARE TEMPORAL ALIGNMENT ON HETEROGENEOUS GRAPH SNAPSHOTS FOR INSIDER THREAT DETECTION
19148Channel Estimation and Data Detection in DS-Spread Channels: A Unified Framework, Novel Algorithms, and Waveform Comparison
9664CHANNEL MODELING IN THE DELAY-DOPPLER DOMAIN FOR COMMUNICATIONS WITH A MOBILE RECEIVER
6645Channel Prediction under Network Distribution Shift Using Continual Learning-based Loss Regularization
3675Channel, Trend and Periodic-Wise Representation Learning for Multivariate Long-term Time Series Forecasting
14658Channel-Adaptive Robust Aggregation for Over-the-Air Federated Learning in Heterogeneous Networks
17666CHANNEL-WISE RETRIEVAL FOR MULTIVARIATE TIME SERIES FORECASTING
14383CHAOS: Chart Analysis with Outlier Samples
15671Chemical Sight Net: Incorporating Crystallographic Priors for Accurate Space Group Determination from PXRD
15407CHNERV: CONDITION ENHANCED HYBRID NEURAL REPRESENTATION FOR VIDEOS
15495CHROMOUVQA: BENCHMARKING VISION-LANGUAGE MODELS UNDER CHROMATIC CAMOUFLAGED IMAGES
2783CHUNKWISE ALIGNERS FOR STREAMING SPEECH RECOGNITION
10329CHUNK-WISE ATTENTION TRANSDUCER FOR FAST AND ACCURATE STREAMING SPEECH-TO-TEXT
2688CIDER: A Causal Cure for Brand-Obsessed Text-to-Image Models
11264CIF: A TWO-STAGE COGNITIVELY-INSPIRED FRAMEWORK FOR CHINESE SPELLING CORRECTION
13299CIFC-MFD: END-TO-END MULTI-FACE FORGERY DETECTION USING CROSS-IMAGE FACE CONTRAST
9910CIMRAG: CIM-AWARE DOMAIN-ADAPTIVE AND NOISE-RESILIENT RETRIEVAL-AUGMENTED GENERATION FOR EDGE-BASED LLMS
6987CINE-SHOT DIRECTOR: NATIVE CINEMA-GRADE MULTI-SHOT VIDEO GENERATION FRAMEWORK
6005CIP-DOA: Cross-Instance Prompted DoA Estimation via Semantic-Spatial Matching
4928CITRAG : CONTRADICTION IDENTIFICATION AND TRACING FOR RETRIEVAL-AUGMENTED GENERATION
16953CKANST: CKAN-BASED ARBITRARY STYLE TRANSFER FOR HOMOGENEOUS IMAGES
18194CLASS DIFFICULTY-AWARE REAL-TIME INSTANCE SEGMENTATION
14499CLASS-AWARE PERMUTATION-INVARIANT SIGNAL-TO-DISTORTION RATIO FOR SEMANTIC SEGMENTATION OF SOUND SCENE WITH SAME-CLASS SOURCES
17897Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation
11301CLASS-IMBALANCED MULTI-VIEW CLUSTERING VIA SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE
12349CLASS-INVARIANT TEST-TIME AUGMENTATION FOR DOMAIN GENERALIZATION
16636CLEAN: COMPLIANT LOOPS WITH ENHANCED ADJUSTMENT FOR TRAINING-FREE UNLEARNING
5518ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery
3259CLIP-driven Zero-shot Learning with Ambiguous Labels
13632CLIP-Guided Unsupervised Semantic-Aware Exposure Correction
3888CLOSED-FORM 3D TDOA-AOA SOURCE LOCALIZATION WITH QUATERNIONS
6960Closed-form Ziv-Zakai Bound for Compressive Time Delay Estimation
13416Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Multimodal Emotion Recognition
6066CLUEUP: RESOLVING INTENT AMBIGUITY IN PERSONALIZED WEB AGENTS WITH PROFILE-DRIVEN CLARIFICATION
5247CLUSTERING OF MULTISOURCE REMOTE SENSING DATA VIA LOW-RANK TENSOR LEARNING WITH SPATIAL CONSTRAINTS
11754CLUSTERING-DRIVEN MEMORY COMPRESSION FOR ON-DEVICE LARGE LANGUAGE MODELS
13490CMCFAE: CLOUD MODEL CHARACTERISTIC FUNCTION AUTO-ENCODER FOR STRUCTURE-AWARE GENERATIVE MODELING
3877COARSE ADVERSARIAL TRAINING WITH LABEL GROUPING FOR ROBUST CLASSIFICATION
4632COARSE-TO-FINE TRAJECTORY PREDICTION VIA TIME-AWARE INTERACTION PREDICTOR AND CONDITIONAL DIFFUSION-BASED REFINER
5107CODECSLIME: TEMPORAL REDUNDANCY COMPRESSION OF NEURAL SPEECH CODEC VIA DYNAMIC FRAME RATE
18932CODED ROBUST AGGREGATION FOR DISTRIBUTED LEARNING UNDER BYZANTINE ATTACKS
12597CODEOE: A BENCHMARK FOR JOINTLY EXTRACTING CROSS-DOCUMENT EVENTS AND OPINIONS FROM SOCIAL MEDIA
11098CODEPMP: SCALABLE PREFERENCE MODEL PRETRAINING FOR LARGE LANGUAGE MODEL REASONING
5980CODESEP: LOW-BITRATE CODEC-DRIVEN SPEECH SEPARATION WITH BASE-TOKEN DISENTANGLEMENT AND AUXILIARY-TOKEN SERIAL PREDICTION
3474Codesign of FDA-MIMO Radar-Communication System in the Presence of Mainlobe Deceptive Jammers
13430CODE-VISION: EVALUATING MULTIMODAL LLMS LOGIC UNDERSTANDING AND REASONING CAPABILITIES THROUGH CODE GENERATION
9277CO-DRS: DATA-FREE ROBUSTNESS STEALING VIA DUAL-MODEL COLLABORATION
9565CoETR2: Complementary Packet-based Modeling for Encrypted Traffic
9434COFE: A FRAMEWORK GENERATING COUNTERFACTUAL ECG FOR EXPLAINABLE CARDIAC AI-DIAGNOSTICS
19010Co-forecasting of Time-varying Spatial-frequency Map for Selective Fixed-Filter Multichannel ANC based on Dynamic Factor Graph
11938Cognition-enhanced One-step Diffusion Model for Degradation-aware Super-Resolution in the Dark
3616Cognitive Attention and Dual Residual Networks for Offline Regularized Multi-Agent Reinforcement Learning
11240COHERENT-GS: HIGH-FIDELITY 3DGS STYLIZATION WITH A GLOBALLY COHERENT COLOR MANIFOLD
14601CO-INITIALIZATION OF CONTROL FILTER AND SECONDARY PATH VIA META-LEARNING FOR ACTIVE NOISE CONTROL
14298COLLABORATIVE COMPRESSION FOR LARGE-SCALE MOE DEPLOYMENT ON EDGE
2897COLLABORATIVE GRAPH CONTRASTIVE NETWORK FOR SEMI-SUPERVISED GRAPH NODE CLASSIFICATION
16085Collaborative learning for Enhanced Cross Domain Adaptation
8132COLLABORATIVE OPTIMIZATION OF LEARNABLE PROBE TOKENS AND ATTRIBUTE TEXT PROMPTS FOR LOW-RESOLUTION FINE-GRAINED VISUAL CLASSIFICATION
13234COLLABORATIVE STANDARDIZATION OF MULTI-CENTER CLINICAL DATA USING DISTRIBUTION-AWARE LLM FUSION
16362Collective Experts against Noise: Enhancing Social Media Popularity Prediction via Retrieval-Augmented Multimodal Experts
2726Collusion-Resistant and Trusted Authority-Free Verifiable Federated Learning via a Two-Server Architecture
5613COMBINING MULTI-ORDER ATTENTION AND MULTI-RESOLUTION DISCRIMINATOR FOR HIGH-FIDELITY NEURAL VOCODER
14215COMBINING SSL SPEECH FEATURES, CONTEXTUAL TRANSFORMERS AND MAMBA MODELS FOR REALISTIC AUDIO SPOOFING DETECTION
18878COMBINING X-VECTORS AND BAYESIAN BATCH ACTIVE LEARNING: TWO-STAGE ACTIVE LEARNING PIPELINE FOR SPEECH RECOGNITION
1566COME: Towards Superior Embeddings for Multimodal RAG With Heterogeneous Input
2522COMET: continuous-time trajectory-guided temporal modeling for spacecraft pose estimation
7514Communication-Efficient Federated Learning with Pre-Executed Gradient Descent
16328COMNET: A COMPLEMENTARY PROTOTYPES-GUIDED RECONSTRUCTION FRAMEWORK FOR MULTI-CLASS ANOMALY DETECTION
3784COMPACT REPRESENTATION LEARNING FOR MULTIMODAL DRUG-DRUG INTERACTION EVENT PREDICTION
1357Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification
16411Complex-Aware Semi-Supervised Modulation Recognition via Latent Adversarial Training
9052COMPOSED GRIDS, ONE-WARP: TEST-TIME RECTIFICATION
13329COMPOSED VISUAL GROUNDING IN REMOTE SENSING IMAGES
2890Composite Memory Transformer for Online 3D Human Motion Predicition
3904Compositional Image Synthesis with Inference-Time Scaling
10056Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
9558COMPRESSED BC-LISTA VIA LOW-RANK CONVOLUTIONAL DECOMPOSITION
14394Compressed Spectrum Cartography: Estimating Wide-Band Channel Gain Maps from Sub-Nyquist Delay Correlations
16004COMPRESSING KV CACHE FOR LONG-CONTEXT LLM INFERENCE WITH INTER-LAYER ATTENTION SIMILARITY
16073Compression meets Sampling: LZ78-SPA for Efficient Symbolic Music Generation
4813COMPRESSIVE RECOVERY OF SIGNALS DEFINED ON PERTURBED GRAPHS
9961COMPRESSIVE SPATIAL CHANNEL ESTIMATION UNDER IQ IMBALANCE
13956CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures
11148CONCEPT ACTIVATION VECTORS: A UNIFYING VIEW AND ADVERSARIAL ATTACKS
17515CONCEPTDEBIAS: INTERPRETABLE BIAS MITIGATION VIA CONCEPT DECOMPOSITION IN DEEP NEURAL NETWORKS
16365CONDITIONAL DIFFUSION INVERSION LEARNING FOR MULTI-VIEW STEREO
14189CONDITIONAL DIFFUSION MODELS FOR MENTAL HEALTH-PRESERVING VOICE CONVERSION
15755Conditional Prior-based Non-stationary Channel Estimation Using Accelerated Diffusion Models
3677CONDITIONAL VARIATIONAL AUTOENCODER FOR GLOSS-FREE SIGN LANGUAGE TRANSLATION
18155CONFCLIP: CONFIDENCE-WEIGHTED AND CLIPPED REWARD FOR REINFORCEMENT LEARNING IN LLMS
11302CONFIDENCE-BASED FILTERING FOR SPEECH DATASET CURATION WITH GENERATIVE SPEECH ENHANCEMENT USING DISCRETE TOKENS
14218Confidence-Guided Error Correction for Disordered Speech Recognition
3140CONFIDENT MOTION MAGNIFICATION CURRICULUM FOR SELF-SUPERVISED OPTICAL FLOW
11282Conflict-Aware Client Selection for Multi-Server Federated Learning
13901CONFORMAL INFERENCE FOR TIME SERIES OVER GRAPHS
10073CONFORMAL PREDICTION AIDED KALMAN FILTERS WITH CONFIDENCE INTERVALS
17459CONFORMAL SIGNAL TEMPORAL LOGIC FOR ROBUST REINFORCEMENT LEARNING CONTROL: A CASE STUDY
15093Conformalized Gaussian processes for online uncertainty quantification over graphs
11762Conjugate Relation Modeling for Few-Shot Knowledge Graph Completion
4560Connecting Layer-Wise Representation of WavLM with Spectro-Temporal Modulation on Speaker Verification
13574CONQUER: CONTEXT-AWARE REPRESENTATION WITH QUERY ENHANCEMENT FOR TEXT-BASED PERSON SEARCH
3192CONSENSUS, CONFLICT, AND COORDINATION: THE C^3-MKD FRAMEWORK FOR RELIABLE MULTI-TEACHER KNOWLEDGE DISTILLATION
8559Consensus-Awarded Multi-Agent Debate via Adversarial Interaction
12190CONSISTENCY-AWARE LEARNING FOR UNBIASED VISUAL QUESTION ANSWERING
10137CONSTANT-MODULUS LINEAR TRANSFORM FOR RIS BEAMFORMING IN UPLINK MULTIUSER MIMO SYSTEMS
18940CONSTRAINED CONDITIONAL DENOISING DIFFUSION FOR HYPERSPECTRAL-MULTISPECTRAL FUSION
4131CONSTRAINED LOCAL POINT CLOUD PERTURBATIONS USING ADAPTIVE CURVATURE FOR 3D ADVERSARIAL ATTACKS
3092Constrained Paraphrase Consistency for LLM Hallucination Detection
1010CONSTRAINT OPTIMIZED MULTICHANNEL MIXER-LIMITER DESIGN
12679CONSTRUCTING COMPOSITE FEATURES FOR INTERPRETABLE MUSIC-TAGGING
4622CONSTRUCTION OF BINARY SEQUENCE PAIRS WITH EQUAL PERIODIC AUTOCORRELATION
9477CONTENT ADAPTIVE SWITCHABLE HYPERPRIOR NETWORKS FOR LEARNED IMAGE COMPRESSION
16132CONTENT ANONYMIZATION FOR PRIVACY IN LONG-FORM AUDIO
11020CONTENT LEAKAGE IN LIBRISPEECH AND ITS IMPACT ON THE PRIVACY EVALUATION OF SPEAKER ANONYMIZATION
1482Content-Aware Model Slimming for Image Super-Resolution with Large Input
12605CONTENT-PRESERVING SPEECH REPRESENTATION LEARNING VIA ADAPTIVE SEGMENT-LEVEL ALIGNMENT
13330CONTEXT-AWARE DEEP HASHING FOR CROSS-DOMAIN IMAGE RETRIEVAL
6772CONTEXT-AWARE DYNAMIC GRAPH LEARNING FOR MULTIMODAL EMOTION RECOGNITION WITH MISSING MODALITIES
12583CONTEXTUAL BIASING FOR ASR IN SPEECH LLM WITH COMMON WORD CUES AND BIAS WORD POSITION PREDICTION
12821CONTEXTUAL CLUE MINING AND CLASS CALIBRATION FOR WEAKLY SUPERVISED VIDEO ANOMALY DETECTION
8773CONTEXTUAL RELATIONSHIP FEATURE-ENHANCED STEGANALYSIS FOR SOCIAL TEXTS
11348Continual Learning with CLIP Text-Prototype and an Orthogonal Pre-Expanded Classification Head
11102CONTINUAL NEURAL NETWORK RETRIEVAL FOR EVER-EXPANDING MODEL ZOO
15307CONTINUAL TIME SERIES FORECASTING WITH DIFFUSION MODELS UNDER FUNCTIONAL REGULARIZATION
14566CONTINUATION METHOD FOR FEEDBACK DELAY NETWORK MODAL DECOMPOSITION
19062Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion
16158Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
11216CONTRASTIVE DISTILLATION OF EMOTION KNOWLEDGE FROM LLMS FOR ZERO-SHOT EMOTION RECOGNITION
14867CONTRASTIVE HYPERSPHERE FOR ONE-CLASS LINGUISTIC STEGANALYSIS
10514Contrastive Learning-Based Deep Neural Network for Robust DOA Estimation
12661CONTRASTIVE PERTURBATION WITH FREQUENCY-DOMAIN FEATURE FUSION FOR FACE PRIVACY PROTECTION
9781CONTRASTIVE TIMBRE REPRESENTATIONS FOR MUSICAL INSTRUMENT AND SYNTHESIZER RETRIEVAL
15605Controllable Embedding Transformation for Mood-Guided Music Retrieval
11759CONTROLLABLE LOCALIZED FACE ANONYMIZATION VIA DIFFUSION INPAINTING
6897CONTROLLING LANGUAGE DIFFICULTY IN DIALOGUES WITH LINGUISTIC FEATURES
18901Convergence Analysis of the Factorial Kalman Filter
18886CONVOLUTIONAL FILTERING WITH RKHS ALGEBRAS
14825CONVOLUTIONAL GRAPH FILTER DESIGN FOR SIGNED GRAPHS
11944COOPERATIVE DETECTION OF CYCLOSTATIONARY TARGET ECHOES FOR PASSIVE RADAR NETWORKS
3514COOPERATIVE MULTI-AGENT REINFORCEMENT LEARNING FOR ADAPTIVE AGGREGATION IN SEMI-SUPERVISED FEDERATED LEARNING WITH NON-IID DATA
6777COREANCHOR-QA: CENTER-ANCHORED AND SELF-IMPROVING FOR QUESTION-ANSWER GENERATION
15089CORM: COARSE-TO-FINE-GRAINED OFFLOADING FOR SMOE LLM INFERENCE ON CONSUMER-GRADE GPU
7936CORRECTING THE BIAS: AVOIDING FALSE TRIPLET INJECTION IN MULTILINGUAL KNOWLEDGE GRAPH COMPLETION WITH LLM-AUGMENTED REASONING
9645CorrEctor: An Execute-to-Correct Paradigm for Efficient LLM Secure Inference
7879COSAGE: FEDERATED LEARNING WITH GRADIENT SUMMARIES FOR CENTRALIZED CLIENT SELECTION
10009COST–EFFICIENT DYNAMIC FEATURE ACQUISITION UNDER LIMITED SUPERVISION
15663CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data
8164COUNTERFACTUAL PROBABILITY DISTILLATION FOR REMOTE SENSING
3056COUNTING DISTINCT MULTIVARIATE SELF-SIMILARITY PARAMETERS USING A BOOTSTRAP-DRIVEN GRAPH CLUSTERING APPROACH
10909Coupling Acoustic Geometry and Visual Semantics for Robust Depth Estimation
9901COVA: TEXT-GUIDED COMPOSED RETRIEVAL FOR AUDIO-VISUAL CONTENT
9557CoVariance Filters and Neural Networks over Hilbert Spaces
18243Covariance-Agnostic Model-Based Deep Learning Filter for Jump Systems
12715CP LOSS: CHANNEL-WISE PERCEPTUAL LOSS FOR TIME SERIES FORECASTING
2490CP-GUARD: CONTINUAL PREFERENCE ALIGNMENT FOR COPYRIGHT PROTECTION
17870CPJ: Explainable Agricultural Pest Diagnosis via Caption–Prompt–Judge with LLM-Judged Refinement
5794CPMark: Robust Latent watermarking against composite perturbations
14250CPMIL: COMPOUND PROTOTYPE-BASED MULTIPLE INSTANCE LEARNING FOR WHOLE SLIDE IMAGE CLASSIFICATION
10103CPT: CONSISTENT PROXY TUNING FOR BLACK-BOX MODELS
11488CPTFORMER: SELF-SUPERVISED CHANGE-POINT-AWARE TRANSFORMER FRAMEWORK FOR NON-STATIONARY TIME SERIES FORECASTING
19039Cramér-Rao Bounds for Laplacian Matrix Estimation
15373Cramér-Rao Bounds on Sparse-Diffuse Channel Estimation
19034CRB Optimization for Intelligent Reflecting Surface-Assisted NLOS Wireless Sensing
14939CRDSNet: Scene Text Recognition Based on Cross-modal and Recurrent Decomposed Self-Attention
8760CREDID: CREDIBLE MULTI-BIT WATERMARK FOR LARGE LANGUAGE MODELS IDENTIFICATION
11274Critical Noise: An Efficient Label-Flipping Attack Against Malicious Traffic Detection Systems
6636CRLB-Guided Orientation Design of Photodiode Arrays for Wide-FOV Optical Wireless Reception
12671Crop Classification in Satellite Images via First Eigenvector of Learned Signed Graph Laplacian
11776Cross Paraphrastic Invariance Learning for Hallucination Detection
9746CROSS PSEUDO LABELING FOR WEAKLY SUPERVISED VIDEO ANOMALY DETECTION
17257CROSS TASK KNOWLEDGE TRANSFER FOR REHEARSAL-FREE CONTINUAL LEARNING
10754Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification
16141CROSS-ATTENTION BASED DUAL-STREAM FRAMEWORK FOR BLIND UNDERWATER IMAGE QUALITY ASSESSMENT
14891CROSS-ATTENTIVE ADAPTER WITH REGULARIZED DOMAIN ADAPTATION FOR SPEAKER VERIFICATION
4470CROSS-CULTURAL BIAS IN MEL-SCALE REPRESENTATIONS: EVIDENCE AND ALTERNATIVES FROM SPEECH AND MUSIC
10148CROSS-DOMAIN CONTRASTIVE LEARNING WITH DYNAMIC THRESHOLD CALIBRATION FOR SOURCE SPEAKER TRACING
15707CROSS-DOMAIN LORA FINGERPRINT LOCALIZATION VIA SPATIAL REPRESENTATION FEW-SHOT KNOWLEDGE DISTILLATION
15014CROSS-EXAMINER: EVALUATING CONSISTENCY OF LARGE LANGUAGE MODEL-GENERATED EXPLANATIONS
6170CROSS-LINGUAL ALZHEIMER’S DISEASE DETECTION WITH MULTIMODAL LLMS VIA SPEECH CUE-AUGMENTED PROMPTING AND INSTRUCTION TUNING
11173Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
14279CROSS-LINGUAL INTERLEAVING FOR SPEECH LANGUAGE MODELS
6702Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition
16682CROSS-MODAL GUIDANCE FOR FAST DIFFUSION-BASED COMPUTED TOMOGRAPHY
14318CROSS-MODAL KNOWLEDGE DISTILLATION FOR SPEECH LARGE LANGUAGE MODELS
11056CROSS-MODAL KNOWLEDGE DISTILLATION FROM VIDEO TO WIFI CSI FOR MULTI-USER HUMAN ACTIVITY RECOGNITION
8178CROSS-MODAL POINT CLOUD COMPLETION VIA STRUCTURALLY-AWARE PROXY GUIDANCE
5524Cross-scene Hyperspectral Image Classification via Topology-Aware Learning without Source Data
14032Cross-View Change Detection via Self-Supervised Contrastive Representation Learning
14701CROWDSOURCED DIGITAL TWINS FOR BEAMFORMING OPTIMIZATION IN XR COMMUNICATIONS
6841CS3-BENCH: EVALUATING AND ENHANCING SPEECH-TO-SPEECH LLMS FOR MANDARIN-ENGLISH CODE-SWITCHING
17443CSFUSION: FLEXIBLE MULTI-MODAL IMAGE FUSION VIA CONTENT-STYLE CROSS MODULATION
10118CSGAN-VLP:Swin-Transformer Enhanced GAN and Contrastive Alignment for Robust Cross-Scene Passive Visible Light Positioning
15788CSGONET: COLLABORATIVE SELF-SUPERVISED LEARNING WITH GRAPH AND OCCUPANCY RECONSTRUCTION FOR TRAJECTORY PREDICTION
7925CSPC: A High-Quality Dataset and Comprehensive Evaluation Metric for Chinese Sentence Paraphrasing
17782CTC-DID: CTC-BASED ARABIC DIALECT IDENTIFICATION FOR STREAMING APPLICATIONS
13579CTGFILTER: USABILITY-PRESERVING CONTROLLABLE TEXT GENERATION VIA NULL-SPACE PROJECTION
10675CTR-LORA: CURVATURE-AWARE AND TRUST-REGION GUIDED LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS
11656CUE-TS: SOFT COVARIATE PROMPTS AS A UNIVERSAL ENHANCER FOR TIME SERIES FOUNDATION MODELS
5021CURRICULUM LEARNING WITH CONTRASTIVE LOSS FOR LIGHTWEIGHT SPEAKER VERIFICATION
14082CURVATURE-DRIVEN SYNCHROSQUEEZING TRANSFORM: A FINE-SCALE BIDIRECTIONAL METHOD FOR TIME-FREQUENCY REPRESENTATION
11880CURVILINEAR SPECTRAL U-NET: A FRAMEWORK FOR STRUCTURE-AWARE ROAD EXTRACTION FROM VERY HIGH RESOLUTION IMAGERY
16499CVAR-AWARE NETWORK SLICING FOR TAIL LATENCY UNDER TIERED DEADLINES
18705CVSTIM: MITIGATING OBJECT HALLUCINATION IN MLLMS VIA CO-OCCURRENCE GUIDED VISUAL STIMULATION
12517CYLINDERFUSION: SELF-ADAPTIVE CYLINDRICAL 3+1D RADAR-CAMERA FUSION FOR WATERWAY POINT CLOUD SEGMENTATION
9556CZSRSSC: CONTINUAL ZERO-SHOT REMOTE SENSING SCENE CLASSIFICATION
13063D2AFM: DUAL-DOMAIN ADAPTIVE FUSION MODULE FOR UNDERWATER IMAGE ENHANCEMENT
12753D²-DETR: Dual-Sourced Augmentation with Duration-Aware Differential Decoder for Video Temporal Grounding
4764D2M: Decoupling to Modulate via Emotion Trajectories for Dynamic Facial Expression Recognition
4363D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead sheet
10130DAAGNET: DEPTH-ADAPTIVE ANCHOR GRAPH FOR WEAKLY-SUPERVISED CROWD COUNTING
9361DAC: DIFFERENTIABLE ARCHITECTURE CAUSALITY
17911DACAS: ADVERSARIAL SPARSE ATTENTION NETWORKS FOR CROSS-MODAL ANOMALY DETECTION IN DISTRIBUTED SYSTEMS
10415DADAGAN: AN IMAGE SUPER-RESOLUTION NETWORK WITH PIXEL-WISE ESTIMATION OF DEGRADATION DEGREES
16671DAFMR: DUAL-SIDE ATTRIBUTE-AWARE FUSION WITH MIXTURE-OF-EXPERTS AND REGULARIZATION FOR RECOMMENDATION SYSTEMS
4602DAIEN-TTS: DISENTANGLED AUDIO INFILLING FOR ENVIRONMENT-AWARE TEXT-TO-SPEECH SYNTHESIS
6185DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation
5816DAME: DURATION-AWARE MATRYOSHKA EMBEDDING FOR DURATION-ROBUST SPEAKER VERIFICATION
14873DAMO: A DATA-EFFICIENT MULTIMODAL ORCHESTRATOR FOR TEMPORAL REASONING WITH VIDEO LLMS
2389DANCINGMATE: COORDINATED AND SYNCHRONIZED DANCE ACCOMPANIMENT GENERATION
15894DAPT: A DUAL-PATH FRAMEWORK FOR MULTILINGUAL MULTI-HOP QUESTION ANSWERING
18874DARAS: DYNAMIC AUDIO-ROOM ACOUSTIC SYNTHESIS FOR BLIND ROOM IMPULSE RESPONSE ESTIMATION
10671DARC-CLIP: DYNAMIC ADAPTIVE REFINEMENT WITH CROSS-ATTENTION FOR MEME UNDERSTANDING
1415DARE: Dual-Aspect Reflective Evolution for Prompt Optimization
3750DarkCite: Unveiling Authority Bias as Implicit RAG Jailbreak Attacks
10025DARKVRAI: CAPTURE-CONDITION CONDITIONING AND BURST-ORDER SELECTIVE SCAN FOR LOW-LIGHT RAW VIDEO DENOISING
8705DARL-CLIP: DENSITY-ADAPTIVE AND REINFORCEMENT FINE-TUNING CLIP FOR CROSS-SCENARIO UAV OBJECT DETECTION
2817DART: a Dual-modality Adaptive Representation with divergence Training framework for ZS-CIR
3834DART: DIFFERENTIAL ACOUSTIC RANGING FOR CALIBRATION-FREE HEAD TRACKING WITH ULTRASONIC SENSORS
9435DASE: MAXIMUM ENTROPY DATA SELECTION FOR BALANCED PRETRAINING CORPORA OF LARGE LANGUAGE MODELS
1029Data-Adaptive Proximal Operator: Demonstration on Low-Rank Sparse Subspace Clustering
17047DATA-BRIDGE: A MULTI-AGENT SYSTEM FOR CODE-BASED MULTIMODAL SCHEMA ALIGNMENT
14424DATA-DRIVEN ALGORITHMS FOR ROBUST OR SELECTIVE CFAR DETECTION IN COLORED GAUSSIAN NOISE
7798DATA-DRIVEN CLUSTERING AND MERGING OF ADAPTERS FOR ON-DEVICE LARGE LANGUAGE MODELS
17578DATA-DRIVEN GRAPH FILTERS VIA ADAPTIVE SPECTRAL SHAPING
3209DATA-DRIVEN REGULARIZATION USING IDLE-STATE MEASUREMENTS FOR IMPROVED VEHICLE NOISE PREDICTION
15169Data-Driven Two-Stage IRS-Aided Sumrate Maximization with Inexact Precoding
2231Dataset-Driven Channel Masks in Transformers for Multivariate Time Series
11710DATKD: DECOUPLED ATTENTION TRANSFER KNOWLEDGE DISTILLATION FOR VISION TRANSFORMERS
8573DA-VLM: Data Factory with Minimal Effort Using VLMs
5939DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content
5000DCFL: DUAL END CONSTRAINT FEDERATED LEARNING WITH AN ADAPTIVE ANALYTIC ANCHOR
16216DCINJECT: PERSISTENT BACKDOOR ATTACKS VIA FREQUENCY MANIPULATION IN PERSONAL FEDERATED LEARNING
18059DC-MAMBER: A DUAL CHANNEL PREDICTION MODEL BASED ON MAMBA AND LINEAR TRANSFORMER FOR MULTIVARIATE TIME SERIES FORECASTING
2968DCR-MUCL:Dual Granularity Consistency Routing Multimodal Unified Contrastive Learning for Rumor Detection Network
12690DCSF: ENHANCING CERTIFIED ROBUSTNESS VIA DYNAMIC COST-SENSITIVE AND SELF-SUPERVISION FRAMEWORK
6723DDCM: Dual-Domain Collaborative Modeling with Boundary-to-Structure Refinement for Camouflaged Object Detection
17461DDPT: DISTILLATION AND DYNAMIC TOWARD BETTER PROMPT TUNING FOR IMPROVING COMPLEX REASONING IN LARGE LANGUAGE MODELS
9362DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing
13270DDSC: DYNAMIC DUAL-SIGNAL CURRICULUM FOR DATA-EFFICIENT ACOUSTIC SCENE CLASSIFICATION UNDER DOMAIN SHIFT
5801DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment
2514DEAR-SR: A Degradation-Aware Adversarially Robust Framework for Infrared Super-Resolution
4898DebateCTI: Enhancing ATT&CK Technique Identification in CTI Reports via a Role-Specialized Multi-Agent Debate
11382DEBATING FOR COREFERENCE: A MULTI-AGENT FRAMEWORK FOR CROSS-DOCUMENT EVENT COREFERENCE RESOLUTION
16202DEBIASED ADAPTIVE DUAL-VIEW GRAPH LEARNING FOR NEXT POI RECOMMENDATION
10743DebiasHSD: Failure-guided Debiasing for Cross-Domain Hate Speech Detection
8153DE-BIASING FACIAL AGE ESTIMATION: A DUAL-STAGE DISENTANGLEMENT FRAMEWORK FOR CROSS-RACIAL GENERALIZATION
11523DECENTRALIZED ACCELERATED MINIMAX OPTIMIZATION VIA EXACT DIFFUSION
2067Decentralized Detection with Many Sensors: Optimality of Exchangeable and Identical Encoding Policies
11407DECENTRALIZED LEARNING OF DECISION MODELS FOR CLASSIFICATION WITH DEPENDENT AGENTS
14156Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks
12688DECENTRALIZED LEARNING WITH DYNAMICALLY REFINED EDGE WEIGHTS: A DATA-DEPENDENT FRAMEWORK
11653DECISION FUSEDCONV: EFFICIENT OFFLINE REINFORCEMENT LEARNING VIA FUSED STATE-REWARD ENCODING AND HYBRID TEMPORAL CONVOLUTION
3801Deco3D: Decoupled Semantic and Geometric Learning for Sparse Supervision in 3D Object Detection
12141DECODER-ONLY CONFORMER WITH MODALITY-AWARE SPARSE MIXTURES OF EXPERTS FOR ASR
4028DECOMPOSED SEASONAL-TREND NETWORK WITH ROTARY ATTENTION FOR TIME SERIES FORECASTING
2737Decomposing Multilingual Representations: How Scale, Architecture, and Data Shape Functional Specialization
5045DECONFUSION CLIP TOWARDS ROBUST OUT-OF-DISTRIBUTION DETECTION
12675DECORRELATION-ENHANCED MULTIBAND SUBBAND ADAPTIVE FILTERING FOR RIR TRACKING IN SOUND FIELD CONTROL
10507Decouple and Match: Frequency-Decoupled Local Matching for Signature Verification
15473Decoupled Reconstruction for Low-Dose CT: From SR-Perceptual Recovering to Frequency-Contrastive Fidelity Rebuilding
3134DecoupleGS: Physical-aware Gaussian Decoupling for High-Quality 3D Scene Lighting Enhancement
3982DECOUPLING MOTION AND TEXTURE: A HYBRID RECURRENT NETWORK FOR VIDEO QUALITY ENHANCEMENT
4273DECOUPLING ORTHOGONAL LIP FEATURES AGAINST GENERATIVE IMPOSTERS
8059DECOUPLING RAW-BASED LOW-LIGHT ENHANCEMENT VIA A FREQUENCY-AWARE TRANSFORMER
11549Deep Co-occurrence Matrix Network for Classification of Plant Fiber SEM Images
12025DEEP DUBBING: END-TO-END AUTO-AUDIOBOOK SYSTEM WITH TEXT-TO-TIMBRE AND CONTEXT-AWARE INSTRUCT-TTS
6561DEEP FUZZY CLUSTERING WITH ANCHOR GRAPH PRESERVATION AND MEMBERSHIP ALIGNMENT
1303DEEP IMAGE PRIOR WITH L0 GRADIENT REGULARIZER FOR IMAGE SMOOTHING
5591DEEP LARGE-MARGIN LP-SVDD WITH CNN FEATURE LEARNING FOR NOVELTY DETECTION
14072DEEP LEARNING BASED ZERO LATENCY AUTOMATIC MUSIC MIXING FOR LIVE PERFORMANCES
14663DEEP LEARNING-BASED JOINT OPTIMIZATION OF ADAPTIVE FEEDBACK CANCELLATION AND RESIDUAL FEEDBACK SUPPRESSION FOR HEARING AIDS
3927DEEP LOCAL FIELD CONSISTENCY FOR NON-RIGID POINT CLOUD REGISTRATION
3189Deep Lossless Point Cloud Attribute Compression via EED Prediction
18902DEEP PHYSICALLY PARAMETERIZED ALL-IN-ONE NETWORK FOR LENS-FREE MICROSCOPY IMAGING
16543Deep Reinforcement Learning for Dynamic Sensing and Communications
3656Deep Spatial Clue Informed Ambisonic Encoding for irregular microphone arrays
8009Deep Tensor Completion for Fast Direct Position Determination
13852DEEP TPC: TEMPORAL-PRIOR CONDITIONING FOR TIME SERIES FORECASTING
11130DEEP UNFOLDED SUBSPACE-BASED DOA RECOVERY FROM SPARSE ARRAYS
11637DEEP UNFOLDED SUPERIORIZED POCS FOR ROBUST JOINT TRANSMISSION UNDER PHASE MISALIGNMENT
10852DEEP VIDEO FRAME INTERPOLATION DETECTION VIA EVENT-GUIDED TEMPORAL ANALYSIS AND HIGH-FREQUENCY ARTIFACTS
13344DEEPAQ: A PERCEPTUAL AUDIO QUALITY METRIC BASED ON FOUNDATIONAL MODELS AND WEAKLY SUPERVISED LEARNING
5781Deepfake Detection via Data-Level Multi-Stream Assessment
18328Deepfake-HMDE: Hierarchical Mixture of Deepfake Experts for Deepfake Detection
16359Deep--Shallow Mixed Gaussian Processes for Efficient and Robust Training
14121DEEPTRAVERSE: AN ALGORITHM-INSPIRED DESIGN PARADIGM FOR STRUCTURED AND INTERPRETABLE VISION BACKBONES
13410Defending 3D Point Clouds with Frequency-Guided Diffusion model
11632DEFENSEMEL: ENHANCING ADVERSARIAL ROBUSTNESS OF MULTIMODAL ENTITY LINKING WITH MULTIMODAL LARGE LANGUAGE MODELS
3582DEFINE: A FINE-GRAINED ANNOTATED AND HIERARCHICALLY STRUCTURED DATASET FOR LONG-FORM ARTICLE GENERATION
12302DEGRADATION DESCRIPTION PROMPTING FOR UNDERWATER IMAGE RESTORATION
13387DELAY AND RANDOM SCATTERING ESTIMATION WITH A BAND-LIMITED SIGNAL: UNCONDITIONAL CRB AND MLE
13335Delay Embedding For Differential Graph Learning From Dependent Data
3411DELNET: CONTINUOUS ALL-IN-ONE WEATHER REMOVAL VIA DYNAMIC EXPERT LIBRARY
5084DeMoFL: Efficient and Effective Decentralized Model-Heterogeneous Federated Learning
16225DEMONET: DEGRADATION-AWARE MODALITY INTERACTION FOR MULTI-MODAL OBJECT DETECTION IN CAR CABIN
1368DEMO-POSE: DEPTH-MONOCULAR MODALITY FUSION FOR OBJECT POSE ESTIMATION
13647DemoReranker: Enhancing the In-context Learning Capability of Multi-modal Large Models via Demonstration Reranking
3169Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
11508DENOISING DIFFUSION MODEL FOR DOA ESTIMATION
12146DENOISING OF STOCHASTIC RAY TRACING ROOM IMPULSE RESPONSES
18914DENOISING PIECEWISE CONSTANT NANOPORE SIGNALS
4694DEOPT: SYNERGIZING LARGE LANGUAGE MODELS AND DIFFERENTIAL EVOLUTION FOR JOIN ORDER OPTIMIZATION
9408DEPMPK: MULTI-PERSPECTIVE KNOWLEDGE FUSION FOR MULTIMODAL DEPRESSION DETECTION
13211Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation
1156DepthCLIP3D: A UNIFIED APPROACH FOR 3D VISUAL UNDERSTANDING WITH DEPTH
15419DEPTHFUSION: DEPTH-GUIDED INFRARED AND VISIBLE IMAGE FUSION FOR ENHANCED DOWNSTREAM TASKS
16206Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery
11418DEPTH-GUIDED RELIGHTING: RESOLVING LIGHTING INCONSISTENCY FOR SEAMLESS BACKGROUND REPLACEMENT
5105DEPTHSHIELD: ROBUST DEPTH ESTIMATION ON TRANSPARENT OR MIRROR SURFACES
14699DepthTalk: Few-Shot Talking Head Generation with Depth-Aware 3D Gaussian Field Motion
10646DERIVING MOMENTS IN THE AGE OF GOSSIP PROCESS FROM PERCOLATION
9788DES: A MULTI-STAGE FRAMEWORK FOR ACCURATE FABRIC PRINTED PATTERN SEGMENTATION
3807DESIGN OF DIFFERENTIAL MICROPHONE ARRAYS VIA A 3D SPATIAL DIFFERENCE OPERATOR
12208DetailCLIP: Injecting Image Details into CLIP's Feature Space
6231DETECTING AND ATTRIBUTING SYNTHETIC SPANISH SPEECH: THE HISPASPOOF DATASET
2938DETECTING OSCILLATING SINGULARITIES WITH THE WEAK SCALING EXPONENT
2677Detecting Trojaned Inputs at Runtime: Activation-Distribution Defenses for Untrusted CNNs
18644Detection and Angle Estimation in Colocated MIMO Radar in the Presence of Grating Lobes
4478DETECTWILD: IN-THE-WILD AI-GENERATED TEXT DETECTION BENCHMARK
10404DFA-SNN: Dual-Frequency Attention Module for Spiking Neural Networks
18463DFATran: Beyond Static Features for Dynamic Transferability Estimation
11138DFF-CGT: Frequency-Domain Feature Fusion with Class-Guided Thresholding for UniSSDA
2274DFFNET: COMBINING SIMILAR AND DIFFERENT DUAL FEATURE FLOWS TO ACHIEVE MULTIPLE WEATHER REMOVAL
4567DFGA-Net: A Dual-Frequency Guided Attention Network for Multivariate Time Series Prediction
1535DFL-ALLC: ADAPTIVE LOCAL LEARNING CONTROL FOR DECENTRALIZED FEDERATED LEARNING IN HETEROGENEOUS VEHICULAR NETWORKS
11367DFLF: A SCALABLE DECENTRALIZED FEDERATED LEARNING FRAMEWORK BASED ON PYTORCH
15728DFMAD: DATA-FREE BACKDOOR DEFENSE FOR FEDERATED LEARNING VIA MULTI-TEACHER ADVERSARIAL DISTILLATION
10914DGCS: Depth-Guided Continual Self-learning for Infrared and Visible Image Fusion
4732DGER: DIFFUSION-GUIDED EFFICIENT RESTORATION FOR UNDERWATER IMAGES
10932DGF-Net: Underwater Image Enhancement via Depth Priors and Frequency-Domain Modeling
16401DGSDNET: DUAL-GRAPH SPECTRAL DIFFUSION NETWORK FOR INCOMPLETE MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS
17031DHEval: A Dynamic Hallucination Evaluation Protocol Robust to Data Contamination
15011DIACDM: COGNITIVE DIAGNOSIS IN TEACHER-STUDENT DIALOGUES USING THE INITIATION-RESPONSE-EVALUATION FRAMEWORK
13553DIAGNOSE-REFLECTIVE PLANNING: FAITHFUL KG REASONING VIA LLM-GUIDED MCTS WITH STRATEGIC SELF-CORRECTION
10595DIAL: DATABASE-INFORMED INTERACTIVE MULTI-AGENT SYSTEM LOOP FOR PERSONALIZED IMAGE GENERATION
11343DIFF: DIFFUSION MODEL AIDED FEATURE FUSION NETWORK FOR E2E LONG-TAILED VISUAL RECOGNITION
16087DIFFACLFSMD: DIFFUSION-AUGMENTED CONTRASTIVE LEARNING FOR FEW-SHOT MALWARE DETECTION
6287DiffAntiSeq: A Controllable Diffusion Model for Efficient Antibody Library Design
9491DIFFEMOTALK: AUDIO-DRIVEN FACIAL ANIMATION WITH FINE-GRAINED EMOTION CONTROL VIA DIFFUSION MODELS
6240DIFFERENCE COARRAY OF MULTI-FREQUENCY SPARSE RATIONAL ARRAYS
17434Differentiable Grouped Feedback Delay Networks for Learning Direction and Position-Dependent Late Reverberation
14649DIFFERENTIABLE META-OPTIMIZATION FOR FEDERATED NEURAL ARCHITECTURE SEARCH
13774DIFFERENTIABLE PULSETABLE SYNTHESIS FOR WIND INSTRUMENT MODELING
6628Differentiable Resizing: Resolution Layers
14119Differential Privacy of Network Parameters from a System Identification Perspective
17416DIFFERENTIALLY PRIVATE CLUSTERED FEDERATED LEARNING WITH PRIVACY-PRESERVING INITIALIZATION AND NORMALITY-DRIVEN AGGREGATION
13873DIFFERENTIALLY PRIVATE DECENTRALIZED CONSTRAINED LEARNING WITH DUAL AVERAGING
6184DIFFERENTIALLY PRIVATE WEIGHTED K-SELECTION AT SCALE
3867Diff-EvINR: event-to-video reconstruction using diffusion models and implicit neural representations
11253DIFFFACE-EDIT: A DIFFUSION-BASED FACIAL DATASET FOR FORGERY-SEMANTIC DRIVEN DEEPFAKE DETECTION ANALYSIS
14636DIFF-IML : TOWARDS THE DIFFUSION-BASED REAL-WORLD IMAGE MANIPULATION LOCALIZATION
10555DIFFNATOR: GENERATING STRUCTURED EXPLANATIONS OF TIME-SERIES DIFFERENCES
9958DiffQ: UNIFIED PARAMETER INITIALIZATION FOR VARIATIONAL QUANTUM ALGORITHMS VIA DIFFUSION MODELS
10077DIFFRIM: A DIFFUSION-DRIVEN MODEL FOR HIGH EFFICIENCY RADAR INTERFERENCE MITIGATION
17156Diffusion Algorithm for Metalens Optical Aberration Correction
9629Diffusion Contrastive Learning for Robust Image Classification
14192Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access
16050DIFFUSION POSTERIOR SAMPLING FOR SLITLESS SPECTRAL IMAGING
2547DIFFUSION RESIDUAL MODELING FOR LONG-TERM TIME SERIES FORECASTING
10037Diffusion Stochastic Learning over Multi-Team Network Games
14117DIFFUSION TIMBRE TRANSFER VIA MUTUAL INFORMATION GUIDED INPAINTING
12389Diffusion-aided Extreme Video Compression with Lightweight Semantics Guidance
11597DIFFUSION-BASED NATURAL ADVERSARIAL PERTURBATIONS TOWARDS SEGMENT ANYTHING MODEL
11797Diffusion-Based Scene Text Image Super-Resolution with Visual Style and Semantic Guidance
5026DIFFUSION-BASED UNSUPERVISED AUDIO-VISUAL SPEECH SEPARATION IN NOISY ENVIRONMENTS WITH NOISE PRIOR
16394DIFFUSIONCOM: STRUCTURE-AWARE MULTIMODAL DIFFUSION MODEL FOR MULTIMODAL KNOWLEDGE GRAPH COMPLETION
18186DIFFUSION-DRIVEN PROXIMAL POSTERIOR SAMPLING FOR SYNTHETIC APERTURE RADAR IMAGING
13367DIFFUSION-LINK: DIFFUSION PROBABILISTIC MODEL FOR BRIDGING THE AUDIO-TEXT MODALITY GAP
14081DiffVAGS: Visual Alignment for High-Fidelity 3D Gaussian Splatting Generation
6181Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation
5363DIFTrack: Vision-Language Tracking with Deep Interaction Fusion
2652DiGA-Fuse: Depth-Guided Geometry-Aware Infrared and Visible Image Fusion
6482DIGITAL HUMAN-ASSISTED SMART CONTRACT VULNERABILITY DETECTION UNDER LIMITED SAMPLE CONSTRAINTS
7037DIGRAPH SIGNAL PROCESSING VIA POLAR DECOMPOSITION
2599Dilated Array Scheme Based on asymmetrically defined cumulants with Moving Platform
9093Dimensionality Reduction for Beamforming by Change of Variable
18012DIMO: DUAL-STRATEGY LEARNING FOR AMBIGUOUS SAMPLES IN CLASS-IMBALANCED FACIAL EXPRESSION RECOGNITION
7082DINN: KEY-ACTIVATED MODEL STEGANOGRAPHY WITH DYNAMIC SPARSE INVERTIBLE NEURAL NETWORKS
4475DiPaS-Bridge: Towards Paired-Guided Diverse Generation for Urban Layouts
12670DIRA: Deep High-Rank Adaptation of Pre-trained Language Models
14388DIRAVIG: DIFFERENTIABLE REGION ASSIGNMENT VISION GRAPH NETWORKS
2809DIRCR: Dual-Inference Rule-Contrastive Reasoning for Solving RAVENs
6386DIRECT POSITION DETERMINATION METHOD BASED ON SPARSE SPECTRUM DATA
18039Direct Preference Optimization for Speech Autoregressive Diffusion Models
10883DIRECT RICIAN-DOMAIN PROCESSING FOR NOISE-AWARE MRI DENOISING AND MICROSTRUCTURE PRESERVATION
9461Direct Simultaneous Translation Activation for Large Audio-Language Models
13866DIRECT TRANSFER OF PROSODY IN SPEECH-TO-SPEECH TRANSLATION USING DISENTANGLED SPEECH TOKENS
11003Directed Hypergraph Framelet Neural Network
12793DIRECTION-AWARE CROSS-MODAL FUSION NETWORK WITH HIERARCHICAL FEATURE RECONSTRUCTION FOR RGB-T SALIENT OBJECT DETECTION
17536Direction-PointNet: A Spatiotemporal Anisotropy Network for Human Action Recognition
17775Directly Trained Spiking Neural Networks with Adaptive Phase Coding
17886Disabling Reasoning: Backdoor Construction in Large Reasoning Models via Knowledge Editing
10782DISASTER-AFFECTED AREA EXTRACTION METHOD THROUGH PIXEL DIFFERENCE CONVOLUTION AND FREQUENCY-DOMAIN ENHANCEMENT
5833DISCONTSE: SINGLE-STEP DIFFUSION SPEECH ENHANCEMENT BASED ON JOINT DISCRETE AND CONTINUOUS EMBEDDINGS
17550DISCREPANCY-AWARE DISENTANGLED CONTRASTIVE LEARNING FOR MULTIMODAL RUMOR DETECTION
13946DISCRETE DIFFUSION FOR GENERATIVE MODELING OF TEXT-ALIGNED SPEECH TOKENS
11620DISCRETE VISION TOKENIZATION FOR VISION-LANGUAGE ALIGNMENT IN AUTONOMOUS DRIVING
1143DISCRETE-CONTINUOUS FUSION WITH ADAPTIVE HIERARCHICAL FEATURES FOR AUDIO DEEPFAKE DETECTION
5590Discrete-Periodic Ambiguity Function of Random Communication Signals
3771DISCRIMINANT LEARNING-BASED COLORSPACE FOR BLADE SEGMENTATION
15561Disentangled Authenticity Representation for Partially Deepfake Audio Localization
15866Disentangled Signals, Dynamic Prompts: A Meta-Network Framework for Robust Task-Oriented Dialogue
2387DISENTANGLED STRUCTURE PRIOR PROPAGATION FOR GUIDED DEPTH SUPER-RESOLUTION
3583Disentangling Contextual and Background Signals for Social Diffusion Prediction
13125DISPATCH: DISTILLING SELECTIVE PATCHES FOR SPEECH ENHANCEMENT
12183DISSECTING PERFORMANCE DEGRADATION IN AUDIO SOURCE SEPARATION UNDER SAMPLING FREQUENCY MISMATCH
1967DISSR: DISENTANGLING SPEECH REPRESENTATION FOR DEGRADATION-PRIOR GUIDED CROSS-DOMAIN SPEECH RESTORATION
9766Distillation based Layer Dropping (DLD): Effective end-to-end framework for dynamic speech networks
16212DISTILLED FEW-STEP SAMPLERS FOR BAYESIAN FLOW NETWORKS
4633DISTILLING ATTENTION KNOWLEDGE FOR SPEAKER VERIFICATION
13401DISTILLING SYNERGISTIC KNOWLEDGE FROM A FUSION TEACHER FOR SAR OBJECT DETECTION
9809Distilling Time-series Foundation Models for Efficient Forecasting
6159DISTILMOS: LAYER-WISE SELF-DISTILLATION FOR SELF-SUPERVISED LEARNING MODEL-BASED MOS PREDICTION
11158DISTRACTION-FREE OUTDOOR 3D GAUSSIAN SPLATTING WITH ENHANCED DEPTH PROPAGATION
4217DISTRIBUTED ASSOCIATIVE MEMORY VIA ONLINE CONVEX OPTIMIZATION
11683DISTRIBUTED MULTICHANNEL ACTIVE NOISE CONTROL WITH ASYNCHRONOUS COMMUNICATION
5689DISTRIBUTED OPTIMISATION VIA THE GENERALISED PRIMAL-DUAL METHOD OF MULTIPLIERS UNDER UNRELIABLE AND QUANTISED COMMUNICATION
13770DISTRIBUTIONAL PPO FOR STABLE POLICY GRADIENT OPTIMIZATION
9029DISTRIBUTION-AWARE DATA CURATION FOR SEMANTIC SEGMENTATION VIA MIXTURE OF VMFS
10045DISTRIBUTION-AWARE MOBILITY-ASSISTED DECENTRALIZED FEDERATED LEARNING
4668Distribution-Aware Neural Additive Models: Robust Interpretable Deep Learning with Feature Selection
16647DISTRICACHE: DISTRIBUTED PARALLELISM FOR ACCELERATING DIFFUSION MODELS
10276DITHERED 1-BIT QUANTIZATION AND SPARSE RECONSTRUCTION FOR NEAR-FIELD 3D MILLIMETER-WAVE IMAGING
14863DITSE: HIGH-FIDELITY GENERATIVE SPEECH ENHANCEMENT VIA LATENT DIFFUSION TRANSFORMERS
12024DITSINGER: SCALING SINGING VOICE SYNTHESIS WITH DIFFUSION TRANSFORMER AND IMPLICIT ALIGNMENT
18031DIVERSE AND FEW-STEP AUDIO CAPTIONING VIA FLOW MATCHING
6425DIVERSITY IS ALL YOU NEED: SELF-SUPERVISED HYPERGRAPH LEARNING FOR MITIGATING POPULARITY BIAS IN CONVERSATIONAL RECOMMENDER SYSTEM
4039DJ-NORM: A DECOMPOSITION-BASED JOINT NORMALIZATION FRAMEWORK FOR NON-STATIONARY TIME SERIES FORECASTING
4676DKFMA: A MULTI-AGENT FRAMEWORK FOR DUAL-SOURCE KNOWLEDGE FUSION IN IT INFRASTRUCTURE OPERATIONS AND MAINTENANCE
13599DLCRR: DIFFERENTIAL LEARNING AND CAUSAL REPRESENTATION RESTORATION MODEL FOR EVENT CAUSALITY IDENTIFICATION
4335DLMDC: A Method for Controllable Text Generation
12795D-LoRA: A Dual Low-Rank Adaptation Framework for Cost-Efficient Personalized Federated Learning
5715DMM-JA: A DYNAMIC MULTIMODAL FUSION AND MULTI-SCALE MODELING FRAMEWORK WITH JUMP-AWARENESS FOR INDUSTRIAL EQUIPMENT RUL PREDICTION
1725DMP-TTS: DISENTANGLED MULTIMODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE
5989DMS-GFViT: DYNAMIC MULTI-SCALE VISION TRANSFORMER WITH INFUSED GATED FUSION FOR HANDWRITTEN TEXT RECOGNITION
13653DMTC: A COLLABORATIVE DUAL MMWAVE RADAR SYSTEM FOR SMART SPACES
13155DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection
6665DNLMark: Dual Noise Layer-based Robust Watermarking Against Image Editing
14411DNN-BASED ONLINE SOURCE COUNTING BASED ON SPATIAL GENERALIZED MAGNITUDE SQUARED COHERENCE
15990DNS: DATA-DRIVEN NONLINEAR SMOOTHER FOR COMPLEX MODEL-FREE PROCESS
14007Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
10100DO FOUNDATIONAL AUDIO ENCODERS UNDERSTAND MUSIC STRUCTURE?
1609Do Multi-modal LLMs possess Compositional Zero-Shot Recognition Capabilities?
11921DO SPEECH LLMS LEARN CROSSMODAL EMBEDDING SPACES?
13501Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture
2215DO WE REALLY NEED SELF-ATTENTION FOR STREAMING AUTOMATIC SPEECH RECOGNITION?
1003Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
4282DOA ESTIMATION FOR MM-WAVE FMCW RADAR WITH TWO RECEIVING ANTENNAS
2202DOCKPOSE: UNDERWATER DOCK POSE ESTIMATION USING ADAPTIVE N-GRAM CONTEXT AND RECONSTRUCTION-DRIVEN LEARNING
4103DocLayout: Elevating the Role of Complex Layout Understanding in Document Visual Question Answering
6520DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue
11930DOES HIGH-FREQUENCY MATTER? A REMOTE SENSING CHANGE DETECTION NETWORK WITH HIGH-FREQUENCY SELECTION AND HIGH-ORDER RECURSION
16886DOES THE PRE-TRAINING OF AN EMBEDDING INFLUENCE ITS ENCODING OF AGE?
13882DOMA: LEVERAGING DIFFUSION LANGUAGE MODELS WITH ADAPTIVE PRIOR FOR INTENT CLASSIFICATION AND SLOT FILLING
19128DOMAIN ADAPTATION OF FEW-SHOT BIOACOUSTIC EVENT DETECTION IN DIFFERENT ENVIRONMENTS
9502DOMAIN DISTILLATION WITH TRANSFORMER FOR UNSUPERVISED DOMAIN ADAPTATION
14671Domain Generalization via Distilling from Domain-Deconfused CLIP Features
10402DOMAIN PARTITIONING MEETS PARAMETER-EFFICIENT FINE-TUNING: A NOVEL METHOD FOR IMPROVED LANGUAGE-QUERIED AUDIO SOURCE SEPARATION
6545DOMAIN-ADAPTIVE MODEL MERGING ACROSS DISCONNECTED MODES
17153DOMAIN-AWARE SCHEDULING FOR ASR FINE-TUNING
15555DOMAIN-GENERALIZABLE RELATION-AWARE KNOWLEDGE TRACING FOR COLD-START EDUCATION SYSTEM
9785Domain-Invariant Representation Learning of Bird Sounds
14808Domination Strategies for Free-Riding in Cross-Silo FL-based Caching
7821DOMINO: DOMINANT PATH-BASED COMPENSATION FOR HARDWARE IMPAIRMENTS IN MODERN WIFI SENSING
9898DOPPLER RADIANCE FIELD-GUIDED ANTENNA SELECTION FOR IMPROVED GENERALIZATION IN MULTI-ANTENNA WI-FI-BASED HUMAN ACTIVITY RECOGNITION
3626Doppler-Based Pseudo-Reciprocity in FDD for LEO MU-MIMO
10832DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting
8348DP-DEGAUSS: DYNAMIC PROBABILISTIC GAUSSIAN DECOMPOSITION FOR EGOCENTRIC 4D SCENE RECONSTRUCTION
5846DPFAN: DUAL-PATH FEATURE-ADAPTIVE NETWORK FOR KPI ANOMALY DETECTION
17931DPI: EXPLOITING PARAMETER HETEROGENEITY FOR INTERFERENCE-FREE FINE-TUNING
14093DP-LAC: LIGHTWEIGHT ADAPTIVE CLIPPING FOR DIFFERENTIALLY PRIVATE FEDERATED FINE-TUNING OF LANGUAGE MODELS
15542DPMM-CFL: CLUSTERED FEDERATED LEARNING VIA DIRICHLET PROCESS MIXTURE MODEL NONPARAMETRIC CLUSTERING
10106DPO-REGULARIZED REGRESSION FOR AGE PREDICTION
11821DQUDF: DEFLATING QUADRATIC BEHAVIOR IN UNSIGNED DISTANCE FUNCTIONS FOR HIGH-FIDELITY SURFACE RECONSTRUCTION
17733DR.Roleplay: Role-play LLM with Direct Preference Optimization and Retrieval-Augmented Generation
9488DRAG WITHIN PRIOR DISTRIBUTION: TEXT-CONDITIONED POINT-BASED IMAGE EDITING WITHIN DISTRIBUTION CONSTRAINTS
2721DRAWMARK: DEFEATING REGENERATION ATTACKS BY EMBEDDING WATERMARK INTO PREDICTED NOISE OF DIFFUSION MODELS
15903DREAM: DUAL-PERSPECTIVE REASONING AND ATTRIBUTION-BASED REFINEMENT FOR CONVERSATIONAL QUERY REWRITING
12271DreamFragment: Instance-Aware Text-to-3D Generation for Compositional Multi-Object Scenes with Complex Interactions
3133DREAMVAR: TAMING REINFORCED VISUAL AUTOREGRESSIVE MODEL FOR HIGH-FIDELITY SUBJECT-DRIVEN IMAGE GENERATION
9562DRIVINGSCENE: A MULTI-TASK ONLINE FEED-FORWARD 3D GAUSSIAN SPLATTING METHOD FOR DYNAMIC DRIVING SCENES
15689DR-Mark: Enhancing Printed-Camera Watermarking Robustness via Noise Decomposition and Dichromatic Reflection Model
8210DRMTST: Dual Retention-Enhanced Transformer with Multiscale and Multivariate Mixing for Time Series Forecasting
13776DSA: DIRECTION AND SIGN ALIGNMENT FOR CONTRIBUTION EVALUATION IN FEDERATED LEARNING
3059DSFR-NET: DISTRIBUTION GUIDED NIGHTTIME IMAGE SCATTERING FLARE REMOVAL
14930DSG: DUAL-SEMANTIC GUIDANCE FROM LLM TO TOKEN DISTILLATION FOR FEW-SHOT INCREMENTAL LEARNING
3942DSGBENCH: A DIVERSE STRATEGIC GAME BENCHMARK FOR EVALUATING LLM-BASED AGENTS IN COMPLEX DECISION-MAKING ENVIRONMENTS
3455DSNET: DUAL-STREAM HARMONIZATION NETWORK FOR IMAGE ENHANCEMENT
8782DSPAST: DISENTANGLED REPRESENTATIONS FOR SPATIAL AUDIO REASONING WITH LARGE LANGUAGE MODELS
1662DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
11197DSPFusion: DEPTH AND SEMANTIC PRIOR GUIDED MULTI-FOCUS IMAGE FUSION WITH VISION FOUNDATION MODELS
4562DSRMS-TransUNet: A Decentralized Non-Shifted TransUNet for Shallow Water Acoustic Source Range Estimation
9444DSR-REC: ENHANCING GENERATIVE RECOMMENDATION THROUGH DYNAMIC EXPERT SELECTION AND SEMANTIC ID REDIRECTION
17701DSSR: DECOUPLING SALIENT AND SUBTLE REPRESENTATIONS UNDER MISSING MODALITIES FOR MULTIMODAL EMOTION RECOGNITION
16545DSV-CTGS: Dynamic Sparse-view CT Reconstruction based on Gaussian Splatting and Prior Transfer
10605DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation
2609DSWP: A DUAL-STAGE WATERMARKING PARTITIONING FRAMEWORK FOR EFFICIENT AND ROBUST MULTI-BIT WATERMARKING IN LARGE LANGUAGE MODELS
1648DTA-PDVC: DYNAMIC TEMPORAL ANCHOR BOXES FOR PARALLEL DENSE VIDEO CAPTIONING
5072DTOPAGENT: A MULTI-AGENT FRAMEWORK FOR DYNAMIC TOP-K CHUNK RETRIEVAL IN RAG PIPELINE
4641DTPE: DOCUMENT TREE PARSING FOR EFFICIENT DOCUMENT-LEVEL RELATION EXTRACTION WITH LLM-BASED DATA REFINEMEN
3078DTR4CAT: Dual-Threshold Retrieval with Ability Gap Upper Bound for Computerized Adaptive Testing
8271DTST: Dual-Transformer for Multivariate Time Series Forecasting
12754DUAL CONTRASTIVE DOCUMENT CLUSTERING WITH MULTI-REPRESENTATION
6974Dual Contrastive Learning for Semi-supervised Domain Adaptation in Bi-modal Depression Recognition
3467DUAL CORRELATION ADAPTIVE HIERARCHICAL SPATIO-TEMPORAL TRANSFORMER FOR STOCK PRICE FORECASTING
12167Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting
6487DUALAST: AST-GUIDED EXEMPLAR RETRIEVAL FOR IN-CONTEXT LEARNING IN MULTI-STEP REASONING
14138DUAL-BRANCH FEATURE-FUSED AND MULTI-SEMANTIC ALIGNED HASHING FOR SUPERVISED CROSS-MODAL RETRIEVAL
1270DUAL-BRANCH SPATIAL-LIGHTING NETWORK FOR PHOTOMETRIC STEREO
4704DUAL-BRANCH SPIRAL INTERSECT NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS
6049DUAL-CHANNEL PERSONALIZED FEDERATED BUNDLE RECOMMENDATION
17667Dual-Criterion Sample Selection for Noisy Labels: Integrating Neighborhood Prediction Divergence and Loss Values
11786DUAL-DOMAIN 3D MESH WATERMARKING WITH ADAPTIVE VERTEX GROUPING
11306DUAL-DOMAIN FEATURE MODULATION FOR LIGHTWEIGHT IMAGE SUPER-RESOLUTION
5898DUAL-DRIVE: A HIERARCHICAL FUSION FRAMEWORK FOR DUAL-MODEL SAFETY-ENHANCED AUTONOMOUS DRIVING
12960DUALEXPERTNET: DISPARITY-AWARE SEMANTIC-DETAIL COMPLEMENTARITY FOR CAMOUFLAGED OBJECT DETECTION
9532Dual-Geometry Prior Frequency Nonlinear Graph Convolutional Network For Human Action Recognition
3804DUAL-GRAINED ROUTING GUIDED MULTI-LORA EXPERTS FOR MULTILINGUAL LOW-RESOURCE SPEECH RECOGNITION
14129Dual-Graph: Protocol Interaction-aware Flow Representation for Accurate Unidirectional Encrypted Traffic Classification
16872DUALGUARD: TWO-STAGE ALIGNMENT PRESERVATION FOR SAFE PEFT
11825DUAL-GUIDED GENERATIVE FRAME INTERPOLATION
14854DUAL-MODEL INFORMATION-BASED CSI RECONSTRUCTION IN HYBRID BEAMFORMING MIMO-OFDM SYSTEMS
9628DUAL-PATH COMPRESSION FOR REAL-TIME MULTIMODAL CLICKBAIT DETECTION: QUANTIZATION AND DISTILLATION
5908DUAL-PATH JND: A NEW FRAMEWORK FOR ROBUST AND IMPERCEPTIBLE IMAGE WATERMARKING
7214DUAL-PERSPECTIVE MULTIMODAL SENTIMENT ANALYSIS WITH MOE FUSION: REPRESENTATION LEARNING VIA SEMANTIC RESONANCE AND DIVERGENCE
10649DUAL-REGULARIZED ITERATIVE ADAPTIVE APPROACH FOR DOA SPECTRUM RECONSTRUCTION IN LIMITED ANGLE SECTOR
14415DUAL-SPACE KNOWLEDGE DISTILLATION WITH KEY-QUERY MATCHING FOR LARGE LANGUAGE MODELS WITH VOCABULARY MISMATCH
16065DualSteg: High-Capacity Provably Secure Text Steganography in Asymmetric Resource Scenario via Dual-Scale LLMs
17579Dual-Strategy-Enhanced ConBiMamba for Neural Speaker Diarization
3976Dual-Stream Feature Fusion for Spoofing Detection under Aliased Interference in UAV Communications
7922DUOTRACKER: CONFIDENCE-ROUTED EYE TRACKING FOR DIGITAL BIOMARKERS IN CLINICAL SCREENING
15256DUST STORM ANOMALY DETECTION ON MARS WITH EVENT CAMERA
14153DVT-AD: DISCRIMINATIVE VISION TRANSFORMERS FOR SCALABLE UNSUPERVISED ANOMALY DETECTION VIA SIMPLE SELF-DISTILLATION
17411DWC-PO: Dynamic Weight Constraints for Model-Based Policy Optimization via Wasserstein Policy Improvement Bounds
18974dYIN AND dSWIPE: DIFFERENTIABLE VARIANTS OF CLASSICAL FUNDAMENTAL FREQUENCY ESTIMATORS
12333DyLUT-UIE: A Dynamic Lookup Table Paradigm for Efficient Underwater Image Enhancement
2123Dynabits: Token Aware Weight-Activation Quantization for Large Vision–Language Models
5318DYNAMIC ADAPTIVE WAVELET STATE SPACE MODEL FOR EFFICIENT LOW-LIGHT IMAGE ENHANCEMENT
12587DYNAMIC ATTENTION-AWARE SHAPING FOR OUT-OF-DISTRIBUTION DETECTION
5174Dynamic Automaton Refinement and Planning for Non-Markovian RL
4066Dynamic Balanced Cross-modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis
12898DYNAMIC BASIS GENERATION AND MULTI-SCALE GAUSSIAN RESPONSE FUSION FOR ROBUST POINT CLOUD REGISTRATION
18896DYNAMIC BIT-PLANE ARITHMETIC CODING METHOD FOR QUANTIZED SPECTRAL COEFFICIENTS IN USAC
5770DYNAMIC ESTIMATION LOSS CONTROL IN VARIATIONAL QUANTUM SENSING VIA ONLINE CONFORMAL INFERENCE
12316DYNAMIC EXPLAINABLE RECOMMENDATION WITH MULTI-FEATURE AND PERSONALIZED TEST-TIME INFERENCE
14888Dynamic Feature Selection on Variable Feature Sets Using Features of Features
2216Dynamic Frequency Domain Curriculum Learning: A Novel Framework for Adaptive Image Forgery Detection
17250Dynamic Fusion for Large Language Models Compression
7470DYNAMIC GATING FUSION AND MULTIMODAL CONTRASTIVE LEARNING FOR GRAPH-BASED DISEASE DIAGNOSIS
12429DYNAMIC INTRA-INTER PARTITION LEARNING FOR BUILDING RECONSTRUCTION FROM POINT CLOUDS
1185DYNAMIC KALMAN FUSION FOR ROBUST CONTINUOUS SIGN LANGUAGE RECOGNITION
6510DYNAMIC LANGUAGE ADAPTATION AND COLLABORATIVE MEMORY MODELING FOR VISION-LANGUAGE TRACKING
17297DYNAMIC MULTI-EXPERT PROJECTORS WITH STABILIZED ROUTING FOR MULTILINGUAL SPEECH RECOGNITION
15685DYNAMIC MULTI-PATH LEARNING FOR OUT-OF-DISTRIBUTION NODE CLASSIFICATION ON HETEROPHILIC GRAPH
9787DYNAMIC MULTI-REWARD OPTIMIZATION FOR MULTI-ROUND PREFERENCE-ALIGNED DIFFUSION
11567DYNAMIC NOISE-AWARE MULTI LORA FRAMEWORK TOWARDS REAL-WORLD AUDIO DEEPFAKE DETECTION
8194DYNAMIC PROTOTYPE REFINEMENT FOR OUT-OF-DISTRIBUTION DETECTION: BALANCING COMPACTNESS AND DIVERSITY
10625Dynamic Self-Distillation Former for Weakly Supervised Semantic Segmentation
5410Dynamic Semantic Path Routing with Learnable Priors for Image Captioning
5813DYNAMIC SEQUENCING AND GNN-BASED POSTED-PRICE DESIGN FOR COMBINATORIAL AUCTIONS
10444DYNAMIC SPECTROGRAM ANALYSIS WITH LOCAL-AWARE GRAPH NETWORKS FOR AUDIO ANTI-SPOOFING
14291Dynamic Spike-and-Slab Particle Filtering for Topology Tracking
7055DYNAMIC STATE SPACE MODELS FOR CROSS-MODALITY FUSION
1961Dynamic Summary Generation for Interpretable Multimodal Depression Detection
11134DYNAMICAL ISOMETRY BASED RIGOROUS FAIR NEURAL ARCHITECTURE SEARCH
7788Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training
14954Dynamic-M With Dual-Stage Sparsity and Cross-Scale Structural Coherence for Generalized Industrial Anomaly Detection
16239DYNAPREDICT: ALTERNATING PREDICTIVE AND REAL ITERATION FOR EFFICIENT DEEP REINFORCEMENT LEARNING TRAINING
2796DYNDETECT: DYNAMIC ROUTING FOR ROBUST MULTI-MODAL MEDIA MANIPULATION DETECTION AND GROUNDING
17339DyPANet: Efficient Event-driven Eye Tracking via Dynamic Path Adaptation and ROI Filtering
6255DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
5855E2E-AEC: IMPLEMENTING AN END-TO-END NEURAL NETWORK LEARNING APPROACH FOR ACOUSTIC ECHO CANCELLATION
6161Early Prediction Method for Learners at Risk Based on Multi-source Feature Fusion
17818EASY TURN: INTEGRATING ACOUSTIC AND LINGUISTIC MODALITIES FOR ROBUST TURN-TAKING IN FULL-DUPLEX SPOKEN DIALOGUE SYSTEMS
11270EATS2: Enabling Efficient and Accurate Trajectory Similarity Computation via Self-Training
16951EBAD-GS: Deblurring Gaussian Splatting with Event-driven Bundle Adjustment
14558EBCF: Strict Error-Bounded Compression of Numerical Climate Data with Discrete Normalizing Flows
16928EBEVTRACK: ESTIMATED BIRD’S-EYE VIEW FOR MULTI-OBJECT TRACKING
5955ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
8921ECHOFAKE: A REPLAY-AWARE DATASET FOR PRACTICAL SPEECH DEEPFAKE DETECTION
4872ECHORAG: A TWO-STAGE FRAMEWORK FOR AUDIO-TEXT RETRIEVAL AND TEMPORAL GROUNDING
11283ECHO-TRAFFIC: CROSS-MODAL FEATURE AUGMENTATION FOR TRAFFIC TRANSFORMER PRE-TRAINING
16530ECM: Enhancing Compressibility of Quantized Vision Encoder and LLM for Large Vision-Language Models
7910ECONOMICALLY CONSTRAINED CYCLE-CONSISTENT GENERATIVE NETWORKS FOR RISK-NEUTRAL DENSITY ESTIMATION: CRISIS-ROBUST PRICING AND HEDGING
12590ECSA: DUAL-BRANCH EMOTION COMPENSATION FOR EMOTION-CONSISTENT SPEAKER ANONYMIZATION
15058EDB-NET: ENTROPY DUAL-BRANCH NETWORK FOR FEW-SHOT TEXT CLASSIFICATION
5149EDGE COLLABORATIVE GAUSSIAN SPLATTING WITH INTEGRATED RENDERING AND COMMUNICATION
5485EDGE-AWARE SCALE PREDICTION FOR 3D GAUSSIAN SPLATTING
10834EDGEPOSE: SELECTIVE AND ADAPTIVE DIFFUSION FILTERING FOR REAL-TIME HUMAN POSE ESTIMATION ON EDGE DEVICES
13998EDGESPOT: EFFICIENT AND HIGH-PERFORMANCE FEW-SHOT MODEL FOR KEYWORD SPOTTING
1954EDITMEM: ENHANCING MULTI-HOP FACT VERIFICATION VIA EDITABLE MEMORY
3198EDITS: ENHANCING DATASET DISTILLATION WITH IMPLICIT TEXTUAL SEMANTICS
4850EDN-Gaussian: Edge-Directed Densification with Covariance Narrowing for Blur-Robust 3D Gaussian Splatting
17122EDPOTRANS: ENHANCED DIRECT PREFERENCE OPTIMIZATION FOR MACHINE TRANSLATION BETWEEN LOW-RESOURCE LANGUAGE AND CHINESE WITH LIMITED MONOLINGUAL DATA
9658EduGesture: A Dataset of Teachers' Hand Gestures toward Pedagogical Intentions
5676EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection using Self-Attention Attractors
8563EFFECT OF PROPAGATION DELAYS ON CELL-FREE MASSIVE MIMO SYSTEMS
12295Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
1994EFFICIENT AND GLOBAL INTERACTION-AWARE RETRAINING-FREE TOKEN PRUNING FOR VISION TRANSFORMERS
6298EFFICIENT AND SCALABLE TOBIT GAUSSIAN PROCESS REGRESSION FOR MODELING AIR QUALITY DATA
2066Efficient Audio-Visual Inference via Token Clustering and Modality Fusion
11977EFFICIENT CATEGORY-LEVEL 6D POSE ESTIMATION VIA POSE-AWARE FEATURE LEARNING
18952Efficient CNNs via Passive Filter Pruning
11666Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming
15079Efficient Distillation of Large Language Models using Group Relative Policy Distillation
15710EFFICIENT EXPOSURE FUSION VIA FINE-TUNING A LOW-LIGHT ENHANCEMENT MODEL
13544EFFICIENT FEW-SHOT LEARNING FOR EDGE AI VIA KNOWLEDGE DISTILLATION ON MOBILEVIT
14084Efficient Gaussian Process Learning via Subspace Projections
2214EFFICIENT MOIRÉ ARTIFACT REMOVAL IN RAW AND SRGB DOMAINS VIA SPIKING NEURAL NETWORKS
14796Efficient Multi-LoRA Deployment via Shared KV-Cache with Task-Adaptive Tokens
1375EFFICIENT OFFLINE REINFORCEMENT LEARNING WITH PROGRESSIVE HEURISTIC BLENDING IN COMPLEX ENVIRONMENTS
4409EFFICIENT ONLINE PEER ADAPTATION IN MULTI-AGENT COMPETITION AND COOPERATION VIA VISION LANGUAGE MODEL
7918Efficient Plug-and-Play Method for Dynamic Imaging via Kalman Smoothing
17670EFFICIENT PROGRESSIVE TRAINING FRAMEWORK FOR IDENTITY-CONSISTENT FACE SWAPPING
9498Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization
12719EFFICIENT RECONSTRUCTION OF TEXTURELESS OBJECTS VIA QUALITY-AWARE AND DEPTH-ENHANCED GAUSSIAN SPLATTING
13799Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data
7719EFFICIENT SELF-SUPERVISED LEARNING FOR REMOTE SENSING VIA SPARSE CONVOLUTIONAL MIXTURE-OF-EXPERTS
8137Efficient Subset Selection-based Algorithms for Factorizing Low-Rank Matrices with Application to Robust PCA
11055Efficient Synthetic Data Selection via Pontryagin's Maximum Principle
10099EFFICIENT TRANSFORMER AND INTERLEAVED CONTEXT CLUSTER FOR FAST POINT CLOUD REGISTRATION
7680Efficient Uncertainty Quantification for Full Waveform Inversion via Shot-Encoded Hessian
4977EFFICIENT VISUO-TACTILE LEARNING VIA FINE-GRAINED ALIGNMENT AND IMPORTANCE-AWARE TOKEN RETENTION
9908EFFICIENT WIDEBAND SPARSE ARRAYS FOR HIGH-RESOLUTION DOA ESTIMATION
15760EFFICIENT3D-AD: TOKEN-EFFICIENT AND VIEW-AWARE ZERO-SHOT 3D MULTIMODAL ANOMALY DETECTION
12285EG-GCN: Enthalpy-Guided graph convolutional networks
17826EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction
2601EGMR-YOLO: ADVANCED TUBERCULOSIS DIAGNOSIS VIA A REFINED YOLO MODEL
4109EGOGEN: EGOCENTRIC INTERACTION VIDEO GENERATION WITH 3D HAND STRUCTURE CONSTRAINTS
3345EGOPRESSDIFF: MULTIMODAL VIDEO DIFFUSION FOR EGOCENTRIC UV-DOMAIN HAND-PRESSURE ESTIMATION
2840EHDN: AN ENHANCED HOMOGRAPHY DECOMPOSITION NETWORK FOR ROBUST PLANAR OBJECT TRACKING
10946EICA: An Emotional Inertia-Contagion-Aware Alignment for Emotion Recognition in Conversations
17794EIVF: EFFICIENT IVFPQ SEARCH FOR ON-DEVICE ARM PROCESSORS
19144Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation
15275Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling
10992EMISSIVE-GS: RELIGHTABLE RECONSTRUCTION AND EMISSION EDITING VIA GAUSSIAN SPLATTING
5420EM-MAMP: LOW-COMPLEXITY SIGNAL RECOVERY WITH PARAMETER LEARNING
12889EMODIFFUSION: MODELING EMOTION EVOLUTION WITH DIFFUSION FOR DIVERSE AND COHERENT DIALOGUE GENERATION
13421EMODRIVE: AN EMOTION-AWARE VISION-LANGUAGE MODEL FOR HUMAN-CENTRIC AUTONOMOUS DRIVING
10280EMOE: EIGENBASIS-GUIDED ROUTING FOR MIXTURE-OF-EXPERTS
17812EMORL-TTS: REINFORCEMENT LEARNING FOR FINE-GRAINED EMOTION CONTROL IN LLM-BASED TTS
10023EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
6751EMOTION AND ACOUSTICS SHOULD AGREE: CROSS-LEVEL INCONSISTENCY ANALYSIS FOR AUDIO DEEPFAKE DETECTION
18894Emotion Classification with Visibility Graphs
11060EMOTIONAL DAMAGE: INVESTIGATING SAFETY VULNERABILITIES OF LARGE AUDIO-LANGUAGE MODELS UNDER SPEAKER EMOTIONAL VARIATIONS
5265EMOTIONAL DIMENSION CONTROL IN LANGUAGE MODEL-BASED TEXT-TO-SPEECH: SPANNING A BROAD SPECTRUM OF HUMAN EMOTIONS
6251EMOTION-ALIGNED GENERATION IN DIFFUSION TEXT TO SPEECH MODELS VIA PREFERENCE-GUIDED OPTIMIZATION
11085Emotion-Aware Learning with Class-Balanced Optimization for Dynamic Facial Expression Recognition
9678Emotri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue
9759EMO-TTA: IMPROVING TEST-TIME ADAPTATION OF AUDIO-LANGUAGE MODELS FOR SPEECH EMOTION RECOGNITION
10101EMPEROR: EFFICIENT MOMENT-PRESERVING REPRESENTATION OF DISTRIBUTIONS
5692EMPIRICAL ANALYSIS OF APPROXIMATE MESSAGE PASSING UNDER NON-I.I.D. MEASUREMENTS WITH COMPARISON TO STATE EVOLUTION
1860EMPOWERING ECONOMIC SIMULATION THROUGH SITUATION-AWARE LLM-DRIVEN GENERATIVE SYSTEM
4546EMPOWERING MULTIMODAL RESPIRATORY SOUND CLASSIFICATION WITH COUNTERFACTUAL ADVERSARIAL DEBIASING FOR OUT-OF-DISTRIBUTION ROBUSTNESS
12468EMPOWERING THE TAIL: ADAPTIVE SEMANTIC NEIGHBORHOOD ENHANCEMENT FOR LONG-TAIL REASONING IN TEMPORAL KNOWLEDGE GRAPHS
1432EMPOWERING TRANSFORMERS SPECTRALLY: TOWARDS COMPREHENSIVE PATTERN LEARNING FOR IMAGE DEMOIRÉING
10152EMS-Mixer: Extreme Multi-Scale Mixing for Time Series Forecasting
11048EMU: EMOTION UNDERSTANDING IN THE WILD - A NATURALISTIC MULTIMODAL DATASET AND BENCHMARK
11552ENABLING EFFICIENT AND ACCURATE PRIVACY-PRESERVING IMAGE-TEXT RETRIEVAL IN PUBLIC CLOUD
12958Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers
3755ENABLING ON-DEVICE LIFE-THREATENING ARRHYTHMIA DETECTION VIA PERSONALIZED ADAPTIVE INFERENCE FOR IMPLANTABLE DEVICES
5530Encoder-Decoder Symmetric Nonnegative Matrix Tri-Factorization for Graph Clustering
10742ENCODING EMOTION THROUGH SELF-SUPERVISED EYE MOVEMENT RECONSTRUCTION
14732ENCORE: ENTROPY-GUIDED CROPPING AND ATTENTION REGULARIZATION FOR ROBUST VISION–LANGUAGE UNDERSTANDING
17447END-END-EDGE COLLABORATIVE FRAMEWORK FOR ADAPTIVE CONTENT-AWARE VIDEO ANALYTICS
9625End-fire Target Bearing Estimation in Passive SONAR Employing End-to-End Deep Neural Networks with Focal Angular Loss
5195END-TO-END EFFICIENT DENOISING FOR RADAR MICRO-DOPPLER SPECTROGRAMS USING FOURIER KOLMOGOROV-ARNOLD NETWORK
13580END-TO-END INDOOR LOCALIZATION FOR BLUETOOTH 5 BASED ON A DUAL-BRANCH NETWORK
8506END-TO-END SPEAKER VERIFICATION WITH UNCERTAINTY-AWARE EVIDENTIAL SCORING
16017END-TO-END STORY VISUALIZATION FRAMEWORK WITH PENALTY-BASED EVALUATION USING VISION-LANGUAGE MODELS
14493ENERGY PROFILING OF VIDEO PLAYBACK
13914ENERGY-AWARE IMAGES VIA PIXEL VALUE REDUCTION: THE IMPACT OF COMPRESSION ON ATTENUATION MAPS.
11246ENHANCE BALANCE BETWEEN GENERALIZATION AND PERSONALIZATION FOR VISION-LANGUAGE MODELS IN FEDERATED LEARNING
10794Enhance Deformation-Tolerant Unsupervised Infrared and Visible Image Fusion via Hybrid Feature Representation Learning
15777ENHANCE MESSAGE PASSING WITH CLUSTER-AWARE VIRTUAL NODES FOR SEMI-SUPERVISED NODE CLASSIFICATION
10645ENHANCED CROSS-MEDIUM COMMUNICATION USING MULTI-SENSOR FUSION AND KALMAN FILTERING
5048ENHANCED GENERATIVE MACHINE LISTENER
6394ENHANCED GRAPH NEURAL NETWORKS USING K-HOP GAUSSIAN DIFFUSION
16745ENHANCED GRAPH TRANSFORMER WITH SERIALIZED GRAPH TOKENS
11716Enhanced Time-Frequency Representation of Nonstationary Signals via Cubic Polynomial Phase Synchroextracting Transform
17809Enhanced Video Compression with Context-Aware Dynamic Neural Adapter
3602ENHANCED VOLUMETRIC VIDEO STREAMING THROUGH ANCHOR-BASED VIEWPORT PREDICTION
10982Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation
15740ENHANCING ADVERSARIAL TRANSFERABILITY WITH INTEGRATED TIME-FREQUENCY MOMENTUM ITERATIVE ATTACK
16354ENHANCING AUDIO QUESTION-ANSWERING PERFORMANCE THROUGH LOG-LIKELIHOOD GUIDED REWARD FUNCTIONS
3202ENHANCING AUTOMATIC DRUM TRANSCRIPTION WITH ONLINE DYNAMIC FEW-SHOT LEARNING
17210ENHANCING CLIP-BASED WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION VIA OPTIMAL TRANSPORT
1633ENHANCING CROSS-VIEW GEO-LOCALIZATION GENERALIZATION VIA GLOBAL-LOCAL CONSISTENCY AND GEOMETRIC EQUIVARIANCE
11627Enhancing Debate Dialogue Generation via Dual-Dimensional Reflection and Refinement
14880ENHANCING DIALOGUE-RELATED SPEECH TASKS WITH GENERATED SPOKEN DIALOGUES
17034ENHANCING DOCUMENT-LEVEL MACHINE TRANSLATION VIA FILTERED SYNTHETIC CORPORA AND TWO-STAGE LLM ADAPTATION
3737Enhancing domain generation through pluggable Style Randomization
14095ENHANCING DOPPLER AND FMCW RADARS VIA UNLIMITED SENSING
14971ENHANCING FAKE NEWS DETECTION WITH LLM-GENERATED MULTI-DIMENSIONAL EXPLANATIONS AND MULTI-CHANNEL FUSION
1606ENHANCING GRAPH-BASED RETRIEVAL-AUGMENTED GENERATION VIA QUERY-AWARE PATH REASONING
17520Enhancing Guidance for Missing Data in Diffusion-Based Sequential Recommendation
16322ENHANCING INTER-LEAD CORRELATIONS: A NOVEL DIFFUSION GAN FRAMEWORK FOR 12-LEAD ECG GENERATION
4057Enhancing Knowledge Base Question Answering with Reinforced Hop-wise Logical Form Generation
9736Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
17869Enhancing Low-Resource Document-Level Relation Extraction with Coarse-to-Fine Prediction
18890ENHANCING LOW-RESOURCE SPEECH RECOGNITION WITH NON-LINEAR CROSS-LINGUAL MAPPINGS
13032ENHANCING MULTILINGUAL LLM-BASED ASR WITH MIXTURE OF EXPERTS AND DYNAMIC DOWNSAMPLING
11893Enhancing Multivariate Time Series Forecasting from a Temporal Decoupling Perspective
3146ENHANCING NOISE ROBUSTNESS FOR NEURAL SPEECH CODECS THROUGH RESOURCE-EFFICIENT PROGRESSIVE QUANTIZATION PERTURBATION SIMULATION
2818ENHANCING ONLINE RL FINE-TUNING VIA ADAPTIVE Q-FUNCTION SELECTION
16332ENHANCING PERSONALIZED FEDERATED CONTINUAL LEARNING WITH CLIENT-SPECIFIC SEMANTIC KNOWLEDGE
14611ENHANCING POST-TRAINING QUANTIZATION VIA FUTURE ACTIVATION AWARENESS
16444Enhancing QAOA Ansatz via Multi-Parameterized Layer and Blockwise Optimization
1960Enhancing Quantization for Visual AutoRegressive Generation via Uncertainty Identification
6908ENHANCING REFERRING EXPRESSION COMPREHENSION WITH PIXEL-WORD CORRELATION AND CROSS-LAYER REGULARIZATION
8274ENHANCING RISK AWARENESS IN LLM AGENTS VIA PROBING SAFETY BOUNDARIES
11574Enhancing Social Emotion Prediction with Persona-driven Comment Generation and Graph-based Information Fusion
10598ENHANCING SPATIAL RELATIONSHIPS IN TEXT-TO-IMAGE GENERATION WITH STRUCTURED INFORMATION
18231Enhancing Spatio-Temporal Forecasting with Spatial Neighbourhood Fusion: A Case Study on Mobility in Peru
10808ENHANCING SPEAKER VERIFICATION WITH LAYER-WISE MIXTURE-OF-EXPERTS ON PRE-TRAINED MODELS
7982ENHANCING SPEAKER VERIFICATION WITH W2V-BERT 2.0 AND KNOWLEDGE DISTILLATION GUIDED STRUCTURED PRUNING
16386ENHANCING SPEECH INTELLIGIBILITY PREDICTION FOR HEARING AIDS WITH COMPLEMENTARY SPEECH FOUNDATION MODEL REPRESENTATIONS
12539ENHANCING STABILITY AND REPRODUCIBILITY OF GRAPH INFORMATION BOTTLENECK FOR MENTAL DISORDER DIAGNOSIS
13165ENHANCING UAV CLASSIFICATION VIA SPHERICAL HARMONIC TRANSFORM AND VIRTUAL MULTI-CHANNEL DENOISING
6168ENHANCING VALUE ALIGNMENT OF LLMS WITH MULTI-AGENT SYSTEM AND COMBINATORIAL FUSION
4881ENHF-YOLO: ENHANCED HIGH-FREQUENCY DOMAIN FEATURE EXTRACTION OF SMALL TARGETS IN REMOTE SENSING
1577ENRICH VISUAL FEATURES BY HOLISTIC SAMPLING AND HIERARCHICAL CONDENSING IN MULTIMODAL LARGE LANGUAGE MODELS
17547Enriching Tail Manifolds via Feature Synthesis and Margin Optimization for Long-Tailed Remote Sensing Recognition
14247ENSEMBLE FOR REDUCING TARGET SPEECH EXTRACTION ERRORS
15440Ensuring Reliable Participation in Subjective Video Quality Tests Across Platforms
12358ENTITY ALIGNMENT AND STRUCTURAL PERTURBATION FOR COMMONSENSE KNOWLEDGE GRAPH REASONING
15128ENTROCUT: ENTROPY-GUIDED ADAPTIVE TRUNCATION FOR EFFICIENT CHAIN-OF-THOUGHT REASONING IN SMALL-SCALE LARGE REASONING MODELS
17576ENTROLLM: ENTROPY ENCODED WEIGHT COMPRESSION FOR EFFICIENT LARGE LANGUAGE MODEL INFERENCE ON EDGE DEVICES
10806EntroLog: An Adaptive and Self-Improving Framework for Efficient Log Analysis
4970Entropy-Aware Multimodal Preference Optimization for Factuality Alignment in Medical Visual Question Answering
9424ENTROPYGS: AN EFFICIENT ENTROPY CODING ON 3D GAUSSIAN SPLATTING
6633ENTROPY-GUIDED DATA-EFFICIENT TRAINING FOR MULTIMODAL REASONING REWARD MODELS
15484Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
13690Environment-Aware MIMO Channel Estimation in Pilot-Constrained Upper Mid-Band Systems
14205EOSIGN: EDGE-EFFICIENT ONE-SHOT ISL VIDEO SYNTHESIS FROM CODE-MIXED SPEECH WITH SIGNER CONSISTENCY AND TEMPORAL STABILITY
17811EPED: A NOVEL REINFORCEMENT LEARNING-DRIVEN FRAMEWORK FOR EARLY PHISHING SCAMS DETECTION IN ETHEREUM
15333EPO: Enhanced Preference Optimization with Multi-Response Data for LLMs via Stochastic Softmax
14946Equipping Large Language Model with Directional Speech Understanding Capabilities
18892EQUIRIPPLE MIMO BEAMPATTERN SYNTHESIS USING CHEBYSHEV APPROXIMATION
3559Equivariant Deep Equilibrium Models for Imaging Inverse Problems
16609Equivariant Hamiltonian Graph Neural Networks for Generalizing Dynamics of Magnetic Pendulum System
18207ERASING YOUR VOICE BEFORE IT’S HEARD: TRAINING-FREE SPEAKER UNLEARNING FOR ZERO-SHOT TEXT-TO-SPEECH
9037ERE-LLM: Entity-Relation Extraction With Large Language Model in Professional Domains
6094ERFORMER: EVENT-RGB FUSION TRANSFORMER WITH ADAPTIVE BRIGHTNESS CONTROL FOR LOW-LIGHT IMAGE ENHANCEMENT
12998Erosion Attack for Adversarial Training to Enhance Semantic Segmentation Robustness
5274E-RRC: ENHANCED RANGE RESTRICTION CLIPPING FOR ROBUST VISION TRANSFORMERS ON EDGE DEVICES
15429Error Bound Based Exact Penalization for Cardinality-Constrained Clustering
2621ES4D-Net: Foreground-aided 3D Object Detection Based on Extremely Sparse 4D Radar Point Cloud
6064ESINET: ENHANCING STRUCTURAL INTEGRITY IN SCRIBBLE-SUPERVISED CAMOUFLAGED OBJECT DETECTION
4132E-SocialNav: Efficient Socially Compliant Navigation with Language Models
13899ESTIMATING HAND-RELATED FEATURES FROM SPEECH USING MACHINE LEARNING
17187ESTIMATING RESPIRATORY EFFORT FROM NOCTURNAL BREATHING SOUNDS FOR OBSTRUCTIVE SLEEP APNOEA SCREENING
5244ESTIMATION OF THE HURST EXPONENT OF NOISY OR BLURRED FRACTAL TEXTURES. APPLICATION TO COMPUTER-AIDED MAMMOGRAM ANALYSIS.
4017Etude: Piano Cover Generation with a Three-Stage Approach --- Extract, strucTUralize, and DEcode
15869EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding
16823EVA: ENHANCING ANIME VIDEO GENERATION VIA REINFORCEMENT LEARNING
7517EVALUATING BIAS IN SPOKEN DIALOGUE LLMS FOR REAL-WORLD DECISIONS AND RECOMMENDATIONS
14235EVALUATING COMPOSITIONAL STRUCTURE IN AUDIO REPRESENTATIONS
14573EVALUATING DISENTANGLED REPRESENTATIONS FOR CONTROLLABLE MUSIC GENERATION
14157EVALUATING EMOTION RECOGNITION IN SPOKEN LANGUAGE MODELS ON EMOTIONALLY INCONGRUENT SPEECH
10161EVALUATING HIGH-RESOLUTION PIANO SUSTAIN PEDAL DEPTH ESTIMATION WITH MUSICALLY INFORMED METRICS
11896Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets
10233EVALUATING TEST-TIME ADAPTATION FOR FACIAL EXPRESSION RECOGNITION UNDER NATURAL CROSS-DATASET DISTRIBUTION SHIFTS
14057EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation
6696EVENT CAMERA DEPTH ESTIMATION FROM EPIPOLAR PLANE IMAGES
14427Event classification by physics-informed inpainting for distributed multichannel acoustic sensor with partially degraded channels
13538EVENT-AIDED SEMANTIC SCENE COMPLETION
17722Event-driven Neuromorphic Near-Field Radar Imaging
16108EVOALPHA: AN LLM-ENHANCED EVOLUTIONARY FRAMEWORK FOR FORMULAIC ALPHA MINING
13609EVOLVING AASIST: TOWARDS SCALABLE AND GENERALIZABLE ANTI-SPOOFING MODELS
4192EXPECTATION PROPAGATION DETECTOR EXPLOITING OVERLAPPING BLOCK STRUCTURES FOR HIGHLY CORRELATED MIMO SYSTEMS
13791EXPERIENCE-DRIVEN DYNAMIC EXITS FOR LLMS WITH REINFORCEMENT LEARNING
19040EXPLAINABLE DEEP LEARNING ANALYSIS FOR RAGA IDENTIFICATION IN INDIAN ART MUSIC
17660EXPLAINABLE DEEPFAKE DETECTION WITH RL ENHANCED SELF-BLENDED IMAGES
19028EXPLAINABLE DNN-BASED BEAMFORMER WITH POSTFILTER
10038Explaining Face Verification Decisions with Pairwise Facial Feature Explanation
15530EXPLICIT TIME-FREQUENCY DYNAMICS FOR SKELETON-BASED GAIT RECOGNITION
4533EXPLOITING BACKDOOR TRIGGER TOWARDS UNLEARNABLE EXAMPLES
16026Exploiting Latent and Implicit chain of thought for Efficient multi-hop question answering
2084EXPLOITING SCATTERS FOR SENSING SECURITY IN ISAC SYSTEMS
16929EXPLOITING SPARSE-TEMPORAL DYNAMICS VIA RELEVANCE NETWORKS FOR UAV TRACKING
13425EXPLOITING THE PROPERTIES OF AN ADAPTIVE BLACK-BOX ATTACK AGAINST FEDERATED LEARNING
16930EXPLORATION BEYOND BUDGET: TRAINING LARGE LANGUAGE MODELS TO EXPLORE UNDER TRUNCATION CONSTRAINTS
3894Exploring Audio Hallucination in Egocentric Video Understanding
9210Exploring Confidence as a Reward to Advance LLMs Reasoning
16536EXPLORING FINE-TUNING OF LARGE AUDIO LANGUAGE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING UNDER LIMITED SPEECH DATA
11467Exploring How Audio Effects Alter Emotion with Foundation Models
5883EXPLORING RESOLUTION-WISE SHARED ATTENTION IN HYBRID MAMBA-U-NETS FOR IMPROVED CROSS-CORPUS SPEECH ENHANCEMENT
13572EXPLORING SSL DISCRETE TOKENS FOR MULTILINGUAL AUTOMATIC SPEECH RECOGNITION
17232Exploring the Existence of Over-Squashing in Directed Networks
10828Exploring Unlabeled Data for Vision-Language Models Beyond Greedy Hard Pseudo-labels
10546EXPRESSIVE VOICE CONVERSION WITH CONTROLLABLE EMOTIONAL INTENSITY
3346Exterior sound field estimation based on physics-constrained kernel
19120EXTERNAL DIVISION OF TWO PROXIMITY OPERATORS—PART I: DEBIASED FEATURE GROUPING
19121EXTERNAL DIVISION OF TWO PROXIMITY OPERATORS—PART II: GENERALIZATION AND PROPERTIES
19097EXTRACTING FORMULAE IN MANY-VALUED LOGIC FROM DEEP NEURAL NETWORKS
16292EXTREMOPROMPT: ADVANCING MIXTURE OF SOFT PROMPTS TO THE LIMIT
13318F2G-AMD: Feature-to-Graph Affinity with Large-Kernel Attention for AMD Grading using Fundus Images
3741F5E-TTS: ENHANCING SPEECH SYNTHESIS BY ALIGNING TEXT WITH RICH SEMANTIC REPRESENTATIONS
5595FABEM: FREQUENCY-AWARE BOUNDARY ENHANCEMENT MODULE FOR SMALL OBJECT DETECTION
17954FACESLEUTH-R: ADAPTIVE ORIENTATION-AWARE ATTENTION FOR ROBUST MICRO-EXPRESSION RECOGNITION
11767Face-Voice Association with Inductive Bias for Maximum Class Separation
14604FAC-FACODEC: CONTROLLABLE ZERO-SHOT FOREIGN ACCENT CONVERSION WITH FACTORIZED SPEECH CODEC
10494FACLIP : LEARNABLE FINE-GRAINED PROMPTS AND MULTI-SCALE FUSION FOR ZERO-SHOT ANOMALY DETECTION
12764FADEMEM: BIOLOGICALLY-INSPIRED FORGETTING FOR EFFICIENT AGENT MEMORY
4325FAIRCG: MITIGATE COUNTERFACTUAL AND GROUP BIAS IN MACHINE LEARNING
5116FAIRMOO: ACHIEVING FAIRNESS IN DISTRIBUTED LEARNING VIA CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION
10708FAIRNESS-AWARE GRAPH REPRESENTATION LEARNING THROUGH LOW-FREQUENCY BIAS SEPARATION
17173Fairness-oriented decoupled user association and resource allocation in fully-decoupled RAN: A two-layer MAB approach
13735FAITH: ENHANCING TIME SERIES FORECASTING WITH FREQUENCY-BASED ADAPTIVE INPUT HORIZON
11019FAKE IMAGE DETECTION ON NOISE RESIDUAL SPECTRA VIA RANDOM-FEATURE SINGLE-LAYER NEURAL NETWORKS
18930Fake Path Co-Construction Source Location Privacy Protection Scheme Design For UWSNs
2021FAKE SPEECH WILD: DETECTING DEEPFAKE SPEECH ON SOCIAL MEDIA PLATFORM
10908Fake-HR1: Rethinking Reasoning of vision language model for Synthetic Image Detection
9677Fall Detection with Sound Diffusion Field: Integrating Audible Sound Event and Acoustic Speed Estimation
7992FAN-RFID: EXFILTRATING DATA FROM AIR-GAPPED SYSTEMS VIA FAN-INDUCED RFID MODULATION
15074FANSR: Frequency Adaptive Network for Efficient Image Super-Resolution
10575FAO-FORMER: LEARNING DISENTANGLED SEMANTIC REPRESENTATIONS WITH FREQUENCY-AWARE ORTHOGONAL TRANSFORMER
10146FAST AND ACCURATE TEMPORAL SUPER-RESOLUTION VIA RESIDUAL-AWARE COUPLED TENSOR FACTORIZATION
2767Fast and Accurate Text-to-Motion Generation through Discrete-Guided Continuous Modeling
10819Fast and Robust Triple Tensor Decomposition With Data Corruption
10142FAST INTER- AND INTRA-MODE DECISION FOR VIDEO-BASED DYNAMIC MESH CODING
2616Fast Low-light Enhancement and Deblurring for 3D Dark Scenes
10229FAST SINGLE-SNAPSHOT HARMONIC RECOVERY WITH 2D SPARSE ARRAYS USING BCCB MATRICES
16452FAST SPARSE NONNEGATIVE MATRIX FACTORIZATION WITH MANIFOLD ACCELERATION
4767FAST: FUSION-BASED ANOMALY SEARCH ON A TREE IN HIERARCHICAL HETEROGENEOUS SYSTEMS
13298FAST_QR: FAST, ACCURATE AND STABLE QUANTILE REGRESSION FOR TIME-SERIES ANALYSIS VIA ADAPTIVE HUBER SMOOTHING
5730FASTAV: EFFICIENT TOKEN PRUNING FOR AUDIO-VISUAL LARGE LANGUAGE MODEL INFERENCE
11174FASTEAGLE: CASCADED DRAFTING FOR ACCELERATING SPECULATIVE DECODING
3646FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement
5451FAST-GS: Frequency Aware Space-time Gaussian Splatting for Photorealistic Dynamic Novel View Synthesis
17946FAST-SLOW LORA: ACHIEVING EFFICIENT CONTINUAL LEARNING VIA FAST-SLOW LEARNING AND REDUNDANCY PRUNING
1778FAST-ULCNET: A FAST AND ULTRA LOW COMPLEXITY NETWORK FOR SINGLE-CHANNEL SPEECH ENHANCEMENT
9366FC-FORMER: EFFICIENT FEATURE CODING FOR MACHINES VIA A HYBRID CNN-TRANSFORMER ARCHITECTURE
2936FC-MOE: FLIP CONSISTENT MIXTURE OF EXPERTS ARE GOOD LEARNERS FOR UNIFIED FACE ATTACK DETECTION
12321FCR: MULTI-VIEW COMPOSITIONAL RETRIEVAL FOR TIME SERIES FORECASTING WITH LARGE LANGUAGE MODELS
4924FCSTG-InceptionNet: Temporal Lag and Mesoscale Spatio-Temporal Features Modeling for EEG-Based Diagnostics
13469FC-VFI: FAITHFUL AND CONSISTENT VIDEO FRAME INTERPOLATION FOR HIGH-FPS SLOW MOTION VIDEO GENERATION
11318FDCA-CLIP: FREQUENCY-ENHANCED DUAL-SEMANTIC CROSS-MODAL ALIGNMENT FOR ZERO-SHOT SPATIO-TEMPORAL ACTION LOCALIZATION
17152FDCNET: FREQUENCY DOMAIN CHANNEL ATTENTION AND CONVOLUTION FOR LIPREADING
14951FDCP-MATCH: A NEW MODEL WITH FREQUENCY DOMAIN AND CLASS PROMPT FOR GENERALIZED FEW-SHOT SEMANTIC SEGMENTATION
15410FDS-MANET: A HYPERSPECTRAL CLASSIFICATION NETWORK DRIVEN BY BIDIRECTIONAL MAMBA WITH FREQUENCY DOMAIN ENHANCEMENT AND GRAPH MODULATION
1684FEATURE IDENTIFICATION FOR HIERARCHICAL CONTRASTIVE LEARNING
10420FEATURE PROJECTION LEARNING FOR BETTER VISION-LANGUAGE REASONING
11078FEATURE-GUIDED UNSIGNED DISTANCE FUNCTIONS ESTIMATION FOR SURFACE RECONSTRUCTION
9448FED: A FINE-GRAINED ENHANCED DUAL-ROUTING NETWORK FOR MULTIMODAL SARCASM DETECTION
12893FedALP: Lightweight Personalized Federated Learning with Adaptive Low-Rank Adapters
12849FedAVOT: Exact Distribution Alignment in Federated Learning via Masked Optimal Transport
13040FEDCADS: ROBUST FEDERATED LEARNING VIA DUAL DISTILLATION AND PARTICIPATION-AWARE OPTIMIZATION UNDER NON-IID DATA
15394FEDCOMPASS: FEDERATED CLUSTERED AND PERIODIC AGGREGATION FRAMEWORK FOR HYBRID CLASSICAL-QUANTUM MODELS
3148FedDBP: Enhancing Federated Prototype Learning with Dual-Branch Features and Personalized Global Fusion
6956FEDD-NET: A FREQUENCY DIAGONAL FEATURE ENHANCED DUAL-BRANCH DIFFUSION NETWORK FOR LOW-LIGHT IMAGE ENHANCEMENT
4030Federated Camouflaged Poisoning Attack in Federated Unlearning
10458Federated Clustering without k: Adaptive Prototype Aggregation on Heterogeneous Data
6782FEDERATED HETEROGENEOUS LANGUAGE MODEL OPTIMIZATION FOR HYBRID AUTOMATIC SPEECH RECOGNITION
10297FEDERATED IMAGE CLUSTERING WITH KNOWLEDGE INTERACTION
16836FEDERATED JOINT LEARNING FOR DOMAIN AND CLASS GENERALIZATION
14083FEDERATED SMOOTHING ADMM FOR ROBUST LOCALIZATION
8072FED-GAME: PERSONALIZED FEDERATED LEARNING WITH GRAPH ATTENTION MIXTURE-OF-EXPERTS FOR TIME-SERIES FORECASTING
10684FEDGPAI: PERSONALIZED FEDERATED LEARNING BASED ON PARAMETER SENSITIVITY ADAPTIVE INTERPOLATION
13954FEDLA: FILTER-WISE LEARNABLE AGGREGATION FOR FEDERATED LEARNING UNDER NON-IID DATA
2780FED-MET: MEMORY-EFFICIENT ELASTIC TRAINING IN FEDERATED LEARNING
4803FEDON: BLACK-BOX UNTARGETED MODEL POISONING VIA MULTI-OBJECTIVE REINFORCEMENT LEARNING
13041FEDPAK: SERVER-CENTRIC PROTOTYPE REFINEMENT WITH ADAPTIVE MARGINS AND GENERATIVE KNOWLEDGE TRANSFER FOR HETEROGENEOUS FEDERATED LEARNING
1493FedPGP: Adaptive Feature Alignment for Personalized Global Prototypes in Federated Learning
4183Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation
2513FEDPLA: PROTOTYPE-ALIGNED LOW-RANK ADAPTATION FOR MULTIMODAL FEDERATED LEARNING
15998FEDPROLN: CLASS PROTOTYPE-ENHANCED FEDERATED LEARNING FOR LONG-TAILED NOISY LABELS
5250FEDPROTOALIGN: FEDERATED PROTOTYPE ALIGNMENT UNDER IDENTITY INCONSISTENCY FOR GAIT RECOGNITION
16010FedRD: Reducing Divergences for Generalized Federated Learning via Heterogeneity-aware Parameter Guidance
3187FEDRL-SATOPT: FEDERATED REINFORCEMENT LEARNING FOR JOINT ROUTING AND COMPUTING IN DYNAMIC LEO SATELLITE NETWORKS
8056FedSKU: Defending Backdoors in Federated Learning Through Selective Knowledge Unlearning
15221FEDZKD: ZEROTH-ORDER DUAL-ADAPTER DISTILLATION FOR FEDERATED FINE-TUNING
12212Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
5630FEFusionMap:A LiDAR-Camera Fused Semantic Map Generation Frame Via Multi-modal Feature Enhancement
14296FEMTOMODELS FOR EEG ARTIFACT REMOVAL: A PARAMETER LOWER-BOUND FOR GENERALISABLE EOG DENOISING
13598FEW-SHOT AND PSEUDO-LABEL GUIDED SPEECH QUALITY EVALUATION WITH LARGE LANGUAGE MODELS
17798FEW-SHOT BEARING FAULT DIAGNOSIS USING MULTI-SCALE FEATURE EXTRACTION AND ATTENTION-BASED PROTOTYPE MATCHING
18325Few-shot Learning via Multi-modal Representation Integration
1906FEW-SHOT OBJECT DETECTION VIA CONDITIONAL VARIATIONAL ADAPTIVE MEMORY ENHANCEMENT
3869FEW-SHOT RECOGNITION OF AUDIO DEEPFAKE GENERATORS USING GRAPH-BASED PROTOTYPE ADAPTATION
14892FGAL-DM: PRIVACY-PRESERVING SEMANTIC COMMUNICATION VIA FEDERATED GENERATIVE ADVERSARIAL LATENT DIFFUSION MODELS
2713FGAPA: FEATURE-GUIDED ADVERSARIAL PROTOTYPE ALIGNMENT FOR HYPERSPECTRAL CROSS-DOMAIN FEW-SHOT CLASSIFICATION
3279FGFF-NET: FREQUENCY-GUIDED FEATURE FUSION NETWORK FOR VISIBLE-INFRARED OBJECT DETECTION
15897FGGM: FISHER-GUIDED GRADIENT MASKING FOR CONTINUAL LEARNING
16779FGSANet: A Frequency-Guided and Structure-Aware Framework for Robust Sheep Breed Recognition
14211FIBKD: A FIBER BUNDLE-BASED FRAMEWORK FOR EFFECTIVE KNOWLEDGE DISTILLATION
8645FIDIC:FINE-GRAINED CONVERSATIONAL EMOTION RECOGNITION VIA INDIVIDUAL DIFFERENCES IN INERTIA AND CONTAGION
18891FieldFormer: Self-supervised Reconstruction of Physical Fields via Tensor Attention Prior
10403FIG: FREQUENCY-BASED INTEGRATED GRADIENTS FOR ROBUST FEATURE ATTRIBUTION
15448FILTER THEN ATTEND: IMPROVING ATTENTION-BASED TIME SERIES FORECASTING WITH SPECTRAL FILTERING
2808FILTER-GROUP MIXTURE-OF-EXPERTS MODEL FOR REMOTE SENSING FORGED TARGET PERCEPTION
9874FINBED: A UNIFIED MULTIMODAL EMBEDDING FRAMEWORK FOR FINANCIAL REPRESENTATION LEARNING
15264FINE-GRAINED FRAME MODELING IN MULTI-HEAD SELF-ATTENTION FOR SPEECH DEEPFAKE DETECTION
12454FINE-GRAINED GESTURE RECOGNITION VIA NARROW-KERNEL CNN AND ATTENTION-BASED SEMG-ACC FUSION
6983Fine-Grained Hashing via Center Similarity Guided Quantization
1928Fine-grained Text-to-Image Synthesis with Semantic Refinement
15721FineLongCLIP: Advancing Fine-Grained Image-Text Matching via a Dual-Branch Visual Encoder Capturing Global and Detailed Features
3521FINE-TUNED DEEP SUBSPACE CLUSTERING NETWORKS
7009FINE-TUNING BIGVGAN-V2 FOR ROBUST MUSICAL TUNING PRESERVATION
15139FINE-TUNING LARGE MULTIMODAL MODELS FOR AUTOMATIC PRONUNCIATION ASSESSMENT
10500FINE-TUNING MODEL WATERMARKS AGAINST EXTRACTION ATTACKS BY REHEARSAL
13974FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition
13596FINLUMEN: A GAME-THEORETIC MULTI-AGENT FRAMEWORK FOR RATIONAL PORTFOLIO MANAGEMENT
17786FINMCP-BENCH: BENCHMARKING LLM AGENTS FOR REAL-WORLD FINANCIAL TOOL USE UNDER THE MODEL CONTEXT PROTOCOL
4643FINSENTLLM: MULTI-LLM AND STRUCTURED SEMANTIC SIGNALS FOR ENHANCED FINANCIAL SENTIMENT FORECASTING
10763FINUA: GENERATING DIVERSE USER INTERACTIONS FOR FINANCIAL DIALOGUE SYSTEMS THROUGH USER SIMULATION
13629FIPNET: SYNERGISTIC FEATURE ENHANCEMENT AND IDENTITY PURIFICATION FOR CLOTHES-CHANGING PERSON RE-IDENTIFICATION
14515First Results on RIS-Enabled Multi-Layer Localization: A Joint Terrestrial and Non-Terrestrial Method
7147First-order and second-order detectors for matched subspace detection on graphs
8131Fisher Scoring algorithm for Time-delay and Doppler estimation
14266FIXED-POINT EQUALIZATION IN DIAGONAL EXPECTATION PROPAGATION: SCALAR DECOUPLING AND BAYES-MMSE OPTIMALITY
9399FLAME: EMPOWERING FROZEN LLMS FOR KNOWLEDGE GRAPH COMPLETION
14368FLASHFOLEY: FAST INTERACTIVE SKETCH2AUDIO GENERATION
4566FLASH-UNLEARN: ON-THE-FLY, TRAINING-FREE LARGE LANGUAGE MODELS UNLEARNING THROUGH SUBSPACE DISTRIBUTION FILTERING
15978F-LBQ: FINE-GRAINED LOW BIT QUANTIZATION FOR EFFICIENT AND ACCURATE OBJECT DETECTION
13338FLEXIBLE FILTER DESIGN USING DEEP OSCILLATORY NEURAL NETWORKS
13151FLEXI-LORA: EFFICIENT LORA FINETUNING WITH INPUT-ADAPTIVE DYNAMIC RANKS
9955FLEXIO: FLEXIBLE SINGLE- AND MULTI-CHANNEL SPEECH SEPARATION AND ENHANCEMENT
10634FLIPCON: FLIPPED CONTRASTIVE LEARNING FOR FINE-GRAINED DOA REPRESENTATION
15940FLOW INTELLIGENCE: ROBUST FEATURE MATCHING VIA TEMPORAL SIGNATURE CORRELATION
6050FLOW MATCHING-BASED ACTIVE LEARNING FOR RADIO MAP CONSTRUCTION WITH LOW-ALTITUDE UAVS
3587FLOWGPT: A GPT-CONDITIONED VISION-MAMBA FRAMEWORK FOR FINE-GRAINED URBAN FLOW INFERENCES
17341FLOWIID: SINGLE-STEP INTRINSIC IMAGE DECOMPOSITION VIA LATENT FLOW MATCHING
1742FlowMemRep:Automated workflow with Memory-Aware for Smart Contract Vulnerability Repair Using LLMs
6992FLOWSE-GRPO: TRAINING FLOW MATCHING SPEECH ENHANCEMENT VIA ONLINE REINFORCEMENT LEARNING
16704FlowSGG: A Single-Stage Framework for Dynamic Scene Graph Generation via Temporal Propagation
10081Fluid Antenna Assisted Anti-Jamming Communication in Low-Altitude Wireless Networks
18864FMAPLS: BAYESIAN LABEL SHIFT ESTIMATION BASED ON DYNAMIC DIRICHLET PARAMETER ADAPTATION
5616FM-Fusion: A Flow Matching Approach for Multi-Modal Image Fusion
3660FMSP-IR: Frequency Modulation and Structure Priors for All-in-One Image Restoration
5436FMTFUSE: EDGE FOURIER-ENHANCED MULTI-SCALE TRANSFORMER FOR MULTI-MODAL IMAGE FUSION
16217FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model
15483FOCA: MULTIMODAL MALWARE CLASSIFICATION VIA HYPERBOLIC CROSS-ATTENTION
5013FOCALCODEC-STREAM: STREAMING LOW-BITRATE SPEECH CODING VIA CAUSAL DISTILLATION
13109FOCALLINK: DENSELY MODULATED CONTRASTIVE LEARNING FOR TRAJECTORY ASSOCIATION IN MULTI-OBJECT TRACKING
7735FOCUS BEFORE REASONING: A BIDIRECTIONAL SELECTION FRAMEWORK FOR NOISE-MITIGATION IN KNOWLEDGE-BASED VISION QUESTION ANSWERING
10999FOCUSFUZZ: TOWARDS EFFICIENT AND DIRECTED FUZZING FOR RTL PROCESSOR DESIGNS
7377FODGE: HIGH-FIDELITY DANCE GENERATION VIA FULL-BODY OPTIMIZATION
17851FOLEYBENCH: A BENCHMARK FOR VIDEO-TO-AUDIO MODELS
18289FOLLOWING THE TRACE: A STRUCTURED PATH TO EMPATHETIC RESPONSE GENERATION WITH MULTI-AGENT MODELS
12551FONTMIMICKER: ENHANCING STYLIZED FONT GENERATION VIA FREQUENCY-AWARE DIFFUSION AND DEFORMABLE ALIGNMENT
17249FoodCLIP: Advancing Food Analysis via Large-scale Pre-training
1982Foreground-Enhanced Coarse-to-Fine Detection for UAV Small Objects
1337FORGERYFUSION: DIFFUSION-DRIVEN REAL FACE MODELING FOR GENERALIZABLE FACE FORGERY DETECTION
13297FORGETMARK: STEALTHY FINGERPRINT EMBEDDING VIA TARGETED UNLEARNING IN LANGUAGE MODELS
14070ForkNet: Direction-Aware and Wavelet-Guided Dual-Encoder Network for Image Fusion
15085FORSE: A RETRIEVAL-AUGMENTED FRAMEWORK FOR TIME SERIES FORECASTING
6672FORWARD CONVOLUTIVE PREDICTION FOR FRAME ONLINE MONAURAL SPEECH DEREVERBERATION BASED ON KRONECKER PRODUCT DECOMPOSITION
5408FORWARD-BACKWARD PRIORS: INTEGRATING PLUG-AND-PLAY AND REGULARIZATION BY DENOISING VIA MONOTONE OPERATOR THEORY
5053FOSK: FAST OPEN-VOCABULARY 3D INSTANCE SEGMENTATION VIA CONSENSUS-FILTERED KNOWLEDGE DISTILLATION
9627Fostering Accuracy and Generalization Ability in Gaze Estimation by Gaze-Relevant Feature Normalization
1232FOUNDATION MODELS-GUIDED MULTI-LEVEL MOTION DECOUPLING VIA GAUSSIAN SPLATTING FOR MONOCULAR VIDEO RECONSTRUCTION
18412Fourier Pruning for Large Language Models Compression
16620FOURIER REGULARIZATION IN UNROLLED ALGORITHM FOR UNIVERSAL DEMOSAICKING
17858FP-ANet: A fixed-point Attention Network for Hybrid-field THz Ultra-Massive MIMO Channel Estimation
10989FPGA IMPLEMENTATION OF ACCURATE AND LOW-COST KEYWORD SPOTTING
4265FPI-DET: A FACE–PHONE INTERACTION DATASET FOR PHONE-USE DETECTION AND UNDERSTANDING
6638Fractal Generative Distillation
18137FragLDM: Fragment-Guided Latent Diffusion Model for 3D Molecular Generation
15007FRAME-STACKED LOCAL TRANSFORMERS FOR EFFICIENT MULTI-CODEBOOK SPEECH GENERATION
16775FREDNET: A FREQUENCY AND DECOMPOSED-SPATIAL NETWORK FOR INDUSTRIAL DEFECT SIGNAL DETECTION
5329Free2Frame: A Training-Free Framework for Video Understanding with Memory Boosting
4031FREEANIMATE: TRAINING-FREE HUMAN IMAGE ANIMATION WITH PREVIEW-GUIDED DENOISING
18299FREQ-DP NET: A DUAL-BRANCH NETWORK FOR FENCE REMOVAL USING DUAL-PIXEL AND FOURIER PRIORS
15544FREQKAN: FREQUENCY-DOMAIN KOLMOGOROV-ARNOLD NETWORK FOR ADAPTIVE MODULATION RECOGNITION
13103FreqMTA: Multi-Token Attention for Stable Frequency-Domain Long-Term Time Series Forecasting
3852FREQUENCY-AWARE CONTRASTIVE LEARNING AND SPECTRAL DISENTANGLEMENT FOR UNSUPERVISED IMAGE DERAINING
7259Frequency-Aware Dynamic Graph Learning via Pseudo-Spectral Decomposition for Metro Flow Forecasting
6957Frequency-Aware Mamba: Exploiting Frequency-Domain Priors to Alleviate Class Imbalance in Medical Image Segmentation
5142FREQUENCY-AWARE Y-SHAPE DECLOUDFORMER FOR SAR-ASSISTED CLOUD REMOVAL
3462Frequency-Decoupled Learning for Joint Thin-Cloud Removal and Pansharpening
18958FREQUENCY-DIRECTION AWARE MULTICHANNEL SELECTIVE FIXED-FILTER ACTIVE NOISE CONTROL BASED ON MULTI-TASK LEARNING
2790Frequency-Domain Driven Recurrent Attention with Linear Complexity for Time Series Forecasting
15794FREQUENCY-ENHANCED AND CONFLICT-ADAPTIVE ODE FRAMEWORK FOR TRAINING-FREE CONSISTENT VIDEO EDITING
8188FREQUENCY-GUIDED MULTI-LEVEL REASONING FOR SCENE GRAPH GENERATION IN VIDEO
14036FREQUENCY-INDEPENDENT AMBISONICS UPSCALING USING DEEP LEARNING
1611Frequency-Modulated Differential Transformer for Semantic Segmentation of Remote Sensing Images
16973From Base to Novel: Semantic-Guided Visual Concept Transfer in Few-Shot Image Classification
11287FROM COLD-START TO STABILIZATION: A DUAL-PROTOTYPE FRAMEWORK FOR ONLINE ANY-SHOT CONTINUAL LEARNING
13732FROM CONTRAST TO COMMONALITY: AUDIO COMMONALITY CAPTIONING FOR ENHANCED AUDIO-TEXT CROSS-MODAL UNDERSTANDING IN MULTIMODAL LLMS
16090From Decomposition to Fusion: Anomaly Detection with Temporal Correlation-Data Dependency Discrepancy Analysis
8718FROM DESIGN TO INDUCTION: A NEW PARADIGM FOR RESPONDENT-CENTRIC PSYCHOLOGICAL SCALE GENERATION
3428From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
10283FROM DISTORTION TO EXPRESSION: PARALLEL MULTI-HOP GRAPH SIGNAL PROCESSING UNDER HETEROPHILY
6435FROM ECG SIGNALS TO DIAGNOSTIC REPORTS: A UNIFIED FRAMEWORK WITH MULTI-MODAL ENCODER AND FINE-TUNED LLM FOR AUTOMATED REPORT GENERATION
7676FROM FIXED POSITIONS TO FREE-FORM SIGNALS: VIRTUAL MICROPHONE SIGNAL ESTIMATION FOR GENERAL-PURPOSE SPATIAL AUDIO PROCESSING
16391FROM HALLUCINATION TO ARTICULATION: LANGUAGE MODEL-DRIVEN LOSSES FOR ULTRA LOW-BITRATE NEURAL SPEECH CODING
12187From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
12093FROM HYPE TO INSIGHT: RETHINKING LARGE LANGUAGE MODEL INTEGRATION IN VISUAL SPEECH RECOGNITION
13753FROM INDEPENDENCE TO INTERACTION: SPEAKER-AWARE SIMULATION OF MULTI-SPEAKER CONVERSATIONAL TIMING
4615From Intent to Invocation: A Reasoning-First Framework for Natural Language to Penetration Testing Commands
15882FROM KNOWING TO DOING PRECISELY: A GENERAL SELF-CORRECTION AND TERMINATION FRAMEWORK FOR VLA MODELS
17821From Lightweight Client Models to a Foundation Model in One Shot with Generative Distillation for Medical Image Segmentation
4713From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
12986FROM PAST TO FUTURE: LEVERAGING EVENT CAUSALITY FOR EXPLAINABLE PREDICTION WITH LARGE LANGUAGE MODELS
11268FROM PER-TIMESTEP DECIDERS TO HOLISTIC STRATEGY GENERATORS: EVOLVING STRATEGIC COMPLEXITY IN LLMS
12673FROM PHASED ARRAYS TO MIMO: RIS-ENABLED WAVEFORM DIVERSITY IN RADAR
6804FROM POWERSGD TO POWERSGD+: LOW-RANK GRADIENT COMPRESSION FOR DISTRIBUTED OPTIMIZATION WITH CONVERGENCE GUARANTEES
17938FROM PRETRAINING TO ROBUSTNESS: BENCHMARKING SSL MODELS FOR NOISE-ROBUST SPEECH EMOTION RECOGNITION
13556FROM SEMANTIC SHIFTS TO CAUSAL CUES: COUNTERFACTUAL LEARNING FOR HATEFUL MEME DETECTION
15565From Silent Flows to Speaking Guardians: LLM-Enhanced Framework for IoT Anomaly Detection
2718From Synthetic to Wild: Dual Alignment for Unsupervised Domain Adaptation RGBT Crowd Counting
9204FROM TOKEN TO LINE: ENHANCING CODE GENERATION WITH A LONG-TERM PERSPECTIVE
15674From WMMSE to XMMSE Algorithm: An Old Tune in a Fast New Key
6068FRONTEND TOKEN ENHANCEMENT FOR TOKEN-BASED SPEECH RECOGNITION
3857F-SEGMAN: FEW-SHOT DOMAIN ADAPTATION CRACK SEGMENTATION
16385FS-LoRA: Fast and Slow Low-Rank Adaptation for Class Incremental Learning
2035FTIN: FREQUENCY-TIME INTEGRATION NETWORK FOR INERTIAL ODOMETRY
11182FULL BAND DENOISING OF ROOM IMPULSE RESPONSE IN THE WAVELET DOMAIN WITH DICTIONARY LEARNING
10707FULL-DUPLEX-BENCH V1.5: EVALUATING OVERLAP HANDLING FOR FULL-DUPLEX SPEECH MODELS
5682FULL-TO-MISSING MODALITY KNOWLEDGE DISTILLATION FOR MULITMODAL 3D SEMANTIC SEGMENTATION
16821FUN-SSL: FULL-BAND LAYER FOLLOWED BY U-NET WITH NARROW-BAND LAYERS FOR MULTIPLE MOVING SOUND SOURCE LOCALIZATION
2681FUSEMOS: PERCEPTUAL EVALUATION OF TEXT-TO-MUSIC GENERATION WITH DUAL-ENCODER FUSION AND RANKING-AWARE COMPOSITE LOSS
9716Fusing Image and Saliency Modalities for Robust Label Restoration with Transformers
10202FUSION OF TRANSFORMER AND CNN ATTENTION NETWORKS FOR LEARNED IMAGE COMPRESSION
7535FUSIONEDIT: SEMANTIC FUSION AND ATTENTION MODULATION FOR TRAINING-FREE IMAGE EDITING
15527FUZZY MEMBERSHIP-ENHANCED UNCERTAINTY-AWARE FUSION FOR MULTI-VIEW CLASSIFICATION
10802FWF-NET: A LEARNABLE FOURIER-WAVELET FUSION NETWORK FOR PDE OPERATOR LEARNING
15150FW-VTON: FLATTENING-AND-WARPING FOR PERSON-TO-PERSON VIRTUAL TRY-ON
2903FXSEARCHER:GRADIENT-FREETEXT-DRIVENAUDIOTRANSFORMATION
4110G2LST: Global to Local Stackelberg Decision Model for computation offloading under Mobile Internet of Things
5400G2P-Rec: Graph-to-Prompt Synergistic Reasoning for Knowledge-Enhanced Recommendation
13841G4CDR: A 4D GeoSOT Grid-Graph for Real-Time UAV Conflict Detection and Resolution
12633G-AFS: GRAPH-GUIDED ADAPTIVE KEYFRAME SAMPLING FOR VIDEO SUMMARIZATION
10308GALA: DUAL ALIGNMENTS FOR UNSUPERVISED DOMAIN ADAPTATION WITH LIMITED SOURCE LABELS
18088GalaxyEdit: Large Scale Image Editing Dataset with Enhanced Diffusion Adapter
14104GAME-THEORETIC INSIGHTS INTO MULTI-AGENT LLM DEBATE FOR ENHANCED CLINICAL QUESTION ANSWERING
9827GAME-TIME: EVALUATING TEMPORAL DYNAMICS IN SPOKEN LANGUAGE MODELS
17323GAMMA: GENERALIZABLE AI-GENERATED IMAGE DETECTION VIA MULTI-TASK AND MANIPULATION-AUGMENTED SUPERVISION
13761GaRA: Gated Low-rank Adaptation for Fine-tuning Time-series Foundation Models
9482GAUSSIAN CLOUD MODEL BAYESIAN NEURAL NETWORKS: A VARIATIONAL INFERENCE FRAMEWORK FOR RELIABLE PREDICTION
16645GAUSSIAN LOCALITY PRIOR FOR CONTRAST–RECONSTRUCTION LEARNING:STATE–SPACE MODEL-BASED TIME–SERIES ANOMALY DETECTION
17921GAUSSIAN MESH RENDERER FOR LIGHTWEIGHT DIFFERENTIABLE RENDERING
11362Gaussian Process State-Space Models for Irregularly Sampled Sequential Data
16572Gaussian Processes for Sensor Repositioning in PDE-Driven Systems
11307GAUSSIAN SPATIAL INTERACTION WITH LONG-RANGE CONTEXT FUSION FOR RADAR-CAMERA 3D OBJECT DETECTION
1231GAUSSIAN SPLATTING WITH HYBRID DEFORMATION AND MULTI-SCALE DEPTH REGULARIZATION FOR DYNAMIC SINGLE-VIEW VIDEO RECONSTRUCTION
1356GAUSSIAN2SCENE: 3D SCENE REPRESENTATION LEARNING VIA SELF-SUPERVISED LEARNING WITH 3D GAUSSIAN SPLATTING
13404Gaussian-grounded Contextual Hierarchical Inference for Weakly Supervised Video Anomaly Detection
7381GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
17420GCE-UQ: QUANTIFYING AND DECOMPOSING UNCERTAINTY IN GRAPH COUNTERFACTUAL EXPLANATIONS
3495GCFNET: GLOBAL FEATURE ENHANCEMENT AND CLUSTERING-BASED RECONSTRUCTION FOR INDUSTRIAL IMAGE CAPTIONING
16769GDCNET: GENERATIVE DISCREPANCY COMPARISON NETWORK FOR MULTIMODAL SARCASM DETECTION
13310GD-COLLAB: GENERATOR-DISCRIMINATOR MULTI-AGENT COLLABORATION FOR AUTOMATED GRAPH ANOMALY DETECTION
3235GDIFFUSE: DIFFUSION-BASED SPEECH ENHANCEMENT WITH NOISE MODEL GUIDANCE
2463GEIA: GENERATIVE ENHANCEMENT INVERSION ATTACK TARGETING MACHINE UNLEARNING
9157GELINA: UNIFIED SPEECH AND GESTURE SYNTHESIS VIA INTERLEAVED TOKEN PREDICTION
5446GEN3D: GENERATING DOMAIN-FREE 3D SCENES FROM A SINGLE IMAGE
18117GENCHO: ROOM IMPULSE RESPONSE GENERATION FROM REVERBERANT SPEECH AND TEXT VIA DIFFUSION TRANSFORMERS
11345GENDEN-GS: GENERATIVE-PRIOR-DRIVEN DENSIFICATION FOR SPARSE-VIEW 3D GAUSSIAN SPLATTING
16306GENERALIZABILITY OF PREDICTIVE AND GENERATIVE SPEECH ENHANCEMENT MODELS TO PATHOLOGICAL SPEAKERS
1050GENERALIZABLE DETECTION OF AUDIO DEEPFAKES
3584GENERALIZABLE SPECULAR SCENE RECONSTRUCTION VIA ANISOTROPIC FILTERING AND ASG-ENHANCED GAUSSIAN SPLATTING
12240GENERALIZABLE SPEECH DEEPFAKE DETECTION VIA INFORMATION BOTTLENECK ENHANCED ADVERSARIAL ALIGNMENT
12375GENERALIZABLE SPEECH DEEPFAKE DETECTION VIA META-LEARNED LORA
15249Generalization In One-Step Contextual Bandit Based DoA Estimation With Passive Backscatter Tags
1573GENERALIZED MULTIDIMENSIONAL CHINESE REMAINDER THEOREM (MD-CRT) FOR MULTIPLE INTEGER VECTORS
15313GENERATING LOCALIZED AUDIBLE ZONES USING A SINGLE-CHANNEL PARAMETRIC LOUDSPEAKER
1582Generating Moving 3D Soundscapes with Latent Diffusion Models
15476GENERATING TRAINING TARGETS FOR REAL-WORLD SPEECH ENHANCEMENT VIA CLOSE-TO-DISTANT MICROPHONE PROJECTION
14533GENERATIVE AUDIO EXTENSION AND MORPHING
5194GENERATIVE MODEL-BASED COMPRESSED SENSING FOR MMWAVE CHANNEL ESTIMATION THROUGH SEQUENTIAL PATH RECONSTRUCTION
8883GENERATIVE MULTI-MODAL EXPLAINABLE RECOMMENDATION
17810Generative Spatiotemporal Modeling for Uncertainty Quantification in High-Dimensional Physical Systems
2450GENFACTS-GENERATIVE COUNTERFACTUAL EXPLANATIONS FOR MULTI-VARIATE TIME SERIES
6258GenFRC: Generative Feature Replay and Calibration for Non-Exemplar Class-Incremental Learning
16419GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference
8212GEN-SER: WHEN THE GENERATIVE MODEL MEETS SPEECH EMOTION RECOGNITION
1180GEODESIC PROTOTYPE MATCHING VIA DIFFUSION MAPS FOR INTERPRETABLE FINE-GRAINED RECOGNITION
13864Geo-Human: Geometrically-Guided 3D Gaussian Splatting for High-Fidelity Human Reconstruction under Sparse Views
5556GEOMETRIC CONSTRAINT-ENHANCED DATA ASSOCIATION FOR MULTI-TARGET LOCALIZATION IN DISTRIBUTED MIMO RADAR SYSTEMS
5986GEOMETRIC IMAGE SYNCHRONIZATION WITH DEEP WATERMARKING
13498GEOMETRY-AWARE RECONSTRUCTION OF LARGE VISION-LANGUAGE MODELS FROM DENSE INTO MIXTURE-OF-EXPERTS
14142GHIN: GATED HIERARCHICAL INTERACTION NETWORK FOR MULTIMODAL SARCASM DETECTION
4628GIFT: A Generative Imagined Fine-Tuning Framework for Visual Place Recognition
7901GIREG: GEOMETRIC-IMAGE COLLABORATIVE POINT CLOUD REGISTRATION
14246GLA-GRAD++: AN IMPROVED GRIFFIN-LIM GUIDED DIFFUSION MODEL FOR SPEECH SYNTHESIS
6259GLAP: General contrastive audio-text pretraining across domains and languages
10637GLASS-SAM: TRANSPARENT OBJECT SEGMENTATION USING FRACTAL-ENHANCED SAM WITH SHAPE CONTEXT-BASED REWARD
2622GLDPC-Net: Global-Local Dual-Scale Fusion and Geometry-aware Synchronization for Denoising Point Cloud Completion
5744Global Context-Aware Multi-Instance Learning for Whole Slide Image Classification
14370GLORIA: GATED LOW-RANK INTERPRETABLE ADAPTATION FOR DIALECTAL ASR
11688GLUCOAPRL: AHEAD-PLANNING REINFORCEMENT LEARNING MECHANISM FOR SAFE BLOOD GLUCOSE REGULATION
11571GLUCOMIXER: AN EFFICIENT GLUCOSE MONITORING MODEL WITH MIXERS
16124GLUE: Gradient-free Learning to Unify Experts
14501GMAMBAFLOW: GLOBAL-AWARE MAMBA BASED COST VOLUME AGGREGATION FOR OPTICAL FLOW
15601GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining
13900Goal-Oriented Joint Source–Channel Coding: Distortion–Classification–Power Trade-off
11614GOFN: GRADIENT ORTHOGONAL FUSION NETWORK FOR SINGLE-IMAGE TRANSPARENT WATERMARK DETECTION AND REMOVAL
17685GO-MLVTON: GARMENT OCCLUSION-AWARE MULTI-LAYER VIRTUAL TRY-ON WITH DIFFUSION MODELS
12290GPS-GS: GEOMETRY-AWARE PROGRESSIVE OPTIMIZATION WITH SYNERGISTIC PSEUDO-VIEWS FOR SPARSE-VIEW GAUSSIAN SPLATTING
11677GRADERAG: BLACK-BOX SEMANTIC PATH INJECTION ATTACKS ON GRAPH RAG SYSTEMS
2913GradFusion: Recognition-Compatible Face Anonymization via Semantic Gradient Editing and Latent Fusion
11842GRADIENT BOOSTING FOR ONLINE TWO-STAGE ADAPTIVE GROUP TESTING
17841GRADIENT-GUIDED LEARNABLE WINDOW ATTENTION FOR EDGE-ENHANCED SUPER-RESOLUTION
15376GRAM-SCHMIDT FEATURE SELECTION FOR CLASS ACTIVATION MAPS
5709GRANULAR-BALL BASED MULTI-VIEW OUTLIER DETECTION
4639GRAPH DISTRIBUTION-VALUED SIGNALS: A WASSERSTEIN SPACE PERSPECTIVE
6169GRAPH FOURIER TRANSFORMER WITH STRUCTURE-FREQUENCY INFORMATION
13174Graph Hodge-Laplacian Particle Filtering for Communication-Efficient Distributed Tracking
19147GRAPH LAPLACIAN LEARNING WITH EXPONENTIAL FAMILY NOISE
18899Graph Neural Network-Based GrUNet and Attention Transformer Adjacency Matrix for Video Denoising
4971GRAPH NEURAL NETWORK-BASED REINFORCEMENT LEARNING FOR COOPERATIVE NETWORK LOCALIZATION
15387GRAPH NEURAL NETWORKS IN LARGE SCALE WIRELESS COMMUNICATION NETWORKS: SCALABILITY ACROSS RANDOM GEOMETRIC GRAPHS
6316GRAPH NEURAL NETWORKS WITH DIVERSITY-AWARE NEIGHBOR SELECTION AND DYNAMIC MULTI-SCALE FUSION FOR MULTIVARIATE TIME SERIES FORECASTING
10274Graph of Thoughts Signal Modeling for Sequential Recommendation
17456Graph Signal Generative Diffusion Models
14676Graph Topological Rectification with Guaranteed Reduction of Class Ambiguous Regions
17179GRAPH TRANSFORMERS FOR AUTOMOTIVE RADAR CLUTTER AND TARGET CLASSIFICATION AT THE EDGE
14297Graph-Aware Diffusion for Signal Generation
16889GRAPH-AWARE LEARNING RATES FOR DECENTRALIZED OPTIMIZATION
11615Graph-based 3D Human Pose Estimation using WiFi Signals
18248GRAPH-BASED EMOTION CONSENSUS PERCEPTION LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATION
11658Graph-Based Image Selection for High-Quality 3D Gaussian Splatting
2129Graph-Based Learning of Spectro-Topographical EEG Representations with Gradient Alignment for Brain-Computer Interfaces
5523GRAPH-BASED MODELING OF HETEROGENEOUS DATA FUSION WITH ENTERPRISE ASSOCIATION RELATIONSHIPS: ENHANCING CORPORATE CREDIT RATING
14304GRAPHEME: GRAPH NEURAL NETWORKS WITH MULTI-EXPERT FUSION FOR EMOTION-CAUSE PAIR EXTRACTION
15766GRAPH-ENHANCED PROTOTYPE ADAPTATION FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION
11661GRAPH-GUIDED CONTRASTIVE LEARNING FOR INCOMPLETE MULTI-VIEW CLUSTERING WITH CONSISTENT GLOBAL GRAPH
13049GRAPH-MAMBA COLLABORATIVE LEARNING NETWORK FOR CAMOUFLAGED OBJECT DETECTION
5920GRAPHMD: A TWO-MODULE DIFFUSION FRAMEWORK FOR SMOOTH AND CONSISTENT MOLECULAR DYNAMICS
5964GRAPHPL: LEVERAGING GNN FOR EFFICIENT AND ROBUST MODALITIES IMPUTATION IN PATCHWORK LEARNING
4944GRASP: GRoup-shApley feature Selection for Patients
4370GratingNet: A Novel 1D-CNN-BiLSTM Architecture with Attention for Optical Grating Parameter Measurement from Diffraction Spectra
16992Grey-Box Prompt Tuning with Graph Alignment for Speech-Language Models
9426GRIDLESS DOA ESTIMATION FOR LARGE-SCALE WIDEBAND MODELS: A NONCONVEX FACTORED l0 ATOMIC NORM APPROACH
10562Gridless Non-coherent DOA Estimation for Uniform Linear Arrays Aided by a Reference Signal with Periodic Phase Variation
2518GRIT: Grounding Through Reasoning and Iteraive Thinking in Adverse Weather
5234GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis
14331Gromov-Wasserstein Graph Coarsening
13406GROUP RELATIVE POLICY OPTIMIZATION FOR TEXT-TO-SPEECH WITH LARGE LANGUAGE MODELS
3596Group-Sparse Gaussian Process Regression for Inhomogeneous Sound Field Estimation
1791GS-3I: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images
4875GSDFUSE: CAPTURING COGNITIVE INCONSISTENCIES FROM MULTI-DIMENSIONAL WEAK SIGNALS IN SOCIAL MEDIA STEGANALYSIS
9001G-SFADA: GRADIENT-INSPIRED SOURCE-FREE ACTIVE DOMAIN ADAPTATION FOR SEMANTIC SEGMENTATION
3591GS-MARK: DEEP ROBUST WATERMARKING FOR GRAPH SIGNALS
1965GSPrivacy:Attribute-Preserving Face Anonymous Framework VIA Fully Controllable Gaussian Head Avatar
6061GSTA: EFFICIENT TRAINING SCHEME WITH SIESTAED GAUSSIANS FOR MONOCULAR 3D SCENE RECONSTRUCTION
18409GSTNET: A GEOSPATIAL-TEMPORAL GRAPH NETWORK FOR GROUP PERSON RE-IDENTIFICATION
15050GTCL: Graph-Text Contrastive Learning meets Log Anomaly Detection
7143GTFMN: Guided Texture and Feature Modulation Network for Low-Light Image Enhancement and Super-Resolution
12387GTLITEPOSE: A LIGHTWEIGHT ARCHITECTURE MODEL INTEGRATING GRAPH CONVOLUTION AND TRANSFORMER
16321GTMA: Dynamic Representation Optimization for OOD VLMs
10255GUI-ARP: ENHANCING GROUNDING WITH ADAPTIVE REGION PERCEPTION FOR GUI AGENTS
4469GUIDED BAYESIAN CONSOLIDATION FOR CLASS-INCREMENTAL CONTINUAL LEARNING THROUGH VARIATIONAL CONSTRAINTS AND NOISE PERTURBATIONS
13913GUIDING EFFICIENT LLM INSTRUCTION-TUNING VIA GRADIENT FLOW MATCHING
15812GVNP-GS: GEOMETRY-ANCHORED AND VIEW-AWARE NEURAL PROXIES FOR SPARSE-VIEW GAUSSIAN SPLATTING
10075H²DFD: SELF-SUPERVISED FAKE NEWS DETECTION VIA A NOVEL HYPERBOLIC HYPERGRAPH DIFFUSION MODEL
10599H3GM: HISTORY-GUIDED GLOBAL GEOMETRIC METRIC FOR SINGLE IMAGE TO 3D SCENE GENERATION
16036HACG: Contribution-Based Dynamic Grouping with Hierarchical Graph Attention for Multi-Agent Cooperation
13184HAD: HYBRID ADVERSARIAL DISTILLATION AGAINST ADVERSARIAL ATTACKS
18168Hadamard Tensor Ring for Efficient Low-Rank Fine-Tuning
3358HADEN: Hierarchical Attentive Alignment and Dual-Contrastive Enhancement Network for Multimodal Few-Shot Relation Extraction
4260HAIR NOISE ANALYSIS AND MITIGATION FOR SMART GLASSES AUDIO CAPTURES
2218HALLUCINATION DETECTION VIA INTERNAL STATES AND STRUCTURED REASONING CONSISTENCY IN LARGE LANGUAGE MODELS
18109HAM-SAM2: ENHANCING SAM2 FOR VISUAL OBJECT TRACKING WITH ADAPTIVE MOTION MODELING AND HIERARCHICAL MEMORY BANK
5279HandFusion: Efficient Cross-modal Fusion Network for RGB-D based 3D Hand Mesh Reconstruction
3387Handling Heterogeneous Features: Modeling Continuous-Discrete Feature Interaction for Time Series Anomaly Detection via Conditional Diffusion
15400Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection
11503HAO-QCB: Towards Robust Quantization-Conditioned Backdoor Attack with Hidden Activation Offset
13519Hardware-Efficient Cognitive Radar: Multi-Target Detection with RL-Driven Transmissive RIS
3037HARMONET: MUSIC GROUNDING BY SHORT VIDEO VIA HARMONIC RESAMPLE AND DYNAMIC SPARSE ALIGNMENT
14735HARMONIC PARAMETER DESIGN IN THE APPROXIMATED ONE-BIT HERMITE LAW
10279HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
10042Harmonized Evolutionary Reinforcement Learning
3306HARNESSING MASKED GENERATIVE TRANSFORMERS FOR EFFECTIVE KNOWLEDGE DISTILLATION
12443Harnessing the Gradient: Enhanced Cross-Prompt Attacks on Large Vision-Language Models
19099HARNESSING WAVEFRONT CURVATURE AND SPATIAL CORRELATION IN NONCOHERENT MIMO COMMUNICATIONS
18197HASAP: HIERARCHICAL ACOUSTIC-SEMANTIC ANNOTATION PIPELINE FOR SCRIPTED SPEECH DATA
14268Hashing-Baseline: Rethinking hashing in the age of pretrained models
8158HA-VITNET: DUAL-DOMAIN COLLABORATIVE LEARNING FOR SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES
7046HAVT-IVD: HETEROGENEITY-AWARE CROSS-MODAL NETWORK FOR AUDIO-VISUAL SURVEILLANCE: IDLING VEHICLES DETECTION WITH MULTICHANNEL AUDIO AND MULTISCALE VISUAL CUES
4368HCGAN: HARMONIC-COUPLED GENERATIVE ADVERSARIAL NETWORK FOR SPEECH SUPER-RESOLUTION IN LOW-BANDWIDTH SCENARIOS
7596HCL-CSC: HIERARCHICAL CONTRASTIVE LEARNING WITH IDS-AWARE CHARACTER SIMILARITY FOR CHINESE SPELLING CORRECTION
7590HC-MONET: HIERARCHICAL CONTINUOUS MASKED OPERATOR NETWORK WITH A SHARED SPECTRAL MIXTURE TIME KERNEL FOR IRREGULAR TIME SERIES
5415HCPT: Hierarchical Cross-modal Prompt Tuning
15278HCTPOSE: HYBRID CNN-TRANSFORMER NETWORK FOR SELF-SUPERVISED MULTI-VIEW 3D HUMAN POSE ESTIMATION
17659HD-NEXUS: A HIERARCHICAL DECOUPLING FRAMEWORK FOR MULTI-MODAL, MULTI-TASK ASSISTIVE DRIVING PERCEPTION
6188HD-PPT: HIERARCHICAL DECODING OF CONTENT- AND PROMPT-PREFERENCE TOKENS FOR INSTRUCTION-BASED TTS
18875HDRSL Net for Accurate High Dynamic Range Imaging-Based Structured Light 3D Reconstruction
15725HEAD-AWARE VISUAL CROPPING: ENHANCING FINE-GRAINED VQA WITH ATTENTION-GUIDED SUBIMAGE
14303Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation
9208HEBBIAN LEARNING WITH GLOBAL DIRECTION
9403HELA: HYPER-EFFICIENT LIGHTWEIGHT ARCHITECTURE FOR IMAGE FINE-TUNING
15973HEMD-SEGNET: A HIERARCHICAL ENCODER-MIXER-DECODER SEGMENTATION NETWORK FOR EXTRACTING LAKES FROM REMOTE SENSING IMAGES
13683HERGNET: A FAST NEURAL SURROGATE MODEL FOR SOUND FIELD PREDICTIONS VIA SUPERPOSITION OF PLANE WAVES
4981HETEROGENEOUS ADVERSARIAL FEDERATED LEARNING
15881Heterogeneous Feature Mutual-Calibration Assisted Online Distillation for Efficient Face Anti-Spoofing
7665Heterogeneous Parallel Framework with Spatio-temporal Conditional Random Field for 3D Human Pose Estimation
2944HETEROGENEOUS SELF-SUPERVISED ACOUSTIC PRE-TRAINING WITH LOCAL CONSTRAINTS
10918HETEROGENEOUS SPATIAL TEMPORAL GRAPH NEURAL NETWORK FOR MULTIVARIATE TIME SERIES FORECASTING
13571HEURISTIC SYNTHESIS FROM BELIEF STATES: ROBUST PLANNING UNDER AMBIGUOUS NATURAL LANGUAGE INSTRUCTIONS
1858HFDFORMER: MONOCULAR 3D HUMAN RECONSTRUCTION VIA LAYER-WISE HIERARCHICAL FEATURE DECOUPLING TRANSFORMER
13715HFGNET: MITIGATING BOUNDARY DISTORTION FOR SONAR IMAGE SEGMENTATION WITH HIGH FREQUENCY GUIDANCE STRATEGY
11543HFSQVAE: HIERARCHICAL VECTOR QUANTIZATION WITH RESIDUALS FOR FREQUENCY-SPECIFIC EMBEDDING
12743HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training
15665HIBAR: A HIDDEN BACKDOOR ATTACK ON LLM RECOMMENDATION SERVICES VIA MULTI-TURN DIALOGUE MANIPULATION
10316HIDIFF-ENERGY: A HIERARCHICAL DIFFUSION MODEL FOR MULTI-SCALE LONG-TERM ENERGY DATA GENERATION
12782HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO
4528Hierarchical Channel Aggregation with Entropy-Driven Distillation for Federated Segmentation
13476Hierarchical Contrastive Learning of Point Clouds Based on P-Norm Pooling
8278HIERARCHICAL CONTRASTIVE LEARNING WITH SPEECH LANGUAGE MODEL FOR SEPARATING SIMILAR SPEAKERS
10424HIERARCHICAL CORRELATION COST VOLUME FOR STEREO MATCHING
14397Hierarchical Discrete Flow Matching for Multi-Codebook Codec-based Text-to-Speech
2812Hierarchical Graph Convolutional Network with Depression-oriented Priors
17016HIERARCHICAL MARL FOR TASK ALLOCATION: DISTRIBUTED SUBTASK SELECTION WITH MUTUAL INFORMATION
5867HIERARCHICAL ORTHOGONAL RESIDUAL SPREAD FOR PRECISE MASSIVE EDITING IN LARGE LANGUAGE MODELS
17643Hierarchical Patch Collaboration with DINOv3 for Efficient Dichotomous Image Segmentation
3016Hierarchical Solver for Reassembling Mixed Puzzles of Eroded Gaps
12082Hierarchical Sparse Vector Transmission for Ultra Reliable and Low Latency Communications
10026HIERARCHICAL TOKENIZATION OF MULTIMODAL MUSIC DATA FOR GENERATIVE MUSIC RETRIEVAL
17319Hierarchical Voting Decoder for Resolving Knowledge Conflicts
10024Hierarchy-aware Dynamic Contrastive Learning and Structural Relation Constraints for Hierarchical Text Classification
18159HierSG: Hierarchical Semantic Gaussian Representation for 3D Occupancy
16428HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset
17208Hi-Former: A Hierarchical Transformer Pedestrian-Vehicle Detector
10307HIGH QUALITY UNDERWATER IMAGE COMPRESSION WITH ADAPTIVE COLOR CORRECTION
16442Higher-Order Feature Attribution: Bridging Statistics, Explainable AI, and Topological Signal Processing
18081HIGH-FIDELITY SPEECH ENHANCEMENT VIA DISCRETE AUDIO TOKENS
2899HIGH-FREQUENCY DETAIL COMPENSATION AND MULTI-SCALE FEATURE FUSION NET FOR UAV REMOTE SENSING OBJECT DETECTION
5259High-Frequency-Aware Omni-Aggregation Transformer for Image Super-Resolution
8199HIGH-LOW FREQUENCY NETWORK FOR SPACE-TIME VIDEO SUPER-RESOLUTION
17001HIGH-QUALITY TRANSMISSION OF HYPERSPECTRAL IMAGE BASED ON SEMANTIC COMMUNICATION
17386High-resolution Contrastive Framework for Generalizable AI-generated Image Detection
18921HILBERT TRANSFORM ON GRAPHS: LET THERE BE PHASE
16435HILO: HIERARCHICAL FEATURE FUSION VIA LOCAL-GLOBAL ATTENTION FOR MULTIMODAL EMBEDDINGS
13843HIMNN:A HIERARCHY-AWARE MULTIMODAL NEURAL NETWORK FOR ELECTROLYTE FORMULATIONS PROPERTY PREDICTION
10381HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK
9546HINT: HIERARCHICAL INTER-FRAME CORRELATION FOR ONE-SHOT POINT CLOUD SEQUENCE COMPRESSION
15363HIPPOCAMPAL-INSPIRED ASSOCIATE MEMORY FRAMEWORK FOR FEW-SHOT CLASSIFICATION
13980HI-READER: A HIERARCHICAL COGNITIVE FRAMEWORK FOR MULTI-PAGE DOCUMENT VISUAL QUESTION ANSWERING
16011HISEM-RL: HIERARCHICAL SEMANTIC-DRIVEN REINFORCEMENT LEARNING FOR ADAPTIVE VR VIDEO TRANSMISSION
2508HISTORICAL INTERACTION RETROSPECTIVE NETWORK FOR TEMPORAL KNOWLEDGE GRAPH REASONING
12371HIUFORMER: A HIERARCHICAL U-SHAPED TRANSFORMER WITH FREQUENCY-DIVIDED DUAL-PATH ATTENTION FOR MULTIVARIATE TIME SERIES FORECASTING
11925HLF: A HIERARCHICAL LOCALIZATION FRAMEWORK FOR JOINT MOMENT RETRIEVAL AND HIGHLIGHT DETECTION
15465HM-AVATAR: TOWARDS REALISTIC LOOSE GARMENT MODELING WITH HIERARCHICAL MLPS
1212HMD: Enhancing Vision Transformer Distillation via Mask Reconstruction
16942HMVLA: HYPERBOLIC MULTIMODAL FUSION FOR VISION-LANGUAGE-ACTION MODELS
12832H-NNPBFDAF: HIERARCHICAL NEURAL NETWORK PARTITIONED BLOCK FREQUENCY DOMAIN ADAPTIVE FILTER WITH NOVEL BLOCK ACTIVATION PROBABILITY
2366Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention
9726Homomorphic Convolution Reimagined: Eliminating Rotation Bottlenecks for Practical Privacy-Preserving CNN Inference
17330Homomorphic-Controlled Augmentation for Time Series Forecasting
5282HO-MUCI: Hierarchical Optimization-Driven Path Planning for Multi-UAV Regional Collaborative Inspection
5851HORIZON: A UNIFIED FRAMEWORK FOR PHASE-WISE RETRIEVAL-GENERATION OPTIMIZATION IN TASK-ORIENTED DIALOGUE SYSTEM
17639HOTGAD: HIGH-ORDER AND TEMPORAL PATTERN RECONSTRUCTION FOR DYNAMIC GRAPH ANOMALY DETECTION
11910HOT-P: HIERARCHICAL OPTIMAL TRANSPORT PROTOTYPING FOR SELF-SUPERVISED LEARNING
10760HOW CAN QUANTUM DEEP LEARNING IMPROVE LARGE LANGUAGE MODELS?
9786HOW DOES CRAMER-RAO BOUND ANALYSIS BENEFIT OPPORTUNISTIC RAIN FIELD RECONSTRUCTION
6635How Does Instrumental Music Help SingFake Detection?
9582HOW FAR DO SSL SPEECH MODELS LISTEN FOR TONE? TEMPORAL FOCUS OF TONE REPRESENTATION UNDER LOW-RESOURCE TRANSFER
6792HOW MANY IRSs ARE REQUIRED TO REALIZE A FULL-RANK MIMO CHANNEL?
14694HOW TO LABEL RESYNTHESIZED AUDIO? THE DUAL ROLE OF NEURAL AUDIO CODECS IN AUDIO DEEPFAKE DETECTION
12703HPC-NERF: INTEGRATING HIGH-FIDELITY POINT CLOUDS WITH NEURAL RADIANCE FIELDS FOR ENHANCED 3D RECONSTRUCTION
15823HPTune: Hierarchical Proactive Tuning for Collision-Free Model Predictive Control
13480HREI: HYBRID LONG-SHORT RETRIEVAL AND EFFICIENT INFERENCE FOR KNOWLEDGE BASE QUESTION ANSWERING
7900HSI-DM:TRAINING-FREE HIERARCHICAL STYLE INJECTION IN DIFFUSION MODELS FOR NATURAL CONTENT-STYLE FUSION
16997HSRI: High-fidelity Shape Representation with Image Guidance
10790HSSDCT: Factorized Spatial-Spectral Correlation for Hyperspectral Image Fusion
10814HUMAN MESH RECOVERY FROM PARTIAL POINT CLOUD WITHOUT HUMAN ANNOTATIONS
10496HUMAN-CENTRIC IMAGE EDITING VIA MOE-UNET DENOISING
16425HUNT: DETECTING HALLUCINATIONS VIA MULTI-LAYER DISCRIMINATIVE REPRESENTATIONS IN LARGE LANGUAGE MODELS
4863HUNTING THE STREAM: AN EFFICIENT AND LIGHTWEIGHT APPROACH FOR ENCRYPTED HLS LIVE STREAMING TRAFFIC IDENTIFICATION
18100HuntingLLM: Risk-Driven Automated Red Teaming with Adaptive Attack Agents
14466HVAC-EAR: EAVESDROPPING HUMAN SPEECH USING HVAC SYSTEMS
14615HVD: HUMAN VISION-DRIVEN VIDEO REPRESENTATION LEARNING FOR TEXT-VIDEO RETRIEVAL
6057HYBRID CHANNEL ESTIMATION WITH QUANTIZED PHASE FEEDBACK FOR OVER-THE-AIR COMPUTATION
11881HYBRID PROGRESSIVE FUSION NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS
1139HYBRID PRUNING: IN-SITU COMPRESSION OF SELF-SUPERVISED SPEECH MODELS FOR SPEAKER VERIFICATION AND ANTI-SPOOFING
1694HYBRID QUANTUM–CLASSICAL GROUP SPARSE RECOVERY
4626Hybrid Ranking with Collaborative Signals for LLM-Based Recommendation
17255Hybrid Semantic-Complementary Transmission for High-Fidelity Image Reconstruction
18110HYBRID ZEROTH-ORDER FINE-TUNING FOR LANGUAGE MODEL WITH CPU MEMORY ASSISTANCE
7628HybridMask: Facial-Guided Cross-Modal Fusion for Multimodal Deepfake Detection
17075HyFlowSE: Hybrid End-to-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning
2010Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification
9473HyperCool: Reducing Encoding Cost in Overfitted Codecs with Hypernetworks
12309HYPERDEFORM: A CROSS-LEVEL SEMANTIC AND SPATIAL ADAPTIVE MODULE FOR ROBUST SCENE TEXT DETECTION
14139HYPERDOA: ROBUST AND EFFICIENT DOA ESTIMATION USING HYPERDIMENSIONAL COMPUTING
18240HyperFedFS: Heterogeneous Federated Few-Shot Learning with Hypergraph-driven Collaborative Aggregation
16766HYPERGRAPH-BASED ASYMMETRIC EMBEDDING FRAMEWORK FOR ATTRIBUTE-MISSING GRAPH CLUSTERING
14855HYPERSPARSE: FINDING COMPETITIVE HIGH-SPARSITY MODELS VIA HYPERNETWORKS
19073Hyperspectral Information Extraction With Full Resolution From Arbitrary Photographs
5153HYPERSPECTRAL OBJECT TRACKING METHOD BASED ON GENERAL EXPERT ADAPTER
9588HYPERSTG: A SPATIAL-TEMPORAL SURVIVAL HYPERGRAPH NETWORK FOR TEMPORAL KNOWLEDGE GRAPH REASONING
8544HYPERTEST: LOW RANK TEST-TIME ADAPTATION FOR CROSS-SCENE HYPERSPECTRAL IMAGE CLASSIFICATION
5249I²CAR: INTRA- AND INTER-VARIATE CONSISTENCY CONTRASTIVE ADVERSARIAL REPRESENTATION LEARNING FOR MULTIVARIATE TIME SERIES ANOMALY DETECTION
10284IADP-SNN: Integer Activation Dropping Spiking Neural Network for Underwater Acoustic Communication Signal Recognition
7953IBMCT: Breaking the Cost Barrier in Industrial Internet of Things via High-Fidelity Virtual Sensing
11779IBPCODEC : A LOW-BITRATE LIGHTWEIGHT SPEECH CODEC WITH INTER-BAND PREDICTION
6284ICNET: INPUT-GUIDED CALIBRATION NETWORK FOR HIGH-FIDELITY POINT CLOUD COMPLETION
12234ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation
13074ICRE-COT: A RETRIEVAL-REVISED TWO-STAGE RANKING FRAMEWORK FOR LLM-BASED KNOWLEDGE GRAPH COMPLETION
17083ICSMFT5: MULTI-FEATURE FUSION LARGE MODEL APPROACH FOR INDUSTRIAL CONTROL PROTOCOL REVERSE ENGINEERING
12973I-DCCRN-VAE: AN IMPROVED DEEP REPRESENTATION LEARNING FRAMEWORK FOR COMPLEX VAE-BASED SINGLE-CHANNEL SPEECH ENHANCEMENT
17212IDEAvatar: Identity-Preserving Avatar Generation With Controllable Emotions
16302IDENTIFIABILITY OF ROTATING STELLAR SURFACES FROM ASTROMETRIC JITTER
12618Identifying birdsong syllables without labelled data
15206Identifying common backbones of interactions underlying food webs via non-deterministic alignments
15780Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations
16493IDENTITY LEAKAGE THROUGH ACCENT CUES IN VOICE ANONYMISATION
4055IdentityGuard: Context-Aware Restriction and Provenance for Personalized Synthesis
17042IESGN-OCC: An Instance-Enhanced Sparse Guidance Network for Vision-based Occupancy Prediction
9995IEUOD: IMPROVING UNDERWATER OBJECT DETECTION VIA SHALLOW FEATURE GUIDANCE FROM UNDERWATER IMAGE ENHANCEMENT MODELS
15577IGCNet: Dual-Branch Implicit Feature and Global Context Network for Agricultural Parcel Delineation
11128IG-CODIFF:CONTRASTIVE DIFFUSION MODELS WITH CROSS-INSTANCE GRAPH CONSTRUCTION FOR TABULAR DATA SYNTHESIS
2673IG-DETR: INSTANCE-GUIDED DYNAMIC QUERIES FOR SMALL OBJECT DETECTION
1414IGRS-YOLO: Illumination-Guided Iterative Residual Decoupling Reflection Enhancement for Low-Light Small Object Detection
11591IGSA: An Information-Guided Synchronized Attack Framework for High-Transferability Multimodal Attack
12432I-LORA: AN ADAPTIVE RANK ALLOCATION APPROACH USING INTEGRATED GRADIENTS
14640ILSA: Information Loss-guided Sparsity Allocation for Pruning Large Language Models
7921Image Ordinal Regression Based on Hierarchy Coherent Transformation with Normalized Binary Classifiers
18146IMAGE-PIXEL REALIGNMENT FOR OPEN-VOCABULARY SEMANTIC SEGMENTATIONVIA SELF-TRAINING
9167iMathBench: Is Your Multi-modal Large Language Model Ready to Solve Mathematical Problems Embedded in Images?
18204I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search
11335IMITATOR: A HIGHLY TRANSFERABLE ADVERSARIAL PROPERTY-DRIVEN STRATEGYFOR TARGETED ATTACKS
11891IMPACT OF PCR CYCLES AND MUTATION RATE ON LINEAR DNA BARCODE DETECTION
10422Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
12255IMPACT OF QUANTIZATION IN NEAR-FIELD CHANNEL MODELING
10127IMPERCEPTIBLE ADVERSARIAL EXAMPLE GENERATION CONTROLLED BY HIGH-FREQUENCY SIGNAL
15631Implicit Degradation Representation and Adaptive Dictionary Learning for Underwater Image Compression
11184IMPORTANCE OF BALANCE: LIGHTWEIGHT TRANSFORMER VIA SIGNED GRAPH ALGORITHM UNROLLING FOR EEG SIGNAL DENOISING
6128Improve MLLM Benchmark Efficiency through Interview
16652IMPROVED CONVEX RELAXATION FOR 4-PAM SIGNAL RECOVERY
15960Improving Active Learning for Melody Estimation by Disentangling Uncertainties
2816Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
4211IMPROVING AUDIO EVENT RECOGNITION WITH CONSISTENCY REGULARIZATION
5542IMPROVING AUDIO QUESTION ANSWERING WITH VARIATIONAL INFERENCE
6722IMPROVING AUTOMATIC SPEECH RECOGNITION BY MITIGATING DISTORTIONS INTRODUCED BY SPEECH ENHANCEMENT UNDER DRONE NOISE
2339IMPROVING BINAURAL DISTANCE ESTIMATION IN REVERBERANT ROOMS THROUGH CONTRASTIVE AND MULTI-TASK LEARNING
11459IMPROVING CONTEXTUAL ASR VIA MULTI-GRAINED FUSION WITH LARGE LANGUAGE MODELS
4495IMPROVING CONTINUOUS SIGN LANGUAGE RECOGNITION VIA LIGHTWEIGHT ADAPTIVE TEMPORAL MIXING
14204IMPROVING CROSS-DOMAIN GENERALIZATION OF LIGHTWEIGHT TRANSFORMERS ON NAMED ENTITY RECOGNITION USING SELF-TRAINING
13822IMPROVING DIFFUSION INVERSE PROBLEM SOLVING WITH STRUCTURE CONSISTENCY REGULARIZATION
1595IMPROVING FEW-STEP GENERATION OF RECTIFIED FLOW MODELS WITH CONSISTENT GRADIENTS
6186IMPROVING INTERPRETABILITY IN GENERATIVE MULTITIMBRAL DDSP FRAMEWORKS VIA SEMANTICALLY-DISENTANGLED MUSICAL ATTRIBUTES
14774Improving Maximum Margin Backdoor Detection by Class Subspace Decorrelation
18945IMPROVING NUMERICAL STABILITY OF NORMALIZED MUTUAL INFORMATION ESTIMATOR ON HIGH DIMENSIONS
13563IMPROVING QUANTIZED GLOSS-FREE SIGN LANGUAGE TRANSLATION MODEL VIA DISENTANGLED ARITHMETIC-PROMPTING
11992Improving Representation Learning for Long-tailed Visual Recognition
15747Improving Sign Language Translation via Gloss Guided Temporal and Representation Alignment
11731IMPROVING TEXT-INSTANCE ALIGNMENT OF FOREGROUND CONDITIONED OUT-PAINTING VIA CUSTOMIZED CONCEPT EMBEDDING
5806Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
1281IMPROVING THE SPEAKER ANONYMIZATION EVALUATION’S ROBUSTNESS TO TARGET SPEAKERS WITH ADVERSARIAL LEARNING
16602IMPROVING WEAKLY SUPERVISED SCENE GRAPH GENERATION VIA NOISE-AUGMENTED TEXT EMBEDDINGS AND CLASS REWEIGHTING
10893IM-RACG: INFORMATION DENSITY-BASED ADAPTIVE MASKING STRATEGY FOR RETRIEVAL-AUGMENTED CODE GENERATION
14600INCOMPLETE MULTI-VIEW CLUSTERING VIA RECONSTRUCTING VIEWS AND STRUCTURE UNIFICATION
13166Incomplete vocabulary learning for fine-grained visual recognition
6713INCONVAD: A TWO-STAGE DUAL-TOWER FRAMEWORK FOR MULTIMODAL EMOTION INCONSISTENCY DETECTION
5963INCORPORATING PRIORS IN LEARNING: A RANDOM MATRIX STUDY UNDER A TEACHER STUDENT FRAMEWORK
16697INCORPORATING SLIDING WINDOW ATTENTION INTO MAMBA FOR LIGHT FIELD IMAGE SUPER-RESOLUTION
12404INCREMENTAL LEARNING FOR AUDIO CLASSIFICATION WITH HEBBIAN DEEP NEURAL NETWORKS
13751INCREMENTAL ORIGIN TRACING OF LLM-GENERATED TEXT WITH IDIOSYNCRASY ENHANCEMENT
17453INDIVIDUAL RISKY ACTION WARNING OF WEARABLE SENSORS ON PERSONALIZED HEALTH PROFILES VIA LLM
5257INDIVIDUALIZE THE HRTF NEURAL FIELD USING ANTHROPOMETRIC PARAMETERS WEIGHTED BY DIRECTION-ATTENTION
15174Inference Scaling in Knowledge Graph Construction for Enhanced Graph-RAG
18900Infinite Factorial Linear Dynamical Systems for Transient Signal Detection
16840INFLUENCE OF CLEAN SPEECH CHARACTERISTICS ON SPEECH ENHANCEMENT PERFORMANCE
13523INFLUENCE-AWARE CURATION AND ACTIVE SELECTION FOR INDUSTRIAL AND SURVEILLANCE SOUND EVENTS
6630INFORMATION-PRESERVING DOWNSAMPLING AND BIDIRECTIONAL FUSION FOR MULTI-SCALE TIME SERIES FORECASTING
10925INFORMATION-SEEKING TRANSMIT BEAMFORMING FOR COGNITIVE ULTRASOUND
3655INFUSING ARBITRARY IDENTITIES: GENERATING VISUALLY HIDDEN FACES VIA DIFFUSION MODELS
12198INPUT-ADAPTIVE DIFFERENTIABLE FILTERBANKS VIA HYPERNETWORKS FOR ROBUST SPEECH PROCESSING
12273INPUT-FAITHFUL SPARSE-VIEW 3D GAUSSIAN SPLATTING WITH DIFFUSION PRIORS
12079InsightRec: Enhancing Sequential Recommendation through Reasoning-Aware Preference Optimization
12943INSS: INVISIBLE SAMPLE-SPECIFIC BACKDOOR ATTACK VIA INVERTIBLE HIDDEN NEURAL NETWORKS
6750INSTANCERSR: REAL-WORLD SUPER-RESOLUTION VIA INSTANCE-AWARE REPRESENTATION ALIGNMENT
17898InstantPhoto: Instance-level Mask Generation via Attention-based Anchor Guidance for Realistic Photo Customization
1722InstructAudio: Unified speech and music generation with natural language instruction
5150INSTRUCTION GUIDED MULTI OBJECT IMAGE EDITING WITH QUANTITY AND LAYOUT CONSISTENCY
14437Instrument Generation Through Distributional Flow Matching and Test-Time Search
14705IN-SYNC: ADAPTATION OF SPEECH AWARE LARGE LANGUAGE MODELS FOR ASR WITH WORD LEVEL TIMESTAMP PREDICTIONS
10563INTACT: INDUCING NOISE TOLERANCE THROUGH ADVERSARIAL CURRICULUM TRAINING FOR LIDAR-BASED SAFETY-CRITICAL PERCEPTION AND AUTONOMY
19078Integrated DNN-based Parameter Estimation for Multichannel Speech Enhancement
12858Integrating Segment-level Context into Frame Representations for Speaker Diarization
2885Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization
17879INTEGRATING STACKED INTELLIGENT METASURFACES AND POWER CONTROL FOR DYNAMIC EDGE INFERENCE VIA OVER-THE-AIR NEURAL NETWORKS
2528Intelligent Character Segmentation Method for Ancient Tibetan Based on Character Structure and Attention-BiLSTM
10323Interactive Consistency And Mutual Independence In Causality For Semi-Supervised Medical Image Segmentation
17729INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS
10920INTERMITTENT SEMI-WORKING MASK: A NEW MASKING PARADIGM FOR LLMS
13639Interpolation-Aware Bitrate Ladder Optimization for Variable Framerate Video Streaming
16387Interpretable Alzheimer's disease Detection via Multi-Scale Fusion of Disentangled Speech Features
5586Interpretable CNN-based Enhancement for Nighttime Driving Image Perception
14447INTERPRETABLE MODELING OF ARTICULATORY TEMPORAL DYNAMICS FROM REAL-TIME MRI FOR PHONEME RECOGNITION
14181INTERPRETABLE MULTIMODAL CLASSIFICATION VIA CAUCHY-SCHWARZ DIVERGENCE-INDUCED GACS-KORNER COMMON INFORMATION
14030INTERPRETABLE MUSIC HARMONIC ANALYSIS THROUGH MULTILINEAR MIXTURE OF EXPERTS
2685INT-MEANFLOW: FEW-STEP SPEECH GENERATION WITH INTEGRAL VELOCITY DISTILLATION
15848Intrinsic Neuronal Adaptation Supports Robust Spatio-Temporal Processing in Spiking Neural Networks
5304Intrinsic Semantic Consistency Enhancement for Robust Hierarchical Understanding in VLMs
11522INTRINSICGRID: GRID-BASED INTRINSIC DECOMPOSITION FOR FAST 3D SCENE RECONSTRUCTION
1753Intrinsic-Preserving Cross-Modal Fusion for Small-Target Recognition in Intelligent Transportation
7911Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
1321INVERSE HALFTONING VIA WEIGHTED SOBEL CONDITIONED DIFFUSION MODEL
2045Inverse Rendering for High-Genus 3D Surface Meshes from Multi-view Images with Persistent Homology Priors
8090INVERSE-HESSIAN REGULARIZATION FOR CONTINUAL LEARNING IN ASR
13246Investigating Batch Inference in a Sequential Monte Carlo Framework for Neural Networks
9733INVESTIGATING MODALITY CONTRIBUTION IN AUDIO LLMS FOR MUSIC
5617INVESTIGATING THE EFFECT OF SENTENCE-LEVEL SYNTACTIC STRUCTURE ON INFORMATION LOSS IN THE HUMAN AUDITORY SYSTEM
15114INVISIBLE BACKDOOR ATTACKS ON SELF-SUPERVISED LEARNING VIA MULTI-CHANNEL ADAPTIVE STEGANOGRAPHY
1780IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data
17261IPACue-TTS: Integrating Prosody and Articulatory Cues in Conditional Flow Matching for Multilingual Zero-Shot TTS
11633IPI²: Mitigating Indirect Prompt Injections on Unmanned Aerial Vehicle Agents Using Physical Invariants
9423IQ-LUT: INTERPOLATED AND QUANTIZED LUT FOR EFFICIENT IMAGE SUPER-RESOLUTION
16182IR-HUNTER: AUTOMATED ANALYSIS OF INTENT REDIRECTION VULNERABILITIES IN ANDROID APPLICATIONS BASED ON HYBRID DYNAMIC AND STATIC APPROACHES
9690IRPFUZZ: FUZZING INDUSTRIAL ROBOT PROTOCOL VIA LLM-DRIVEN TRAFFIC SEMANTIC ANALYSIS
15724IRREGULAR MULTIVARIATE TIME SERIES MODELING VIA LATENT GRAPH-GUIDED GAUSSIAN PROCESS PRIORS
13530IS PHASE REALLY NEEDED FOR WEAKLY-SUPERVISED DEREVERBERATION ?
11858IS REPEATER-ASSISTED MASSIVE MIMO COMPATIBLE WITH DYNAMIC TDD?
15151ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
7288ISOMETRIC IMMERSION LEARNING WITH RIEMANNIAN GEOMETRY FOR DISTORTION-FREE REPRESENTATION
6763ISSE: AN INSTRUCTION-GUIDED SPEECH STYLE EDITING DATASET AND BENCHMARK
11248ISTER: LINEAR TRANSFORMER FOR EFFICIENT MULTIVARIATE TIME SERIES FORECASTING
14243It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion
14962ITD-AWARE BINAURAL SPIKING NETWORKS FOR SOUND SOURCE LOCALIZATION
12634ITDS-SQL: ENHANCING TEXT-TO-SQL PARSING BY IN-CONTEXT LEARNING WITH INFERENCE TIME DATA SYNTHESIS
16808ITERATIVE AMORTIZED HIERARCHICAL VAE
17724ITERATIVE REDUNDANCY-BASED HEAD PRUNING FOR EFFICIENT SELF-SUPERVISED SPEECH RECOGNITION MODELS
18080JAILFUZZ: A ZERO-KNOWLEDGE AND STATE FEEDBACK BASED GRAY-BOX FUZZING FRAMEWORK FOR LARGE LANGUAGE MODELS
18877Jamming and Impulsive Noise Uncertainty Aided Covert Communication in PLC Networks
17471J-MoGen:Joint Differential Learning and Semantic Enhancing for Motion Generating
4134JND-GS: JUST NOTICEABLE DIFFERENCE BASED 3D GAUSSIAN SPLATTING COMPRESSION
16919JOINT ACTIVE RIS CONFIGURATION AND USER POWER CONTROL FOR LOCALIZATION: A NEUROEVOLUTION-BASED APPROACH
10506Joint antenna selection and robust precoding design for muti-target DFRC
6449Joint Antenna Selection and Subarray Structure Design via CNN for Hybrid Beamforming ISAC
10156JOINT AUTOREGRESSIVE MODELING OF MULTI-TALKER OVERLAPPED SPEECH RECOGNITION AND TRANSLATION
16461JOINT CALIBRATION AND DIRECTION-OF-ARRIVAL ESTIMATION FOR SPARSE LINEAR ARRAYS: IDENTIFIABILITY AND ARRAY DESIGN
11735JOINT CLOUD AND HAZE REMOVAL BASED ON SPECTRAL HARMONIZER AND ATMOSPHERIC DISENTANGLER FOR REMOTE SENSING IMAGES
4857JOINT COMPRESSION AND DIRECTION-OF-ARRIVAL ESTIMATION IN DISTRIBUTED SENSOR NETWORKS
10963JOINT DEEP SECONDARY PATH ESTIMATION AND ADAPTIVE CONTROL FOR ACTIVE NOISE CANCELLATION
19081Joint Enhancement and Bandwidth Extension for Radar Through-Barrier Speech Acquisition
5910JOINT ESTIMATION OF LASER-ULTRASONICS RESONANCES IN THIN METAL PLATES
5161JOINT ESTIMATION OF PIANO DYNAMICS AND METRICAL STRUCTURE WITH A MULTI-TASK MULTI-SCALE NETWORK
10243Joint Estimation of Primary and Secondary Paths for Personalized Hearable Applications
16548JOINT GRAPH-BASED MODALITY ALIGNMENT FOR ROBUSTNESS IN CONVERSATIONAL EMOTION RECOGNITION
6742JOINT MODELING OF TYPICALITY AND UNCERTAINTY FOR SOT-BASED FEW-SHOT LLM REASONING
6046JOINT MULTICHANNEL ACOUSTIC FEEDBACK CANCELLATION AND SPEAKER EXTRACTION VIA KALMAN FILTER AND DEEP NON-LINEAR SPATIAL FILTER
1007JOINT MULTI-DIMENSIONAL FEATURES AND ACADEMIC NETWORK EMBEDDING FOR AUTHOR NAME DISAMBIGUATION
5498Joint Optimization of Physical Layer Security in XL- IRS-Assisted ISAC Systems under Hybrid-Field Propagation
12646Joint reconstruction and pansharpening for high-resolution hyperspectral single-pixel imaging
1985JOINT REPRODUCTION NUMBER AND SPATIAL CONNECTIVITY STRUCTURE ESTIMATION VIA GRAPH SPARSITY-PROMOTING PENALIZED FUNCTIONAL
2692Joint single-shot ToA and DoA estimation for VAA-based BLE ranging with phase ambiguity: A deep learning-based approach
15088JOINT SUPERPIXEL AND SELF-REPRESENTATION LEARNING FOR SCALABLE HYPERSPECTRAL IMAGE CLUSTERING
18148Joint Transmit Beamforming and Reflection Optimization for Beyond Diagonal RIS Aided Multi-Cell MIMO Communication
13686Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis
16329JPAD:Joint Prediction-Planning with Temporal Consistency for End-to-end Autonomous Driving
10528JRAS: JOINTED REBALANCED ADJUSTMENT STRATEGY FOR LONG-TAILED VISUAL RECOGNITIONS
18119JUDGE BEFORE ANSWER: CAN MLLM DISCERN THE FALSE PREMISE IN QUESTION?
15572JUND-F0: A Novel Deep Learning Framework for Joint Unvoiced/Voiced Detection and F0 Estimation
15753K Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function
7708Kalman Filter Based Linear Deformable for Retinal Vessel Segmentation
11317KAME: TANDEM ARCHITECTURE FOR ENHANCING KNOWLEDGE IN REAL-TIME SPEECH-TO-SPEECH CONVERSATIONAL AI
6091KAN we make models simpler for Audio Deepfake Detection with Kolmogorov-Arnold Networks?
5313KAN-ENHANCED TRANSFORMER WITH MULTISCALE PARALLEL POOLING FOR CLOUD REMOVAL
17646KAN-LLM: Kolmogorov-Arnold Networks-enhanced Large Language Models For Time Series Forecasting
13606KBNET: A KNOWLEDGE BRIDGING NETWORK FOR GENERALIZABLE DEEPFAKE DETECTION
7018KD-CVG: A KNOWLEDGE-DRIVEN APPROACH FOR CREATIVE VIDEO GENERATION
8389KDFNet: Kalman dynamic filtering network for multivariate time series forecasting
11751KEEPING MODELS LISTENING: SEGMENT- AND TIME-AWARE ATTENTION RESCALING AT DECODING TIME
9368KERNEL REGRESSION OF MULTI-WAY DATA VIA TENSOR TRAINS WITH HADAMARD OVERPARAMETRIZATION: THE DYNAMIC GRAPH FLOW CASE
6055KG2QA: KNOWLEDGE GRAPH-ENHANCED RETRIEVAL-AUGMENTED GENERATION FOR COMMUNICATION STANDARDS QUESTION ANSWERING
12050KGER: Knowledge Graph Error Detection and Refinement with Reinforcement Learning
17847KG-TOOLPLAN: KNOWLEDGE GRAPH-GUIDED REASONING FOR EFFICIENT LLM TOOL SELECTION
12696KINEMATIC PRIORS BENEFIT SKELETON-BASED ACTION RECOGNITION
13216KINGUARD: HIERARCHICAL KINSHIP-AWARE FINGERPRINTING TO DEFEND AGAINST LARGE LANGUAGE MODEL STEALING
1715KLGATE: LEVERAGING LLM EXPLANATIONS VIA KL-GUIDED GATING FOR MULTIMODAL SARCASM DETECTION
10845Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels
16987KNOWLEDGE DISTILLATION VIA GENERATIVE RECONSTRUCTION PATHWAYS FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
13566KNOWLEDGE EDITING WITH DEMONSTRATION SELECTION FOR MULTI-HOP QUESTION ANSWERING
14960KNOWLEDGE-AWARE REFINEMENT FOR DETECTING AND ADDRESSING ANOMALIES IN KNOWLEDGE TRACING
11386KNOWLEDGE-ENHANCED CONTRASTIVE LEARNING FOR GAIT EMOTION RECOGNITION
8376KNOWLEDGE-MISMATCHED SEMANTIC COMMUNICATION FOR WIRELESS IMAGE TRANSMISSION: A UNIFIED INFORMATION BOTTLENECK APPROACH
3996KPMG: A GRAPHICAL KOOPMAN-MAMBA APPROACH FOR FINANCIAL MARKETS
9800KSDIFF: KEYFRAME-AUGMENTED SPEECH-AWARE DUAL-PATH DIFFUSION FOR FACIAL ANIMATION
5065LABEL-CORRECTED WEIGHTED MULTI-SIMILARITY LOSS FOR NOISY CROSS-MODAL RETRIEVAL
5723LACEREC: CONTROLLABLE SEQUENTIAL RECOMMENDATION WITH WEAK-SIGNAL ENHANCEMENT
12766LAFUFU: LATENT ACOUSTIC FEATURES FOR ULTRA-FAST UTTERANCE RESTORATION
17830Lagrangian Deep Learning for Private RIS-aided Localization: An Active Sensing Approach
14722Lagrangian-Based Motion-Capture Model for Continuous Weather Forecasting
15963LAKALMANTRACKER: ROBUST LEARNING-AIDED KALMAN FILTERING FOR MULTI-OBJECT TRACKING
1891LAKAN: LANDMARK-ASSISTED ADAPTIVE KOLMOGOROV-ARNOLD NETWORK FOR FACE FORGERY DETECTION
13659LAMB: LLM-BASED AUDIO CAPTIONING WITH MODALITY GAP BRIDGING VIA CAUCHY-SCHWARZ DIVERGENCE
6048LAMER-SSL: LAYER-AWARE MIXTURE OF LORA EXPERTS FOR CONTINUAL MULTILINGUAL EXPANSION OF SELF-SUPERVISED MODELS WITHOUT FORGETTING
10890LAMIGAUSS: PITCHING RADIATIVE GAUSSIAN FOR SPARSE-VIEW X-RAY LAMINOGRAPHY RECONSTRUCTION
1903LAMUT: LIGHTING-AWARE MULTI-MATERIAL APPEARANCE TRANSFER FROM A SINGLE IMAGE
9855LANDSCAPE ANALYSIS OF SIMULTANEOUS BLIND DECONVOLUTION AND PHASE RETRIEVAL
16369LANGUAGE-INFUSED RETRIEVAL-AUGMENTED CTC WITH ADAPTIVE SOFT-HARD GATING FOR ROBUST CODE-SWITCHING ASR
16759LANTERN: LANGUAGE MODEL ASSESSMENT ON NOISY AND TRANSFORMED TASKS FOR UNDERSTANDING ERROR AND ROBUSTNESS NUANCES
3934LaPrune: Layout-Aware Pruning for Efficient Multimodal Large Language Models
17282Large System Analysis of SURE based Hyper- parameter Optimizing in Sparse Bayesian Learning
12592LARGE VISION MODELS CAN SOLVE MENTAL ROTATION PROBLEMS
6669LARGE-SCALE EEG MODELS FOR MEDITATION STATE RECOGNITION
9867LARGE-SYSTEM FIXED-POINT LAW AND DETERMINISTIC CLOSURE FOR SPARSE BAYESIAN LEARNING
4053LATENT DOMAIN PROMPT LEARNING FOR VISION-LANGUAGE MODELS
3341Latent DPO for Concept Erasure in Text-to-Video Diffusion Models.
14270LATENT SPACE ORTHONORMALIZATION FOR HYPER-FINETUNING OF LANGUAGE MODELS
6874LATENT TEMPORAL DISCREPANCY AS MOTION PRIOR: A LOSS-WEIGHTING STRATEGY FOR DYNAMIC FIDELITY IN T2V
14549LATENT VARIABLE ESTIMATION VIA KERNEL AND GRAPH FOR GAUSSIAN PROCESS REGRESSION
10483LATENTCOLORNET : A LATENT DIFFUSION-BASED FRAMEWORK FOR INFRARED IMAGE COLORIZATION
14554LatentGuard: Robust Latent Watermarking for Deepfake Tracing and Forgery Localization
11452LATENT-SPACE METRICS FOR COMPLEX-VALUED VAE OUT-OF-DISTRIBUTION DETECTION UNDER RADAR CLUTTER
5298LATTICE-GUIDED CONSISTENCY REGULARIZATION OF DUAL-MODE TRANSDUCERS FOR AUTOMATIC SPEECH RECOGNITION
16991Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification
4731LAYER-WISE CONTRIBUTION EVALUATION FOR INCENTIVIZING PERSONALIZATION IN FEDERATED LEARNING
18099Layout Robust Zero Shot Learning for Human Activity Recognition Using Wi-Fi Sensing in Unseen Environments
1192LC-Sketch: A Layered-Carry Sketch for IoT Network Measurement
13283LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool \\for Pre-trained Model-based Class-Incremental Learning
15873LDG-PCGC: LOSSLESS DYNAMICALLY GROUPED POINT CLOUD GEOMETRY COMPRESSION
6753LDINet: A Lightweight Dual Domain Interaction Network for Human Pose Estimation
15819LEARN TO UNLEARN IN LARGE LANGUAGE MODELS
10606LEARNABLE INSTANCE ATTENTION FILTERING FOR ADAPTIVE DETECTOR DISTILLATION
5665Learnable Mel-frontend for Robust Underwater Acoustic Target Detection under Non-Target Interference
14350Learning Affine-Equivariant Proximal Operators
18173Learning Beyond the Gaussian Data: Learning Dynamics of Neural Networks on an Expressive and Cumulant-Controllable Data Model
5825LEARNING CLASS SIMILARITIES FOR ENHANCED IMAGE OUT-OF-DISTRIBUTION DETECTION
13719LEARNING CLASS-CONDITIONAL TEMPERATURE WITH ENTROPY ALIGNMENT FOR MEDICAL IMAGE CLASSIFICATION
14292LEARNING CONSISTENT CAUSAL ABSTRACTION NETWORKS
11611LEARNING CONTROLLABLE BLIND DENOISING VIA NOISE LEVEL MAP ESTIMATION AND MISMATCH TRAINING
3661LEARNING CROSS-DOMAIN DISCREPANCY FOR IMAGE MANIPULATION LOCALIZATION
3205LEARNING DEPTH GUIDANCE FOR CAMOUFLAGED OBJECT DETECTION WITHOUT ANNOTATIONS
13688LEARNING DIRECTED ACYCLIC GRAPHS FROM MAX-TIMES STRUCTURAL EQUATION MODELS WITH SPARSE INPUT
9757LEARNING DOMAIN-ROBUST BIOACOUSTIC REPRESENTATIONS FOR MOSQUITO SPECIES CLASSIFICATION WITH CONTRASTIVE LEARNING AND DISTRIBUTION ALIGNMENT
6018LEARNING DUAL MIXTURE-OF-EXPERTS MODELS FOR UNIFIED IMAGE DERAINING
4151LEARNING EXPLICITLY CONDITIONED SPARSIFYING TRANSFORMS
2970LEARNING FAIR DOMAIN ADAPTATION WITH VIRTUAL LABEL DISTRIBUTION
14353LEARNING FALSE DISCOVERY RATE CONTROL VIA MODEL-BASED NEURAL NETWORKS
12652Learning Fill-in Reduction Ordering via Graph Policy Optimization for Sparse Matrices
9560LEARNING FROM LABEL PROPORTIONS WITH SHRINKING BAG
10382LEARNING FROM MULTIPLE EXPERTS: ALTERNATE ENSEMBLE DISTILLATION HASHING FOR LIGHTWEIGHT CROSS-MODAL RETRIEVAL
16030LEARNING FROM NOISY LABELS: A CONFORMAL PREDICTION PERSPECTIVE
17116LEARNING GRAPH FROM SMOOTH SIGNALS UNDER PARTIAL OBSERVATION: A ROBUSTNESS ANALYSIS
1926LEARNING GRAPHICAL MODELS UNDER LOW-RANK FACTOR ANALYSIS STRUCTURE
17564Learning Image-Text Matching with Optimal Partial Transport
6103LEARNING LATENT SPACE FOR MULTI-ORDER / RESOLUTION GRAPH-REGULARIZED IMAGE DENOISER
16785LEARNING LIGHT FIELD IMPLICIT NEURAL REPRESENTATIONS FOR ARBITRARY-SCALE SPATIAL-ANGULAR SUPER-RESOLUTION
13744LEARNING LINEARITY IN AUDIO CONSISTENCY AUTOENCODERS VIA IMPLICIT REGULARIZATION
14643LEARNING MIXTURE OF SPATIO-TEMPORAL EXPERTS FOR 3D HUMAN POSE ESTIMATION
1484LEARNING MOTION TRENDS IN GAUSSIAN SPLATTING FOR MONOCULAR DYNAMIC RECONSTRUCTION
1643LEARNING MULTI-COLOR SPACE IMPLICIT NEURAL REPRESENTATIONS FOR JOINT IMAGE DERAINING AND LOW-LIGHT ENHANCEMENT
14896Learning Nonlinear Systems In-Context: From Synthetic Data to Real-World Motor Control
9745Learning Non-Local Spatial-Spectral Correlation for Hyperspectral Image Super-Resolution
19070Learning Optimal Graph Filters for Clustering of Attributed Graphs
7832LEARNING PHYSICS-AWARE REPRESENTATION FOR DYNAMIC FLUID SCENES
12941LEARNING PIEZOELECTRIC HYSTERESIS IN IN-EAR MEMS LOUDSPEAKERS FROM ACOUSTIC MEASUREMENTS
16496LEARNING PRODUCT GRAPHS FROM TWO-DIMENSIONAL STATIONARY SIGNALS
9864Learning Reference-Guided Exposure Correction with Hybrid Illumination Characteristics
11095Learning Spatio-Temporal Variability for Cattle Re-Identification
11946LEARNING THE STRUCTURE OF CONNECTION GRAPHS
15187Learning Time-Varying Turn-Taking Behavior in Group Conversations
7250Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR
14800LEARNING TO CASCADE: A POMDP APPROACH TO SEQUENTIAL MODEL SELECTION
2058LEARNING TO COARSE-TO-FINE REFINEMENT FOR CAMOUFLAGED OBJECT DETECTION
16207Learning to Decrypt: A Cipher-guided Dynamic Expert Framework for Document Deblurring
16222Learning to Intervene: Optimized Soft Intervention Selection for Causal Discovery
19068LEARNING TO QUANTIZE AND PRECODE IN MASSIVE MIMO SYSTEMS FOR ENERGY REDUCTION: A GRAPH NEURAL NETWORK APPROACH
10734Learning to Rotate Frames for Hyperbolic Graph Feature Extraction
17631LEARNING TO SEE THROUGH DARKNESS: SELF-SUPERVISED EVENT-BASED VIDEO RECONSTRUCTION UNDER LENS FLARE
14777Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model
17059LEARNING WHAT TO HEAR: BOOSTING SOUND-SOURCE ASSOCIATION FOR ROBUST AUDIOVISUAL INSTANCE SEGMENTATION
13001LEARNING-ENHANCED DISTRIBUTIONALLY ROBUST ADAPTIVE BEAMFORMING
11950LEGAL∆: ENHANCING LEGAL REASONING IN LLMS VIA REINFORCEMENT LEARNING WITH CHAIN-OF-THOUGHT GUIDED INFORMATION GAIN
10393LEND A HAND: SEMI TRAINING-FREE CUED SPEECH RECOGNITION VIA MLLM-DRIVEN HAND MODELING FOR BARRIER-FREE COMMUNICATION
3998Length-Aware Rotary Position Embedding for Text-Speech Alignment
14650LENSLESSMIC: AUDIO ENCRYPTION AND AUTHENTICATION VIA LENSLESS COMPUTATIONAL IMAGING
10330LePER: Label‑Free Edge Polarity Reweighting for Heterophily
10524Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
1433LESS: LARGE LANGUAGE MODEL ENHANCED SEMI-SUPERVISED LEARNING FOR SPEECH FOUNDATIONAL MODELS USING IN-THE-WILD DATA
9804LET MORE EXPERTS SPEAK: BALANCING EXPLORATION AND EXPLOITATION IN PEFT FOR MIXTURE-OF-EXPERTS MODELS
12113LETP: COUPLING ATTENTION LOCALIZATION AND COGNITIVE REASONING FOR EGO-CENTRIC MULTI-TASK DRIVING SCENE PERCEPTION
3490LETPAV: LEXICON-ENHANCED TEXT WITH PROGRESSIVE AUDIO-VISUAL FUSION FOR MULTIMODAL SENTIMENT ANALYSIS
5612LEVERAGING AUDIO-VISUAL DATA TO REDUCE THE MULTILINGUAL GAP IN SELF-SUPERVISED SPEECH MODELS
19149Leveraging Content and Acoustic Representations for Speech Emotion Recognition
15030LEVERAGING DIFFUSION U-NET FEATURES FOR PREDOMINANT INSTRUMENT RECOGNITION
16295LEVERAGING LABEL PROPORTION PRIOR FOR CLASS-IMBALANCED SEMI-SUPERVISED LEARNING
15806LEVERAGING LARGE LANGUAGE MODELS FOR TEXT NORMALIZATION OF NON-STANDARD WORDS IN TEXT-TO-SPEECH SYNTHESIS
17112LEVERAGING LARGE MULTIMODAL MODELS FOR AUDIO-VIDEO DEEPFAKE DETECTION: A PILOT STUDY
14001LEVERAGING LARGE SPEECH LANGUAGE MODELS AS EVALUATORS FOR EXPRESSIVE SPEECH
16384LEVERAGING MULTIPLE SPEECH ENHANCERS FOR NON-INTRUSIVE INTELLIGIBILITY PREDICTION FOR HEARING-IMPAIRED LISTENERS
6067LEVERAGING MULTI-SOURCE RETRIEVAL AND EXPERT FILTERING FOR LLM-BASED KNOWLEDGE GRAPH COMPLETION
16574LEVERAGING OVERFITTING FOR LOW-COMPLEXITY AND MODALITY-AGNOSTIC JOINT SOURCE-CHANNEL CODING
6137LEVERAGING POINT TRANSFORMER FOR 3D HUMAN MESH RECONSTRUCTION WITH INCOMPLETE POINT CLOUD
14365LEVERAGING PREDICTION ENTROPY FOR AUTOMATIC PROMPT WEIGHTING IN ZERO-SHOT AUDIO-LANGUAGE CLASSIFICATION
14690LEVERAGING SEGMENT-LEVEL SPEECH REPRESENTATIONS FOR LLM-BASED SPEECH RECOGNITION
4574LEVERAGING SEMANTIC-AWARE COLLABORATION BETWEEN PLM AND LLM IN DATA AUGMENTATION FOR ENTITY-RELATIONSHIP EXTRACTION
17256LEVERAGING SPEAKER AND LISTENER PERSONALITIES AND THEIR INTERACTIONS FOR SPEECH EMOTION RECOGNITION
13868LEVERAGING WHISPER EMBEDDINGS FOR AUDIO-BASED LYRICS MATCHING
11538LExTra: Folded Prompt and Split-Role Attention for Target Speaker Extraction
12423LFMIM: Low-Frequency Enhanced Reconstruction for Few-Shot SAR-ATR
15317LGF-Net: A Local-Global Feature Learning Framework for Intrusion Detection
7042LGFNet: Local Correlation and Global Context Fusion for Multivariate Time Series Forecasting
16123LG-STAFNET: EMOTION RECOGNITION IN AI-GENERATED MUSIC VIA LOCAL-GLOBAL SPATIO-TEMPORAL EEG FEATURE FUSION
6754LGTNET:A DUAL-BRANCH MICRO-EXPRESSION RECOGNITION NETWORK WITH GROUPED CHANNEL ATTENTION AND DEFORMABLE WINDOWS
4465LIBEMER: A NOVEL BENCHMARK AND ALGORITHMS LIBRARY FOR EEG-BASED MULTIMODAL EMOTION RECOGNITION
13100LiDAR-based Human Activity Recognition through Laplacian Spectral Analysis
6775Lie Bracket Geometry of Feature Learning in Neural Networks
6093LIFT: A QUALITY-AWARE DATA SELECTION FRAMEWORK FOR LOW-RESOURCE MACHINE TRANSLATION
15773Light Field Image Super-Resolution with Multi-Scale Context Aggregation Mamba
17287LIGHTCSEG: LIGHTWEIGHT CRACK SEGMENTATION NETWORK WITH ADAPTIVE SOBEL AND LOCAL ENHANCEMENT
3213LIGHTOL: A LIGHTWEIGHT ONTOLOGY LEARNING FRAMEWORK WITH LARGE LANGUAGE MODELS
11660LIGHTPONZI: EFFICIENT MULTIMODAL DETECTION OF PONZI SCHEMES IN ETHEREUM SMART CONTRACTS
14053Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation
11244LIGHTWEIGHT AND PERCEPTUALLY-GUIDED VOICE CONVERSION FOR ELECTRO-LARYNGEAL SPEECH
16819LIGHTWEIGHT CHROMATIC-AWARE WHITE BALANCE VIA STATE-SPACE MODELING
17356LIGHTWEIGHT IMAGE SUPER-RESOLUTION VIA EFFICIENT SHIFT CONVOLUTION AND EDGE-ENHANCED ATTENTION
11639LIGHTWEIGHT IMPLICIT NEURAL NETWORK FOR BINAURAL AUDIO SYNTHESIS
14719Lightweight Multitask-Oriented Semantic Communication via Foundational Knowledge Distillation
16664Lightweight Phoneme-Conditioned Bandwidth Extension for Body-Conducted Speech
16943LIGHTWEIGHT RGB-T TRACKING WITH MOBILE VISION TRANSFORMERS
11576LIKQA: Lightweight Image Data Quality Assessment via Iterative Optimization and KAN-Based Models
18924LIMITATIONS OF DATA-DRIVEN SPECTRAL RECONSTRUCTION: AN OPTICS-AWARE ANALYSIS
18957Linear Convergence of Plug-and-Play Algorithms With Kernel Denoisers
3639Linear Cross-Attention Guided Feature Pyramid Networks for Crowd Counting
14067LINGOMETER: ON-DEVICE PERSONAL SPEECH WORD COUNTING SYSTEM
13170LINGUARD: AUTHENTICATING SPEECH RECORDINGS USING SPEECH RECOGNITION AND WATERMARK
16323LIPSAM: LIPSCHITZ-CONTINUOUS AMPLITUDE MODIFIER FOR AUDIO SIGNAL PROCESSING AND ITS APPLICATION TO PLUG-AND-PLAY DEREVERBERATION
11484LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency
7339Lisa: Lightweight Yet Superb Neural Speech Coding
11912LISTEN, BUT DON'T LEAK: SENSITIVE DATA PROTECTION FOR PRIVACY AWARE AUTOMATIC SPEECH RECOGNITION WITH ACOUSTIC TRIGGERS
16979Listening to UAV: 3D Trajectory Estimation via Acoustic Transformer
15022LiteEngine: Lightweight Low-precision Inference Engine for Efficient DNN Inference
4472LIVE4D: DECOUPLED OBJECT AND SCENE MODELING VIA DOUBLE-BRANCH 4D DIFFUSION
7129LLAC: LEARNED LOSSLESS AUDIO CODEC
13723LLM BASED EDGE-ASSISTED UAV INFERENCE AGAINST JAMMING
2042LLM4RIM: Leveraging Large Language Model for Radar-based In-vehicle Monitoring
9671LLM-ADAPTIVE REASONING CLUSTERING FOR AUTOMATED CHAIN-OF-THOUGHT PROMPTING
5369LLMBA: ADAPTING LARGE LANGUAGE MODELS FOR BEHAVIOR ANALYTICS IN ZERO TRUST NETWORKS
15975LLM-BASED POST-ASR ERROR CORRECTION FOR DISORDERED SPEECH
3915LLM-DRIVEN KNOWLEDGE GRAPH ENCODING FOR FINANCIAL RISK
16436LLM-Driven Scenario-Aware Planning for Autonomous Driving
6244LLMEKEREC: EXPLAINABLE RECOMMENDATION VIA KNOWLEDGE GRAPH PATH REASONING WITH LLMS
10181LLM-Guided Hierarchical Reinforcement Learning for Black-Box Adversarial Attacks Against Malware Detectors
13232LLM-GUIDED SAM FOR LEFT VENTRICLE SEGMENTATION AND FUNCTIONAL ANALYSIS IN ECHOCARDIOGRAPHY
1418LLMPopcorn: Exploring LLMs as Assistants for Popular Micro-video Generation
13813LLMs Cannot Reliably Generate Architectural Design Images (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
12523LLMS DO NOT ALWAYS SAY NO: A PARALLEL MULTI-AGENT JAILBREAK FRAMEWORK VIA PROBABILISTIC VULNERABILITY
17522LMM-ENHANCED MULTIMODAL SEQUENTIAL RECOMMENDATION USING CONSISTENCY-GUIDED HYBRID ATTENTION
4681LMOSA-STEREO: LIGHTWEIGHT STEREO MATCHING WITH MIXTURE-OF-SCENE AND GEOMETRIC SPATIAL ATTENTION
16575LMS-WHISPER: EFFICIENT LIGHTWEIGHT WHISPER FOR MULTI-STUTTER SPEECH CLASSIFICATION
6325Local Rate Analysis of Scaled Gradient Descent for Matrix Completion
8425LOCALEDIT: COMPLEX VIDEO LOCAL EDITING WITH MASK-GUIDED DIFFUSION INPAINTING
11575LOCALIZATION OF A CONSTANT VELOCITY MOVING RIGID BODY IN 2-D BY SUCCESSIVE TOA MEASUREMENTS
12246LOCALIZING SPEECH DEEPFAKES BEYOND TRANSITIONS VIA SEGMENT-AWARE LEARNING
13140LOFEMECHO: RESOURCE-EFFICIENT AND SCALABLE ECHOCARDIOGRAPHIC CARDIAC FUNCTION ASSESSMENT
12343LOG ANOMALY DETECTION VIA HYBRID STATE-SPACE RECURRENT ENCODING AND META-CONTRASTIVE LEARNING
15705Logic-ORiented Retriever Enhancement via Contrastive Learning
1030LOGIX: LOCAL-GLOBAL MIXERS FOR TIME SERIES REPRESENTATION LEARNING
2188LOGPTR: VARIABLE-AWARE LOG PARSING WITH POINTER NETWORK
15585LONG CHAIN-OF-THOUGHT COMPRESSION VIA FINE-GRAINED GROUP POLICY OPTIMIZATION
15526LONGSPEECH: A SCALABLE BENCHMARK FOR TRANSCRIPTION, TRANSLATION AND UNDERSTANDING IN LONG SPEECH
3041LONG-TAILED TIME SERIES CLASSIFICATION WITH NOISY LABELS
4744Look, Listen and Segment: Towards Weakly Supervised Audio-visual Semantic Segmentation
19133LOOKING AROUND FLATLAND: END-TO-END 2D REAL-TIME NLOS IMAGING
12842LOOSE COUPLING OF SPECTRAL AND SPATIAL MODELS FOR MULTI-CHANNEL DIARIZATION AND ENHANCEMENT OF MEETINGS IN DYNAMIC ENVIRONMENTS
15492LORA-ENHANCED DYNAMICS: A STRONG BASELINE FOR TRANSFERABLE PERSON RE-IDENTIFICATION ADVERSARIAL ATTACK
3261LOSS-ONLY KNOWLEDGE TRANSFER: SIMILARITY-GUIDED LOSS WITH LLM PRIORS
9718Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching
13079LOTUSDIS: A THAI FAR-FIELD MEETING CORPUS FOR ROBUST CONVERSATIONAL ASR
5372LOW RANK QUANTIZATION ADAPTATION FOR LARGE LANGUAGE MODEL
17245Low-Bandwidth High-Fidelity Speech Transmission With Generative Latent Joint Source-Channel Coding
2852LOW-COMPUTATION DETECTION METHOD FOR UNKOWN LFM SIGNALS BELOW NOISE FLOOR
12317LOW-FREQUENCY HARMONIC CONTROL FOR SPEECH INTELLIGIBILITY IN OPEN-EAR HEADPHONES
4942LOW-LATENCY AUDIO FRONT-END REGION-OF-INTEREST BEAMFORMING FOR SMART GLASSES
18064LOW-LEVEL CONTINUAL TEST-TIME ADAPTATION FOR IMAGE RESTORATION
14467LOW-POWER END-TO-END COCHLEAR IMPLANT SPEECH DENOISING WITH SPIKING NEURAL NETWORKS
13013LOW-RANK AND SPARSE MODEL MERGING FOR MULTI-LINGUAL SPEECH RECOGNITION AND TRANSLATION
18904Low-Rank Covariance Matrix Recovery From Rank-One Measurements: An Analytical Solution
7655LOW-RANK SYMMETRIC INFORMATION BOTTLENECKS IN MULTI-VIEW SUBSPACE CLUSTERING
6423LOW-RANK WEIGHTED AMPLITUDE AND PHASE FUSION FOR CSI-FINGERPRINT LOCALIZATION
13346LOW-RESOURCE GUIDANCE FOR CONTROLLABLE LATENT AUDIO DIFFUSION
11195LOW-RESOURCE IN-CAR INFANT DETECTION USING IR-UWB RADAR
2320LOW-RESOURCE SPEECH-BASED EARLY ALZHEIMER’S DETECTION VIA CROSS-LINGUAL AND FEW-SHOT TRANSFER LEARNING
12734LP-CFM: PERCEPTUAL INVARIANCE-AWARE CONDITIONAL FLOW MATCHING FOR SPEECH MODELING
12969LPCVAE: A CONDITIONAL VAE WITH LONG-TERM DEPENDENCY AND PROBABILISTIC TIME-FREQUENCY FUSION FOR TIME SERIES ANOMALY DETECTION
3019LSAFE: Edge-Guided Lightweight Network for Remote Sensing Salient Object Detection via Dynamic Multi-Scale Fusion
4297LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning
14241LTS-GS: LOCAL TEMPORAL SLICES FOR ADAPTIVE DYNAMIC 3D GAUSSIAN SPLATTING
14949LUMIDIFF: LUMINANCE-PRIOR GUIDED DIFFUSION FOR PERCEPTUALLY BALANCED LOW-LIGHT IMAGE SIGNAL RECOVERY
7554LUSEEL: LANGUAGE-QUERIED BINAURAL UNIVERSAL SOUND EVENT EXTRACTION AND LOCALIZATION
7793LVC: A LIGHTWEIGHT COMPRESSION FRAMEWORK FOR ENHANCING VLMS IN LONG VIDEO UNDERSTANDING
5132LVD-GS: GAUSSIAN SPLATTING SLAM FOR DYNAMIC SCENES VIA HIERARCHICAL EXPLICIT-IMPLICIT REPRESENTATION COLLABORATION RENDERING
10158LYAPUNOV-CONSTRAINED INTEGRAL REINFORCEMENT LEARNING FOR STABLE ADMITTANCE CONTROL IN NON-RIGID ENVIRONMENTS
16470LYTIMET: TOWARDS ROBUST AND INTERPRETABLE STATE-VARIABLE DISCOVERY
13462M2DP: A MULTI-SCALE ASSOCIATION LEARNING FRAMEWORK FOR MULTI-CATEGORY DEMAND PREDICTION UNDER PUBLIC HEALTH EMERGENCIES
11390M2EA: MULTI-VAE MANIFOLD ENVELOPE ALIGNMENT FOR CHALLENGING AI-GENERATED IMAGE DETECTION
11685M2FNET: MULTI-LEVEL MODALITY-FUSED NETWORK FOR ROBUST FINGERPRINT AND FINGER VEIN RECOGNITION
5624M²-PROTOLLM: AN LLM-DRIVEN FRAMEWORK FOR WHITELIST RULE EXTRACTION IN ICS PROTOCOLS
9597M2REG: UNSUPERVISED MULTI-SCALE REGISTRATION FOR MULTIMODAL MICROSCOPY
10313M2TRACKFORMER: TRANSFORMER-BASED MMWAVE TRACKING WITH LOST TARGET RE-ACQUISITION CAPABILITY
5345M3FEND: MULTI-MODAL MIXTURE OF EXPERTS WITH ADVERSARIAL GATING FOR MULTI-MODAL FAKE NEWS DETECTION
11000M3GQA: A MULTIMODAL MULTI-HOP AND KNOWLEDGE GRAPH-BASED FRAMEWORK FOR QUESTION ANSWERING
3924M3NET: MULTIVARIATE TIME SERIES CLASSIFICATION VIA MULTISCALE PATCHING AND CHANNEL MIXING
18989M4SER: MULTIMODAL, MULTIREPRESENTATION, MULTITASK, AND MULTISTRATEGY LEARNING FOR SPEECH EMOTION RECOGNITION
11374MAC-SAM: Mask-Aware Category-Guided Segment Anything Model for Interactive Image Segmentation
15346MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation without Vector Quantization
13458MAGE: A COARSE-TO-FINE SPEECH ENHANCER WITH MASKED GENERATIVE MODEL
12258MAGE-KT: Multi-Agent Graph-Enhanced Knowledge Tracing with Subgraph Retrieval and Asymmetric Fusion
14843MAGF-UIENET: A MULTISCALE ATTENTION GUIDED FUSION NETWORK FOR UNDERWATER IMAGE ENHANCEMENT
4596MAGICITY4D: CONTROLLABLE AND EDITABLE 4D CITY SCENE GENERATION USING MLLM-ENHANCED PROCEDURAL CONTENT GENERATION
3004Magnet Tracking by a Magnetic Sensor Array with Interactive Multiple Model Estimation for Small-Scale Applications
15803MAGNITUDE DIFFERENCE CONDITIONED ALL-IN-ONE IMAGE RESTORATION
17164MaHaWave-Net: A Lightweight Multi-Scale Model for Fine-Grained Medical Image Segmentation
4367MAIA: A MULTIDIMENSIONAL BENCHMARK FOR ASSESSING MEDICAL AI AGENTS
17278MAKE A GAME: A NOVEL PARADIGM FOR INTERACTIVE GAME RENDERING
3956Make Your MoVe: Make Your 3D Contents by Adapting Multi-View Diffusion Models to External Editing
14568MAKING DIALOGUE GROUNDING DATA RICH: A THREE-TIER DATA SYNTHESIS FRAMEWORK FOR GENERALIZED REFERRING EXPRESSION COMPREHENSION
12910MALEFA: MULTI-GRANULARITY LEARNING AND EFFECTIVE FALSE ALARM SUPPRESSION FOR ZERO-SHOT KEYWORD SPOTTING
14779MA-MAE3D: MEMORY AUGMENTED-BASED MAE3D NETWORK FOR POINT CLOUD COMPLETION
13965Mamba-Based Encoder-Decoder for Multi-Scale Feature Fusion in Remote Sensing Object Detection
10752MambaFormer: State-Space Augmented Self-Attention with Down–Up Sampling for Monaural Speech Enhancement
1516MambaHDR: Ghost-free High Dynamic Range Imaging with State-Space Model
7335MAMBA-VP: MULTIMODAL VIEWPORT PREDICTION VIA TRAJECTORY-FILTERED TEMPORAL MULTI-SCALE AND VISUAL SPATIOTEMPORAL SCANNING
13612MANGAVOX: DATASET OF ACTED VOICES ALIGNED WITH MANGA IMAGES TOWARDS COMPUTER UNDERSTANDING OF AUDIO COMICS
16116Manifold-Optimization-Based 3D Sound Source Mapping with Unknown Camera-Microphone Array Relative Pose
17881ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance
16139MAPD-MAMBA:MODALITY-ADAPTIVE PERCEPTION-DRIVEN MAMBA FUSION NETWORK
6419MAPEX: A MULTI-AGENT PIPELINE FOR KEYPHRASE EXTRACTION
10086MAPROUTE-BENCH: EVALUATING SPATIAL REASONING ON TOP-VIEW MAPS IN VISION-LANGUAGE MODELS
16497MAR: EFFICIENT LARGE LANGUAGE MODELS VIA MODULE-AWARE ARCHITECTURE REFINEMENT
4789MARCO-VOICE: A UNIFIED FRAMEWORK FOR EXPRESSIVE SPEECH SYNTHESIS WITH VOICE CLONING
16252MARE: Multi-Agent Role Embedding for Role-Consistent Generation in Multi-Agent Systems
2588MARITIME INFRARED SMALL-TARGET DETECTION VIA LOCAL CONTRAST MEASUREMENT WITH A NOVEL WINDOW AND CENTRAL DIRECTIONAL CONSISTENCY
13282Marking the Margin: Robust DNN watermarking against Removal Attacks via sculpting decision boundaries
10616MARKSWEEP: A NO-BOX REMOVAL ATTACK ON AI-GENERATED IMAGE WATERMARKING VIA NOISE INTENSIFICATION AND FREQUENCY-AWARE DENOISING
4780MASA: Query-Free Black-Box Adversarial Attack on Text-to-Image Generation via Multi-modal Adaptive Semantic Optimization
13929MASDROID: A Multi-Agent System for Enhancing the Analysis of Android Malware
14850MaskDiff-Traj: A UNIFIED TRAJECTORY IMPUTATION AND GENERATION FRAMEWORK VIA PATTERN-GUIDED MASKED DIFFUSION
13935MASKED PROJECTION MODELLING FOR SPARSE-VIEW CRYO-EM RECONSTRUCTION
4556MASK-FREE THANGKA RESTORATION VIA RETRIEVAL-GUIDED DIFFUSION WITH SEMANTIC AND STRUCTURAL ALIGNMENT
15905Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
11353MASK-GUIDED BACKTRACK DECODING: ENABLING SELF-CORRECTION IN LLM REASONING
2952MASKVCT: MASKED VOICE CODEC TRANSFORMER FOR ZERO-SHOT VOICE CONVERSION WITH INCREASED CONTROLLABILITY VIA MULTIPLE GUIDANCES
16095MASSIVE MIMO WITH FEWER RF CHAINS USING SIGMA-DELTA RIS
11496MASTER-ASSISTED DISTRIBUTED UPLINK OPERATION FOR CELL-FREE MASSIVE MIMO NETWORKS
2440MATCHGAUSSIAN: ADAPTIVE DENSIFICATION WITH MATCHING PRIORS FOR GENERALIZABLE GAUSSIAN SPLATTING
16068MATCHING REVERBERANT SPEECH THROUGH LEARNED ACOUSTIC EMBEDDINGS AND FEEDBACK DELAY NETWORKS
3974MATCHMIX: PHASE-MATCHED FRAME MIXING FOR TEMPORALLY CONSISTENT MOTION AUGMENTATION
4381MATE: MATRYOSHKA AUDIO-TEXT EMBEDDINGS FOR OPEN-VOCABULARY KEYWORD SPOTTING
12558MATHHALU: A BENCHMARK FOR MATHEMATICAL REASONING PROCESS HALLUCINATION DETECTION IN LARGE REASONING MODELS
1959MATHPHYS-GUIDED COARSE-TO-FINE ANOMALY SYNTHESIS WITH SQE-DRIVEN BI-LEVEL OPTIMIZATION FOR ANOMALY DETECTION
12121MATRIX-STRUCTURED HIERARCHICAL CONVOLUTIONAL MODELING FOR PRONUNCIATION ASSESSMENT AND MISPRONUNCIATION DETECTION
11774MATTER: Multiscale Attention for Registration Error Regression
16791MAVNET: DEEP LEARNING-BASED FMCW RADAR FRAMEWORK FOR MOTION-RESILIENT VITAL SIGN MONITORING ON PYNQ SOC
8322MAXIMIZING SECURE ENERGY EFFICIENCY IN UAV-ASSISTED BACKSCATTERING NETWORKS USING DEEP REINFORCEMENT LEARNING
17038MAXIMUM ENTROPY-BASED EFFICIENT FUZZY GRAPH CLUSTERING
17247Maximum Likelihood Measurement Noise Estimation for Block-Time Domain Kalman Filters
12944MCF: Text LLMs for Multimodal Emotional Causality
3481MCF-Net: A Mamba-based Efficient Network for Radar Jamming Recognition
4315M-CGL: MAMBA-ENHANCED CONCEPT GUIDED LEARNING FOR FINE-GRAINED IMAGE CLASSIFICATION
13051MCI-OTFusion: A multimodal model for MCI detection and cognitive score prediction
12114MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism
12610MCMC method with integrated adiabatic modes model for range-dependent matched-field geoacoustic inversion
13260MC-MRX: REFERENCE- AND MIDI-GUIDED MUSIC SOURCE EXTRACTION WITH CONTRASTIVE LEARNING
16434mCoT-VLA: Towards Robust Vision–Language–Action Models via Multimodal Chain-of-Thought
10791MCPO: DYNAMIC MASKING AND MULTI-COMPARISON POLICY OPTIMIZATION ALGORITHM FOR LLM REINFORCEMENT LEARNING
11377MDBoost: A Multi-Dimensional Reweighting Framework for Robust Gradient Boosting
15500MDFDet: A Multi-modal Dynamic Fusion Algorithm for RGB-Infrared Object Detection
3300MDPO: MULTI-DIMENSIONAL LABEL ENHANCED DIRECT PREFERENCE OPTIMIZATION FOR EFFICIENT MULTIMODAL LLM FINE-TUNING
2598MDSF-DET: MODALITY DECOUPLING AND SYNERGISTIC FUSION DETECTOR
5752MEAN-FIELD-ENABLED ROBUST ANTI-JAMMING TRANSIMISSION FOR LARGE-SCALE AERIAL RIS NETWORKS
4592MEANFLOW-ACCELERATED MULTIMODAL VIDEO-TO-AUDIO SYNTHESIS VIA ONE-STEP GENERATION
2705MEANFLOWSE: ONE-STEP GENERATIVE SPEECH ENHANCEMENT VIA CONDITIONAL MEAN FLOW
4902MEANSE: EFFICIENT GENERATIVE SPEECH ENHANCEMENT WITH MEAN FLOWS
1704MEANVC: LIGHTWEIGHT AND STREAMING ZERO-SHOT VOICE CONVERSION VIA MEAN FLOWS
11590MEANVOICEFLOW: ONE-STEP NONPARALLEL VOICE CONVERSION WITH MEAN FLOWS
6236MEASURE-TRANSFORMED PRINCIPAL COMPONENT ANALYSIS
12453MEASURING AND REDUCING INTRINSIC BOUNDARY NOISE IN TEMPORAL ACTION SEGMENTATION
2081MEASURING PROSODY DIVERSITY IN ZERO-SHOT TTS: A NEW METRIC, BENCHMARKING, AND EXPLORATION
15472MEBM: Exploring the Synergy of Mixture of Experts in Background Matting
3638MECAP-R1: EMOTION-AWARE POLICY WITH REINFORCEMENT LEARNING FOR MULTIMODAL EMOTION CAPTIONING
4543Medical Federated Learning under Long-Tailed and Non-IID Distributions
10355MedSpeak: A Knowledge Graph-Aided ASR Error Correction Framework for Spoken Medical QA
16062MEGATEMPQA: A MILLION-SCALE TEMPORAL QUESTION-ANSWER DATASET FOR REDUCING LLM HALLUCINATIONS
12826MEIE:A PROMPT-DRIVEN FRAMEWORK FOR MIXED-EXPOSURE IMAGE ENHANCEMENT WITH ADAPTIVE 3D LUTS
11483MELA-TTS: JOINT TRANSFORMER-DIFFUSION MODEL WITH REPRESENTATION ALIGNMENT FOR SPEECH SYNTHESIS
10888MELT: IMPROVE COMPOSED IMAGE RETRIEVAL VIA THE MODIFICATION FREQUENTATION-RARITY BALANCE NETWORK
10881MEM4TEETH: Memory-Guided Point Cloud Completion for Dental Reconstruction
17815Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation
2382MemFormer: Memory-enhanced Transformer with Multi-task Learning for Video Anomaly Detection
10711MEMORY FOOTPRINT IMAGES: A U-NET APPROACH FOR ADVANCED CACHE PREFETCHING
9761MEMORY-EVOLUTION AND REFLECTION-AUGMENTED AGENTS
17147MEMORYPROMPT:MEMORY-AUGMENTEDMULTI-LAYERPROMPTINGFORVISION-LANGUAGEMODELS
15361Meow: End-to-End Outline Writing for Automatic Academic Survey
10858MERLINet: Multi-Exposure Reflection Elimination Network for Real-World Scenes
4164MESHRF: RESIDUAL FUSION OF VERTICES, EDGES, AND FACES FOR MESH UNDERSTANDING
1513MESSAGE PASSING-BASED PARALLEL MULTI-TARGET JOINT DETECTION AND ESTIMATION IN DISTRIBUTED PASSIVE MIMO RADAR
9693Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making
12811META-REINFORCEMENT LEARNING WITH CONTEXTUAL BIAS REDUCTION
14624MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning
5312METRA: Robust Encrypted Traffic Detection Against Adversarial Attacks via Multi-Task Learning and Label Denoising
16749MEVAR: MOBILITY-ENHANCED VEHICLE TRAJECTORY RECONSTRUCTION FROM CAMERA SENSING NETWORKS
9025MFA-Align: Aligning by Disagreeing for Efficient and Low-Cost Personalized Alignment of Aesthetic VLLMs
5778MFF-Net:Image Manipulation Localization Method Based on Multi-scale Feature Fusion Network
5597MFF-RVRDI: MULTIMODAL FUSION FRAMEWORK FOR ROBUST VIDEO RECORDING DEVICE IDENTIFICATION
5062MFM-DETR: MISALIGNMENT-FREE MULTIMODAL DETR FOR MARITIME OBJECT DETECTION
11393MGHFED: ENHANCING HETEROGENEOUS SUBGRAPH FEDERATED LEARNING THROUGH ADVERSARIAL META-PATH GENERATION
10780MH-A3: Metropolis-Hastings Anomaly-Aware Augmentation for Contrastive Graph Anomaly Detection
11106MHPTRACK: EFFICIENT MULTIMODAL HISTORICAL PROMPTING FOR AERIAL TARGET TRACKING
18888Micro-Image Domain View Synthesizer for Free Navigation With Focused Plenoptic Cameras
7932MICROPHONE-LESS MEASUREMENT OF THREE-DIMENSIONAL RADIATING IMPULSE RESPONSE OF SOUND SOURCE USING SPHERICAL HARMONIC-DOMAIN ACOUSTO-OPTIC TOMOGRAPHY
15035MIDAS: A Dynamic Cross-GPU KV Cache Offloading Framework For LLM On GPU Cluster Systems
4159MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding
1004MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model
17732MIHT-NET: A DEEP-UNROLLED FRAMEWORK FOR SPARSE SIGNAL RECOVERY
6033MILORE-SSL: SCALING MULTILINGUAL CAPABILITIES IN SELF-SUPERVISED MODELS WITHOUT FORGETTING
16770MIMO Array Calibration in Non-stationary Channels with Residual Surfaces and Slepian Spherical Harmonics
10219MIND THE GAP: DATA REWRITING FOR STABLE OFF-POLICY SUPERVISED FINE-TUNING
10644MIND THE NOISE, ALIGN THE FINE: CONFIDENCE-AWARE MASKED IMAGE MODELING FOR TEXT-BASED PERSON RE-IDENTIFICATION
13993MIND THE SHIFT: USING DELTA SSL EMBEDDINGS TO ENHANCE CHILD ASR
14275MIND YOUR [m]S, CROSS YOUR [t]S: A LARGE-SCALE PHONEMIC ANALYSIS OF SPEECH REPRODUCTION IN MODERN SPEECH GENERATORS
1402MINIMIZATION OF NONSMOOTH WEAKLY CONVEX FUNCTION OVER PROX-REGULAR SET FOR ROBUST LOW-RANK MATRIX RECOVERY
14124MINIMIZING ADC PRECISION FOR ANALOG IN-MEMORY COMPUTING
14826MI-PRUN: OPTIMIZE LARGE LANGUAGE MODEL PRUNING VIA MUTUAL INFORMATION
9648MIRAGE: NOISE-AWARE BAYESIAN CALIBRATION WITH MUTUAL INFORMATION FOR RELIABLE RAG
12414MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
15237MIRRORTALK: FORGING PERSONALIZED AVATARS VIA DISENTANGLED STYLE AND HIERARCHICAL MOTION CONTROL
12465MISA: MULTI-STAGE INTERACTIVE SELF-ATTENTION FOR CONSISTENT SUBJECT-DRIVEN TEXT-TO-IMAGE GENERATION
17688Misclassification Rates and Privacy-Utility Trade-offs in Graph Convolutional Networks via Subsampling Stability
10201MISPRONUNCIATION DETECTION AND DIAGNOSIS WITHOUT MODEL TRAINING: A RETRIEVAL-BASED APPROACH
6261MISSPECIFIED CRAMÉR-RAO BOUNDS ON SNR ESTIMATION
14366MIST: Micro-Image Shuffling Tool for Codec-Agnostic Plenoptic Video Compression
12267MISTA: Compact Multi-Identity Structure-aware Tensorized Avatars
16462MITA: A HIERARCHICAL MULTI-AGENT COLLABORATION FRAMEWORK WITH MEMORY-INTEGRATED AND TASK ALLOCATION
8026MITIGATING ATTENTION SINKS AND MASSIVE ACTIVATIONS IN AUDIO-VISUAL SPEECH RECOGNITION WITH LLMS
14219MITIGATING DATA REPLICATION IN TEXT-TO-AUDIO GENERATIVE DIFFUSION MODELS THROUGH ANTI-MEMORIZATION GUIDANCE
13657MITIGATING DECEPTIVE KNOWLEDGE EDITING IN LLMS VIA DIFFUSION SYNTHESIS
10249MITIGATING DOMAIN SHIFT IN ULTRASONIC WAVEFIELD PATTERN ANALYSIS THROUGH TEST-TIME TRAINING
17709Mitigating entity bias in Relation Extraction with Pair-Training
3893MITIGATING FALSE ALARMS IN OPEN-SET SPEAKER IDENTIFICATION WITH A DECOUPLED FRAMEWORK
2402MITIGATING HALLUCINATION IN FINANCIAL RETRIEVAL-AGUMENTED GENERATION VIA FINE-GRAINED KNOWLEDGE VERIFICATION
17609Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping
15511MITIGATING INTRA-SPEAKER VARIABILITY IN DIARIZATION WITH STYLE-CONTROLLABLE SPEECH AUGMENTATION
6133MITIGATING LANGUAGE PRIOR-INDUCED HALLUCINATIONS VIA BI-LEVEL CONTRASTIVE DECODING
16454MITIGATING OBJECT AND RELATIONSHIP HALLUCINATION IN LARGE VISION LANGUAGE MODEL WITH MULTI-AGENT GUIDANCE
13976MIX2MORPH: LEARNING SOUND MORPHING FROM NOISY MIXES
4206MIX-CLAP: ADAPTIVE FUSION OF KNOWLEDGE-DISTILLED AUDIO EMBEDDINGS FOR NOISE-AWARE AUDIO-LANGUAGE MODELS
19124Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm
1837MixGAN-based Non-blind Bandwidth Extension for Audio Codec
14889Mix-Persona Comment Generation for LLM Fine-Tuning in Multimodal Crisis Post Classification
2435MixStyle-Augmented Meta-Learning for Cross-Domain Infrared-Visible Image Fusion
6097MIXTURE OF EXPERTS FOR RECOGNIZING DEPRESSION FROM INTERVIEW AND READING TASKS
9545Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR
5140Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition
9973Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of moving talkers
5686MIXTURES OF LIGHTWEIGHT ARTICULATORY EXPERTS FOR MULTILINGUAL ASR
14710MLLM-EMPOWERED ACTIVE LEARNING WITH GENERATED ATTRIBUTES FOR MICROSCOPIC ALGAE IMAGE CLASSIFICATION
12520MMAUDIOSEP: TAMING VIDEO-TO-AUDIO GENERATIVE MODEL TOWARDS VIDEO/TEXT-QUERIED SOUND SEPARATION
11596MMC: Min-Max Calibration for Test-Time Prompt Tuning in Vision-Language Models
6208MMFast: Rethinking Vision-Language Interaction in Efffcient MLLMs
1202MMIndoor3D: Multi-View Multimodal 3D Indoor Scene Generation with Material Information
16236MMNAD: A GENERALIZED MULTI-SCENARIO ATTACK DETECTION METHOD FOR SOFTWARE DEFINED NETWORKING
13871MM-NO: Learning Physical Operators from Heterogeneous Data via Cross-Modal Attention Fusion
3278mmSRFormer: Efficient Transformer for Sparse mmWave Radar Point Cloud Super-Resolution
11402mmWave-Diffusion: A Novel Framework for Respiration Sensing Using Observation-Anchored Conditional Diffusion Model
14985MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
4244MÖBIUS FOURIER BASIS FOR DAGS WITH NONNEGATIVE EDGE WEIGHTS
5353MOC: Mamba-based Multi-Scale One-Class Time-Series Anomaly Detection
10523Modality-Aware Token Filtering and Common Feature Enhancement Network for Multi-modal Vehicle Re-Identification
1885Modality-Decoupled RGB-Thermal Object Detector via Query Fusion
7008MODEL EQUALITY TESTING OF BLACK-BOX LLM APIS VIA PREFIX TREE STATISTICS
11201MODEL SHALL KNOW IT: BACKDOOR ATTACKS ON IMAGE CAPTIONING MODELS BY TEXTURAL REPRESENTATIONS
15632MODELING AND INTEGRATION OF DYNAMIC METASURFACE ANTENNAS WITH CLUSTERED CHANNEL MODELS
17348MODELING BOTH INTRA- AND INTER-UTTERANCE VARIABILITY FOR CONVERSATIONAL EMOTION RECOGNITION
14659Modeling Inter-Segment Relationships in Speech for Dementia Detection with Audio Spectrogram Transformers and Graph Attention Networks
12777MODELING STRATEGIES FOR SPEECH ENHANCEMENT IN THE LATENT SPACE OF A NEURAL AUDIO CODEC
5207Modelling of a Marked Hawkes Process.
12813MODERN STRUCTURE-AWARE SIMPLICIAL SPATIOTEMPORAL NEURAL NETWORK
10183MODULARITY-FREE CONFLICT-AVERSE TRAINING FOR GENERALIZED PINNS
3314MODWKAN: HARNESSING MAXIMAL OVERLAP DISCRETE WAVELET TRANSFORM AND KAN FOR TIME SERIES FORECASTING
5789MoE-AMC: Enhancing Automatic Modulation Classification Using Mixture-of-Experts
11885MO-GRPO-MED: A MULTI-OBJECTIVE FRAMEWORK FOR GENERATING SAFE AND HIGH-QUALITY DISCHARGE INSTRUCTIONS
14410Moment-based Posterior Sampling for Multi-reference Alignment
13762MOMENTS MATTER: POSTERIOR RECOVERY IN POISSON DENOISING VIA LOG-NETWORKS
18561Mongoose: Do We Need Scanner for Vision Mamba?
9413MONOCULAR 3D FACE RECONSTRUCTION VIA COARSE-TO-FINE LANDMARK REPRESENTATION
19061Monotone Lipschitz-Gradient Denoiser: Explainability of Operator Regularization Approaches Free From Lipschitz Constant Control
13314Moral5D: A Five-Dimensional Human-centered Method for Evaluating and Enhancing LLM Moral Reasoning
14481MORE THAN A SHORTCUT: A HYPERBOLIC APPROACH TO EARLY-EXIT NETWORKS
6110MORE: MULTIMODAL RELATIONSHIP ENHANCEMENT FOR UNBIASED SCENE GRAPH GENERATION
13583MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
14529MOSA: MOTION-GUIDED SEMANTIC ALIGNMENT FOR DYNAMIC SCENE GRAPH GENERATION
12624MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding
7972MOTIONFLOW: TEXT-DRIVEN EMOTION-CONTROLLABLE HUMAN MOTION GENERATION VIA CONDITIONAL FLOW MATCHING
16880MotionFusion: Fusing Motion and Saliency for Fast Video Large Language Model Inference
17163MOTION-GUIDED SEMANTIC ALIGNMENT WITH NEGATIVE PROMPTS FOR ZERO-SHOT VIDEO ACTION RECOGNITION
1468MotionPLLM: LLM-Based Generator for Part-Level Controllable Human Motion in Quantized Latent Space
10017Mouthing-Enhanced Multimodal Hierarchical Contrastive Learning for Gloss-Free Sign Language Translation
14265MOVi: Training-free Text-conditioned Multi-Object Video Generation
10264MPDA: A MULTI-GRANULARITY PERTURBATION AND DUAL-FEATURE ANALYSIS FRAMEWORK FOR AI-GENERATED TEXT DETECTION
18870MPHR: A Robust Algorithm for Estimating Nodal Head in Water Networks
7667MPL-MOE: MULTI-MODAL PROMPT LEARNING WITH MIXTURE OF EXPERTS FOR MULTIVARIATE TIME SERIES FORECASTING
4883MP-MVSNET: MULTI-VIEW STEREO NETWORK GUIDED BY BOTH MONOCULAR FEATURE AND GEOMETRIC PRIORS
12939MPR: Memory Perturbation Regularization for Controllable Stability-Plasticity Balancing in Continual Trajectory Prediction
1205MRFHAR: WAVELET-BASED CONTRASTIVE LEARNING FOR HUMAN ACTIVITY RECOGNITION BY FUSING RFID AND WIFI SIGNALS
14146MR-FLOWDPO: MULTI-REWARD DIRECT PREFERENCE OPTIMIZATION FOR FLOW-MATCHING TEXT-TO-MUSIC GENERATION
8189MSAT: MULTI-SCALE SEMANTIC-ALIGNED TRANSFORMER FOR MULTI-LABEL IMAGE CLASSIFICATION
11803MSBENCH: CAN SPEECH LANGUAGE MODELS GENERATE MULTI-SPEAKER DIALOGUES IN ONE PASS?
5330MSCT : DIFFERENTIAL CROSS-MODAL ATTENTION FOR DEEPFAKE DETECTION
4539MSF-Mamba:Multi-Scale Frequency Mamba for Long-Term Time Series Forecasting
17622MSF-SER: ENRICHING ACOUSTIC MODELING WITH MULTI-GRANULARITY SEMANTICS FOR SPEECH EMOTION RECOGNITION
12084MSGCoOp: Multiple Semantic-Guided Context Optimization for Few-Shot Learning
5577MSGFF: TRANSPARENT WATERMARK DETECTION AND REMOVAL VIA MULTI-SCALE GRADIENT FEATURE FUSION
12668MSNAV: ZERO-SHOT VISION-AND-LANGUAGE NAVIGATION WITH DYNAMIC MEMORY AND LLM SPATIAL REASONING
6281MSP-REID: HAIRSTYLE-ROBUST CLOTH-CHANGING PERSON RE-IDENTIFICATION
13792MSTAR: CROSS-MODAL FUSION VIA MULTI-SOURCE REWARD MECHANISM FOR SPATIO-TEMPORAL AWARE REASONING
15200MSVS: MULTI-SHELL VIEWPOINT SAMPLING FOR COMPREHENSIVE EVALUATION OF 3D WATERMARKING
1672MTAD: A Three-Stage Framework for Machine Translation Agents Distillation
15282MTEDS: MEMORY AND TIME EFFICIENT SPECULATIVE DECODING WITH DYNAMIC SPARSITY AND BYPASS SCHEDULING
12280MT-HPDE: MULTIMODAL VISION TRANSFORMER FOR HAND POINT DIRECTION ESTIMATION USING ZERO-SHOT DIFFUSION SEGMENTATION
13859MT-HUBERT: SELF-SUPERVISED MIX-TRAINING FOR FEW-SHOT KEYWORD SPOTTING IN MIXED SPEECH
10618MTP-S2UT: ENHANCING SPEECH-TO-SPEECH TRANSLATION QUALITY WITH MULTI-TOKEN PREDICTION
4897MTRP: Diversely Enhancing Multi-Turn Dialogue and Role-Playing Abilities of Large Language Models
12595MTS-CR: Contrastive Representation Learning for Real-Time QoS Degradation Detection in Media Cloud Shared Instances
9796MTSearch-R1: Reinforcement Learning for Flexible Multi-Tool Search with Large Language Models
17119MUCO: MULTI-VIEW PATTERN REPRESENTATION LEARNING FOR SUBGRAPH COUNTING
5341MUGSQA: NOVEL MULTI-UNCERTAINTY-BASED GAUSSIAN SPLATTING QUALITY ASSESSMENT METHOD, DATASET, AND BENCHMARKS
18038MULTI LEVEL PATCH-WISE CONTRASTIVE SELF-SUPERVISED LEARNING WITH DYNAMIC SCALE-AWARE ATTENTION FOR AIRPORT OBJECT DETECTION
13823Multi Stage Training With Dynamic Data Balancing For Multilingual Speech Recognition and Translation
5574MULTI-AGENT BRAINSTORMING FOR INTERPRETING AND MITIGATING HALLUCINATION IN MULTIMODAL-LLM
10061Multi-Agent Deep Reinforcement Learning-Based IoV Secure Data Transmission
2477MULTI-AGENT DIAGNOSTIC COLLABORATION AND SEGMENTATION-AWARE RESIDUAL DECODING FOR HALLUCINATION-RESISTANT MEDICAL VQA
4309Multi-Agent Honeypot-Based Request-Response Context Dataset for Improved SQL Injection Detection Performance
3424MULTI-ANGLE VISUAL INFORMATION REPRESENTATION AND PROGRESSIVE ALIGNMENT NETWORK FOR JOINT MULTIMODAL ENTITY-RELATION EXTRACTION
15157Multiantenna Channel Map Prediction With Missing Location Information Using Contrastive Learning and Graph Neural Networks
18992MULTI-ATTRIBUTE GRAPH LEARNING FOR GEOSCIENCE APPLICATIONS
14697MULTI-BAND FREQUENCY PROMPT TUNING FOR SOURCE-FREE CROSS-DOMAIN FEW-SHOT LEARNING
14086Multibeam analog beamformer design for monostatic ISAC under Self-Interference
14484MULTI-BLOCK ALTERNATING GRADIENT DESCENT AND MINIMIZATION FOR L+S COLUMN-WISE COMPRESSED SENSING
12000MULTI-BRANCH COLLABORATIVE FEATURE PYRAMID NETWORK FOR SHORT-SPEECH SPEAKER VERIFICATION
3702MULTI-CHANNEL SPEECH ENHANCEMENT FOR COCKTAIL PARTY SPEECH EMOTION RECOGNITION
19102Multi-Channel Speech Enhancement Guided by Learning-based A Posteriori Speech Presence Probability Estimation
5666MULTI-COURSE INTEGRATION FRAMEWORK BASED ON SUBJECT KNOWLEDGE GRAPHS
13878Multi-Dictionary Learning for Low Rank Sparse Coding
17980MULTI-DIFFERENTIAL FEATURE INTERACTION NETWORK FOR IMAGE CHANGE CAPTIONING TOWARDS LOW-LIGHT REMOTE SENSING SCENARIOS
18887Multidimensional Polynomial Phase Estimation
6694Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
13384MULTI-DOMAIN SHORT VIDEO ANOMALY NEWS DETECTION
19119MULTIFACETED PRONUNCIATION FEEDBACK MODEL WITH INTERACTIVE HIERARCHICAL NEURAL MODELING
1678MULTI-GATE CONVOLUTIONAL NEURAL NETWORK FOR EFFICIENT SINGLE IMAGE SUPER-RESOLUTION
16725MULTI-GRANULARITY ATTRIBUTE PROMPT LEARNING FOR CLOTH-CHANGING PERSON RE-IDENTIFICATION
3225MULTI-GRANULARITY SCORE-BASED GENERATIVE FRAMEWORK ENABLES EFFICIENT INVERSE DESIGN OF COMPLEX ORGANICS
16540MULTI-HOP DEEP JOINT SOURCE-CHANNEL CODING WITH DEEP HASH DISTILLATION FOR SEMANTICALLY ALIGNED IMAGE RETRIEVAL
16356Multi-layer attentive probing improves transfer of audio representations for bioacoustics
17272MULTILINGUAL SUPERVISED PRETRAINING WITH LM-ASSISTED DECODING FOR VISUAL SPEECH RECOGNITION
7884MULTI-MODAL BASED POINT CLOUD GEOMETRY COMPRESSION
15226MULTIMODAL CO-TRAINING WITH SUBTRACTIVE UNLABELED-BENEFIT BOUNDS
6654MULTIMODAL DEEP LEARNING METHOD FOR REAL-TIME SPATIAL ROOM IMPULSE RESPONSE COMPUTING
11351MULTI-MODAL FAKE NEWS DETECTION VIA INTRA-CALIBRATED CROSS-MODAL FUSION AND MODALITY-WISE ATTENTION AGGREGATION
11980MULTIMODAL FUSION-BASED IPCLIP NETWORK FOR MIXED REALITY SURGICAL ASSISTANCE
7072MULTI-MODAL HIERARCHICAL FUSION WITH CROSS-AGENT FOR RGB-D SALIENT OBJECT DETECTION
11991Multimodal LLMs as Expert Speech Annotators: Acoustic Macro-Descriptors for Parkinson's Detection
17198MULTIMODAL MULTI-AGENT EMPOWERED LEGAL JUDGMENT PREDICTION
3880Multimodal Privacy-Preserving Entity Resolution with Fully Homomorphic Encryption
14425MULTIMODAL ROOM IMPULSE RESPONSE GENERATION THROUGH LATENT RECTIFIED FLOW MATCHING
1950MULTIMODAL SELF-ATTENTION NETWORK WITH TEMPORAL ALIGNMENT FOR AUDIO-VISUAL EMOTION RECOGNITION
10228Multimodal Sensing-Aided Beamforming Optimization for OFDM Systems
9687MULTIMODAL SPEAKER-LISTENER COUPLING DYNAMICS OF SPEECH, PHYSIOLOGY, AND EMOTIONS USING HRV AND ENTROPY ANALYSIS
13004MULTIMODAL TRANSFORMER WITH MULTIPERSPECTIVE TRAINING FOR PREDICTING SELF-EXPRESSION SKILLS FROM VIDEO INTERVIEW
16688MULTIMODAL VARIATIONAL GRAPH NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS
10688Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis
12829MULTI-OS: MULTIMODAL OOD SYNTHESIS ENHANCES OUT-OF-DISTRIBUTION DETECTION FOR VISION-LANGUAGE MODELS
5320MULTI-PATCH HIERARCHICAL ADAPTIVE STATE SPACE MODEL FOR REMOTE SENSING IMAGE DEHAZING
4655MULTI-PHYSICS: A COMPREHENSIVE BENCHMARK FOR MULTIMODAL LLMS REASONING ON CHINESE MULTI-SUBJECT PHYSICS PROBLEMS
5802Multi-Polynomial Phase Signal Parameter Estimation using Time-Frequency Decomposition and Time-Series Representations
9980Multi-Resolution Spectrograms Detection of LPI RADAR with Time-Frequency Attention augmented YOLO
8713MULTI-SCALE ADAPTIVE NEIGHBORHOOD AWARENESS TRANSFORMER FOR GRAPH FRAUD DETECTION
17630MULTI-SCALE AND MULTI-MODAL SELECTIVE FUSION FOR RGB-D VIDEO SALIENT OBJECT DETECTION
2167MULTI-SCALE FREQUENCY PERCEPTION ENABLED SELECTIVE STATE-SPACE AND FEATURE PYRAMID COLLABORATIVE METHOD FOR SMALL OBJECT DETECTION
16616Multi-scale Generative Modeling for Fast Sampling
2254MULTI-SCALE POSITIVITY GRAPH TRANSFORMER FOR FINE-GRAINED IMAGE RECOGNITION
1813MULTI-SCALE STATE SPACE MODELING FOR CROSS-MODAL INFRARED AND VISIBLE IMAGE FUSION
4233MULTI-SCALE TASK-AWARE EEG REPRESENTATION LEARNING FOR COGNITIVE STATE RECOGNITION
18153Multi-Source Domain Generalized Person Re-Identification with DualStyle Augmentation and Dynamic Memory classifier
16094MULTISOURCE LOCALIZATION USING MULTIMARGINAL OPTIMAL TRANSPORT
5106Multi-Source Transfer Learning and Field Extraction for Cross-Domain Protocol Reverse Engineering
14354MULTI-SPEAKER DOA ESTIMATION IN BINAURAL HEARING AIDS USING DEEP LEARNING AND SPEAKER COUNT FUSION
13620Multi-Stage Spatial Imagination and Fusion for Immersive Visual Text-to-Speech
1655MULTISYNERGY ATTACK: MULTIMODAL SYNERGISTIC ADVERSARIAL ATTACK FOR DEPTH ESTIMATION
12721MULTI-TASK LEARNING FOR SPEECH QUALITY ASSESSMENT USING ASR-DERIVED ENTROPY FEATURES
15167MULTITASK LEARNING WITH LEARNED TASK RELATIONSHIPS
11888MULTI-TASK TRANSFORMER FOR EXPLAINABLE SPEECH DEEPFAKE DETECTION VIA FORMANT MODELING
13651MULTI-TURN PHYSICS-INFORMED VISION-LANGUAGE MODEL FOR PHYSICS-GROUNDED ANOMALY DETECTION
11203Multi-User Channel Estimation with One-Bit ADCs: A Semi-Blind Approach
14123Multiverse Kernel Adaptive Filtering
4975MULTI-VIEW CROWD COUNTING WITH SELF-SUPERVISED LEARNING
10900MULTI-VIEW FREQUENCY ALIGNMENT AND STATE SPACE PARAMETER FUSION FOR LIGHTWEIGHT CAMOUFLAGED OBJECT DETECTION
19145MULTIVIEW GRAPH LEARNING WITH CONSENSUS GRAPH
13904MULTI-VIEW HIERARCHICAL HYPERGRAPH NEURAL NETWORK FOR AUTOMATIC STUTTERING DETECTION
3902MULTI-VIEW HYPERGRAPH-BASED CONTRASTIVE LEARNING FOR KNOWLEDGE TRACING
10054Multiview Progress Prediction of Robot Activities
17628MULTI-VIEW SPECTRAL CLUSTERING WITH ADAPTIVE REGRESSION
11997MUSETOK: SYMBOLIC MUSIC TOKENIZATION FOR GENERATION AND SEMANTIC UNDERSTANDING
14249MUSHRA–1S: A SCALABLE AND SENSITIVE TEST APPROACH FOR EVALUATING TOP-TIER SPEECH PROCESSING SYSTEMS
5471MusicDETR: A Position-aware Spectral Note Detection Model for Singing Transcription
16106MUSIC-GUIDED POINT-SCATTERER ATTENTION FOR SAR SUPER-RESOLUTION
15156MUSICRS: BENCHMARKING AUDIO-CENTRIC CONVERSATIONAL RECOMMENDATION
7809Mutual Information Regularized Weight Ensembles with Moving Average for Generalizable Re-identification
10296Mutual Information-Based Joint Phase and Rate Optimization for RIS-Aided Communication
17901MVGCD: MULTI-VIEW GRAPH FUSION NETWORK FOR GROUP COGNITIVE DIAGNOSIS
3383MVI: HIGH-RESOLUTION ROADSIDE VEHICLE IMAGING BY MMWAVE
17777MVIR: MULTI-VIEW VISUAL-SEMANTIC REPRESENTATION FOR FAKE NEWS DETECTION
6778MVP: MODELING VARIANTS OF PROMPTS FOR VISION-LANGUAGE MODELS
13229MVP-DIFF: MULTI-VIEW PRIORS LEARNING FOR DIFFUSION-BASED SINGLE-VIEW 3D POINT CLOUD RECONSTRUCTION
5328MWNET: MULTI-BRANCH WAVELET NETWORK FOR PHOTOVOLTAIC SEGMENTATION IN REMOTE SENSING IMAGES
17274N2CDrive: Negotiate to Cooperate for Multi-Agent Autonomous Driving via Large Vision-Language Model
11907NATIVETOK: NATIVE VISUAL TOKENIZATION FOR IMPROVED IMAGE GENERATION
17900NATURAL LANGUAGE TO SPATIAL AUDIO PARAMETERS: LIGHTWEIGHT DETERMINISTIC RENDERING FOR CREATIVE AUTHORING
11696Navigating Modality Uncertainty: Modality-Interaction Enhanced Mixture-of-Experts for Multi-Modal Knowledge Graph Completion
9542NCF-TTS: ENHANCING FLOW MATCHING BASED TEXT-TO-SPEECH WITH NEIGHBORHOOD CONSISTENCY FLOW
14379NEAR-FIELD CHANNEL ESTIMATION AND ENVIRONMENT MAPPING: LOCALIZATION OF REFLECTORS AND SCATTERERS
8125NEAR-FIELD CHANNEL ESTIMATION AND LOCALIZATION WITH COPRIME ARRAYS
16578NEAR-FIELD SWIPT USING MASSIVE PHASED MULTISINE ANTENNA ARRAY
13944NEAR-FIELD WIDEBAND BEAMFORMING FOR ISAC VIA ALGORITHM UNROLLING
2811NEAR-LIGHT COLOR PHOTOMETRIC STEREO FOR MONO-CHROMATICITY NON-LAMBERTIAN SURFACE
5033NEAR-OPTIMAL ONLINE GAIN CONTROL FOR MODULO ADCS
18322NEDGCN: HIGH-QUALITY SAMPLE SELECTION AND NOISE-TOLERANT GRAPH NEURAL NETWORK VIA DIFFERENTIATED EDGE WEIGHTING
11906Negative-Aware Routing Network with Adversarial Knowledge Injection for Efficient LLM Adaptation
9708NEGEV: NEGATIVE SAMPLE-AWARE FINE-TUNING FOR OPEN-VOCABULARY OBJECT DETECTION
17516Neighborhood-Aware Self-Paced Graph Clustering for Robust Data Partitioning
17077NEON: One-Shot Text-to-Video Tuning via Noise Latent Dynamics
2114Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification
16276NETWORK-CONTROLLED REPEATERS UNDER POWER AMPLIFIER NON-LINEARITIES
13050NEURAL ACOUSTIC MULTIPOLE SPLATTING FOR ROOM IMPULSE RESPONSE SYNTHESIS
19031Neural Audio Synthesis for Sound Effects: A Scope Review
12233Neural Forward Filtering for Speaker-Image Separation
5525NEURAL NETWORK-BASED TIME-FREQUENCY-BIN-WISE LINEAR COMBINATION OF BEAMFORMERS FOR UNDERDETERMINED TARGET SOURCE EXTRACTION
19011Neural Optimisation of Fixed Beamformers With Flexible Geometric Constraints
6150Neural personal sound zones with flexible bright zone control
11616NEURAL VARIABLE SPAN FILTERS FOR INTERPRETABLE MULTI-CHANNEL SPEECH ENHANCEMENT
9591NEURERASE: SELECTIVE DEACTIVATION OF NEURONS FOR ERASING CONCEPTS IN DIFFUSION MODELS
3137NEUROCAPSNET: A NEURO-INSPIRED CAPSULE NETWORK FOR MULTI-DIRECTION AUDITORY SPATIAL ATTENTION DETECTION
2953NEUROHASH: A HYPERDIMENSIONAL NEURO-SYMBOLIC FRAMEWORK FOR SPATIALLY-AWARE IMAGE HASHING AND RETRIEVAL
2388NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition
10005NEURO-SYMBOLIC REACHABILITY REASONING FOR PHYSICALLY GROUNDED EMBODIED QUESTION ANSWERING
14080nGPT as a Scalable Architecture for Speech Recognition and Translation
13511NIFTY: A NON-LOCAL IMAGE FLOW MATCHING FOR TEXTURE SYNTHESIS
14999NLDSI-BWE: NON LINEAR DYNAMICAL SYSTEMS-INSPIRED MULTI RESOLUTION DISCRIMINATORS FOR SPEECH BANDWIDTH EXTENSION
10449NMGE: Nested Multi-Granularity Expert Groups for Complexity-Aware Routing in Multilingual Translation
16310NN-BASED IN-LOOP FILTERING FOR ENHANCED COMPRESSION BEYOND VVC
14234No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation
6658NO VERIFIABLE REWARD FOR PROSODY: TOWARD PREFERENCE-GUIDED PROSODY LEARNING IN TTS
14981No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting
4240NOISE-ROBUST AV-ASR USING VISUAL FEATURES BOTH IN THE WHISPER ENCODER AND DECODER
15590Noise-Robust Cross-Modal Hashing with Contrastive Weighting
19084Noise-Robust Speaker Verification with Attenuated Speech Restoration and Consistency Training
15714Noise-Robust Video Salient Object Detection in Spike Streams
10215NOISE-TO-NOTES: DIFFUSION-BASED GENERATION AND REFINEMENT FOR AUTOMATIC DRUM TRANSCRIPTION
9382NON-ASYMPTOTIC PERFORMANCE ANALYSIS OF DOA ESTIMATION BASED ON REAL-VALUED ROOT-MUSIC
9516NON-BAYESIAN SOCIAL LEARNING FOR MODELING INTERACTING LARGE LANGUAGE MODEL AGENTS
9653Non-Coherent Multi-Antenna Reception of Ambient Backscatter with Canonical Correlation Analysis
10116NONCONVEX REGULARIZATION FOR FEATURE SELECTION IN REINFORCEMENT LEARNING
17697NON-HOMOGENEOUS HAZE REMOVAL BASED ON DEEP UNFOLDING NETWORK FOR REMOTE SENSING IMAGES
16511NON-LINE-OF-SIGHT VEHICLE DETECTION VIA AUDIO-VISUAL FUSION
17181NON-UNIFORM HAZE REMOVAL FOR REMOTE SENSING IMAGE BASED ON WAVELET-DOMAIN HAZE-AWARE COMPLEMENTARY LEARNING
3544NORD-PARL-TTS: FINNISH AND SWEDISH TTS DATASET FROM PARLIAMENT SPEECH
4852NO-REFERENCE NIGHT-TIME IMAGE QUALITY ASSESSMENT VIA SELF-SUPERVISED AND META-LEARNING
2860NORMALIGN: FEATURE NORM REGULARIZATION FOR CONFIDENCE CALIBRATION IN GRAPH NEURAL NETWORKS
17548NOT ALL WEIGHT VECTORS ARE NEEDED: COVARIANCE-BASED VECTOR SELECTION TUNING FOR LARGE LANGUAGE MODELS
16592NOT JUST DETECTION: ALIGNED-DRIVEN PURIFICATION OF INDIRECT PROMPT INJECTION FOR RELIABLE AGENT INTERACTION
2445NRRN: NEWS REPRESENTATION RESTORATION NETWORK FOR MULTIMODAL FAKE NEWS DETECTION WITH MULTIMODAL COMPRESSION AND CAPSULE FUSION
4809NSC-SL: A Bandwidth-Aware Neural Subspace Compression for Communication-Efficient Split Learning
4446Nuclear Diffusion Models for Low-Rank Background Suppression in Videos
13195NUMERICAL SPECTRUM LINKING: IDENTIFICATION OF GOVERNING PDE VIA KOOPMAN-CHEBYSHEV APPROXIMATION
3688Object-aware Restoration Diffusion: Progressive and Interactive Framework for Blind Compressed Image Restoration
15371OBSTRUCTIVE SLEEP APNEA ENDOTYPE PREDICTION DURING WAKEFULNESS USING VOICE BIOMARKERS
5489OCCLUSION AWARE GRAPH TRANSFOREMR FOR 3D MULTI-OBJECT TRACKING
3518Occlusion-Aware Triplet Learning for Robust Pedestrian ReID: Beyond Single-ID Labels and Data Augmentation
4242OCCLUSION-ROBUST HUMAN RENDERING BASED ON TRI-PLANE RESTORATION
18257OCR-Enhanced Multimodal ASR Can Read While Listening
10757OCTIP: COMPACT GEOGRAPHY-AWARE IP EMBEDDINGS FOR NEAREST-NEIGHBOR IP SIGNAL RETRIEVAL
7740OCTOPUS: ENHANCING DISTRIBUTIONAL REINFORCEMENT LEARNING THROUGH REGULARIZATION
2813OctreeSplatting: Region-Aware Gaussian Densification via Dynamically Managed Octree
9838ODSA: ONLINE DIFFERENTIABLE STRUCTURE ADAPTATION FOR TINY TCN ON IOT TIME SERIES
9145OFF-THE-GRID MULTI-PITCH ESTIMATION USING OPTIMAL TRANSPORT
6370OFHIE: OVERVIEW-THEN-FOCUS HIERARCHICAL INTERACTION ENCODING FOR VOXEL-BASED 3D OBJECT DETECTION
9374OF-SemWat: HIGH-PAYLOAD TEXT EMBEDDING FOR SEMANTIC WATERMARKING OF AI-GENERATED IMAGES WITH ARBITRARY SIZE
16481OG-PCL: EFFICIENT SPARSE RADAR POINT CLOUD PROCESSING FOR HUMAN ACTIVITY RECOGNITION
14654OGRA-YOLOv8: Overlapping Gridded and Rhombus Attention for Underwater Object Detection
14302OILSAM2: MEMORY-AUGMENTED SAM2 FOR SCALABLE SAR OIL SPILL DETECTION
17366OKAN: ORTHOGONAL KOLMOGOROV-ARNOLD NETWORKS FOR ACCURATE AND INTERPRETABLE CAMERA POSE REGRESSION
4139OMNI-AVSR: TOWARDS UNIFIED MULTIMODAL SPEECH RECOGNITION WITH LARGE LANGUAGE MODELS
9247ON DEEPFAKE VOICE DETECTION - IT'S ALL IN THE PRESENTATION
9603On Multiangle Discrete Fractional Periodic Transforms
15602ON OPTIMIZATION OF POLES FOR ADAPTIVE FOURIER DECOMPOSITION-INSPIRED NEURAL LAYERS
16363ON RANDOM POOLING OF LARGE-SCALE SCREENING WITH EXTREMELY SPARSE INFECTIONS
11778ON THE DESIGN OF EFFICIENT NEURAL METHODS FOR GEOMETRY-AGNOSTIC MULTICHANNEL SPEECH ENHANCEMENT
12983ON THE DESIGN OF HIGHER-ORDER TIME-INTENSITY MICROPHONE ARRAYS FOR PANORAMIC AUDIO RECORDING AND REPRODUCTION
10663ON THE DOPPLER EFFECT AND COHERENCE TIME OF NEAR-FIELD SCATTERING-FREE CHANNELS
5191On the Foundational Condition for Non-contact Vibration Measurement using Phase-based Microwave Interferometry
8875On the Importance of a Multi-Scale Calibration for Quantization
9776On the Optimality of Rate Balancing for Max-Min Fair Multicasting
8167ON THE ROLE OF EXTRINSIC VALUE EXCHANGE IN EXPECTATION PROPAGATION FOR CODED MIMO SYSTEMS
14444ON THE ROLE OF TRAINABLE PARAMETERS IN DIFFERENTIABLE FEEDBACK DELAY NETWORKS
2069On the Security of RIS-Aided Wireless Communication Systems: RIS Codebook Attack and Camouflage Solution
15348On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy
1955ON THE SHOULDERS OF GIANTS: KNOWLEDGE-DRIVEN SELF-ADAPTIVE NETWORK FOR DISTILLATION
15581ONCORAG: A Knowledge Graph-Augmented RAG Framework For Mechanism-Aware Oncology Recommendations
14148ONE MODEL--THREE TASKS: DISCOVERING A SHARED WINNING TICKET FOR LOW-COMPLEXITY AUDIO INTELLIGENCE
12733One Timestep Spiking Actor Network with Adaptive Global-connected Encoding and Threshold Learning
13664ONE-BIT QUANTIZED PRECODER CHARACTERIZATION AND PARAMETER OPTIMIZATION IN MASSIVE MIMO SYSTEMS
4505ONE-SHOT SEQUENTIAL FEDERATED LEARNING WITH DUAL-DISTILLATION
3898ONE-STAGE SEMI-SUPERVISED SEMANTIC SEGMENTATION FOR ANOMALY DETECTION VIA CONSISTENCY REGULARIZATION AND STUDENT-TEACHER MODELS
17865One-step Generative Distillation
6536ONLINE CONTINUAL CATEGORY LEARNING WITH INVARIANT PROTOTYPES
13860ONLINE CURSIVE HANDWRITING GENERATION USING TRACE TRANSFORMATION AND SYMBOL-INDEPENDENT POINT CLASSIFICATION MODEL
6472ONLINE NEURAL FUSION OF DISTORTIONLESS DIFFERENTIAL BEAMFORMERS FOR ROBUST SPEECH ENHANCEMENT
17202ONLINE REGISTER FOR DUAL-MODE SELF-SUPERVISED SPEECH MODELS: MITIGATING THE LACK OF FUTURE CONTEXT
7831Online Sensor Selection for Object Detection via Bayesian Risk Minimization
18893ONLINE SIMPLEX-STRUCTURED MATRIX FACTORIZATION
17296ONLINE TEST-TIME ADAPTATION FOR SHADOW SEGMENTATION
16186OPENHIER: AN OPEN-VOCABULARY HIERARCHICAL IMAGE CLASSIFICATION FRAMEWORK
10002OPINION CONSENSUS FORMATION AMONG NETWORKED LARGE LANGUAGE MODELS
3423OPINION-TREE-AWARE PROMPT TUNING FOR ASPECT SENTIMENT QUADRUPLE PREDICTION
18884OPTIMAL DETECTION FOR A PROSPECT THEORETIC VARIANT OF THE NEYMAN-PEARSON PROBLEM
14088OPTIMAL PLACEMENT OF MOVABLE ANTENNAS FOR ANGLE-OF-DEPARTURE ESTIMATION UNDER USER LOCATION UNCERTAINTY
13789Optimal QAM Constellation for Over-the-Air Computation in the Presence of Heavy-Tailed Channel Noise
9503OPTIMAL QUASI-CLIQUE DETECTION VIA LASRY-LIONS DOUBLE ENVELOPES
12593OPTIMAL SENSOR PLACEMENT UNDER CONSTRAINTS FOR TARGET LOCALIZATION USING DRSS MEASUREMENTS
9455OPTIMAL TRANSPORT BASED UNSUPERVISED RESTORATION LEARNING EXPLOITING DEGRADATION SPARSITY
17009OPTIMIZED END-TO-END CODING WORKFLOW FOR IMAGE STORAGE AND RETRIEVAL USING JPEG DNA
18895OPTIMIZED MULTISTAGE DECIMATION BASED ON OPTIMAL FACTORIZATION OF DECIMATION RATIO
2534OPTIMIZED PARTITIONING ACCELERATION FOR VVC INTER CODING
16839OPTIMIZING AUTOMATED JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS VIA EXPERIENCE ACCUMULATION
14995OPTIMIZING DOMAIN-ADAPTIVE SELF-SUPERVISED LEARNING FOR CLINICAL VOICE-BASED DISEASE CLASSIFICATION
14239OPTIMIZING SPEECH LANGUAGE MODELS FOR ACOUSTIC CONSISTENCY
1650OPTIMIZING THE SIGNAL-TO-NOISE RATIO OF CONTEXTUAL INFORMATION FOR IMPROVED ATTRIBUTED TEXT GENERATION
6233OptimUS: Optimization-based Unlimited Sampling Algorithm
4158OR-DETR: Exploring Explicit Occlusion Relation Prior for Crowded Pedestrian Detection
17575ORSC: OBJECT-AWARE REINFORCEMENT WITH SEMANTIC CONSISTENCY FOR HALLUCINATION MITIGATION IN MLLMS
8876ORTHOGONAL APPROXIMATE MESSAGE PASSING ALGORITHMS FOR RECTANGULAR SPIKED MATRIX MODELS WITH ROTATIONALLY INVARIANT NOISE
6285ORTHOGONAL APPROXIMATE MESSAGE-PASSING FOR SUBLINEAR SPARSITY
3968Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation
18093ORTHOVAD: WEAKLY SUPERVISED VIDEO ANOMALY DETECTION VIA PROTOTYPE ORTHOGONALITY LEARNING
12617OSG: TRAINING-FREE OBJECTNESS, SEMANTICS, AND GEOMETRY FUSION FOR ZERO-SHOT REFERRING EXPRESSION COMPREHENSION
3466OTD: DIFFUSION ON OT-STRUCTURED POINT CLOUDS FOR 3D SHAPE GENERATION
17204OUT-OF-DISTRIBUTION DETECTION BASED ON TOTAL VARIATION ESTIMATION
9078OVERCOMING BINNING DILEMMA: CUMULATIVE CALIBRATION FOR DOUBLY ROBUST LEARNING IN DEBIASED RECOMMENDATION
4957Overconfidence in Investment Decisions: A Filtering and Control Framework
9536OVID: Text-Guided Open-Vocabulary Dense Object Counting and Localization
2296OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
3245P2CL: Prototype-Constrained Consistent Learning -Toward Controllable and Consistent Transfer
5399PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
4172PADAM: Perceptual Audio Defect Assessment Model
5063PADUM: PATCH-BASED DUAL-STREAM NETWORK WITH CNN AND MAMBA FOR TIME SERIES FORECASTING
3951PAGE: A PHYSICS-AWARE GENERATIVE NETWORK FOR PRESSURE MAP SYNTHESIS
5486PAGM:A PYRAMID ALIGNMENT AND ID GRAPH MATCHING MODEL FOR VIDEO OBJECT RE-IDENTIFICATION
14591PAGS: PRIORITY-ADAPTIVE GAUSSIAN SPLATTING FOR DYNAMIC DRIVING SCENES
2802PaintFlow: Stage-Aware Temporal Modeling for Text-to-Video Synthesis of Painting Processes
16800Pairing Denoising Enhanced Hash-aware Distillation for Unsupervised Cross-modal Retrieval
14979PAIRWISE DISTORTION DISTRIBUTION FOR COMPRESSION AND QUANTIZATION
15506PALETTE: A BACKGROUND-ROBUST FINGERPRINTING ATTACK ON AIR INTERFACE DESIGNED FOR CROSS-ENVIRONMENT CHALLENGES
5049PAM-COAT: PHYSICS-AWARE MULTIMODAL COATNET FOR IMBALANCED PULSAR CANDIDATE CLASSIFICATION
6909PAMNet: Patch-Adaptive Mixing Network for Multivariate Time Series Forecasting
8442PANKRAG: ENHANCING GRAPH RETRIEVAL VIA GLOBALLY AWARE QUERY RESOLUTION AND DEPENDENCY-AWARE RERANKING MECHANISM
11461PanoIndoor and PanoOutdoor: Towards Comprehensive Datasets for Panoramic Instance Segmentation
11691PAPER SUMMARY ATTACK: JAILBREAKING LLMS THROUGH LLM SAFETY PAPERS
14223PAPR ANALYSIS OF RPDMA AND ORPDMA WITH PRIME POWER SUBCARRIERS
3936PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models
5351ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning
16594PARAEDIT: UNIFYING PARALLEL TRANSPORT AND GENERATIVE FLOWS FOR HIGH-FIDELITY IMAGE EDITING
5412PARAGSE: PARALLEL GENERATIVE SPEECH ENHANCEMENT WITH GROUP-VECTOR-QUANTIZATION-BASED NEURAL SPEECH CODEC
14158PARALINGUISTIC EMOTION-AWARE VALIDATION TIMING DETECTION IN JAPANESE EMPATHETIC SPOKEN DIALOGUE
5572Parallax-Aware Spatial Transformer: Fusing Physics and Learning for Terahertz Near-Field Localization
6834PARALLEL DELAY-DOPPLER ESTIMATION VIA ORDER-REVERSED TWO-STAGE PRONY METHOD
6332Parallel Randomized Coordinate Descent for Matrix Completion with Convergence Guarantees
16119PARAMETER ADAPTATION IN HIDDEN MARKOV MODELS WITH EQUAL EXIT PROBABILITIES
13646Parameter Localization and Relearning for Safety Disalignment in Large Language Models
18876Parameter optimisation for a physical model of the vocal system
11164PARAMETER-FREE MIXTURE OF EXPERTS FOR BLACK-BOX PROMPT TUNING
7671Parametric Channel Estimation as an Enabler for RIS-Assisted Sensing
9899PARAMETRIC MODELING AND LOCALIZATION OF SPATIALLY DISTRIBUTED TARGETS IN OFDM-MIMO RADAR SYSTEMS
4365PARAMETRIC NEURAL AMP MODELING WITH ACTIVE LEARNING
12993PARSIMONY, ORDER AND BALANCE: PRINCIPLES FOR COMPRESSING MIXTURE-OF-EXPERTS MODELS
14347PART-CENTRIC DIFFUSION POLICY WITH VISION LANGUAGE MODEL FOR GENERALIZABLE ARTICULATED OBJECT MANIPULATION
13448PAS-SE: PERSONALIZED AUXILIARY-SENSOR SPEECH ENHANCEMENT FOR VOICE PICKUP IN HEARABLES
11726PASSEG: A MULTI-SCALE SEMANTIC SEGMENTATION FRAMEWORK FOR COMPLEX UAV IMAGERY IN PLATEAU SCIENTIFIC EXPEDITIONS
15723PassMoE-P: Enhancing Password Guessing Using Large Language Models with Pattern-Specialized Mixture-of-Experts
8396PAST AS PRIOR: REWEIGHTED PROXY GUIDANCE FOR STABLE ADVERSARIAL TRAINING
7065PASTA-YOLO: AN ENHANCED DETECTOR FOR SMALL OBJECT DETECTION IN UAV IMAGERY
13461PATCH FIRST, GENERATE THEN: A DEBIASED DIFFUSION MODEL FOR MULTIVARIATE TIME SERIES GENERATION
15324PATCH-AWARE DECOMPOSITION AND DYNAMIC FUSION NETWORK FOR MULTIVARIATE TIME SERIES FORECASTING
16079PATCH-AWARE-BASED NO-REFERENCE IMAGE QUALITY ASSESSMENT VIA MULTI-FACTOR CONTRASTIVE LEARNING
1660Patch-based Active Source-Free Domain Adaptation for Annotation-Efficient Medical Image Segmentation
14904PATHFINDER: MCTS AND LLM FEEDBACK-BASED PATH SELECTION FOR MULTI-HOP QUESTION ANSWERING
8843PC2-MTO: A PRINCIPAL COMPONENT CLUSTERING APPROACH FOR MULTI-TASK OFFLOADING OPTIMIZATION IN IOV
15308PC-SSL: A PREDICTIVE CODING-BASED SELF-SUPERVISED LEARNING FRAMEWORK FOR EEG EMOTION RECOGNITION
4838PDConv: Priori Perceptual Dilated Convolution
17061PD-Reweight for UAV Aerial Burrow Detection: A Plug-in Point-Distance Module for Rebalancing Sparse Tiny Objects
13709PEEKING INTO THE FUTURE FOR CONTEXTUAL BIASING
19027PEERRTF: ROBUST MVDR BEAMFORMING USING GRAPH CONVOLUTIONAL NETWORK
2060PE-LORA: PARAMETER-EFFICIENT BAYESIAN LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS
6155PENPLAN-PDDL: A MULTI-AGENT FRAMEWORK FOR AUTOMATED PENETRATION TESTING PLANNING WITH PDDL-BASED VERIFICATION
14349PERCEPTION-GUIDED DIFFUSION FUSION WITH GRADIENT DESCENT POSTERIOR MAXIMIZATION
6927PERCEPTUAL LOSS OPTIMIZED HRTF PERSONALIZATION IN SPHERICAL HARMONIC DOMAIN
1379Perceptual Quality Assessment for Stylized Talking Heads
12134PERCEPTUAL QUALITY OPTIMIZATION OF IMAGE SUPER-RESOLUTION
18869PERFORMANCE ANALYSIS OF LINEAR DETECTION UNDER NOISE-DEPENDENT FAST-FADING CHANNELS
7600Performance Analysis of Near-Field RIS-Assisted Networks with Minimum Additive Path-Loss Association
6164Performance Bounds On Parameter Estimation for Relative Phases in Multi-Agent Wireless Systems
15936PERFORMANCE OF JOINT TDOA AND FDOA ESTIMATION IN THE LARGE SATELLITE CONSTELLATION LIMIT
5528PERFORMANCE OF REPEATER-ASSISTED MASSIVE MIMO SYSTEMS: TDD VS FDD
6071PERFORMANCE-GUIDED REINFORCED ACTIVE LEARNING FOR OBJECT DETECTION
9310PERFORMSINGER: MULTIMODAL SINGING VOICE SYNTHESIS LEVERAGING SYNCHRONIZED LIP CUES FROM SINGING PERFORMANCE VIDEOS
17399Persona Drift Detection in Role-Playing Agents: A Multi-Dimensional Consistency Framework
4252PERSONAAGENT WITH GRAPHRAG: COMMUNITY-AWARE KNOWLEDGE GRAPHS FOR PERSONALIZED LLM
9555PERSONALIZED FEDERATED LEARNING BASED ON CLUSTERING KNOWLEDGE PROTOTYPE ALIGNMENT AND DISTRIBUTION-AWARE CONSISTENCY
4659PERSONALIZED FEDERATED LEARNING VIA DECOUPLED VISUAL PROMPTS AND ADAPTIVE CLASSIFIER FUSION
6214PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONAL SPEECH MODELS
13576PERSUASION SHOULD BE DOUBLE-BLIND: A MULTI-DOMAIN DIALOGUE DATASET WITH FAITHFULNESS BASED ON CAUSAL THEORY OF MIND
6524PERTURB TO PROTECT: LEVERAGING TEST-TIME DEFENSIVE PERTURBATIONS AGAINST ADVERSARIAL ATTACKS
3842PERTURBATION-RESISTANT TRANSMIT BEAMFORMING
15177PE-SLEUTH: PROGRAM-LEVEL SEMANTICS AND STATIC FEATURE FUSION FOR INTERPRETABLE RANSOMWARE DETECTION WITH LLMS
11725PFE-NET: PROXY-GUIDED FREQUENCY ENHANCEMENT FOR CAMOUFLAGED OBJECT DETECTION
15942PFLUXTTS: HYBRID FLOW-MATCHING TTS WITH ROBUST CROSS-LINGUAL VOICE CLONING AND INFERENCE-TIME MODEL FUSION
4266PGDiff: Prior-Consistency Guided Diffusion for Unsupervised Image Restoration under Adverse Weather Conditions
13500PGFed: Prompt-Guided Distillation for Personalized Federated Learning with Model Heterogeneity
16508PG-SE: PREDICTIVE ACCELERATION AND CORRECTION FOR GENERATIVE SPEECH ENHANCEMENT
12175PG-SELECT: PRIOR-GUIDED FEATURE SELECTION FOR UNSUPERVISED OBJECT DISCOVERY IN DUNHUANG MURALS
3072PGSENET: PRIOR-GUIDED SPECTRUM ENHANCEMENT NETWORK
9129Phase Consistency Enhanced Complex-Valued Neural Network for Radio Frequency Fingerprint Identification
18188PHASE OPTIMIZATION DRIVEN WAVEFORM DESIGN WITH GOOD CORRELATION AND INFORMATION EMBEDDING PERFORMANCES FOR JOINT RADAR-COMMUNICATIONS
14012PHASE-AWARE STATE SPACE MODELING WITH FINE-GRAINED IDENTITY DISENTANGLEMENT FOR DYNAMIC FACIAL EXPRESSION RECOGNITION
10725PhaseFormer: Capturing Cross-Channel Phase and Trend Dynamics for Time Series Forecasting
5208PHASEMARK: A POST-HOC, OPTIMIZATION-FREE WATERMARKING OF AI-GENERATED IMAGES IN THE LATENT FREQUENCY DOMAIN
14471PHASE-ONLY POSITIONING IN DISTRIBUTED MIMO UNDER PHASE IMPAIRMENTS: AP SELECTION USING DEEP LEARNING
12533PHASE-RETRIEVAL-BASED PHYSICS-INFORMED NEURAL NETWORKS FOR ACOUSTIC MAGNITUDE FIELD RECONSTRUCTION
9998Phase-Space Signal Processing of Acoustic Data for Advanced Manufacturing in-situ Monitoring
14061PHOENIXDSR: PHONEME-GUIDED AND LLM-ENHANCED DYSARTHRIC SPEECH RECOGNITION
4343PHOMO: PATCH HOMOGENEITY FOR NO-REFERENCE INPAINTING ASSESSMENT AND VERIFICATION
17346PHONEME-LEVEL VISUAL SPEECH RECOGNITION VIA POINT-VISUAL FUSION AND LANGUAGE MODEL RECONSTRUCTION
16033PHONOLOGICAL TOKENIZER: PROSODY-AWARE PHONETIC TOKEN VIA MULTI-OBJECTIVE FINE-TUNING WITH DIFFERENTIABLE K-MEANS
12151PHOTO SHIELDING: ROBUST PROTECTION AGAINST AI MANIPULATION OF IMAGES
13113PHOTOMETRIC STEREO USING GAUSSIAN SPLATTING AND INVERSE RENDERING
10482PHPTRIGGER: MULTI-ENGINE ASSISTED VULNERABILITY AUDITING AND VERIFICATION IN PHP APPLICATIONS
5169PHRASED: Phrase Dictionary Biasing for Speech Translation
14897PHYS-DIFF: A PHYSICS-INSPIRED LATENT DIFFUSION MODEL FOR TROPICAL CYCLONE FORECASTING
3959PHYSHDR: WHEN LIGHTING MEETS MATERIALS AND SCENE GEOMETRY IN HDR RECONSTRUCTION
8246PHYSICALLY DEPLOYABLE 3D OMNIDIRECTIONAL INFRARED ADVERSARIAL PATCHES
3586PHYSICS AND DATA DRIVEN TRANSFORMER-MAMBA FRAMEWORK FOR FLOW FIELD
10740Physics Informed Generative Models for Magnetic Field Images
12281PHYSICS-AWARE NOVEL-VIEW ACOUSTIC SYNTHESIS WITH VISION-LANGUAGE PRIORS AND 3D ACOUSTIC ENVIRONMENT MODELING
15269PHYSICS-BASED CHANNEL TRANSFORMATION FOR WIRELESS CONFIGURATIONS
9769Physics-Encoded Learned Maximum Likelihood Estimation for Unknown Measurement Distribution
5609PHYSICS-GUIDED DIFFUSION MODELS FOR ANCIENT BAMBOO SCRIPT RESTORATION
12250Physics-Guided Learning with Hard-Soft Constraints for Urban Wind Assessment
16956PHYSICS-INFORMED ANOMALY DETECTION OF TERRAIN MATERIAL CHANGE IN RADAR IMAGERY
10809PHYSICS-INFORMED DIFFUSION GENERATION FOR GEOMAGNETIC MAP INTERPOLATION
9968PHYSICS-INFORMED GNN FOR MEDIUM-HIGH VOLTAGE AC POWER FLOW WITH EDGE-AWARE ATTENTION AND LINE SEARCH CORRECTION OPERATOR
7920PHYSICS-INFORMED HIERARCHICAL BAYESIAN MODELING FOR ANGLE-OF-ARRIVAL ESTIMATION WITH COMMERCIAL OFF-THE-SHELF RFID
6815PHYSICS-INFORMED LEARNING OF NEURAL SCATTERING FIELDS TOWARDS MEASUREMENT-FREE MESH-TO-HRTF ESTIMATION
18975Physics-Informed Neural Network-Driven Sparse Field Discretization Method for Near-Field Acoustic Holography
9877PHYSICS-INFORMED NEURAL NETWORKS FOR OCEAN ACOUSTIC FIELD RECONSTRUCTION AND SOURCE LOCALIZATION
8496PHYSICS-INFORMED VIDEO DIFFUSION FOR SHALLOW WATER EQUATIONS
15266PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation
15193PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC
11818PICFORMER: PERCEPTION-INFERENCE-CONSISTENCY LOOP FOR OCCLUDED 3D POSE ESTIMATION
1815PICOAUDIO2: TEMPORAL CONTROLLABLE TEXT-TO-AUDIO GENERATION WITH NATURAL LANGUAGE DESCRIPTION
1534PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters
15704PI-GNN: A Physics-Informed Graph Neural Network for Spatio-Temporal Diffusion Prediction
3975PILED: PHYSICS-INFORMED LOW-LIGHT ENHANCEMENT AND DEBLURRING
17310PINA: PROMPT INJECTION ATTACK AGAINST NAVIGATION AGENTS
5551PINDEFECTNET: A TRANSFORMER FRAMEWORK FOR DETECTING DEFECTS IN MILLIMETER-SCALE POWER LINE LOCKING PINS
4882PITH-Former: A Hierarchical Motion Prediction Framework Guided by Latent Goals and Driving Habits
3838PI-TPDNET: A PHYSICS-INFORMED TREND-PERIOD DECOMPOSITION NEURAL NETWORK FOR AIR QUALITY PREDICTION
9668PIXEL-PATCH GRAPH REGULARIZED GROUP SPARSE REPRESENTATION FOR SINGLE-IMAGE DENOISING
9373PKW: PUBLIC KEY WATERMARKING FOR DEEP NEURAL NETWORK WITH FISHER-GUIDED EMBEDDING
12526PLACE ANYWHERE: LEARNING SPATIAL REASONING FOR OCCLUSION-AWARE IMAGE COMPOSITION
16834PLA-LOSS: POTENTIAL LABEL-AWARE TRAINING FOR TOP-K CLASSIFICATION
9725Planning-oriented Adversarial Attack against End-to-End Autonomous Driving Systems
10235PLANPERCEIVER: A UNIFIED FRAMEWORK FOR MULTI-LEVEL SCENE INFORMATION FUSION IN AUTONOMOUS DRIVING PLANNING
3856PLNET: AN EFFICIENT PARAMETER AGGREGATION NETWORK FOR MULTIMODAL WHOLE HEART SEGMENTATION
10674PLPP: PROMPT LEARNING WITH PERPLEXITY IS SELF-DISTILLATION FOR VISION-LANGUAGE MODELS
14244PLUG-AND-PLAY DIFFUSION PRIORS FOR MULTILOOK COHERENT IMAGING WITH PROVABLE GUARANTEES
6260PLUG-AND-PLAY EMOTION GRAPHS FOR COMPOSITIONAL PROMPTING IN ZERO-SHOT SPEECH EMOTION RECOGNITION
14319PLUG-AND-PLAY FORWARD BACKWARD ALGORITHM TO RESTORE LANDSAT IMAGES: A PRELIMINARY STEP TO UNCOVER THE HISTORY OF SURFACE WATERS
4075PLUG-AND-PLAY ROBUST VISION ENCODERS FOR MULTI-MODAL LARGE LANGUAGE MODELS VIA FULLY MULTI-MODAL ADVERSARIAL FINETUNING
1969PLUG-AND-PLAY TEMPORAL FOURIER EMBEDDING FOR ROBUST LONG-HORIZON TRAFFIC FLOW FORECASTING
12853PMMD: A POSE-GUIDED MULTI-VIEW MULTI-MODAL DIFFUSION FOR PERSON GENERATION
10901P-MOE: PROXY-GUIDED MIXTURE-OF-EXPERTS NETWORK FOR FACE FORGERY DETECTION
12598PMTNET-MTS: Control-Aware Multi-Step Forecasting For Rotary Kiln Tail Temperature
12917PMW-DEHAZE: MULTI-SCALE WAVELET FUSION FOR IMAGE DEHAZING VIA MAMBA FRAMEWORK
17455PocketDVDNet: Realtime Video Denoising for Real Camera Noise
11319POEMCRAFT: MULTIMODAL POETRY GENERATION WITH PROSODY-GUIDED REFINEMENT AND BIASED ATTENTION
10411Point-Pillar Feature Representation via Fine-Grained Fusion Network for 3D Object Detection
17391POISONCRAFT: PRACTICAL POISONING OF RETRIEVAL-AUGMENTED GENERATIONFOR LARGE LANGUAGE MPDELS
4290Polaris: Detecting Advanced Persistent Threat on Provenance Graphs via Siamese Masked Graph Representation
5043POLARIZATION FINGERPRINT IDENTIFICATION METHOD BASED ON ARECA-NET
12927POLARIZATION FINGERPRINT IDENTIFICATION VIA CLUSTER-DRIVEN SUB-CLASSIFIERS ROUTING
14051POLYNOMIAL MIXING FOR EFFICIENT SELF-SUPERVISED SPEECH ENCODERS
3195Poly-SVC: Polyphonic-Aware Singing Voice Conversion with Harmonic Modeling
13203POSE-FREE INFANT GENERAL MOVEMENT ASSESSMENT USING BODY CONTOURS
13376Position-Aware Self-supervised Representation Learning for Cross-mode Radar Signal Recognition
4167POSITION-INVARIANT FINE-TUNING OF SPEECH ENHANCEMENT MODELS WITH SELF-SUPERVISED SPEECH REPRESENTATIONS
11428POSITIVE–AND–MULTI-NEGATIVE LEARNING WITH ADAPTIVE REWEIGHTING FOR NOISY LABELS
17374POST-HOC FAIRNESS ADJUSTMENT VIA COUNTERFACTUAL SENSITIVE ATTRIBUTES LEARNING
14164POWER CONSUMPTION OF MODULO SAR ADCS: A SEMI-ANALYTICAL CASE STUDY
14134POWER CONSUMPTION REDUCTION IN ELAA-ASSISTED ISAC SYSTEMS
14041PPDD: A UNIFIED PUSH–PULL ADVERSARIAL OBJECTIVE IN FEATURE AND LOGIT SPACES FOR DATASET DISTILLATION
10311PPFC: A REINFORCEMENT LEARNING-BASED FEEDBACK FRAMEWORK FOR HIGH-FIDELITY CHINESE POETRY-TO-IMAGE GENERATION
14334PRECISION NEURAL NETWORKS: JOINT GRAPH AND RELATIONAL LEARNING
13889PRECODER DESIGN IN MULTI-USER FDD SYSTEMS WITH VQ-VAE AND GNN
13877Predict the Retrieval! Test Time Adaptation for Retrieval Augmented Generation
16573PREDICTING EMOTIONS IN DIALOGUE RESPONSES BY MODELING IMPLICIT FACTORS
10882Predictor-guided Robust Federated Learning against Backdoor Attacks
4316PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting
1772PREMAB: A MULTI-MODULE SHORT VIDEO RECOMMENDATION SYSTEM WITH FOUNDATION MODELS AND MAB TO SAVE COLD-START
18862PRESENT: ZERO-SHOT TEXT-TO-PROSODY CONTROL
12322Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
14997PRETRAIN-DPFL: MITIGATING NOISE DETRIMENT IN DIFFERENTIALLY PRIVATE FEDERATED LEARNING WITH MODEL PRE-TRAINING
18979PRETRAINING AND FINE-TUNING TECHNIQUES FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION
3258Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits
10466PREVAD: PREVENTING UPSTREAM BIAS IN WEAKLY SUPERVISED VIDEO ANOMALY DETECTION
14154Preventing Modality Collapse via Category-Guided Transition Regularization
2978PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph
18037PRIMAL VARIABLE DECOUPLING AND DIAGONAL PRECONDITIONING FOR PRIMAL-DUAL SPLITTING BEYOND LIPSCHITZ CONSTANT RESTRICTIONS
12933PRINCIPLED COARSE-GRAINED ACCEPTANCE FOR SPECULATIVE DECODING IN SPEECH
8082PRINCIPLE-GUIDED MULTIMODAL REASONING WITH MINIMAL HUMAN DEMONSTRATIONS
11250PRINT2VOLUME: SYNTHETIC OCT-BASED 3D FINGERPRINT VOLUME GENERATOR
3061PRIOR KNOWLEDGE DRIVEN MULTI-VIEW CLUSTERING
4078PRIOR-CALIBRATED LONG-TAILED RECOGNITION VIA PERTURBED DISENTANGLED LOGIT ADJUSTMENT AND ADAPTIVE MIXUP
11076Prism: Few-shot Synthesis of Socratic Questioning Dialogues in Chinese Counseling
5675PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion
6271PRISM: PROBABILISTIC AND ROBUST INVERSE SOLVER WITH MEASUREMENT-CONDITIONED DIFFUSION PRIOR FOR BLIND INVERSE PROBLEMS
2662PRISM: PROPAGATING-BASED REFINED SEMANTIC FEATURES WITH BIPARTITE MATCHING FOR VISIBLE-INFRARED GROUP RE-IDENTIFICATION
19051Privacy Disclosure of Similarity Rank in Speech and Language Processing
13098PRIVACY-AWARE DESIGN OF DISTRIBUTED MIMO ISAC SYSTEMS
9358PRIVACY-PRESERVATION OVER DIRECTED GRAPHS: A CASE STUDY OF AVERAGE CONSENSUS
18238PRIVACY-PRESERVING EDGE-ASSISTED AUTHENTICATION AND KEY AGREEMENT PROTOCOL FOR RESOURCE-ASYMMETRIC IOT
6583PrivacyShadow: Revealing Fine-Tuning Leakage in Vision-Language Models via Dual-Level Black-Box Attacks
14226PROACTIVE SAFETY DELIBERATION: GUIDING LARGE REASONING MODELS WITH DISTILLED PRINCIPLES
9279PROADS: PROVABLY SECURE AND ROBUST AUDIO DIFFUSION STEGANOGRAPHY WITH LATENT OPTIMIZATION AND BACKWARD EULER INVERSION
3775PROBABILISTIC DEEP DISCRIMINANT ANALYSIS FOR WIND BLADE SEGMENTATION
17707Probabilistic Device Discovery for Communication via UAVs
15168Probabilistic Graphical Modeling for Biomedical Signal Completion with Non-Random Missingness on Patient Networks
14288PROBING CONTENT AND CHANNEL IN SPEAKER VERIFICATION MODELS
15695PROBING THE HIDDEN TALENT OF ASR FOUNDATION MODELS FOR L2 ENGLISH ORAL ASSESSMENT
11232PROBING WHISPER FOR DYSARTHRIC SPEECH IN DETECTION AND ASSESSMENT
4272PRODISTILL: A PROGRESSIVE PROMPTING FRAMEWORK FOR FINE-GRAINED VLM DISTILLATION
1439PRODUCTION-SCALE DYNAMIC VOCABULARY ASR BIASING WITH WORD-LEVEL FST AND ROBUST TRAINING
14811PROFICIENCY-AWARE ADAPTATION AND DATA AUGMENTATION FOR ROBUST L2 ASR
11595PROGRESSIQA: PROGRESSIVE CURRICULUM AND ENSEMBLE SELF-TRAINING FOR FILTER-ALTERED IMAGE QUALITY ASSESSMENT
14656Progressive Feature Distillation for Model-Heterogeneous Personalized Federated Learning
10697Progressive Motion Interpolation for Humanoid Trajectory Tracking
2332Progressive Thinking for Lane Detection: Holistic Priors to Focused Refinement
15189PROGRESSIVELY INJECTING STRUCTURAL SEMANTICS FROM THE FREQUENCY DOMAIN INTO MAMBA FOR ACCURATE CURVILINEAR STRUCTURE SEGMENTATION
18453ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
2757ProMist-5K: A Comprehensive Dataset for Digital Emulation of Cinematic Pro-Mist Filter Effects
9771PROMPT-GUIDED MIXTURE-OF-EXPERTS FOR ROBUST MULTIMODAL SENTIMENT ANALYSIS WITH MISSING MODALITIES
15895Prompt-Guided Multi-Scale Feature Pyramid Aggregation with Unified Channel-Spatial Transformer for Single Image Deraining
4491PROMPTHASH: ROBUST INSTRUCTION WATERMARKS AGAINST PARAPHRASE AND SPLICING IN LLM FORENSICS
9883PROMPTMAD: CROSS-MODAL PROMPTING FOR MULTI-CLASS VISUAL ANOMALY LOCALIZATION
17961PromptPatch: Towards Precise and Stable Behavioral Patching in Large Language Models via Feedback-driven Prompt Optimization
11422PROMPTSEP: GENERATIVE AUDIO SEPARATION VIA MULTIMODAL PROMPTING
12370PROMPTSID: A SELF-ITERATIVE DISTILLATION FRAMEWORK FOR UNSUPERVISED ADAPTATION OF VISION-LANGUAGE MODELS
10622PROPAGATING SIMILARITY, MITIGATING UNCERTAINTY: SIMILARITY PROPAGATION-ENHANCED UNCERTAINTY FOR MULTIMODAL RECOMMENDATION
11269ProRank: Progressive Context Refinement for Reliable Retrieval-Augmented Generation
4190PROSE: Probabilistic Reinforcement Learning Optimized by Success Estimation for Stage-Aware Cotton Irrigation Scheduling
16799PROSODY-GUIDED HARMONIC ATTENTION FOR PHASE-COHERENT NEURAL VOCODING IN THE COMPLEX SPECTRUM
6016PROST-LLM: PROGRESSIVELY ENHANCING THE SPEECH-TO-SPEECH TRANSLATION CAPABILITY IN LLMS
9414PROTOLENS: A FINE-GRAINED AND ADAPTIVE INTERPRETATION FRAMEWORK FOR TIME SERIES DATA CLASSIFICATION WITH PROTOTYPES
17695PROTOSAM:PROTOTYPE-AUGMENTED PROMPT LEARNING FOR SCRIBBLE-SUPERVISED SEMANTIC SEGMENTATION WITH SAM
16615PROTOTYPE-BASED INFORMATION BOTTLENECK FOR EXPLAINABLE HETEROGENEOUS TEMPORAL GRAPH NEURAL NETWORKS
2772Prototype-Based Pseudo-Label Denoising for Source-Free Domain Adaptation in Remote Sensing Semantic Segmentation
15991PROTOTYPE-GUIDED CROSS-MODAL CONTRASTIVE LEARNING FOR CONTINUAL AUDIO-VISUAL SOUND SEPARATION
12143PROTOTYPICAL SELF-TRAINING WITH PROGRESS-AWARE UPDATE FOR SOURCE-FREE DOMAIN ADAPTATION IN SEMANTIC SEGMENTATION
4396Provable Unregistered Hyperspectral-Multispectral Image Fusion via Spectral Unmixing and Adversarial Learning
6211PROXICBO: A CONSENSUS-BASED METHOD FOR COMPOSITE OPTIMIZATION
6030PRSA: PREVENTING MALICIOUS SPEAKER RECOGNITION AND SPEECH SYNTHESIS SIMULTANEOUSLY WITH ADVERSARIAL EXAMPLES
17424P-SAM: Parallel Semantic Decoding of SAM for Domain-Driven Prompt Generation in Pore Segmentation
13077PSCC NET: A SIAMESE NETWORK FRAMEWORK FOR PSEUDO-VIDEO TEMPORAL MODELING AND SPATIOTEMPORAL FUSION IN REMOTE SENSING CHANGE DETECTION
19098PSELDNETS: PRE-TRAINED NEURAL NETWORKS ON A LARGE-SCALE SYNTHETIC DATASET FOR SOUND EVENT LOCALIZATION AND DETECTION
12476PSEUDO-SIAMESE NETWORK FOR PLANNING IN TARGET-ORIENTED PROACTIVE DIALOGUES
5381PSGait: Gait Recognition using Parsing Skeleton
12127PSGS: TEXT-DRIVEN PANORAMA SLIDING SCENE GENERATION VIA GAUSSIAN SPLATTING
12170PSQ-PMC: A Hardware-Friendly Quantization Scheme for Spike-Based Neural Radiance
2579PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-aware Audio-Driven Point-based Shape
8445PTSE-T: PRESENTATION TARGET SPEAKER EXTRACTION USING UNALIGNED TEXT CUES
13750PULL-PUSHING CANNY EDGE EXTRACTION
14926PURIFICATION BEFORE FUSION: TOWARD MASK-FREE SPEECH ENHANCEMENT FOR ROBUST AUDIO-VISUAL SPEECH RECOGNITION
9813PV-ARCNET: AN ADAPTIVE DENOISE END-TO-END DEEP LEARNING MODEL FOR RAPID DC ARC DETECTION IN PHOTOVOLTAIC SYSTEMS
14022PWA: PROCESS-LEVEL WEB AGENT REINFORCEMENT LEARNING
4586PYRAMATCH: MULTI-HEAD PYRAMID SCAN FOR MAMBA-BASED IMAGE MATCHING
5914Q4Q: Quantum for Quantization in Large Language Models
14727QA-ReID: Quality-Aware Query-Adaptive Convolution Leveraging Fused Global and Structural Cues for Clothes-Changing ReID
14357QASTANET: A DNN-BASED QUALITY METRIC FOR SPATIAL AUDIO
7344QCA-RAG: EFFICIENT RETRIEVAL FOR LLMS VIA QUERY COMPLEXITY AWARENESS
12019QE-XVC: ZERO-SHOT CROSS-LINGUAL VOICE CONVERSION VIA QUERY-ENHANCEMENT AND CONDITIONAL FLOW MATCHING
12347QFOCUS: CONTROLLABLE SYNTHESIS FOR AUTOMATED SPEECH STRESS EDITING TO DELIVER HUMAN-LIKE EMPHATIC INTENT
18978QHARMA-GAN: QUASI-HARMONIC NEURAL VOCODER BASED ON AUTOREGRESSIVE MOVING AVERAGE MODEL
6899QPNET: QUATERNION PHYSICS-DRIVEN NEURAL NETWORK FOR UNDERWATER POLARIZED IMAGE RECOVERY
17557QP-SAM: Query-based Prompt Generation for Segment Anything Model in Urban Village Identification
2449QUADRATIC FLOW: CONSTANT ACCELERATION AS A PRIOR FOR LEARNING BETTER VELOCITY FIELD
12238QUADRATURE OVER-THE-AIR-COMPUTING FOR MULTIMODAL DUAL-STREAM SIGNAL PROCESSING
4985QUALITY ASSESSMENT OF NOISY AND ENHANCED SPEECH WITH LIMITED DATA: UWB-NTIS SYSTEM FOR VOICEMOS 2024
17188Quality enhancement for anomaly detection via injective linear attention
10051Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
15494Quantile Randomized Kaczmarz Algorithm with Whitelist Trust Mechanism
8504QUANTIZATION-BASED SCORE CALIBRATION FOR FEW-SHOT KEYWORD SPOTTING WITH DYNAMIC TIME WARPING IN NOISY ENVIRONMENTS
14673Quantum Adaptive Self-Attention for Financial Rebalancing: An Empirical Study on Automated Market Makers in Decentralized Finance
14599QUANTUM GASP CODES FOR PRIVATE DISTRIBUTED MATRIX MULTIPLICATION
13602Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures
6036QUANTUM-INSPIRED FREQUENCY ATTENUATION FOR ENHANCED TARGETED FABRICATION ATTACKS IN OBJECT DETECTION
6314QUERY-GUIDED PROTOTYPICAL LEARNING FOR FEW-SHOT DOCUMENT-LEVEL RELATION EXTRACTION
15404QUERY-SCALABLE FEW-SHOT SEMANTIC SEGMENTATION VIA IN-CONTEXT VARIATIONAL INFERENCE
11411Query-Specific Context-Enhanced Representation Learning for Temporal Knowledge Graph Reasoning
17057QUSR: QUALITY-AWARE AND UNCERTAINTY-GUIDED IMAGE SUPER-RESOLUTION DIFFUSION MODEL
11093Qwen-Simplify: Exploring Sentence Simplification via Qwen-based Reinforcement Learning Paradigm
10701R3G: A REASONING-RETRIEVAL-RERANKING FRAMEWORK FOR VISION-CENTRIC ANSWER GENERATION
10318R³-REC: REASONING-DRIVEN RECOMMENDATION VIA RETRIEVAL-AUGMENTED LLMS OVER MULTI-GRANULAR INTEREST SIGNALS
16478RADAREYE: ROBUST LIQUID LEVEL TRACKING USING MMWAVE RADAR IN ROBOTIC POURING
11706RADI: A RETRIEVAL-AUGMENTED DYNAMIC IN-CONTEXT LEARNING FRAMEWORK FOR AIGC IMAGE DETECTION
11845RADIANCE FIELD RENDERING WITH ADAPTIVE COMPACT KERNEL FOR NOVEL VIEW SYNTHESIS
18999RADIO MAP ESTIMATION VIA LATENT DOMAIN PLUG-AND-PLAY DENOISING
9743RADIOLUNADIFF: ESTIMATION OF WIRELESS NETWORK SIGNAL STRENGTH IN LUNAR TERRAIN
16753RADIOMETRIC VARIATION-AWARE ROBUST CHANGE DETECTION FOR MULTISPECTRAL SATELLITE IMAGES VIA CONVEX OPTIMIZATION
2177RaFD: Flow-Guided Radar Detection for Robust Autonomous Driving
10110RAFS: RETRIEVAL-AUGMENTED FEW-SHOT CAD SEGMENTATION
9799RAINFALL RETRIEVAL FROM WIRELESS LINKS VIA HYBRID LEARNING WITH DYNAMIC GATING APPROACH
15634RAME: ROLE-AWARE MULTI-VIEW EMBEDDING FOR TRANSFERABLE MULTI-AGENT REINFORCEMENT LEARNING
16203RAMTIME: RETRIEVAL-AUGMENTED MEMORY FOR TIME SERIES FORECASTING
1304RANDOM MATRIX-DRIVEN GRAPH REPRESENTATION LEARNING FOR BIOACOUSTIC RECOGNITION
16855Ranking the Impact of Contextual Specialization in Neural Speech Enhancement
10392RANKING-AWARE REINFORCEMENT LEARNING FOR ORDINAL RANKING
11339RANKNB: RANKING-AWARE DIRECT PREFERENCE OPTIMIZATION FOR ALIGNMENT OF A NANOBODY DIFFUSION MODEL
7685RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer
12704RAPTM: Retrieval-Augmented Prompting for Short-Text Topic Modeling
11108RASD-SR: A ROBUST ANOMALOUS SOUND DETECTION FRAMEWORK WITH SCORE RECALIBRATION
9979RATE-DISTORTION ANALYSIS OF OPTICALLY PASSIVE VISION COMPRESSION
5432Rationale-Augmented Fine-Grained Opinion Mining with Large Language Models
16861RATIONALE-GUIDED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
14462RAVE: RATE ADAPTIVE VISUAL ENCODING FOR 3D GAUSSIAN SPLATTING
11646RAVE: Retrieval and Scoring Aware Verifiable Claim Detection
4948RAWMEF: MULTI-EXPOSURE FUSION FOR RAW HDR RECONSTRUCTION VIA HISTOGRAM ENHANCEMENT AND FREQUENCY ALIGNMENT
16446RBA: TOWARDS ROBUST AND STEALTHY BACKDOOR ATTACK IN FEDERATED LEARNING
9513RBAP AND RBAC: TWO NOVEL TYPES IN NONLINEAR RESIDUAL WEIGHTING FOR PHYSICS-INFORMED NEURAL NETWORKS
1244RBDA: BLACK-BOX DOMAIN ADAPTATION PERSON RE-IDENTIFICATION WITH TEST-TIME ORIENTATION-AWARE RARE ATTRIBUTE-GUIDED RE-RANKING
14839RCAL: Reinforced Cross-modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames
16025RCLMATCH: REVISITING CONTRASTIVE LEARNING FOR SEMI-SUPERVISED SEMANTIC SEGMENTATION WITH CONSISTENCY REGULARIZATION
4835RDQ: Learnable Kronecker Rotation Matrix Decomposition for Efficient Large Language Model Quantization
14423RDSNET: EFFICIENT RADIAL-AWARE DEFORMABLE SAMPLING NETWORK FOR TOP-VIEW FISHEYE PEOPLE DETECTION
9990Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading
14293READING BETWEEN THE WAVES: ROBUST TOPIC SEGMENTATION USING INTER-SENTENCE AUDIO FEATURES
10592Readout-Side Bypass for Residual Hybrid Quantum-Classical Models
11149REALCOUNT: ROBUST OPEN-WORLD OBJECT COUNTING VIA DUPLEX CONTRASTIVE LEARNING
4512REAL-TIME ANCHOR NODE SELECTION FOR UNDERWATER TDOA LOCALIZATION: A CONVEX-OPTIMIZATION-DRIVEN NEURAL FRAMEWORK
5158REAL-TIME CARFAC COCHLEA MODEL ACCELERATION ON FPGA FOR UNDERWATER ACOUSTIC SENSING SYSTEMS
17121REAL-TIME MARKOV MODELING FOR SINGLE-PHOTON LIDAR: 1000× ACCELERATION AND CONVERGENCE ANALYSIS
16059REAL-TIME STREAMING MEL VOCODING WITH GENERATIVE FLOW MATCHING
6917Real-Time Thermal Anomaly Detection via Commodity WiFi Sensing for Autonomous IoT Systems
11798Real-World Adversarial Attacks on RF-Based Drone Detectors
14853REANISOGS: REFLECTION-AWARE ANISOTROPIC NEURAL GAUSSIANS VIA K-PLANES
13636REASON TO RETRIEVE: STRUCTURED CHAIN-OF-THOUGHT FOR TEXT-VIDEO RETRIEVAL
12405Reason,Construct,Rehearse:A Dynamic Framework for Generating Verifiable Behavior Trees in Open Worlds
11355REASONER-ASSISTED PLANNING: ENHANCE THE ABILITY OF GRAPH-RAG TO HANDLE COMPLEX QUESTIONS
9959Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition
9607REASONING DRIVEN CAPTIONS TO ASSIST NOISE ROBUST SPEECH EMOTION RECOGNITION
16858Rebalancing Sparse Tiny Objects for UAV Detection with a Plug-in Point-Distance Module
18035RECALL-LO: Enhancing Label-Only Membership Inference Against Large Language Models
19041RECOGNIZING ORNAMENTS IN VOCAL INDIAN ART MUSIC WITH ACTIVE ANNOTATION
12017RECOM: REALISTIC CO-SPEECH MOTION GENERATION WITH RECURRENT EMBEDDED TRANSFORMER
3501Reconstructing Topology-Consistent Face Mesh by Volume Rendering from Multi-View Images
3297Reconstruction of Spherical Sound Source Radiation Characteristics with Graph Signal Processing
18026RECOVERING COMPRESSED TENSORS USING DEEP FACTORIZATION MODELS
9566RECOVERING PERFORMANCE IN SPEECH EMOTION RECOGNITION FROM DISCRETE TOKENS VIA MULTI-LAYER FUSION AND PARALINGUISTIC FEATURE INTEGRATION
14461RECOVERING WASSERSTEIN DISTANCE MATRICES FROM FEW MEASUREMENTS
11534RECSUM: RECONSTRUCT COMPLEMENTARY AND CONSISTENT INFORMATION IN MULTIPLEX GRAPH FOR UNSUPERVISED SOCIAL SUMMARIZATION
14403RECURRENT CONFIDENCE CHAIN: TEMPORAL-AWARE UNCERTAINTY QUANTIFICATION IN LARGE LANGUAGE MODELS
19079Recurrent Neural Beamformer for Multichannel Speech Enhancement Under Adverse Noise Condition
9865Recursive state estimation via approximate modal paths
16690REDD-MFP: REGULARIZATION BY DIFFUSION DENOISING WITH MULTI-TIMESTEP FIXED-POINT OPTIMIZATION
4296ReDO: Online Data Selection via Joint Relevance and Diversity Optimization
14010REDUCING PROMPT SENSITIVITY IN LLM-BASED SPEECH RECOGNITION THROUGH LEARNABLE PROJECTION
2201REDUCING THE SIZE EXPANSION OF AN IMAGE ENCRYPTED BY PAILLIER’S CRYPTOSYSTEM
3276REDUNDANCY-AWARE FEATURE REFINEMENT FOR LIGHTWEIGHT IMAGE SUPER-RESOLUTION
14355REFERENCE MICROPHONE SELECTION FOR GUIDED SOURCE SEPARATION BASED ON THE NORMALIZED L-P NORM
10768REFERENCE-AWARE SFM LAYERS FOR INTRUSIVE INTELLIGIBILITY PREDICTION
2157Reference-Aware Two-Stream Detector for Traffic Accident Detection in Road Surveillance Videos
4279REFGEN: REFERENCE-GUIDED SYNTHETIC DATA GENERATION FOR ANOMALOUS SOUND DETECTION
17833REFINEBRIDGE: GENERATIVE BRIDGE MODELS IMPROVE FINANCIAL FORECASTING BY FOUNDATION MODELS
4941REFINING CROSS-MODAL CONTRADICTION VIA ITERATIVE FOCUSING FOR MULTIMODAL SARCASM DETECTION
16972Refining Open-Vocabulary Semantic Segmentation via Regional Semantics and Visual Prototypes
7510REFLECTING ON THE PAST: A MEMORY-AUGMENTED FRAMEWORK FOR SINGLE-POINT TARGET DETECTION
16177REFLECTIVE CONFIDENCE: CORRECTING REASONING FLAWS VIA ONLINE SELF-CORRECTION
14098Reflective Policy Optimization: Enhancing Reasoning in Large Language Models via Error Localization and Test-Time Self-Correction
19122REFRAMING AUDIO DATA ANNOTATION AS DOMAIN ADAPTATION PROCESS: A MULTI-INDICATOR ACTIVE LEARNING FRAMEWORK
14562Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding
12809REGFUSE: REGISTRATION MEETS INFRARED AND VISIBLE IMAGE FUSION
2211Region Energy-Aware Learning with Gaussian-Prior Convolution for Infrared Small Target Detection
17460REGION GROWING PHYSICS-INFORMED NEURAL NETWORK FOR WIND FIELD RECONSTRUCTION FROM SPARSE DATA
10256Region-Aware Brightness-Adaptive Enhancement Paradigm for Heterogeneous Illumination
17079REGULARIZED INVERSE FILTER DESIGN FOR RIGID SPHERICAL MICROPHONE ARRAY PROCESSING: LAPLACE- AND TIME-DOMAIN REPRESENTATIONS
3576Regularized Semi-Supervised Graph Purification Network for Financial Fraud Detection
16307REGULARIZING FUNCTIONAL VECTORS TO MITIGATE FORGETTING IN PROMPT TUNING OF VISION-LANGUAGE MODELS
6892Rehearsing High Confident Samples via Masked Optimal Transportation for Catastrophic Forgetting in Continual Named Entity Recognition
13724Reinforced Active Learning for Change Point Detection
5514REINFORCEMENT LEARNING DRIVEN FUSION: INTEGRATING VISUAL SEGMENTATION AND TEXTUAL SEMANTICS FOR SENTIMENT ANALYSIS
13470REINFORCEMENT LEARNING FOR GNSS SPOOFING DETECTION: A MULTI-CLASS DQN APPROACH WITH TEXBAT
10369REINFORCEMENT LEARNING FOR OPTIMIZED ADAPTIVE SAMPLING
16751RELALIGN: LLM-BASED RELATION-FOCUSED CONTRASTIVE PRE-TRAINING AND ALIGNMENT FOR OPEN RELATION EXTRACTION
10456RELATE: ENHANCE COMPOSED VIDEO RETRIEVAL VIA MINIMAL-REDUNDANCY HIERARCHICAL COLLABORATION
14730RELATIONAL DUAL-GRANULARITY DISTILLATION FOR TEXT-BASED PERSON RETRIEVAL
16534Relative Time Intervals Representation for Word-level Timestamping with Masked Training
18980Relaxation-Free Min-k-Partition for PCI Assignment in 5G Networks
10491RELIABLE DATABASE QUESTION ANSWERING WITH COLLABORATIVE AGENTS
2960RELIC:Residual flow matching for Learned Image Compression
6616RE-LL1: An Effective Regularized $(L,L,1)$-Tensor Decomposition Method For Video Background Modeling and Foreground Separation
8093RELO-IRR: REFLECTION-GUIDED LORA FRAMEWORK FOR IMAGE REFLECTION REMOVAL
18052RelUNet: Relative Channel Fusion U-Net for Multichannel \\ Speech Enhancement
18227REMOTE MULTI-PERSON BLOOD PRESSURE MONITORING USING MMWAVE RADAR
5056REMOTEDET-MAMBA: A HYBRID MAMBA-CNN NETWORK FOR MULTI-MODAL OBJECT DETECTION IN REMOTE SENSING IMAGES
11479Repeater Swarms as Enablers of Fluid Antenna Multiple Access
6222Repeater-Assisted Massive MIMO Full-Duplex Communications
12008Representation-Based Data Quality Audits for Audio
13600Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings
8864RESBIDET: EFFICIENT DUAL-BRANCH SMALL OBJECT DETECTION FOR UAVS UNDER RESOURCE-CONSTRAINED CONDITIONS
16072ResGaussian: 3D Gaussian Splatting with High-frequency Residual
10983RESIDUAL DIFFUSION WITH FUSED ACCELERATED SHARED DISTRIBUTION AND FREQUENCY-ADAPTIVE SELECTION FOR UNIFIED IMAGE RESTORATION
7687Residual Tokens Enhance Masked Autoencoders for Speech Modeling
15053RESIDUAL VECTOR QUANTIZATION FOR COMMUNICATION-EFFICIENT MULTI-AGENT PERCEPTION
6131RESIDUAL-ENHANCED ADAPTIVE KOOPMAN AUTOENCODER: A DEEP LATENT DYNAMICS MODEL FOR STOCK PREDICTION
12088Resolution-Progressive Diffusion Model for Pansharpening
15015RESOLVING LOW-RANK UPDATE LIMITATIONS FOR MEMORY-EFFICIENT VISUAL NEURAL NETWORK TRAINING
3053RESONATE-AND-FIRE NEURONS MEET EMG: ENHANCING GESTURE CLASSIFICATION WITH SPIKING NEURAL NETWORKS
15784Restricted Isometry for Variable-Density Continuous Frequency Sampling for Off-the-Grid Sparse Signals
12340RETHINKING CHANGE DETECTION: BENCHMARKING MULTI-AGENT REMOTE SENSING IMAGE CHANGE UNDERSTANDING
7528RETHINKING DATASET PRUNING: LET THE PIXEL SPEAK FOR ITSELF
15959Rethinking Entity Disambiguation in Complex Modalities
6196Rethinking Fusion: Disentangled Learning of Shared and Modality-Specific Information for Stance Detection
16691RETHINKING LARGE LANGUAGE MODELS FOR IRREGULAR TIME SERIES CLASSIFICATION IN CRITICAL CARE
11241RETHINKING MESSAGE PASSING IN DEEP UNFOLDING NETWORK FOR SNAPSHOT COMPRESSIVE IMAGING
5664RETHINKING MULTI-SCALE PERCEPTION FOR CAMOUFLAGED OBJECT DETECTION
15551RETHINKING MUSIC CAPTIONING WITH MUSIC METADATA LLMS
5407Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency
13594RETHINKING PSEUDO-LABELING: A UNIFIED DUAL-CCL FRAMEWORK FOR ROBUST SEMI-SUPERVISED SEMANTIC SEGMENTATION
12083Rethinking Speech Representation Aggregation in Speech Enhancement: a Phonetic Mutual Information Perspective
15804RETLLM: TRAINING AND DATA-FREE MLLMS FOR MULTIMODAL INFORMATION RETRIEVAL
12192ReTools: Reflection-Enhanced Tool Invocation for Domain-Specific QA
13677RETRIEVAL AUGMENTED PRETRAINED TRANSFORMER FOR COMPETING RISKS SURVIVAL IN STATISTICAL SIGNAL PROCESSING
9709RETRIEVAL-AUGMENTED MULTI-AGENT MULTIMODAL FRAMEWORK FOR FAKE NEWS DETECTION
10189RETRIEVAL-BASED SPECULATIVE DECODING FOR AUTOREGRESSIVE SPEECH SYNTHESIS
15505RETRIEVEALL: A MULTILINGUAL NAMED ENTITY RECOGNITION FRAMEWORK WITH LARGE LANGUAGE MODELS
7557REVIG: A CNN-GNN HYBRID MODEL WITH DYNAMIC REVERSED AXIAL GRAPH CONSTRUCTION FOR VISION TASKS
1420Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective
17930REVISITING DIRECT SPEECH-TO-TEXT TRANSLATION WITH SPEECH LLMS: BETTER SCALING THAN COT PROMPTING?
13148REVISITING PROTOTYPES FOR OPEN-DOMAIN CONTINUAL LEARNING IN VISION-LANGUAGE MODELS
17587REVISITING THE CONNECTION BETWEEN MCCA-GENVAR AND IVA-G: ROLE OF ORTHOGONALITY AND DEFLATION
13016Revisiting the Seasonal Trend Decomposition for Enhanced Time Series Forecasting
12391REWARD-BASED EFFICIENT DEMONSTRATION SELECTION FOR IN-CONTEXT LEARNING
6967REWARD-GUIDED POLICY OPTIMIZATION WITH PHYSICAL PRIORS FOR UNDERWATER COLOR RESTORATION
4481RFGAT: GENERATIVE ADVERSARIAL TEACHER FOR CROSS-DOMAIN RFID ACTIVITY RECOGNITION
13340RFL-NLCP: ROBUST FEDERATED LEARNING WITH NON-IID DATA AND LIMITED CLIENT PARTICIPATION
9945RFM-EDITING: RECTIFIED FLOW MATCHING FOR TEXT-GUIDED AUDIO EDITING
11090RFSSM: A Recursive Frequency-Aware State Space Model for Pansharpening
5230RGSC: Retrieve and then Generate Image-text Pairs from Semantic Concepts for Unsupervised Vision-Language Pre-training
9875RHO-PERFECT: CORRELATION CEILING FOR SUBJECTIVE EVALUATION DATASETS
6349RHOSI: Efficient Anti-Jamming Resource Allocation with Holographic Surfaces in UAV-enabled ISAC
1409Riemannian adversarial attacks on Symmetric Positive Definite matrices
3101Riemannian optimization on the manifold of unitary and symmetric matrices with application to BD-RIS-assisted systems
5839RIR-FORMER: COORDINATE-GUIDED TRANSFORMER FOR CONTINUOUS RECONSTRUCTION OF ROOM IMPULSE RESPONSES
11155RISC-V Microarchitecture Information Leakage Attack via Transient Execution
5181RIS-ENHANCED INFORMATION-DECOUPLED SYMBIOTIC RADIO OVER BROADCASTING SIGNALS
11670RIS-FUSION: RETHINKING TEXT-DRIVEN INFRARED AND VISIBLE IMAGE FUSION FROM THE PERSPECTIVE OF REFERRING IMAGE SEGMENTATION
9701Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making
4554RISKFUZZ: RISK-GUIDED FUZZING FOR DEEP LEARNING LIBRARIES
12929RITA: Enhancing the Region-Independence for Transferable Targeted Attacks
9931RLBR: REINFORCEMENT LEARNING WITH BIASING REWARDS FOR CONTEXTUAL SPEECH LARGE LANGUAGE MODELS
4267RLCSC: REINFORCEMENT LEARNING ENHANCED CHINESE SPELLING CORRECTION WITH GLYPH-PHONETIC SIMILARITY
3361RLSP-NER: REINFORCEMENT LEARNING OF SOFT PROMPTS FOR NER WITH LARGE LANGUAGE MODELS
3157RLSW:REINFORCEMENT LEARNING-GUIDED SAMPLE WEIGHTING FOR DYNAMIC EARLY-EXITING NETWORKS
3287RMCNet: Reflection and Moiré Removal for Virtual Production
17183RMODGDF: A ROBUST STFT-DERIVED FEATURE FOR MUSICAL INSTRUMENT RECOGNITION
16221RMT-KD: RANDOM MATRIX THEORETIC CAUSAL KNOWLEDGE DISTILLATION
6135RNT2Vec: A Road-Network-Aware Trajectory Representation Model for Robust Similarity Computation
4044RO-BENCH: LARGE-SCALE ROBUSTNESS EVALUATION OF MLLMS WITH TEXT-DRIVEN COUNTERFACTUAL VIDEOS
11369ROBUST ACCENT IDENTIFICATION VIA VOICE CONVERSION AND NON-TIMBRAL EMBEDDINGS
12493Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
1857ROBUST AND LIGHTWEIGHT F0 ESTIMATION THROUGH MID-LEVEL FUSION OF DSP-INFORMED FEATURES
17402ROBUST BAYESIAN LAST LAYER MODELS WITH HEAVY-TAILED NOISE
14038ROBUST COVARIANCE MATRIX ESTIMATION FOR UNIFORM RECTANGULAR ARRAY
12608ROBUST CPD-BASED DOA ESTIMATION FOR ROTATING DISTRIBUTED ARRAY SYSTEMS UNDER INTER-NODE CALIBRATION ERROR
10604ROBUST DEEP CROSS-MODAL HASHING VIA DUAL CONSENSUS LEARNING
12497ROBUST DEEPFAKE AUDIO DETECTION VIA MULTI-LEVEL INTERMEDIATE FEATURE FUSION
18943Robust Diffusion Recursive Algorithm for Distributed Widely-Linear Exponential Functional Link Network
11956ROBUST DOA ESTIMATION FOR NON-COHERENT SUB-ARRAYS WITH NON-UNIFORM NOISE VARIANCES
17316ROBUST DOA ESTIMATION WITH UNKNOWN SOURCE NUMBER VIA VIRTUAL ULA BEAMFORMING
8352Robust Federated Fine-Tuning over Heterogeneous and Unreliable Communication Networks
10942ROBUST GROUNDING WITH MLLMS AGAINST OCCLUSION AND SMALL OBJECTS VIA LANGUAGE-GUIDED SEMANTIC CUES
17842ROBUST HYPERSPECTRAL ANOMALY DETECTION VIA CONSTRAINED DIFFERENCE-OF-CONVEX OPTIMIZATION UNDER MIXED NOISE CONTAMINATION
5797ROBUST IN-BED HUMAN POSE AND SHAPE ESTIMATION FROM PRESSURE IMAGES WITH CLINICAL AWARENESS
6589ROBUST IN-CONTEXT DEFENSES AGAINST JAILBREAKING OF LLMS VIA ROLE SPECIFICATION
4228ROBUST KALMAN FILTER FOR ADDITIVE GAUSSIAN-STUDENT'S T DISTRIBUTION
6436ROBUST KEYFRAME-CONSTRAINED SIGNAL MODELING FOR HUMAN MOTION SYNTHESIS
16990Robust MAE-Driven NAS: From Mask Reconstruction to Architecture Innovation
12130ROBUST MMSE PRECODING FOR OUT-OF-CLUSTER INTERFERENCE MITIGATION IN CELL-FREE MIMO SYSTEMS
3003ROBUST ONLINE OVERDETERMINED INDEPENDENT VECTOR ANALYSIS BASED ON BILINEAR DECOMPOSITION
16873Robust Open-World Object Detection through Evidential Learning
18871ROBUST PARAMETER ESTIMATION OF NON-LINEAR STATE SPACE MODELS USING A DIVERGENCE-BASED ESTIMATOR
17129Robust Personalized Recommendation under Hidden Confounding in MNA
8069ROBUST PROVABLY SECURE IMAGE STEGANOGRAPHY VIA LATENT ITERATIVE OPTIMIZATION
11606ROBUST RUMOR DETECTION ON SOCIAL MEDIA WITH DYNAMIC CONTRASTIVE LEARNING
17975ROBUST SENTIMENT ANALYSIS VIA IMPORTANCE-GUIDED AUGMENTATION AND CONSISTENCY REGULARIZATION
18881Robust Single-Shot 3D Reconstruction by Sparse-to-Dense Stereo Matching and Spline Function Based Parallax Modeling
13467Robust Tensor Decomposition for Joint multiview Graph Learning and Community Detection
5074Robust Test-time Adaptation by Unifying Principled Priors and Adaptive Feature Regularization
7946ROBUST UNCERTAINTY ESTIMATION UNDER DISTRIBUTION SHIFT VIA DIFFERENCE RECONSTRUCTION
4299ROBUST UNSUPERVISED SET-LEVEL ANOMALY DETECTION FOR SMALL TEST-TIME SETS
15119ROBUST, ONLINE, AND ADAPTIVE DECENTRALIZED GAUSSIAN PROCESSES
17981ROBUSTIFYING GRAPH LAPLACIAN REGULARIZATION AGAINST EDGE WEIGHT UNCERTAINTIES: AN INFIMAL CONVOLUTION APPROACH
12036ROBUSTNESS OF AUDIO CLASSIFICATION MODELS AGAINST FILTER PERTURBATIONS
10739RoCo: Robust Code for Fast and Effective Proactive Defense against Voice Cloning Attack
17485ROLE-RL: ONLINE LONG-CONTEXT PROCESSING WITH ROLE REINFORCEMENT LEARNING FOR MULTIPLE LLMS IN THEIR OPTIMAL ROLES
11321ROLE-SPECIALIST AND CONFIDENCE-SELECTIVE MULTI-TEACHER COLLABORATIVE DISTILLATION FOR MULTI-SOURCE DOMAIN ADAPTATION
13185RoPFL: Robust and Privacy-Preserving Decentralized Federated Learning Framework
12746ROTATIONALLY-INVARIANT AMP FOR COMPRESSED SENSING WITH MULTIPLE MEASUREMENT VECTORS
15064ROTATION-DRIVEN FLEXIBLE SPARSE ARRAYS FOR HIGH-RESOLUTION DOA ESTIMATION
1797Rotation-Invariant Point Cloud Segmentation via Neural Tangent Kernel-based Angle Selection
5780Routing-Guided Multi-Expert LoRA Fine-Tuning for Image Restoration
17108ROUTINGLLM: BOOSTING LLM PERFORMANCE FOR NETWORK ROUTING
14984ROVLM: REGION-AWARE OPTIMAL VISION–LANGUAGE ALIGNMENT FOR ZERO-SHOT RECOGNITION
13987RPFE: A RANGE-VIEW ENHANCED PILLAR FEATURE ENCODING METHOD FOR LIDAR-BASED 3D OBJECT DETECTION
7861RPM-NET: RECIPROCAL POINT MLP NETWORK FOR UNKNOWN NETWORK SECURITY THREAT DETECTION
11256RRPO: ROBUST REWARD POLICY OPTIMIZATION FOR LLM-BASED EMOTIONAL TTS
13936RSC: Robust Self-correcting Watermark Model Based on Channel Control
9495RSCC-Diff: A Novel Generative Paradigm Empowers Differential-Loss-Guided MLLM for Remote Sensing Change Captioning
15045RSC-COT: VISUAL-COT REASONING AND REINFORCED OPTIMIZATION FOR REMOTE SENSING CHANGE CAPTIONING
13912RSCOT: A RICH SEMANTIC CHAIN-OF-THOUGHT FOR REMOTE SENSING VQA BASED ON MODULE ANALYSIS AND MODEL COLLABORATION
4016RSHR: HIERARCHICAL VISUAL REPRESENTATION AND STATE-SPACE REASONING FOR REMOTE SENSING VISUAL QUESTION ANSWERING
5327RSoRA: Spiking-Inspired Low-Rank Adaptation for Noise-Robust Vision Transformers
5127RTNLW: REVERSIBLE AND TAMPER-AWARE NATURAL LANGUAGE WATERMARKING SCHEME FOR MULTI-STAGE TRANSMISSION
12260RTPNET: ROBUST DETECTION OF TRAFFIC PARTICIPANTS IN COMPLEX DRIVING SCENARIOS
13277RugKeeper: A Multi-Agent LLM Framework for Rug Pull Token Detection
5644RUMOR SPOTTER: A NOVEL TEXTUAL RUMOR DETECTION MODEL INTEGRATING RUMOR CLASSIFICATION AND MARKING
3997S$^2$Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion
14758S2S: Sentence-to-section Training with Multi-task Learning for LLM-Driven Song Generation
13905S2TX: CROSS-ATTENTION MULTI-SCALE STATE-SPACE TRANSFORMER FOR TIME SERIES FORECASTING
4802S²VD: A SUBSPACE-AWARE SVD METHOD FOR EFFICIENT LLM COMPRESSION
1930S3-3DGS: STEERING SPHERICAL-HARMONIC SUBSPACES FOR SECURE 3DGS WATERMARKING
4537S3G: STOCK STATE SPACE GRAPH FOR ENHANCED STOCK TREND PREDICTION
8951SAD-SAM: MULTIDIMENSIONAL DISTRIBUTION-ALIGNED SPATIAL-AWARE DISTILLATION FOR SEGMENT ANYTHING MODEL
2670SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM
13964SAFEGEN: SCULPTING REPRESENTATION SPACE FOR SAFER AND SMARTER LLMS
10887SAFEGRAD: GRADIENT SURGERY FOR SAFE LLM FINE-TUNING
12311SAFE-IMM: ROBUST AND LIGHTWEIGHT RADAR-BASED OBJECT TRACKING ON MOBILE PLATFORMS
16104SAFETR: VERIFIABLE SEMANTIC TREE-RING WATERMARK FOR DIFFUSION MODEL AGAINST FORGERY ATTACKS
2209Safety Alignment Should Be Made More Than Just A Few Attention Heads
9739SAFETY ANCHOR-GUIDED ADAPTIVE BIAS DECAY FOR JAILBREAK DEFENSE
10710SAGA-SR: SEMANTICALLY AND ACOUSTICALLY GUIDED AUDIO SUPER-RESOLUTION
9135SAGE: SEMANTIC-AWARE SHARED SAMPLING FOR EFFICIENT DIFFUSION
12500SAGETRACK: ADAPTIVE UAV MULTI-OBJECT TRACKING WITH TEMPORAL ALIGNMENT AND SCENE-AWARE POLICIES
3106SAICR: SYMMETRIC ALIGNMENT AND INTRA-CLASS CONTRASTIVE REFINEMENT FOR REFERRING IMAGE SEGMENTATION
9479SAIL:SYNERGISTIC ANOMALY-INFORMED LEARNING FOR DEEPFAKE DETECTION WITH CLIP
2078SAILING BEYOND SCARCITY: TASK-DRIVEN DIFFUSION MARINE DATA AUGMENTATION FOR MARINE OBJECT DETECTION
5204SAIP: A PLUG-AND-PLAY SCALE-ADAPTIVE MODULE IN DIFFUSION-BASED INVERSE PROBLEMS
2563SAKA: SPATIALLY-ADAPTIVE AND KEYFRAME-ANCHORED GRAPH NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
14489SALAD-VAE: Semantic Audio Compression with Language-Audio Distillation
12027SALIENCY-GUIDED MULTI-SCALE FEATURE ENHANCEMENT NETWORK FOR INFRARED AND VISIBLE IMAGE FUSION
4477SALM: STOCHASTIC ATTENTION WITH LEARNABLE MEMORY FOR MULTIVARIATE TIME SERIES ANOMALY DETECTION
17605SAM Meets Mask2Former: A SegMoE-Hybrid Model for Semantic Segmentation
6069SAMAS: A SPECTRUM-GUIDED MULTI-AGENT SYSTEM FOR ACHIEVING STYLE FIDELITY IN LITERARY TRANSLATION
2758SAM-DRIVEN MULTI-SCALE GATED NETWORK FOR MULTIMODAL REMOTE SENSING IMAGE SEGMENTATION
3868SAME: SIMILARITY-AWARE MIXTURE OF EXPERTS FOR GENERALIZED FACE ANTI-SPOOFING
8193SAM-GT: SAM AS A GENERAL TEACHER ENHANCES MEDICAL IMAGE SEGMENTATION BY DISTILLING ONLY WHAT MATTERS
6037SAM-GUIDED MULTI-VIEW FUSION FOR WEAKLY SUPERVISED 3D POINT CLOUD SEGMENTATION
14042SAMM: Segment Anything Mamba Model for General Medical Image Segmentation
2539SAMPLE EFFICIENT EXPERIENCE REPLAY IN NON-STATIONARY ENVIRONMENTS
6682SAMPLE WEIGHT AVERAGING FOR STABLE PREDICTION
18885SAMPLING AND UNIQUENESS SETS IN GRAPHON SIGNAL PROCESSING
18611SAMPLING-RATE-AGNOSTIC SPEECH SUPER-RESOLUTION BASED ON GAUSSIAN PROCESS DYNAMICAL SYSTEMS WITH DEEP KERNEL LEARNING
12967SANDWICHED IMAGE COMPRESSION: THE IMPACT OF DIFFERENTIABLE JPEG QUANTIZATION
16963SAR SHIP WAKE DETECTION BASED ON SIAMESE NETWORK WITH MAMBA CROSS-DOMAIN FEATURE FUSION
18162SAR-CAPTION RANKER: OPTIMIZING AUTOMATIC SAR IMAGE DESCRIPTIONS VIA RLAIF
11684SARD: Similarity-Aligned Reminiscence and Distillation for Exemplar-Free Class-Incremental Learning
4625SARNET: A SPIKE-AWARE CONSECUTIVE VALIDATION FRAMEWORK FOR ACCURATE REMAINING USEFUL LIFE PREDICTION
7521SA-SSL-MOS: SELF-SUPERVISED LEARNING MOS PREDICTION WITH SPECTRAL AUGMENTATION FOR GENERALIZED MULTI-RATE SPEECH ASSESSMENT
11359SATBADEDIT: TOWARDS EFFICIENT AND ROBUST MULTI-TRIGGER BACKDOOR INJECTION IN LARGE LANGUAGE MODELS
14460SATURATION-AWARE SNAPSHOT COMPRESSIVE IMAGING: THEORY AND ALGORITHM
12841SAUNA: SONG-LEVEL AUDIO & USER-LISTENING DATA NEURAL ALIGNMENT
12819SAVER: STEGANOGRAPHY-AGNOSTIC VIDEO ERASURE AND RECONSTRUCTION
10607SAVGBENCH: BENCHMARKING SPATIALLY ALIGNED AUDIO-VIDEO GENERATION
17993SCAF: SOFT CLUSTER-AWARE FUSION WITH AFFINITY ALIGNMENT FOR MULTIVARIATE TIME SERIES FORECASTING
15056SCALABLE ALGORITHMS FOR TREE CONNECTIVITY MAXIMIZATION
14551SCALABLE BAYESIAN FINE-TUNING OF LLMS FOR MULTI-OBJECTIVE BAYESIAN OPTIMIZATION
14238SCALABLE BEAMFORMING FOR VERY LARGE ANTENNA ARRAYS WITHOUT CSI
17731SCALABLE EVALUATION FOR AUDIO IDENTIFICATION VIA SYNTHETIC LATENT FINGERPRINT GENERATION
9352SCALABLE HESSIAN-FREE PROXIMAL CONJUGATE GRADIENT METHOD FOR NONCONVEX AND NONSMOOTH OPTIMIZATION
4012SCALABLE INFORMATION LEAKAGE DETECTION IN IOT WEB INTERFACES
14276Scalable LLM-Augmented DRL with Context-Aware Prompt Learning for O-RAN Slicing
13085SCALE: Semantic Chunking And Label-delay Engine for Streaming Speech-LLM
15021SCALE-AWARE SELF-SUPERVISED LEARNING FOR SEGMENTATION OF SMALL AND SPARSE STRUCTURES
14416Scale-covariant spiking wavelets
7504SCALEGS: TOWARDS SCALABLE AND EFFICIENT 3D GAUSSIAN SPLATTING
17959ScaleMamba: Multi-scale Context Fusion for Training-Free Open-Vocabulary Remote Sensing Segmentation
17632Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
6485SCALING AUDIO-VISUAL QUALITY ASSESSMENT DATASET VIA CROWDSOURCING
16450SCALING MULTI-TALKER ASR WITH SPEAKER-AGNOSTIC ACTIVITY STREAMS
2969SCALING SENTIMENT STRENGTH VIA SENTIMENT MIXING
15325Scaling Spoken Language Models with Syllabic Speech Tokenization
1017SCATTERFUSION: A HIERARCHICAL SCATTERING TRANSFORM FRAMEWORK FOR ENHANCED TIME SERIES FORECASTING
16039SCATTERING MECHANISM-AWARE DEEP LEARNING FRAMEWORK FOR POLARIMETRIC SAR DECOMPOSITION
13102SCATTERING TRANSFORMS FOR HETEROPHILIC GRAPHS USING COMPLEMENTARY BASE FILTERS
17117SCENE: SEMANTIC-AWARE CODEC ENHANCEMENT WITH NEURAL EMBEDDINGS
13225SCENERAG: SCENE-LEVEL RETRIEVAL-AUGMENTED GENERATION FOR VIDEO UNDERSTANDING
13955SCFusion: Semantic and Contextual Fusion for Document-level Event Argument Extraction
11878SCHK-HTC: SIBLING CONTRASTIVE LEARNING WITH HIERARCHICAL KNOWLEDGE-AWARE PROMPT TUNING FOR HIERARCHY TEXT CLASSIFICATION
15541SCHROMIND: MITIGATING HALLUCINATIONS IN MULTIMODAL LARGE LANGUAGE MODELS VIA SOLVING THE SCHRODINGER BRIDGE PROBLEM
15713SCI-GR: SEQUENTIAL CONTROLLABLE INPAINTING-BASED GENERATIVE REPLAY FOR CLASS-INCREMENTAL OBJECT DETECTION
16249SCLVD: SOURCE CODE VULNERABILITY DETECTION VIA SEMANTIC CONTRASTIVE LEARNING
12412SCORE-GUIDED MOTION PLANNING: LEARNING THE GRADIENT FIELD OF PROMISING REGIONS
16285SCORENF: SCORE-BASED NORMALIZING FLOWS FOR SAMPLING UNNORMALIZED DISTRIBUTIONS
10306SCORE-USOD: A GENERATIVE APPROACH TO UNDERWATER SALIENCY DETECTION
15384SD2-MAMBA:SEMANTIC-DENSITY-DRIVEN MAMBA FOR ROBUST DOMAIN GENERALIZATION UNDERWATER OBJECT DETECTION
5292SDFM: Spatial-dominated Flow Matching for Stochastic Human Motion Prediction
1850SDGF: Fusing Static and Multi-Scale Dynamic Correlations for Multivariate Time Series Forecasting
17807SDR-STE: SYNERGISTIC DISENTANGLEMENT AND REFINEMENT FOR PHOTOREALISTIC SCENE TEXT EDITING
10771SDRTRANS-FUSE: IMAGE FUSION METHOD BASED ON DEPTHWISE SEPARABLE CONVOLUTION-ENHANCED TRANSFORMER
13251Se3DGSMark: Securing Frequency-Based Watermarking with Token Chunking for 3DGS
12708Seaf: Semantic-aware Frame Selection for Long-form Video Understanding
6117Sealing Text-to-Image Models with Signet: A Lossless and Effective Watermarking Framework
10515SEAM-Former: Infusing Waveform Semantics into Transformers for Explainable Myocardial Infarction Localization via 12-lead ECG
9780SEARAG: SEMANTIC ENTROPY-GUIDED ADAPTIVE RETRIEVAL FOR MULTI-HOP QUESTION ANSWERING
16055SEARCH-ON-GRAPH: EFFECTIVE AND RETRIEVAL-ENHANCED SEARCH ON KNOWLEDGE GRAPH FOR FAITHFUL LARGE LANGUAGE MODEL REASONING
17694Secondary source placement for sound field control based on Ising model
12878Second-order optimization of variable projection SVM models and road abnormality detection
16261SECURE BACKSCATTERING WITH NON-COLLUDING JAMMER AND EAVESDROPPER
2336SecureHDC-FL: Addressing Data Heterogeneity in Encrypted Federated Hyperdimensional Computing
6025Securing INR-Based Steganography with Quantum Circuit-driven Weight Initialization
17084SED: STRUCTURAL ENTROPY BASED SPEECH DISCRETIZATION FOR DISCRETE TOKEN-BASED ASR
11276SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper
15332SEDPA: SVD-ENHANCED DUAL PATH ATTENTION FOR EFFICIENT INFERENCE
10722SEE NO EVIL: SEMANTIC CONTEXT-AWARE PRIVACY RISK DETECTION FOR AR
5130SEE WHAT YOU NEED: QUERY-AWARE VISUAL INTELLIGENCE THROUGH REASONING-PERCEPTION LOOPS
1968SEEING BEYOND DARKNESS: MULTI-DOMAIN TRANSFORMER FOR LOW-LIGHT IMAGE ENHANCEMENT
16730SEEING IS BELIEVING: COMPREHENSIVE SELF-REFLECTIVE EVALUATION SYSTEM FOR LARGE MULTI-MODAL MODELS
11249SEEING YOU IN THE NOISE: ACHIEVING DEGRADED OBJECT DETECTION WITH POSITIVE TEXT GUIDANCE
16298SEEM: EXPLOITING BLACK-BOX TEXT ATTACKS TO MANIPULATE TOOL SELECTION
13627SEFO: SEMANTIC-ENHANCED FUSION FOR ONLINE 3D INSTANCE SEGMENTATION
13249SEGMENTWISE PRUNING IN AUDIO-LANGUAGE MODELS
11328SELD-MoHA: A Fine-Tuning Method with the Mixture of Heterogeneous Adapters for Sound Event Localization and Detection
15432Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition
2564Selective Poisoning: Enhancing Backdoor Attacks on Graph Neural Networks with Limited Samples
3456Self-Attention Decomposition for Training Free Diffusion Editing
3746SELF-CALIBRATING INTEGRATE-AND-FIRE TIME ENCODING MACHINE
7736SELF-CHILL: A DIVERGENCE-CONVERGENCE FRAMEWORK FOR MULTI-PATH GENERATION IN LLMS
15956SELF-DISTILLATION PROTOTYPE LEARNING FOR WEAKLY SUPERVISED SEMANTIC SEGMENTATION
5607SELF-PACED LEARNING FOR ACTIVE VISUAL GROUNDING IN ROBOTIC SCENARIOS
15092SELF-PROMPTING WITH DEMO AUGMENTATION FOR OPEN-VOCABULARY ARGUMENT ROLE PREDICTION
15099SELF-SUPERVISED DEPTH MAP SUPER-RESOLUTION VIA SPECTRAL-BIAS-AWARE KOLMOGOROV-ARNOLD NETWORK
6749SELF-SUPERVISED DEPTH-CONSISTENCY FOR MESH RECONSTRUCTION IN THE LOOP
5720SELF-SUPERVISED MONOCULAR DEPTH ESTIMATION VIA RGB-TO-THERMAL CROSS-MODAL DISTILLATION WITH CONFIDENCE AWARENESS
15028SELF-SUPERVISED NOTE TRACKING AND MULTI-PITCH ESTIMATION VIA RECONSTRUCTION-BASED LEARNING
3270SEMAMIL: SEMANTIC-AWARE MULTIPLE INSTANCE LEARNING WITH RETRIEVAL-GUIDED STATE SPACE MODELING FOR WHOLE SLIDE IMAGES
17134SEMANTIC ALIGNMENT FRAMEWORK WITH DISTILLED SOFT LABELS FOR IMAGE-TEXT RETRIEVAL
5638SEMANTIC ALIGNMENT INCOMPLETE MULTI-MODAL HASHING
1974SEMANTIC ANCHOR TRANSFER FROM SHORT TO LONG SPEECH IN A DISTILLATION-BASED SUMMARIZATION FRAMEWORK
8175SEMANTIC AND TEMPORAL-AWARE DISTILLATION FOR CLASS-INCREMENTAL LEARNING
3885SEMANTIC COMMUNICATIONS VIA DENOISING DIFFUSION AUTOENCODER MODELS
14938SEMANTIC MINING AND CROSS-CENTER SYNERGY FOR CROSS-MODAL PERSON RE-IDENTIFICATION
1922Semantic Pilot Design for Data-Aided Channel Estimation Using a Large Language Model
1916SEMANTIC REFORMULATION ENTROPY FOR ROBUST HALLUCINATION DETECTION IN QA TASKS
2375Semantic Relation-Enhanced CLIP Adapter for Domain Adaptive Zero-Shot Learning
15799Semantic Token-Guided Generative Latent Coding for Ultra-Low Bitrate Image Compression
11511SEMANTICACHE: EFFICIENT KV CACHE COMPRESSION VIA SEMANTIC CHUNKING AND CLUSTERED MERGING
15718SEMANTIC-AWARE 3D SCENE DECOMPOSITION USING SUPERQUADRICS
3636SEMANTIC-AWARE ADDRESS SANITIZATION WITH METRIC DIFFERENTIAL PRIVACY
16179Semantic-Aware Discrete Online Cross-Modal Hashing
14993SEMANTIC-AWARE UAV COMMAND AND CONTROL FOR EFFICIENT IOT DATA COLLECTION
6323SEMANTIC-AWARE UAV-ASSISTED DATA COLLECTION IN WPT-ENABLED SPACE–AIR–SEA INTEGRATED NETWORKS
3682SEMANTIC-GUIDED MODAL ALIGNMENT FOR MULTIMODAL CARDIOVASCULAR DISEASE DETECTION
9470Semantic-Guided Pseudo-Feature Attention Network For Audio-Visual Zero-Shot Learning
10078SEMANTIC-GUIDED SLOW-FAST PRUNING OF VISUAL TOKENS FOR VISION-LANGUAGE MODELS
1896SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
14339SEMI-SUPERVISED GNN FOR SOUND SOURCE LOCALIZATION WITH PREDICTION INTERVALS
12930Sensor Array and Camera Fusion via Unbalanced Optimal Transport for 3D Source Localization
9222SENTINEL MODEL AS A TRY: A DUAL-MODEL ARCHITECTURE FOR DEFENDING AGAINST DATA EXTRACTION ATTACKS IN RETRIEVAL-AUGMENTED GENERATION
12416SEPARABLE DELAY AND DOPPLER ESTIMATION IN PASSIVE RADAR
18518SEPARATE THIS, AND ALL OF THESE THINGS AROUND IT: MUSIC SOURCE SEPARATION VIA HYPERELLIPSOIDAL QUERIES
16521SEP-IQA: HARNESSING MLLM SEMANTIC PREFERENCES FOR TRAINING-FREE IMAGE QUALITY ASSESSMENT
14502SEP-ST Incorporating Speech Entity Prompt into Large Language models for speech translation
14432Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
10015Sequential and Simultaneous Optimization of Microphone Array Geometry and Region-of-Interest Beamforming
2370SEQUENTIAL MULTIPLE TESTING WITH THREE HYPOTHESES AND KNOWN NUMBER OF STREAMS FOLLOWING EACH HYPOTHESIS
11960SESSION-LEVEL SPOKEN LANGUAGE ASSESSMENT WITH MULTIMODAL FOUNDATION MODEL VIA MULTI-TARGET LEARNING
15101SF-CLIP: CLIP-BASED ARBITRARY STYLE IMAGE RETRIEVAL WITH STYLE AND FINE-GRAINED SEMANTIC ENHANCEMENT
6718SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration
13801SFENET: SPATIAL–FREQUENCY ENTANGLEMENT NETWORK FOR GENERALIZABLE DEEPFAKE DETECTION
6529SFGNET: SEMANTIC AND FREQUENCY GUIDED NETWORK FOR CAMOUFLAGED OBJECT DETECTION
3475SFKR: Semantic-Freezing for Knowledge-aware Recommendation
6388SFL-GS: Semantic-Aware Feature Learning for 3D Gaussian Splatting
8825SFLUT: EFFICIENT STYLE FUSION LOOKUP TABLE FOR IMAGE ENHANCEMENT
5678SFM-TTS: LIGHTWEIGHT AND RAPID SPEECH SYNTHESIS WITH FLEXIBLE SHORTCUT FLOW MATCHING
1937SF-MVD: SENSOR FAILURE-AWARE MULTI-MODAL VEHICLE DETECTION WITH LIDAR-RADAR FUSION IN FOGGY WEATHER
5337SFN-NET: INTEGRATING SPATIAL-FREQUENCY FEATURE FUSION INTO DEEP UNFOLDING NETWORK WITH NESTA FOR COMPRESSIVE SENSING
3367SFQA: A Comprehensive Perceptual Quality Assessment Dataset for Singing Face Generation
8708SGAC: A SCENE GRAPH-GUIDED VISION-LANGUAGE UNDERSTANDING FRAMEWORK FOR ACTION REASONING
15662SGA-GNN: Semantic-Guided Adaptive Graph Neural Network for Cold-Start Multimodal Recommendation
2273SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians
12186SGTE-SNN: Similarity-Guided Temporal Encoding for Radar Emitter Denoising and Recognition
1517ShapeVVE: Variable Evaluator for Multivariate Time Series Shapelets Extraction
11583Shapley Features for Robust Signal Prediction in Tactile Internet
18329SHARED REPRESENTATION LEARNING FOR REFERENCE-GUIDED TARGETED SOUND DETECTION
13215SHARED-WEIGHTS EXTENDER AND GRADIENT VOTING FOR NEURAL NETWORK EXPANSION
5145SHARK: MODELING SEMANTIC HIERARCHY OF MEDICAL CODE VIA RESIDUAL K-MEANS QUANTIZATION
4576SHARPNESS-AWARE MINIMIZATION WITH Z-SCORE GRADIENT FILTERING
14091SHEAF LAPLACIAN LOCALIZATION FOR SUBGRAPH SIGNAL DIFFUSION
10765SHIELDRAG: PRIVACY-PRESERVING APPROXIMATE NEAREST NEIGHBOR SEARCH FOR RETRIEVAL-AUGMENTED GENERATION SYSTEMS
9567Shift- and stretch-invariant non-negative matrix factorization with an application to brain tissue delineation in emission tomography data
5566Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training
6329SHORT-SEGMENT SPEAKER VERIFICATION WITH PRE-TRAINED MODELS AND MULTI-RESOLUTION ENCODER
9349SHRINKV: KEY-VALUE CACHE COMPRESSION WITH PROGRESSIVE HIDDEN STATES SHRINKING TO MITIGATE PREFILLING LATENCY
18967Shuffled Linear Regression via Spectral Matching
11640SIB-VMAMBA: SELF-SUPERVISED INFRARED DYNAMIC RANGE COMPRESSION VIA STRUCTURED INFORMATION BOTTLENECK
12720Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
14996SIE3D: SINGLE-IMAGE EXPRESSIVE 3D AVATAR GENERATION VIA SEMANTIC EMBEDDING AND PERCEPTUAL EXPRESSION LOSS
6294Sieve: Computationally Efficient Hierarchical Adversarial Feature Detection in Multi-Agent Perception
15388SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
18868SIGNAL RECOVERY USING A SPIKED MIXTURE MODEL
15401SIGNAL-DRIVEN JOINT SAFETY–COMFORT OBJECTIVE FOR REAL-TIME TRAJECTORY REPLANNING ON RUTTED ROADS
17435SIGNED GRAPH UNLEARNING
3933SIGN-SALD: A SKELETON-AWARE LATENT DIFFUSION MODEL FOR TEXT-DRIVEN SIGN LANGUAGE PRODUCTION
11320SIMBA: DISENTANGLING GLOBAL-LOCAL AND DYNAMIC DEPENDENCIES FOR TIME SERIES FORECASTING
11509SIMILARITY-AWARE ANISOTROPIC SHARPENING FOR TRAINING-FREE TEST-TIME ADAPTATION WITH DISTRIBUTIONAL DISCRIMINANTS
4765SIM-MSTNET: SIM2REAL BASED MULTI-TASK SPATIOTEMPORAL NETWORK TRAFFIC FORECASTING
1696Simple Aggregation Is Not Enough: Temporal Knowledge Graph Forecasting via Decentralized Multi-Chain Reasoning
13429SIMPLICIAL GAUSSIAN MODELS: REPRESENTATION AND INFERENCE
3791SIMTOKEN: A SIMPLE BASELINE FOR REFERRING AUDIO-VISUAL SEGMENTATION
13795SIMULATORCODER: DNN ACCELERATOR SIMULATOR CODE GENERATION AND OPTIMIZATION VIA LARGE LANGUAGE MODELS
14052SIMULSENSE: SENSE-DRIVEN INTERPRETING FOR EFFICIENT SIMULTANEOUS SPEECH TRANSLATION
15986Sim-Weather: Efficient Similar Weather Retrieval with Physically Aligned Fingerprints
3763SINDIFF: SPOKEN-TO-SIGN LANGUAGE GENERATION WITH TRANSFORMER-BASED DIFFUSION MODEL
12780Sing What You Fit: A Perception-based Dataset and Benchmark for Vocal-Song Suitability Analysis
16576Sing2Song: An Accompaniment Generation System based on Solo Singing
6268Single Image Super-Resolution with Selective Perceptual Refinement and Distribution-Constancy Ranking
14178SINGLE VIEW CAMERA-BASED DYNAMIC AIRFLOW SENSING
1446Single-DMRS based CFO Estimation for Low Latency Cellular Communications
14274SINGLE-MICROPHONE AUDIO POINT SOURCE DISCRIMINATIVE LOCALIZATION FROM REVERBERATION LATE TAIL ESTIMATION
14322SINGLE-STEP CONTROLLABLE MUSIC BANDWIDTH EXTENSION WITH FLOW MATCHING
17889SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
18179SIREN: SPATIALLY-INFORMED RECONSTRUCTION OF BINAURAL AUDIO WITH VISION
11918SIRUP: A DIFFUSION-BASED VIRTUAL UPMIXER OF STEERING VECTORS FOR HIGHLY-DIRECTIVE SPATIALIZATION WITH FIRST-ORDER-AMBISONICS
8651SKETCH AND VECTOR-GUIDED 3D SHAPE GENERATION VIA CROSS-MODAL DIFFUSION
11429SkyMatte: a High-Quality Dataset for Improving Sky Image Matting
10978SLAM: Sequential Learning Signal Modeling for Multi-Concept Knowledge Tracing
9906SLAP: SCALABLE LANGUAGE-AUDIO PRETRAINING WITH VARIABLE-DURATION AUDIO AND MULTI-OBJECTIVE TRAINING
4616Sliding-Cache VLA: Training-Free Acceleration of Vision Language Action Models via Foreground-Background Decoupling
2755SLM-SS: Speech Language Model for Generative Speech Separation
9842SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models
15426SLOT FILLING AS A REASONING TASK FOR SPEECHLLMS
15516SLTN: Shadow and Lighting Transformation Network for Efficient 3D Shape Recognition
15504SMALL-SCALE CAMOUFLAGED OBJECT DETECTION FOR AGRICULTURAL AUTOMATION
13834Smart Grid Topology Inference via Locational Margin Prices and Graph-based Voltage Interpolation
15810SMEKGE: A SHRINKAGE-GUIDED META-ENSEMBLE OF KNOWLEDGE GRAPH EMBEDDING EXPERTS
11801SMOGVLM: A SMALL, GRAPH-ENHANCED VISION-LANGUAGE MODEL
4769SMOOTHCLAP: SOFT-TARGET ENHANCED CONTRASTIVE LANGUAGE -- AUDIO PRETRAINING FOR AFFECTIVE COMPUTING
2972Snore Sound Classification Based on Physiological Features and Adaptive Loss Function
17253SODA: A UNIFIED FRAMEWORK FOR JOINT ESTIMATION OF SPEAKER ORIENTATION AND DIRECTION OF ARRIVAL
10413Soft Graph Transformer for MIMO Detection
15790Soft Super-Pixel Partitioning for Certified Adversarial Robustness
18580SoGRE: Boosting Logical Reasoning of LLMs via Solver-Guided Reasoning Enhancement
15768SOLVING POISSON INVERSE PROBLEMS WITH DIFFUSION MODELS VIA THE PLUG-AND-PLAY SCHEME
5112Solving the Helmholtz Equation via an enhanced Physics-Informed Neural Networks with an Enhanced Adaptive Strategy
10457SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation
14647SOUND SOURCE LOCALIZATION USING RELATIVE CIRCULAR HARMONIC COEFFICIENTS
10271SOUNDCOMPASS: NAVIGATING TARGET SOUND EXTRACTION WITH EFFECTIVE DIRECTIONAL CLUE INTEGRATION IN COMPLEX ACOUSTIC SCENES
1979SOUNDING HIGHLIGHTS: DUAL-PATHWAY AUDIO ENCODERS FOR AUDIO-VISUAL VIDEO HIGHLIGHT DETECTION
1132Sounds That Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling
14307SOURCE LOCALIZATION AND ACOUSTIC INVERSION USING BAYESIAN OPTIMIZATION WITH LOCAL GAUSSIAN PROCESSES
17467SOURCE SEPARATION FOR A CAPPELLA MUSIC
12934SOURCE-FREE CONCEPT BOTTLENECK MODEL ADAPTATION WITH CONFIDENCE-ADAPTIVE CONDITIONAL ENSEMBLE
12647SOURCE-FREE DOMAIN ADAPTATION WITH LIGHT-WEIGHT TRANSFORMER AND CONSISTENCY LOSS
17541SPACE-TIME ARC ABSTRACTION FOR UAV NETWORK RECONFIGURATION UNDER ADVERSARIAL ELECTRO-OPTICAL DISRUPTION
12400SPADE: STRUCTURED PRUNING AND ADAPTIVE DISTILLATION FOR EFFICIENT LLM-TTS
12880SPAM: Style Prompt Adherence Metric for Prompt-based TTS
18280SPAN PRUNING AND SYNTACTIC AWARENESS FOR ASPECT SENTIMENT TRIPLET EXTRACTION
10841SPAR-GS: SINGLE-VIEW POSE-FREE AUTOMOBILE RECONSTRUCTION WITH 3D GAUSSIAN SPLATTING
13489SPARKLING TOGETHER: JOINT EDITING FOR MULTI-ACCESSORY VIRTUAL TRY-ON
15798SPARSE AND ADAPTIVE SIMILARITY-BASED GRAPH EMBEDDING FOR UNSUPERVISED FEATURE SELECTION
5758SPARSE AUTOENCODERS MAKE AUDIO FOUNDATION MODELS MORE EXPLAINABLE
3282SPARSE BAYESIAN LEARNING WITH SIMPLE AND INTERPRETABLE DNNS EXPLOITING DATA-DRIVEN PRIORS
6234Sparse Gradient Compression for Fine-Tuning Large Language Models
11790SPARSE PHYSICAL ADVERSARIAL ATTACK ON VIDEO RECOGNITION BASED ON SPATIOTEMPORAL SEMANTIC REDIRECTION
9442Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation
17292SPARSE RECOVERY USING TIGHT FRAMES AND MINIMAX CONCAVE PENALTY
17784SPARSE SIGNAL RECOVERY BASED ON LOWER-SEMICONTINUOUS 1-WEAKLY-CONVEX ENVELOPE OF MARGINAL FUNCTION
5958SPARSE-UP: LEARNABLE SPARSE UPSAMPLING FOR 3D GENERATION WITH HIGH-FIDELITY TEXTURES
4248Sparse-view Visual-acoustic Latent Learning for Novel-view Audio Synthesis
9383Sparsity Induction for Accurate Post-Training Pruning of Large Language Models
9823SPARSITY-AWARE TIME-FREQUENCY-CHIRP RATE REPRESENTATION FOR INCOMPLETE MICRO-DOPPLER SIGNAL
13078Sparsity-Induced Reparametrization for Differentially Private Federated Learning
6292SPARSITY-REGULARIZED LATENT DIFFUSION MODELS FOR RADAR CLUTTER SUPPRESSION
3718SPATIAL COVARIANCE MATRIX RECONSTRUCTION FOR SPEECH ENHANCEMENT IN REVERBERANT MULTI-SOURCE ENVIRONMENTS
15955SPATIAL RELATIONSHIP-ENHANCED SELF-SUPERVISED TRAJECTORY LEARNING FOR TRIP RECOMMENDATION
9755SPATIAL-CLAP: LEARNING SPATIALLY-AWARE AUDIO–TEXT EMBEDDINGS FOR MULTI-SOURCE CONDITIONS
13666SPATIALLY ADAPTIVE GLOBAL-LOCAL MATCHING FOR UNSUPERVISED FACIAL OPTICAL FLOW ESTIMATION
5009SPATIALLY AWARE SELF-SUPERVISED MODELS FOR MULTI-CHANNEL NEURAL SPEAKER DIARIZATION
5836Spatially Filtered Sparse Bayesian Learning for Direction-of-Arrival Estimation with Leaky-Wave Antennas
3074SPATIALLY WEIGHTED FEATURES FOR SMALL OBJECT AERIAL RECOGNITION
11014SPATIALLY-COUPLED OTFS SYSTEMS VIA BLOCK MARKOV SUPERPOSITION TRANSMISSION
3728SPATIALNET-ECHO: REAL-TIME ACOUSTIC ECHO CANCELLATION VIA INTEGRATED NARROW-BAND AND CROSS-BAND PROCESSING
4544Spatiotemporal Alignment for Remote Sensing Image Recovery via Terrain-Aware Diffusion
12844SPATIOTEMPORAL GRADIENT DECOUPLING: ADVANCING ONLINE TRAINING OF RECURRENT SPIKING NEURAL NETWORKS
5011SPATIOTEMPORAL STATE SPACE MODELING OF DYNAMIC BRAIN CONNECTIVITY IN COGNITIVELY NORMAL INDIVIDUALS AT RISK FOR ALZHEIMER’S DISEASE
11901SPC-Seg: A SAM-Guided Progressive Consistency Framework with Anatomical Priors for Scribble-Supervised Segmentation
8122SPDMOT: SPD AND EUCLIDEAN SYNERGIES FOR MULTI-OBJECT TRACKING
15972Speaker Anonymisation for Speech-based Suicide Risk Detection
9422SPEAKER ATTRIBUTED AUTOMATIC SPEECH RECOGNITION USING SPEECH AWARE LLMS
18963Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model
4664SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion
10391SPEAKING CLEARLY: A SIMPLIFIED WHISPER-BASED CODEC FOR LOW-BITRATE SPEECH CODING
11540Spectral Logit Sculpting: Adaptive Low-Rank Logit Transformation for Controlled Text Generation
1506SPECTRAL OR SPATIAL? LEVERAGING BOTH FOR SPEAKER EXTRACTION IN CHALLENGING DATA CONDITIONS
15420SPECTRAL-ALIGNED INFERENCE GUIDANCE FOR DIFFUSION-BASED IMAGE SUPER-RESOLUTION
17549SPECTRAMORPH: STRUCTURED LATENT LEARNING FOR SELF-SUPERVISED HYPERSPECTRAL SUPER-RESOLUTION
5228SPECTROGRAM EVENT BASED FEATURE REPRESENTATION FOR GENERALIZABLE AUTOMATIC MUSIC TRANSCRIPTION
17634SPECTROGRAM RESTORATION AND CLASSIFICATION FRAMEWORK FOR MULTI-PERSON THROUGH-OBSTACLE HUMAN ACTIVITY RECOGNITION
10906SPEECH EMOTION RECOGNITION BASED ON HIERARCHICAL TRANSFORMER WITH SHIFTED WINDOWS
14449SPEECH QUALITY-BASED LOCALIZATION OF LOW-QUALITY SPEECH AND TEXT-TO-SPEECH SYNTHESIS ARTEFACTS
18969Speech-FT: Merging Pre-Trained and Fine-Tuned Speech Representation Models for Cross-Task Generalization
6180SPEECHMAPPER: SPEECH-TO-TEXT EMBEDDING PROJECTOR FOR LLMS
5068SPGR: SOURCE-PATH GUIDED REPAIR FOR DEEP NEURAL NETWORKS
5819S-PHiNe: PHYSICS-INFORMED MULTICHANNEL SPEECH ENHANCEMENT USING SPECTRO-SPATIAL FUSION FOR LOW-SNR CONDITIONS
9437SPIDER: A SEMANTIC PRIOR-INFORMED DIFFUSION MODEL FOR ENHANCED MULTIMODAL RECOMMENDATION
17519SPIKE-DRIVEN LOW-POWER SPEECH BANDWIDTH EXTENSION
8625Spiking Adapter for Event-Based Action Recognition
10008SPIKING ATTENTION NETWORK: A HYBRID NEUROMORPHIC APPROACH TO UNDERWATER ACOUSTIC LOCALIZATION AND ZERO-SHOT ADAPTATION
18190Spiking Meets Causality: Efficient Granger Causal Discovery with Spiking Neural Networks
12498SPIKING NEURAL NETWORKS FOR ORDINAL REGRESSION
14851Spiking Self-Organizing Maps with Convergence Guarantees for Unsupervised Radar Signal Deinterleaving
10689SPIKING TEMPORAL-ENHANCED NETWORK FOR ZERO-SHOT AUDIO-VISUAL LEARNING
11086SPIKING-NEURO-OPTIMAL-TRANSPORT (S-NOT): A ROBUST SNN FRAMEWORK FOR SPATIO-TEMPORAL PATTERN LEARNING
11680SP-MCQA: EVALUATING INTELLIGIBILITY OF TTS BEYOND THE WORD LEVEL
7852SPOC: Safety-aware planning under partial observability and physical constraints
15080S-PRESSO: ULTRA LOW BITRATE SOUND EFFECT COMPRESSION WITH DIFFUSION AUTOENCODERS AND OFFLINE QUANTIZATION
16553SPRING REVERB EMULATION WITH HYBRID GATED CONVOLUTIONAL NETWORKS AND STATE SPACE MODELS
15104SP-UNet: Robust Single-Snapshot DOA Estimation via Signal Manifold Recovery
17491SPUTTER-AWARE FOCUSED PARTICLE BEAM MICROSCOPY
2429SQP-Based Passive Coherent Localization via Joint DTD and AOA Measurements
10984SRC4SYM: A WEAK SOURCE CODE ENHANCEMENT APPROACH FOR FUNCTION NAME RECOVERY UNDER VERSION MISMATCH SCENARIOS
7670SRGAC: Self-Reference Guided Adaptive Classification for Generalizable Deepfake Detection
1562SR-Gaussian: Depth-Feature Supervised Sparse Gaussian Splatting with Robust Initialization
11378SROGS: Semantic-Regularized Optimization for Pose-Free Gaussian Splatting under Sparse Views
3920SSCM: A SPATIAL SEMANTIC CONSISTENT MODEL FOR MULTI-CONTRAST MRI SUPER-RESOLUTION
6785SSCR: EFFICIENT MULTIMODAL CLOUD REMOVAL FRAMEWORK VIA EXPLOITING STRUCTURAL SEMANTICS IN SAR
15532SSG-DIT: A SPATIAL SIGNAL GUIDED FRAMEWORK FOR CONTROLLABLE VIDEO GENERATION
9767SS-JDSC: SINGLE-SPEAKER JAPANESE DYSARTHRIC SPEECH CORPUS
1851SSMEUN: SPATIAL-SPECTRAL MAMBA ENHANCED UNFOLDING NETWORK FOR PAN-SHARPENING
15084S-SONDO: SELF-SUPERVISED KNOWLEDGE DISTILLATION FOR GENERAL AUDIO FOUNDATION MODELS
16913SSRDWater: A Robust and Secure Watermarking of Large Language Models via Sentence-Level Semantic Relational Dependencies
5252SSRFNet : Stage-wise SV-Mixer and RedimNet Fusion Network for Speaker Verification
15666SSUN: Symmetric Cross-Stage State Interaction Deep Unrolling Network for Hyperspectral and Multispectral Image Fusion
9808SSVD-O: PARAMETER-EFFICIENT FINE-TUNING WITH STRUCTURED SVD FOR SPEECH RECOGNITION
9348Stability and Generalization of Adversarial Diffusion Training
18879STABILIZING RED USING THE KOOPMAN OPERATOR
16056Stable Generative Diffusion: Depth-Modality Aware and Adaptive Fusion for Camouflaged Object Detection
4100STABLE LAYOUT IMAGE DIFFUSION FOR CONTENT-AWARE LAYOUT GENERATION
10282STACODEC: SEMANTIC TOKEN ASSIGNMENT FOR BALANCING ACOUSTIC FIDELITY AND SEMANTIC INFORMATION IN AUDIO CODECS
6060STAGED DIFFUSION WITH HYBRID MIXTURE-OF-EXPERTS (MOE) FOR MULTIMODAL SENTIMENT ANALYSIS
11012STAGE-WISE ROBUST DISTILLATION FOR SPIKING NEURAL NETWORK TRAINING
14616STAGL: A SIGN-TARGET AWARE GRAPH LEARNING FRAMEWORK FOR STANCE DETECTION
14380STAMamba: Spatio-Temporal Adaptive State Space Model for 3D Human Pose Estimation
3666STANCE-DRIVEN CONTROLLABLE STATEMENT GENERATION VIA COMPOSITIONAL ATTRIBUTE GRAPH PROMPTING WITH LLMS
15863STANCEMSA: A MULTIMODAL SELF-ATTENTION FRAMEWORK FOR ACCOUNT-LEVEL IMPLICIT STANCE DETECTION IN SHORT VIDEOS
3179STAR Meets Linear Attention: Linear Complexity-Preserving Enhanced Attention Mechanism for Vision Transformer
7368STAR-RFF: Spatio-Temporal Sensing Assisted Robust Radio Frequency Fingerprint Identification via STGCN
10584STARS: SPATIO-TEMPORAL REDUNDANCY-AWARE SPARSIFICATION FOR SATELLITE VIDEO OBJECT TRACKING
15293STATE SPACE CLUSTERING FOR INTERPRETABLE FETAL HEART RATE CHARACTERIZATION
8659ST-CFNET: A SPATIO-TEMPORAL ENHANCED NETWORK FOR REAL-TIME 4D PANOPTIC SEGMENTATION
17828STCFORMER: ROBUST MALICIOUS TRAFFIC DETECTION VIA SHORT-TERM TRAFFIC PROFILING AND A HYBRID TRANSFORMER
1234STD-GAUSSIANS: SPATIO-TEMPORAL DECOUPLED GAUSSIAN SPLATTING FOR SINGLE-VIEW DYNAMIC SCENE RECONSTRUCTION
4600STDIFFUSION: A SPATIOTEMPORAL INTERPOLATION-ORIENTED DIFFUSION MODEL FOR SIGNAL SERIES LATENT REPRESENTATION GENERATION
9756STDLC: Video Coding for Machines with Spatial-Temporal Decoupled Latent Composition
2228STEMPHONIC: ALL-AT-ONCE FLEXIBLE MULTI-STEM MUSIC GENERATION
4900STEP-STA: STEPWISE TOKEN-LEVEL SPATIO-TEMPORAL ATTENTION FOR ENCRYPTED TRAFFIC CLASSIFICATION
15198StereoFoley: Object-Aware Stereo Audio Generation from Video
11752STEREOPHONIC ACOUSTIC ECHO CANCELLATION USING AN IMPROVED AFFINE PROJECTION ALGORITHM WITH ADAPTIVE MULTIPLE SUB-FILTERS
11361STHC-GS: SPATIO-TEMPORAL HIGH-FREQUENCY CONSISTENCY CONSTRAINTS FOR DYNAMIC URBAN SCENE RECONSTRUCTION
10188ST-HNTM: Joint Speech-Text Neural Topic Modeling on the Hypersphere
16755Still Thinking or Stopped Talking? Dialogue Silence Intention Classification Using Multimodal Large Language Model
15364STIMULI-AWARE EMOTION ADAPTOR FOR ENHANCING LLM IN AFFECTIVE EXPLANATION CAPTIONING
10360STNID: A SPATIOTEMPORAL MAMBA-BASED NEURAL IMPLICIT DYNAMICS MODEL FOR POINT CLOUD FORECASTING
16524STOCHASTIC SHADOW DESCENT: TRAINING PARAMETRIZED QUANTUM CIRCUITS WITH SHADOWS OF GRADIENTS
1503STPHYNET: PHYSICS-INTEGRATED SPATIOTEMPORAL NEURAL NETWORKS FOR EFFICIENT PDE SIMULATION
16315STRATEGIC USER OFFLOADING AND SERVICE PROVIDER PRICING IN MOBILE EDGE COMPUTING
16860STR-DIFFSEP: STREAMABLE DIFFUSION MODEL FOR SPEECH SEPARATION
4115STREAMING SPEECH RECOGNITION WITH DECODER-ONLY LARGE LANGUAGE MODELS AND LATENCY OPTIMIZATION
15697StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
9970STREAMMARK: A DEEP LEARNING-BASED SEMI-FRAGILE AUDIO WATERMARKING FOR PROACTIVE DEEPFAKE DETECTION
14185STREAM-VOICE-ANON: ENHANCING UTILITY OF REAL-TIME SPEAKER ANONYMIZATION VIA NEURAL AUDIO CODEC AND LANGUAGE MODELS
13771STRESS PREDICTION FROM TEMPORAL EMOTION TRAJECTORIES IN CLINICAL PATIENT-PHYSICIAN CONVERSATIONS
18984STRIDE CONVERSION ALGORITHMS FOR CONVOLUTIONAL LAYERS AND ITS APPLICATION TO SAMPLING-FREQUENCY-INDEPENDENT DEEP NEURAL NETWORKS
13324Strong Basin of Attraction for Unmixing Kernels With the Variable Projection Method
14317STRONG CONVEXITY OF (KERNEL) LAPLACIAN REGULARIZATION
11142STRPNET: VIDEO SALIENT OBJECT DETECTION VIA SPATIO-TEMPORAL SCENE RELATION PROPAGATION
12035STRUCTSEMKGC: A STRUCTURE-AWARE KNOWLEDGE GRAPH COMPLETION METHOD WITH ADAPTIVE MULTIMODAL FUSION
5080STRUCTSOUP: A UNIFIED ADAPTIVE FRAMEWORK FOR STRUCTURE-AWARE RETRIEVAL-AUGMENTED GENERATION
13903STRUCTURAL DECOUPLING-DRIVEN LOSS-AWARE FILTER PRUNING
12164STRUCTURE-AWARE ADVERSARIAL PURIFICATION: DYNAMIC MASKING AND ATTRIBUTION REFINEMENT IN DIFFUSION MODELS
2420Structure-Aware Corpus Construction and User-Perception-Aligned Metrics for Large-Language-Model Code Completion
15636STRUCTURE-AWARE DIFFUSION SCHRÖDINGER BRIDGE
12210Structured Persona-Driven Authentic and Controllable Moment Generation
16408STRUCTURED PRUNING VIA MULTI-OBSERVATION ITERATIVE HARD THRESHOLDING
12913STRUCTURE-DRIVEN GRAPH NEURAL NETWORKS FOR SCALABLE MULTI-GROUP MULTICASTING
10339STRUCTURE-GUIDED GRAPH REFINEMENT NETWORK FOR FACIAL AESTHETIC ASSESSMENT
9866STRUFUZZ: ENHANCING STATEFUL PROTOCOL FUZZING WITH LLM-DRIVEN SEED STRUCTURE AWARENESS
1346ST-WaveLLM: Spatio-Temporal Traffic Forecasting via Wavelet-Enhanced Large Language Models
9845StyHarmo: Efficient Style-Specific Video Generation with Music Synchronization
17014STYLE ATTACK DISGUISE: WHEN FONTS BECOME A CAMOUFLAGE FOR ADVERSARIAL INTENT
10667STYLEBENCH: EVALUATING SPEECH LANGUAGE MODELS ON CONVERSATIONAL SPEAKING STYLE CONTROL
6559StyleDecoupler: Generalizable Artistic Style Disentanglement
15948STYLE-DISENTANGLED DIFFUSION FOR CONTROLLABLE AND IDENTITY-GENERALIZED SPEECH-DRIVEN BODY MOTION GENERATION
4254STYLEPITCHER: GENERATING STYLE-FOLLOWING AND EXPRESSIVE PITCH CURVES FOR VERSATILE SINGING TASKS
13817STYLIC: STYLIZED CAPITONS USING CONTRASTIVE VISION-LANGUAGE MODELS
11871Stylized Text-to-Motion Synthesis via Multi-condition Latent Diffusion
5982STYMAM:A MAMBA-BASED GENERATOR FOR ARTISTIC STYLE TRANSFER
17123SUBARRAY ORTHOGONAL MATCHING PURSUIT FOR BLOCK-SPARSE SIGNALS WITH UNKNOWN BLOCK PARTITIONS
7848SUBGRAPH LOCALIZATION IN THE SUBBANDS FOR PARTIALLY SPOOFED SPEECH DETECTION
2075SUBJECTIVE EVALUATION OF FRAME RATE IN BITRATE-CONSTRAINED LIVE STREAMING
7858Sub-Nyquist Frequency Estimation via Amplitude-Encoding Filters
11533SUBQRAG: SUB-QUESTION DRIVEN DYNAMIC GRAPH RAG
1257Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions
16641SUBSPACE HYBRID ADAPTIVE FILTERING FOR PHONOCARDIOGRAM SIGNAL DENOISING
14284SUBTRACTIVE MODULATIVE NETWORK WITH LEARNABLE PERIODIC ACTIVATIONS
18995SUDAFIELD: SUBJECT- AND DATASET-AWARE NEURAL FIELD FOR HRTF MODELING
19030SUFFICIENT CONDITIONS FOR CONVERGENCE OF RHT AND RHTP ALGORITHMS BASED ON RIC OF ORDER 2S
10151SUMMARY ON THE MULTILINGUAL CONVERSATIONAL SPEECH LANGUAGE MODEL CHALLENGE: DATASETS, TASKS, BASELINES, AND METHODS
15651Sum-Rate Maximization for DMA-Based Wideband Near-Field Systems with Lorentzian Responses
10034SUNAC: Source-aware Unified Neural Audio Codec
14162SUPER MONOTONIC ALIGNMENT SEARCH
13803SUPERFICIAL TEXTURE SUPPRESSION NETWORK FOR GENERALIZED DEEPFAKE DETECTION
5635SUPER-MULTIPLICATIVE NMF AND GENERIC ALGORITHMS WITH IMPROVED CONVERGENCE SPEED
3599SUPERPIXEL INTEGRATED GRIDS FOR FAST IMAGE SEGMENTATION
18018Superpixel-informed Continuous Low-Rank Tensor Representation for Multi-Dimensional Data Recovery
17934SUPER-RESOLUTION GUIDED DIFFUSION NETWORK FOR MULTI-RESOLUTION REMOTE SENSING CHANGE DETECTION
5862SUPERVISED MAKEUP TRANSFER WITH A CURATED DATASET: DECOUPLING IDENTITY AND MAKEUP FEATURES FOR ENHANCED TRANSFORMATION
11524SUPPORT VECTOR DATA DESCRIPTION FOR RADAR TARGET DETECTION
2092Support-Conditioned Dynamic Convolution for Few-Shot Object Detection
4540SURE: SYNERGISTIC UNCERTAINTY-AWARE REASONING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS
6762SURE-MED: SYSTEMATIC UNCERTAINTY REDUCTION FOR ENHANCED RELIABILITY IN MEDICAL REPORT GENERATION
2769Surpassing Oneself: Self-Distillation from Past Failures
10586SUSTAINABLE INCENTIVE FOR MODEL TRADING IN DECENTRALIZED AND PERSONALIZED FEDERATED LEARNING VIA DAG-BLOCKCHAIN CONSENSUS
17100SUSTAIN-VLM: COORDINATED DATA AND COMPUTE FOR LOW-CARBON VISION LANGUAGE MODEL FINE-TUNING
17089SVCF: ENABLING ZERO-SHOT CORRECTION OF REASONING STEPS IN MULTI-MODAL LARGE LANGUAGE MODELS
7345SVPO: A LLM REINFORCEMENT LEARNING METHOD BASED ON STEPWISE VALUE ESTIMATION
3147SWAN: Boosting Image Super-Resolution with Stochastic Wavelet Attention
5733SWIN-DS: A DEEPLY SUPERVISED TRANSFORMER WITH GEOMETRIC GUIDANCE FOR ROBUST LACUNE DETECTION
10795SWITCHCODEC: ADAPTIVE RESIDUAL-EXPERT SPARSE QUANTIZATION FOR HIGH-FIDELITY NEURAL AUDIO CODING
18015SYMBOLIC GOAL-GUIDED INTRINSIC CURRICULA FOR LONG-HORIZON REINFORCEMENT LEARNING
13794Symphony Rendering: MIDI and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers
6409Synaspot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy
5275SYNCHRONOUS SECONDARY PATH MODELING AND KRONECKER-FACTORIZED ADAPTIVE ALGORITHM FOR MULTICHANNEL ACTIVE NOISE CONTROL
6277SYNCSPEECH: EFFICIENT AND LOW-LATENCY TEXT-TO-SPEECH BASED ON TEMPORAL MASKED TRANSFORMER
17106SYNERGISTIC FOURIER-WAVELET NEURAL OPERATOR
8053SYNERGISTIC HYBRID ATTENTION NETWORK: AN ENHANCED MULTI-MODAL INTERACTION ARCHITECTURE FOR EFFICIENT VISUAL QUESTION ANSWERING
7314SYNERGISTIC STRUCTURE-AWARE GUIDED NETWORK FOR BINARY PROTOCOL FORMAT INFERENCE
11391SYNERGY MAP–GUIDED SPECTRAL–DOMAIN ENHANCED NETWORK FOR CAMOUFLAGED OBJECT DETECTION
10773SYNERGYWARPNET: ATTENTION-GUIDED COOPERATIVE WARPING FOR NEURAL PORTRAIT ANIMATION
3704SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding
4974SynthCloner: Synthesizer Preset Conversion via Factorized Codec with ADSR Envelope Control
11183Synthesis-Driven Contrastive Learning for Unpaired Unsupervised Cloth-Changing Person Re-Identification
3229Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition
3272SYNTHETIC DATA DOMAIN ADAPTATION FOR ASR VIA LLM-BASED TEXT AND PHONETIC RESPELLING AUGMENTATION
4889SYNTHETIC YET STRIKING? ASSESSING VOCAL CHARISMA IN TTS VIA PERCEPTUAL AND ALGORITHMIC MEASURES
18022SYSSEC: SECURING SYSTEM CALLS VIA MULTI-INTENT ALIGNMENT
4448TABULAR SYNTHESIS BASED ON BI-DIRECTIONAL FEEDBACK CONDITIONAL DIFFUSION MODELS
13123TacExpert: a Pseudo-Temporal Mixture-of-Experts Framework for Open-Set Tactile Object Recognition
5433Tackling Data Heterogeneity in Parameter-Efficient Federated Fine-Tuning of Large Language Models
11260Tackling Sparse Interactions In Multimodal Session-Based Recommendation
16038TAG: TEMPORAL-AWARE AUDIO GENERATION VIA LLM-GUIDED MANUAL CONSTRUCTION AND ATTENTION CONTROL
14620TAGARELA - A PORTUGUESE SPEECH DATASET FROM PODCASTS
5123Tag-U: Improving Social Media Role-Playing via Multimodal Tagging Strategies
12152TAILORED TEXT INTEGRATION AND SEMANTIC DIFFERENTIAL ENHANCEMENT FOR FEW-SHOT CLASS-INCREMENTAL LEARNING
11940TALPS: A FRAMEWORK FOR ADAPTIVE LEARNING OF TACTICS, TECHNIQUES, AND PROCEDURES CLASSIFICATION WITH LARGE LANGUAGE MODELS
14755Taming Audio VAEs via Target-KL Regularization
16346TAMING THE LIGHT: ILLUMINATION-INVARIANT SEMANTIC 3DGS-SLAM
4729TAML: TASK-AWARE METRIC-DRIVEN META LEARNING FOR FEW-SHOT ACTION RECOGNITION
14912TARA: TOKEN-AWARE RECALIBRATION AND ATTENTION FOR EXPLAINABLE PATHOLOGY REPORTS CLASSIFICATION
14657TARGET DETECTION IN TWO-CHANNEL PASSIVE RADARS WITH INTER-RECEIVER COLLABORATION
17739Target speaker anonymization in multi-speaker recordings
8119Targeted Fine-Tuning of DNN-Based Receivers via Influence Functions
12422TARGETED POOLED LATENT-SPACE STEGANALYSIS APPLIED TO GENERATIVE STEGANOGRAPHY, WITH A FIX
16985TARGET-SPEAKER LLM-ASR WITH SPEAKER-AWARE SPEECH ENCODER
9269Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
17495Task-Aware LLM Council with Adaptive Decision Pathways for Complex Task Support
2380Task-Aware Modality-as-Experts Fusion of NIR and Microscopic Image for Textile Analysis
16678Task-Oriented Sound Privacy Preservation for Sound Event Detection via End-to-End Adversarial Multi-task Learning
10670TASR: TRAINING TASK-ALIGNED SUPER-RESOLUTION WITH A SEMANTIC AUTOENCODER LOSS
14878TASS: TASK ALIGNED SUBSPACE SELECTION FOR KNOWLEDGE PRESERVING FINE-TUNING
11577TASU: TEXT-ONLY ALIGNMENT FOR SPEECH UNDERSTANDING
1196TAU: A BENCHMARK FOR CULTURAL SOUND UNDERSTANDING BEYOND SEMANTICS
17772T-CACHE: FAST INFERENCE FOR MASKED GENERATIVE TRANSFORMER-BASED TTS VIA PROMPT-AWARE FEATURE CACHING
9415T-CAMEL: TEAMMATE-CAUSAL-AWARE MULTI-AGENT LEARNING
7753TCC: USING TOPIC CHAINS TO COMPRESS PROMPTS FOR LONG DOCUMENT QUESTION ANSWERING
12184TCSST: A TEMPORAL KNOWLEDGE GRAPH EMBEDDING MODEL BASED ON SPACE SPIRAL TIMELINE
3215TCT-LOSS: SHAPE-AWARE TIME-SERIES FORECASTING WITH A ZERO-SHOT TIME-COLUMN TRANSFORMER AUTOENCODER
15673TC-zarr: an analysis-ready storage approach for Tropical Cyclone Data
1008TDATTACK: ENHANCING TRANSFERABILITY OF UNRESTRICTED ADVERSARIAL EXAMPLES VIA TEXT-DRIVEN DIFFUSION
5006Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing
8042TEACHER-STUDENT DIFFUSION MODEL FOR TEXT-DRIVEN 3D HAND MOTION GENERATION
16851Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
7949TEACHING THE TEACHERS: BOOSTING UNSUPERVISED DOMAIN ADAPTATION IN SPEECH RECOGNITION BY ENSEMBLE UPDATE
6195TEAMo: Trait and Emotion Aware Motion Generation in 3D Human
5087Tell Me What to Track: Infusing Robust Language Guidance for Enhanced Referring Multi-Object Tracking
4613TEMPORAL DISTILLATION FOR MUSIC REPRESENTATION LEARNING
17194Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks
15876TEMPORAL-AWARE HETEROGENEOUS GRAPH REASONING WITH MULTI-VIEW FUSION FOR TEMPORAL QUESTION ANSWERING
16069Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification
16962Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis
10839TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?
11451TENSLORA: TENSOR ALTERNATIVES FOR LOW-RANK ADAPTATION
2895Tensorformer-Based Multimodal Depression Detection from Concurrent Gait Patterns and Physiological Signals
8170TER: TEST-TIME EMBEDDING REGULARIZATION FOR CALIBRATION-AWARE PROMPT TUNING IN VISION-LANGUAGE MODELS
15783TEST TIME ADAPTATION FOR SPEECH EMOTION RECOGNITION
12252TESTAGENT: AUTOMATIC BENCHMARKING AND EXPLORATORY INTERACTION FOR EVALUATING LLMS IN VERTICAL DOMAINS
4969TESTING THE EFFICIENT CODING HYPOTHESIS BEYOND HUMANS: THE AUDITORY KERNELS OF BAT VOCALIZATIONS
10933TEST-TIME ADAPTATION FOR SPEECH ENHANCEMENT VIA MASK POLARIZATION
12342TEST-TIME SCALING FOR AUDITORY COGNITION IN AUDIO LANGUAGE MODELS
15513TEXT SEMANTICS-GUIDED DUAL-TEACHER KNOWLEDGE DISTILLATION FOR PARTIALLY RELEVANT VIDEO RETRIEVAL
10581Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment
5124Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment
14598TEXT2TRY3D: TEXT-GUIDED 3D GARMENT GENERATION ON PARAMETRIC HUMAN MODELS
10295TEXT-GUIDED DOMAIN ADAPTATION VIA DEEP MANIFOLD CONSTRAINTS AND NEIGHBORHOOD PROPAGATION
11456TEXT-GUIDED ROI-AWARE PRUNING METHOD FOR LANGUAGE EMBEDDED 3DGS
2854TEXTLESSRAG: END-TO-END VISUAL DOCUMENT RAG BY SPEECH WITHOUT TEXT
17017TEXT-ONLY ADAPTATION IN LLM-BASED ASR THROUGH TEXT DENOISING
13675TEXT-PRIOR-DRIVEN FEATURE INTERACTION FOR OPEN-VOCABULARY OBJECT DETECTION
15813TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution
19091TEXT-TO-SPEECH WITH LIP SYNCHRONIZATION BASED ON SPEECH-ASSISTED TEXT-TO-VIDEO ALIGNMENT AND MASKED UNIT PREDICTION
3766TFF-ID: A TRAINING-FREE FRAMEWORK FOR INVERTIBLE AND DIVERSIFIED FACE ANONYMIZATION
3736TF-GS: TEMPORAL-FREQUENCY FUSED GAUSSIAN SPLATTING FOR DYNAMIC VIEW SYNTHESIS
8510TF-MAMBANET: A TEMPORAL AND FREQUENCY FUSED BIDIRECTIONAL MAMBA ARCHITECTURE FOR PPG FOUNDATION MODEL
4215T-GEMS: TEXT-GUIDED EXIT MODULES FOR DECREASING CLIP IMAGE ENCODER
4073TGPO: TREE-GUIDED PREFERENCE OPTIMIZATION FOR ROBUST WEB AGENT REINFORCEMENT LEARNING
7952THANGKA: TEXT-HIERARCHICAL ALIGNMENT FOR NARRATIVE-GUIDED KNOWLEDGE-AWARE ASSOCIATION
14028THE 3RD CLARITY PREDICTION CHALLENGE: A MACHINE LEARNING CHALLENGE FOR HEARING AID SPEECH INTELLIGIBILITY PREDICTION
14398The Achilles' Heel of Angular Margins: A Chebyshev Polynomial Fix for Speaker Verification
14281THE CURIOUS CASE OF VISUAL GROUNDING: DIFFERENT EFFECTS FOR SPEECH- AND TEXT-BASED LANGUAGE ENCODERS
1365The Example Saturation Effect: The Hidden Role of Input Difficulty in In-Context Learning
11199The Hidden Cost of Caching: Analyzing the Energy Expenditure of Placement in Cache-aided MISO Networks
10060THE IMPACT OF ABSTRACT AND OBJECT TAGS ON IMAGE PRIVACY CLASSIFICATION
11703The Impact of Antenna Spacing on DOA Estimation Error in Dense Arrays
15778THE IMPACT OF AUDIO WATERMARKING ON AUDIO ANTI-SPOOFING COUNTERMEASURES
19053THE INVERSE DRUM MACHINE: SOURCE SEPARATION THROUGH JOINT TRANSCRIPTION AND ANALYSIS-BY-SYNTHESIS
12767The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs
10567THE RL-R CHAT DATASET: EGOCENTRIC CONVERSATIONS AMONG FAMILIAR INTERLOCUTORS FOR MULTI-MODAL HEARING AUGMENTATION TECHNOLOGY
3841THE ROLE OF PROSODIC AND LEXICAL CUES IN TURN-TAKING WITH SELF-SUPERVISED SPEECH REPRESENTATIONS
10055THE SINGING VOICE CONVERSION CHALLENGE 2025: FROM SINGER IDENTITY CONVERSION TO SINGING STYLE CONVERSION
15186The Stability-Plasticity Dilemma Revisited: A Brain-Inspired Continual Learning Method with Representation-Function Separation
13662THE SYNERGISTIC ROLE OF AUDIO AND LARGE VIDEO-LANGUAGE MODEL IN SOURCE-FREE VIDEO DOMAIN ADAPTATION
18120THEMIS: Bridging Documentation and Code to Uncover Access Control Vulnerabilities in GitLab
3039Theory and application of circular relative harmonic coefficients
5179THINK-AUGMENTED FUNCTION CALLING: IMPROVING LLM PARAMETER ACCURACY THROUGH EMBEDDED REASONING
12801THINK-CLIP-SAMPLE: SLOW-FAST FRAME SELECTION FOR VIDEO UNDERSTANDING
1996Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
13482THINKING WHILE LISTENING: SIMPLE TEST TIME SCALING FOR AUDIO CLASSIFICATION
1908THREATSAGE: A MODULAR BENCHMARK FOR LLM-ORCHESTRATED BLUE TEAM DEFENSE OPERATIONS
11323THREE SECONDS IS SUFFICIENT: A MULTI-PRONGED FRAMEWORK FOR MODEL-BASED SPEAKER ADAPTATION IN ASR UNDER DATA-SCARCE CONDITIONS
6843THREE-STAGE DIFFUSION POLICY OPTIMIZATION FOR OFFLINE REINFORCEMENT LEARNING
9936TICL: TEXT-EMBEDDING KNN FOR SPEECH IN-CONTEXT LEARNING UNLOCKS SPEECH RECOGNITION ABILITIES OF LARGE MULTIMODAL MODELS
16230TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice
5019TIERED TREATMENT EFFECT DECOMPOSITION FOR MULTI-TASK UPLIFT MODELING
2299TIGHT REGRET BOUNDS FOR MEAN-REVERTING LINEAR BANDITS VIA RECURSIVE STATE ESTIMATION
8112TIGHTNESS OF SEMIDEFINITE RELAXATION FOR QUATERNION-BASED ROTATION SYNCHRONIZATION PROBLEMS
11768TIMBRE-AWARE AUDIO DIFFERENCE CAPTIONING FOR ANOMALOUS MACHINE SOUNDS WITHOUT PAIRED TRAINING DATA VIA SYNTHETIC PERTUBATIONS
12236TIMBRE-BASED PRETRAINING WITH PSEUDO-LABELS FOR MULTI-INSTRUMENT AUTOMATIC MUSIC TRANSCRIPTION
16013Time Series Anomaly Detection with Quantum Variational Methods and Set Covering
17720TIME SERIES ATTRIBUTES GUIDED PRETRAINING DATA SELECTION FOR TIME SERIES FOUNDATION MODELS
10293TIME SERIES DECOMPOSITION AND FUSION-BASED GRANGER CAUSALITY NETWORK FOR NONLINEAR CAUSAL INFERENCE
16526TIME VS. LAYER: LOCATING PREDICTIVE CUES FOR DYSARTHRIC SPEECH DESCRIPTORS IN WAV2VEC 2.0
18032TIME-AWARE MULTI-EXPONENTIAL ANALYSIS TO OPTIMIZE ANALYTE IDENTIFICATION USING ZIF-8-90
5135TIMEDIFF: LEVERAGING DIFFERENTIAL DOMAIN REPRESENTATIONS FOR LONG TIME SERIES FORECASTING
1766TIME-DOMAIN SYNTHESIS OF VIRTUAL SOUND SOURCE WITHIN PERSONALIZED SOUND ZONE USING A LINEAR LOUDSPEAKER ARRAY
12635TIME-FREQUENCY ANALYSIS OF NON-UNIFORMLY SAMPLED SIGNALS VIA SAMPLE DENSITY ADAPTATION
14282Time-Shifted Token Scheduling for Symbolic Music Generation
3850TINYDROP: TINY MODEL GUIDED TOKEN DROPPING FOR VISION TRANSFORMERS
9359TINYMU: A COMPACT AUDIO-LANGUAGE MODEL FOR MUSIC UNDERSTANDING
17675TIPS Over Tricks: Simple Prompts for Effective Zero-Shot Anomaly Detection
4721TIWNet : A Template-based Real-time Image Watermarking Method Using Invertible Neural Network
7475TLDIFFGAN: A LATENT DIFFUSION-GAN FRAMEWORK WITH TEMPORAL INFORMATION FUSION FOR ANOMALOUS SOUND DETECTION
10165TLD-PGD: TWO-STAGE LOW FREQUENCY DEGRADATION ADVERSARIAL ATTACK IN HYPERSPECTRAL IMAGE CLASSIFICATION
13923TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
15497T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS
2122TMS:Text-Prompted Multi-channel Speech Separation on Smart Glasses
10519TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
14576TNET: TERRACE CONVOLUTIONAL DECODER NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
3390TOEPLITZ UNLABELED SENSING
15774TOKCOINFER: TOKEN-LEVEL MULTI-MODEL COLLABORATION FOR ENERGY-EFFICIENT LLM INFERENCE
7777TOKENCHAIN: A DISCRETE SPEECH CHAIN VIA SEMANTIC TOKEN MODELING
11916TOP-1 COMPRESSION SUFFICES FOR FEDERATED UNLEARNING WITH THE HELP OF ADAPTIVE ERROR FEEDBACK
11129TOPOBIND: MULTI-MODAL PREDICTION OF ANTIBODY-ANTIGEN BINDING FREE ENERGY VIA SEQUENCE EMBEDDINGS AND STRUCTURAL TOPOLOGY
16142Topological Growth Serialization-based Mamba for 3D Point Clouds
19012TOPOLOGICAL PERSISTENCE OF THE NEURAL EMBEDDING OF THE ARCHETYPAL SUBSPACE
13805Topological Signal Processing for 3D Point Cloud Data
16714TOPT: TASK-ORIENTED PROMPT TUNING FOR URBAN REGION REPRESENTATION LEARNING
11401TOS: A TEAM OF SPECIALISTS ENSEMBLE FRAMEWORK FOR STEREO SOUND EVENT LOCALIZATION AND DETECTION WITH DISTANCE ESTIMATION IN VIDEO
10631TOUR-TUPLE-7: A FINE-GRAINED 7-TUPLE GENERATIVE ASPECT-BASED SENTIMENT ANALYSIS BENCHMARK FOR TOURISM SERVICE QUALITY
1997TOWARD CONVERSATIONAL USER INTERFACE VIA VOICE COMMAND CORRECTION
8087Toward Cross-Dataset Clothes-Changing Re-Identification via Efficient Decoupled Adaptive Matching
16131Toward Faithful Explanations in Acoustic Anomaly Detection
19141Toward Generalized Iris Presentation Attack Detection: A Mask-and-Distill Mixture of Experts Approach
12226TOWARD NON-PARAMETERIZED TIME SERIES EMBEDDING FOR EFFICIENT FORECASTING: A DYNAMICAL SYSTEM PERSPECTIVE
11123TOWARD ROBUST AND EFFICIENT BEAT TRACKING VIA BEAT-AWARE ATTENTION
11150TOWARD ROBUST IMITATION LEARNING VIA SEARCH-BASED INVERSE DYNAMICS WITH LIMITED EXPERT DEMONSTRATIONS
11419TOWARD ROBUST NODE-LEVEL GRAPH OOD GENERALIZATION WITH SEMANTIC AWARENESS
1403TOWARD ROBUST SAR SHIP DETECTION: DOMAIN-INVARIANT LEARNING VIA PHYSICS-DRIVEN AUGMENTATION AND RETROSPECTIVE ALIGNMENT
15272Towards 2D Texture Binding via Personalized Text-to-Image Generation based on Texture-Object Decoupling
15439TOWARDS ACCURATE QUANTIZATION FOR LARGE VISION-LANGUAGE MODELS VIA ZEROTH-ORDER GRADIENT OPTIMIZATION AND SECTIONED LOGARITHMIC QUANTIZER
11809TOWARDS BLIND DATA CLEANING: A CASE STUDY IN MUSIC SOURCE SEPARATION
10853TOWARDS BUILDING SPEECH LARGE LANGUAGE MODELS FOR MULTITASK UNDERSTANDING IN LOW-RESOURCE LANGUAGES
4095TOWARDS DATA DRIFT MONITORING FOR SPEECH DEEPFAKE DETECTION IN THE CONTEXT OF MLOPS
14844TOWARDS DISTANCE-AWARE SYNTHETIC AUDIO MIXTURES FOR UNIVERSAL SOUND SEPARATION
13551Towards Dynamic World Model Generation with Monocular Video
5544TOWARDS EFFECTIVE NEGATION MODELING IN JOINT AUDIO-TEXT MODELS FOR MUSIC
6830Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances
14115TOWARDS EVENT-DRIVEN RADARS: SPECTRAL SUPER-RESOLUTION AND HARDWARE
3960Towards Explainable Privacy Preservation in Federated Learning via Shapley Value-Guided Noise Injection
15257TOWARDS FAIR ASR FOR SECOND LANGUAGE SPEAKERS USING FAIRNESS PROMPTED FINETUNING
9792TOWARDS LIGHTWEIGHT ADAPTATION OF SPEECH ENHANCEMENT MODELS IN REAL-WORLD ENVIRONMENTS
1998Towards Memory-based Temporal Coherence in Pose-free 3D Gaussian Splatting
9466TOWARDS MORE ACCURATE CROSS-MODAL VIDEO OBJECT DETECTION WITH LOWER COMPUTATIONAL COST
5711TOWARDS MULTI-VIEW HIERARCHICAL VIDEO-TO-PIANO GENERATION WITH MIDI GUIDANCE
13459TOWARDS NOISE-ROBUST SPEECH INVERSION THROUGH MULTI-TASK LEARNING WITH SPEECH ENHANCEMENT
9663TOWARDS OBJECT-LEVEL MULTIMODAL TASK PLANNING FOR LONG-TERM ROBOTIC MANIPULATION WITH VISION LANGUAGE MODEL AND BEHAVIOR TREE
6578TOWARDS OPEN-WORLD HUMAN-OBJECT INTERACTION REASONING WITH MULTIMODAL LARGE LANGUAGE MODEL
17315TOWARDS ORTHOGRAPHICALLY-INFORMED EVALUATION OF SPEECH RECOGNITION SYSTEMS FOR INDIAN LANGUAGES
10227TOWARDS PRACTICAL DIFFERENTIAL PRIVACY FOR DIFFUSION-BASED DATASET DISTILLATION
13224TOWARDS PRIVACY-PRESERVING FINE-GRAINED VISUAL CLASSIFICATION VIA HIERARCHICAL LEARNING FROM LABEL PROPORTIONS
14623TOWARDS REAL-TIME GENERATIVE SPEECH RESTORATION WITH FLOW-MATCHING
12724TOWARDS RELIABLE TIME SERIES FORECASTING UNDER FUTURE UNCERTAINTY: AMBIGUITY AND NOVELTY REJECTION MECHANISMS
1870TOWARDS ROBUST CROSS-COMPRESSION DEEPFAKE DETECTION
14174TOWARDS ROBUST DYSARTHRIC SPEECH RECOGNITION: LLM-AGENT POST-ASR CORRECTION BEYOND WER
9863Towards Robust Visual Continual Learning with Multi-Prototype Supervision
2164TOWARDS SELF-EVALUATION OF SYCOPHANTIC HALLUCINATIONS IN MATHEMATICAL REASONING
4483TOWARDS SEMANTICALLY FAITHFUL TEXT-TO-TIME SERIES GENERATION VIA AGENTS AND SPECTRAL CONDITIONING
3350TOWARDS TRANSFERABLE CROSS-MODAL ADVERSARIAL ATTACKS VIA SEMANTIC CONSISTENCY DISRUPTION
14112TPEformer: Temporal Patch Embedding Transformer
4117TPFLOW: TOWARDS TOPOLOGICALLY-AWARE MOLECULAR GRAPH GENERATION VIA DISCRETE FLOW MATCHING
5589TPP-LLM: TIME SERIES POPULARITY PREDICTION VIA LLM EMPOWERED BY TEXTUAL PROTOTYPE AND PROMPT
14940TRACE: A TRIPLET-BASED ROBUSTNESS-AUGMENTED CAUSAL ENCODER FOR CAUSALITY GRAPH EVENT PREDICTION
10400TRACE: OPTIMIZING MULTI-HOP QUESTION ANSWERING VIA CONFIDENCE-GUIDED RETRIEVAL ASSIMILATION
15763TRACE: TRACKING AND ADDRESSING CROSS-DOMAIN CONFLICT FOR ENHANCED SEMANTIC SEGMENTATION
16590Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework
9113TRAFFIC ANOMALY DETECTION VIA DIMENSION-AWARE MULTI-VIEW ALIGNMENT
9862TRAFFICGS: SPARSE-VIEW GAUSSIAN SPLATTING FOR DYNAMIC ROADSIDE TRAFFIC SCENE MODELING AND STREAMING
6304TRAFFICHTG: REVOLUTIONIZING NETWORK TRAFFIC GENERATION WITH HIERARCHICAL TRANSFORMERS
5884TRAFFICMOE: ADAPTIVE MULTI-PERSPECTIVE FEATURE FUSION FOR ENHANCING MALICIOUS TRAFFIC GENERAL DETECTION CAPABILITY
3569Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio
13782TRAIN2EXPLAIN: TRAINING OPTIMIZATION FOR EXPLANATION IMPROVEMENT
15735Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction
11017Training Flow Matching Models with Reliable Labels via Self-Purification
13128TRAINING QUANTIZED SPIKING NEURAL NETWORKS WITH LOW-BIT GRADIENTS
4570TRAINING STUDENTS FOR RESEARCH WITH QUANTUM AI SIMULATION TOOLS
10006Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial ReaSoning
14452TRAINING-FREE FRAMEWORK FOR DEFENDING UNSAFE IMAGE SYNTHESIS ATTACK
16133TRAINING-FREE INFERENCE-TIME SCALING FOR AUDIO SOURCE SEPARATION
4364Training-Free Layered Framework for Geometry-Aware Multilingual Text Editing
13443TRAINING-FREE MULTIMODAL GUIDANCE FOR VIDEO TO AUDIO GENERATION
15693Training-Free Prompt Compression via Shallow-Layer Structural-Semantic Fusion
2247TRAINING-FREE SIGNAL RECONSTRUCTION UNDER DRFM JAMMING VIA SKEWNESS-ADAPTIVE GATING AND GEOMETRIC REFINEMENT
10479TRAINING-FREE TEST-TIME ADAPTATION WITH BROWNIAN DISTANCE COVARIANCE IN VISION-LANGUAGE MODELS
16656Trajectory-Enhanced Camera Motion Understanding for Multimodal Large Language Models
14145TrajRS: Towards Certified Robustness in Pedestrian Trajectory Prediction
9641Transfer Learning for Paediatric Sleep Apnoea Detection Using Physiology-Guided Acoustic Models
14141Transfer Learning in Kernel Adaptive Filters with Dynamic Embeddings
16518Transferable Adversarial Attacks against Visual Language Models via Staged Semantic Reframing
3024TRANSFERABLE AUDIO LOTTERY TICKETS: GRADIENT ACCUMULATION FOR EXTREME SPARSITY
10938TransferAnything: Arbitrary Style Transfer via Frequency-Aware Latent Optimization in Diffusion Models
2861TRANSFORMER AND LATENT SCALABLE CONTRASTIVE LEARNING FOR GHOST-FREE HIGH DYNAMIC RANGE IMAGING
2059TRANSFORMER IMAGE QUALITY ASSESSMENT WITH MULTIMODAL FEATURES FUSION
15824TRANSPONDER-ASSISTED DIRECT TRACKING OF TIME-VARYING EMITTERS UNDER EXTREME BLOCKAGE
7016TRANSWNET: DUAL-STREAM HIERARCHICAL FEATURE INTEGRATED NETWORK FOR IMAGE FORGERY LOCALIZATION
12980Tree Reparameterized Belief Propagation for Gaussian Markov Random Fields
6212TriAD: Tri-head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection
5756Triage knowledge distillation for speaker verification
7054TRIAGE: HIERARCHICAL VISUAL BUDGETING FOR EFFICIENT VIDEO REASONING IN VISION-LANGUAGE MODELS
14667TRI-ATTENTION FUSION: JOINT TEMPORAL-SPECTRAL AND BIDIRECTIONAL MODELING FOR SPEECH SPOOFING DETECTION
15538TRICON-FAIR: TRIPLET CONTRASTIVE LEARNING FOR MITIGATING SOCIAL BIAS IN PRE-TRAINED LANGUAGE MODELS
7827TriFusion: A Self-Supervised Learning Enhanced Dual-Level Multimodal Framework for Traffic Classification
13778Tri-Hybrid Beamforming Design for Integrated Sensing and Communications
9967TRIM: A SELF-SUPERVISED VIDEO SUMMARIZATION FRAMEWORK MAXIMIZING TEMPORAL RELATIVE INFORMATION AND REPRESENTATIVENESS
3479TRINET: A NOVEL AND MEMORY-EFFICIENT TENSOR NETWORK FOR HIGHER-ORDER TENSOR DECOMPOSITION
17111TRIVISIONTALK: MANDARIN LIP-TO-SPEECH SYNTHESIS WITH MULTIPLE VISUAL PATTERN INFORMATION AND MULTI-SCALE HYBRID ATTENTION
11705TRJSCC: Text-guided ROI-aware Deep Joint Source-Channel Coding
1645TRM-UNET: AN EFFICIENT EVENT-GUIDED MOTION DEBLURRING NETWORK
12355TRUST YOUR DEMONSTRATIONS: ENHANCING LLM-DRIVEN TEXT STYLE TRANSFER VIA CLUSTER-GUIDED SEMANTIC CONTRASTIVE DECODING
1971TRUSTWORTHY AI VIA UNBIASED VALIDATION: FAIR MODEL SELECTION FOR PARKINSON’S DETECTION FROM VOICE
4698TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS FOR CLIENT-SIDE CONTENT SCANNING
16293TSAD-RAG: Boosting MLLM Time Series Anomaly Detection Via Retrieval-Augmented Generation
4398TS-Agent: Reinforcement Learning Empowered LLM Agents for Financial Time Series Forecasting
14948TSAR: Scalable Time Series Forecasting Meets Next-Scale Autoregressive Modeling
9632TSQLORA: TOWARDS SENSITIVITY AND QUALITY LOW-RANK ADAPTATION FOR EFFICIENT FINE-TUNING
17076TTA: TRANSCRIBE, TRANSLATE AND ALIGNMENT FOR CROSS-LINGUAL SPEECH REPRESENTATION
10899TTCE: TRACING TIME CYCLES FOR TEMPORAL KNOWLEDGE GRAPH EMBEDDINGS
19008TTSOPS: A CLOSED-LOOP CORPUS OPTIMIZATION FRAMEWORK FOR TRAINING MULTI-SPEAKER TTS MODELS FROM DARK DATA
11442TUNING-FREE FIDELITY-CONSTRAINED DECODING FOR FAITHFUL LEGAL REASONING WITH OPEN-DOMAIN LARGE LANGUAGE MODELS
18937Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization
1736TUP: A Transferable Model for Wireless User Positioning with Few-Shot Learning
4686TURN THE BLACK-BOX WHITE: INFERRING AGGREGATION RULES IN FEDERATED LEARNING THROUGH MULTI-TRIGGER GEOMETRY-AWARE BACKDOORS
5928TURNING DATA HETEROGENEITY INTO A BACKDOOR SHIELD FOR PERSONALIZED FEDERATED LEARNING
3787TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
13529TVP-UNET: THRESHOLD VARIANCE PENALTY U-NET FOR VOICE ACTIVITY DETECTION IN DYSARTHRIC SPEECH
3541TWO-STAGE ATTENTION TRIPLE ENHANCEMENT AND U-KAN DIFFUSION FOR FEW-SHOT KNOWLEDGE GRAPH COMPLETION
8065TWO-STAGE AUDIO-VISUAL TARGET SPEAKER EXTRACTION SYSTEM FOR REAL-TIME PROCESSING ON EDGE DEVICE
12632TWO-STAGE CATEGORY-ANCHORED FACTORIZED DISENTANGLEMENT FOR CROSS-DOMAIN RECOMMENDATION
6807TWO-STAGE GRID OPTIMIZATION FOR GROUP-WISE QUANTIZATION OF LLMS
17801TWO-STAGE LANGUAGE MODEL FRAMEWORK FOR ACOUSTIC ECHO CANCELLATION
5241TWO-TIMESCALE CHANNEL ESTIMATION FOR RIS-ASSISTED NEAR-FIELD COMMUNICATION
11506UAFD: Unified Adaptive Frequency-Domain Detector for Generalizable Deepfake Detection
3547UA-TTRL: Uncertainty-Aware Test-Time Reinforcement Learning
13768UAV PATH PLANNING FOR RADIO FREQUENCY SIGNAL LOCALIZATION VIA CRLB-BASED UNCERTAINTY MINIMIZATION
6693U-DAVI: UNCERTAINTY-AWARE DIFFUSION-PRIOR-BASED AMORTIZED VARIATIONAL INFERENCE FOR IMAGE RECONSTRUCTION
5832UDT: UNSUPERVISED DUAL-PATH TARGET FEATURE REFINEMENT FOR ROBUST SAR AUTOMATIC TARGET RECOGNITION
1746UJCODEC: AN END-TO-END UNET-STYLE CODEC FOR JOINT SPEECH COMPRESSION AND ENHANCEMENT
2451ULTRALIGHT IPM-DAE: AN ULTRA-LIGHTWEIGHT ECG DENOISING AUTOENCODER VIA PARALLEL MAMBA AND MULTI-SCALE FUSION
15822Ultra-Reliable Risk-Aggregated Sum Rate Maximization via Model-Aided Deep Learning
3546ULTRASONIC IN-EAR DETECTION FOR EARBUDS
4930UMA-SPLIT: UNIMODAL AGGREGATION FOR BOTH ENGLISH AND MANDARIN NON-AUTOREGRESSIVE SPEECH RECOGNITION
19015U-MusT: A Unified Framework for Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
9913UMV: A MIXTURE-OF-EXPERTS VISION TRANSFORMER WITH MULTI-SPECTROGRAM FUSION FOR UNDERWATER SHIP NOISE CLASSIFICATION
15549UNBOUNDED HAT: AN E2E BOUNDARY-INDEPENDENT AUTOMATIC SYLLABLE STRESS DETECTION WITH HIERARCHICAL ATTENTION BASED TIME-COMPRESSION
5129UNCERTAINTY FACTORIZATION WITH LINEAR-TIME SEQUENTIAL MODELING FOR SPEAKER EMBEDDING
10451UNCERTAINTY-AWARE 3D EMOTIONAL TALKING FACE SYNTHESIS WITH EMOTION PRIOR DISTILLATION
15458Uncertainty-Aware Iterative Graph Reasoning for Document Event Causality Identification
15787UNCERTAINTY-AWARE MULTIMODAL ADAPTIVE FUSION WITH MIXTURE-OF-EXPERTS FOR ZERO-SHOT VIDEO OBJECT SEGMENTATION
9623Uncertainty-Aware Multi-Scale Feature Fusion with Transformer for Time Series Prediction
1709UNCERTAINTY-AWARE PROTOTYPE LEARNING WITH VARIATIONAL INFERENCE FOR FEW-SHOT POINT CLOUD SEGMENTATION
14190Uncertainty-Aware Sequence Classification with Probabilistic Selective State-Space Models
16166UNCERTAINTY-AWARE WIRELESS LOCALIZATION WITH DIFFUSION MODELS
5022UNCERTAINTY-GUIDED DOMAIN AUGMENTATION FOR DOMAIN GENERALIZATION IN SPEAKER VERIFICATION AND ANTI-SPOOFING
6009UNCERTAINTY-GUIDED SPLATTING: A DUAL ADAPTIVE OPTIMIZATION FRAMEWORK FOR 3D SCENE RECONSTRUCTION
14341Unconditional flow-based time series generation with equivariance-regularised latent spaces
5052UNCOVERING PRIVACY RISKS IN TIMEGAN: NOVEL AND EFFECTIVE MEMBERSHIP INFERENCE ATTACKS
15795UNDERSTANDING FRECHET SPEECH DISTANCE FOR SYNTHETIC SPEECH QUALITY EVALUATION
2649UNDERSTANDING GENERALIZATION IN DECENTRALIZED LEARNING: A TIME-UNIFORM AND TOPOLOGY-AWARE ANALYSIS
15443UNDERSTANDING PERSONALITY BASES
11314UNDERSTANDING TEXTUAL CAPABILITY DEGRADATION IN SPEECH LLMS VIA PARAMETER IMPORTANCE ANALYSIS
3343Understanding the Improvement in Model Quantization
2830UNDERSTANDING THE STRENGTHS AND WEAKNESSES OF SSL MODELS FOR AUDIO DEEPFAKE MODEL ATTRIBUTION
1608Unfettered Ink: Restoring Legibility and Stylistic Consistency in Immersive Air Handwriting
2284UNICAMO: A UNIVERSAL PHYSICAL CAMOUFLAGE FOR MULTISPECTRAL OBJECT DETECTOR
2328UniDiff-TTS: Aligner-Free Diffusion Speech Synthesis with Duration Guidance
6838UNI-EDIT: CONSISTENT TEXT-DRIVEN EDITING FOR 3D GAUSSIAN SPLATTING
18863Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework
17145Unified Compression via Adaptive Bits Selection and structural reparameterization
12425UNIFIED MODELING OF LAGGED AND SYNCHRONIZED RELATIONS IN MULTIVARIATE TIME SERIES FORECASTING
12770UNIFIED MULTIMODAL AND MULTILINGUAL RETRIEVAL VIA MULTI-TASK LEARNING WITH NLU INTEGRATION
18867UNIFIED NEURAL BACKDOOR REMOVAL WITH ONLY FEW CLEAN SAMPLES THROUGH UNLEARNING AND RELEARNING
5500UNIGEO: A UNIFIED 3D INDOOR OBJECT DETECTION FRAMEWORK INTEGRATING GEOMETRY-AWARE LEARNING AND DYNAMIC CHANNEL GATING
10464UNIKGLM : A UNIFIED LLM-DRIVEN MULTI-TASK REASONING FRAMEWORK FOR KNOWLEDGE GRAPH COMPLETION
5971UNILORA: A UNIFIED FRAMEWORK FOR EFFICIENT AND SECURE LORA MANAGEMENT IN MULTI-TENANT LLM INFERENCE
13585UNIMOCOLA: AN UNCERTAINTY-GUIDED MULTI-MODEL COLLABORATION FRAMEWORK FOR CROSS-LINGUAL NAMED ENTITY RECOGNITION
7808UNIPACT: A MULTIMODAL FRAMEWORK FOR PROGNOSTIC QUESTION ANSWERING ON RAW ECG AND STRUCTURED EHR
1238UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition
2182UNIVERSAL 3D POINT CLOUD ATTACK USING GAUSSIAN DISTRIBUTION MODELING
1775UNIVERSAL DENOISING PATTERNS FOR DIFFUSION IMAGE DETECTION
18968Universal Vessel Segmentation for Multi-Modality Retinal Images
15913UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
16205UNLABELED TARGET-DOMAIN CALIBRATION FOR TABULAR CLASSIFIERS UNDER LABEL SHIFT
12556UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion
16924Unleashing the power of global-local synergy for multivariate time series forecasting
4724Unleashing Vision Transformer Potential in Image Quality Assessment via Global-Local Adaptive Interaction
12659UNLOCKING HIDDEN POTENTIAL IN POINT CLOUD NETWORKS WITH ATTENTION-GUIDED GROUPING-FEATURE COORDINATION
19137UNLOCKING OFF-THE-GRID SPARSE RECOVERY WITH UNLIMITED SENSING: SIMULTANEOUS SUPER-RESOLUTION IN TIME AND AMPLITUDE
1989UNLOCKING THE POTENTIAL OF SOCIAL MEDIA PREFERENCE FOR ANNOTATION-EFFICIENT LARGE LANGUAGE MODEL ALIGNMENT
11824UNMIXX: UNTANGLING HIGHLY CORRELATED SINGING VOICES MIXTURES
6929UNPAIRED INCREMENTAL HASHING FOR CROSS-MODAL RETRIEVAL IN NON-STATIONARY ENVIRONMENTS
10174UNROLLED GRAPH NEURAL NETWORKS FOR CONSTRAINED OPTIMIZATION
6248UNSEEN BUT NOT UNKNOWN: USING DATASET CONCEALMENT TO ROBUSTLY EVALUATE SPEECH QUALITY ESTIMATION MODELS
13844UNSUPERVISED ADAPTATION OF AI DOA ESTIMATORS VIA DOWNSTREAM TRACKING
14419Unsupervised Discovery and Analysis of the Vocal Repertoires and Patterns of Select Corvid Species
2569UNSUPERVISED DOMAIN ADAPTATION WITH CONTRASTIVE LEARNING FOR CROSS-MODALITY AND CROSS-SITE MEDICAL IMAGE SEGMENTATION
10886Unsupervised Learning To Hash with A Soft Winner-Take-All Mechanism
1258UNSUPERVISED LEXICON LEARNING FROM SPEECH IS LIMITED BY REPRESENTATIONS RATHER THAN CLUSTERING
8182UNSUPERVISED PROJECTION VIA CONVEX-HULL RADIUS MINIMIZATION FOR COMPACT CLUSTER REPRESENTATIONS
11949UNSUPERVISED SENTENCE STRESS DETECTION IN L2 SPOKEN ENGLISH VIA ITERATIVE ADAPTATION OF WHISPER ASR FRAMEWORK
16282Unsupervised TBD-MIG Detectors in Nonhomogeneous Clutter
1600Unsupervised UAV Detection from Sparse LiDAR via Temporal Dispersion Signatures
12859UNWRAPDIFF: A CONDITIONAL DIFFUSION MODEL FOR INSAR PHASE UNWRAPPING
2269UP TO 36X SPEEDUP: MASK-BASED PARALLEL INFERENCE PARADIGM FOR KEY INFORMATION EXTRACTION IN MLLMS
2956UP-AF: URBAN PERCEPTION VIA ACTIVE FINETUNING
12579UPLINK PERFORMANCE OF MULTIPLE RIS-ASSISTED CELL-FREE MASSIVE MIMO SYSTEMS JOINTLY RIS PHASE SHIFT OPTIMIZATION
2454USCTNET: A DEEP UNFOLDING NUCLEAR-NORM OPTIMIZATION SOLVER FOR PHYSICALLY CONSISTENT HSI RECONSTRUCTION
10389USER-LEVEL SAFETY ALIGNMENT
9729USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization
1472Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks
15842Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration
8855UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model
2337UVT-LM: UNIFYING VISUAL AND TACTILE PERCEPTION WITH LANGUAGE MODEL
16380V2A-DPO: OMNI-PREFERENCE OPTIMIZATION FOR VIDEO-TO-AUDIO GENERATION
14787V2R2: Hierarchical Dual-View Consistency with Dual-Representations for Network Alignment
15286VAE-GENERATED SECOND-ORDER GLOBAL PROTOTYPES FOR HETEROGENEOUS FEDERATED LEARNING
1642VARDet: Visual Autoregressive Multi-Scale Prediction and CLIP-Guided Semantics for UAV Small-Object Detection
17546VARIABLE METRIC STOCHASTIC LINE-SEARCH FOR PRIMAL-DUAL HYBRID GRADIENT
15594variance & greediness: a comparative study of metric-learning losses
14327VARIATIONAL BAYESIAN FILTERING USING GAUSSIAN MIXTURES
17262Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
4951VARIATIONAL NEAREST NEIGHBOR SIGN LANGUAGE TRANSLATION
17573VBX FOR END-TO-END NEURAL AND CLUSTERING-BASED DIARIZATION
5119VCE: A ZERO-COST HALLUCINATION MITIGATION METHOD OF LVLMS VIA VISUAL CONTRASTIVE EDITING
14024VChangeCodec: An ultra Low-Complexity Neural Speech Codec with Built-in Voice Changer for Customized Real-time Communication
3360VDCKAN: A KOLMOGOROV-ARNOLD DRIVEN MODEL FOR VOLUMETRIC DATA COMPRESSION
14602Vector Quantization-based Watermarking for Autoregressive Generated Images
6088Vector Quantized Intent Contrastive Learning for Sequential Recommendation
9972VELOCITY POTENTIAL NEURAL FIELD FOR EFFICIENT AMBISONICS IMPULSE RESPONSE MODELING
18913Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance
6115VIA SCORE TO PERFORMANCE: EFFICIENT HUMAN-CONTROLLABLE LONG SONG GENERATION WITH BAR-LEVEL SYMBOLIC NOTATION
16350VIB2SOUND: SEPARATION OF MULTIMODAL SOUND SOURCES
10490Video Hashing via Transformer and KAN for Retrieval
15138VIEWLEARNER: GNN-DRIVEN PRE-BUILT VIEWS FOR MULTI-TABLE NL2SQL
14491VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation
13870VIRTUAL CONSISTENCY FOR AUDIO EDITING
12138VISA: Virtual Identity for Secure Face Anonymization
17177VISCORTEX: HIERARCHICAL CORTICAL FUSION FOR FMRI IMAGE DECODING
1107VISION KAN: TOWARDS AN ATTENTION-FREE BACKBONE FOR VISION WITH KOLMOGOROV-ARNOLD NETWORKS
16366VISION MEETS LANGUAGE: ADAPTIVE JOINT PRUNING FOR EFFICIENT MULTIMODAL MODELS
16024VISION-ENHANCED TIME SERIES FORECASTING BY DECOMPOSED FEATURE EXTRACTION AND COMPOSED RECONSTRUCTION
9717Visual Contrastive Guidance for Improving Generalization of Gaze Estimation
15345VISUAL KEYS TO SYMPHONIES: LATENT DIFFUSION FOR MULTI-SCENE VIDEO-TO-MUSIC GENERATION
4083VISUAL SALIENCY STEERING DISTILLATION FOR MULTIMODAL CHAIN-OF-THOUGHT REASONING
9614VISUAL-AIDED AIRCRAFT ILS DEVIATION ESTIMATION USING RAO-BLACKWELLIZED PARTICLE FILTERS ON LIE GROUPS
18976VISUAL-INFORMED SPEECH ENHANCEMENT USING ATTENTION-BASED BEAMFORMING
10179VisualPrism: Disperse-and-Focus Token Compression
18172VITEX: VISUAL TEXTURE CONTROL FOR MULTI-TRACK SYMBOLIC MUSIC GENERATION VIA DISCRETE DIFFUSION MODELS
2620VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink
16931VIVIDVOICE: A UNIFIED FRAMEWORK FOR SCENE-AWARE VISUALLY-DRIVEN SPEECH SYNTHESIS
3231VKT+: ENHANCING VISUAL KNOWLEDGE TRACING VIA NEURAL NETWORK ARCHITECTURE SEARCH
3873VKTNet: A Hybrid Visual Kolmogorov-Arnold Transformer Network for Pedestrian Intention and Trajectory Prediction
6421VL-ANODIFF:VISION-LANGUAGE GUIDED DIFFUSION FOR FEW-SHOT INDUSTRIAL ANOMALY SYNTHESIS
3551VMambaMorph: a 3D Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module
7844VMSP: Video-to-Music Generation with Two-Stage Alignment and Synthesis
12139VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays
8003VNODE: A PIECEWISE CONTINUOUS VOLTERRA NEURAL NETWORK
4631VOCALNET-M2: ADVANCING LOW-LATENCY SPOKEN LANGUAGE MODELING VIA INTEGRATED MULTI-CODEBOOK TOKENIZATION AND MULTI-TOKEN PREDICTION
17003VOICING-GUIDED DECOMPOSITION AND RECOMPOSITION FOR FEW-SHOT KEYWORD-INCREMENTAL LEARNING
13419VOROGEOMNET: A GRAPH NEURAL NETWORK BASED ON VORONOI TESSELLATION FOR PROPERTY PREDICTION OF POROUS MATERIAL
16796VOTING-BASED PITCH ESTIMATION WITH TEMPORAL AND FREQUENTIAL ALIGNMENT AND CORRELATION AWARE SELECTION
15507VoxGuard: Evaluating user and attribute privacy in speech via Membership Inference Attacks
15057VOXMORPH: SCALABLE ZERO-SHOT VOICE IDENTITY MORPHING VIA DISENTANGLED EMBEDDINGS
4854VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
18058VP-GNN: A UNIFIED GRAPH FRAMEWORK FOR VARIABLE-WISE AND PATCH-WISE MODELING OF IRREGULAR CLINICAL TIME SERIES
9940VQEzy: AN OPEN-SOURCE DATASET FOR PARAMETER INITIALIZATION IN VARIATIONAL QUANTUM EIGENSOLVERS
7833VSE: VARIATIONAL STATE ESTIMATION OF COMPLEX MODEL-FREE PROCESS
3716VSTYLE: A BENCHMARK FOR VOICE STYLE ADAPTATION WITH SPOKEN INSTRUCTIONS
15067VT-Heads: Voice Cloning and Talking Head Generation From Text Based on V-DiT
15238VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content
16028WAM-UNET: A HYBRID U-NET ARCHITECTURE WITH WMAA AND AEMAMBA FOR MEDICAL IMAGE SEGMENTATION
14450WARM: WEIGHT ALIGNMENT AND REMAPPING FOR EFFECTIVE NEURAL NETWORK REINITIALIZATION
16523WARP QUANTIFICATION ANALYSIS: A FRAMEWORK FOR PATH-BASED SIGNAL ALIGNMENT METRICS
12456WATEMP: ACOUSTIC-BASED NON-CONTACT WATER TEMPERATURE MEASUREMENT SYSTEM USING SMARTPHONES
9568WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation
13839Watermark Self-Repair Model: Robust Multimodal Watermark Generation via Anomaly-Aware Mask Restoration
15391WAV2LEV: PREDICTING LEVENSHTEIN EDIT OPERATION SEQUENCES FOR FINE-GRAINED ESTIMATION OF AUTOMATIC SPEECH RECOGNITION ERROR
13375WaveFormer: Cross-modal Fusion with Robust Multi-view Flow Representation for Encrypted Traffic Classification
16080WaveFormer: Wavelet-Enhanced Transformer for Multi-Scale Representation Learning in Time Series Forecasting
18918WAVEFORMS FOR COMPUTING OVER THE AIR: A GROUNDBREAKING APPROACH THAT REDEFINES DATA AGGREGATION
17027WAVELET-AWARE ANOMALY DETECTION IN MULTI-CHANNEL USER LOGS VIA DEVIATION MODULATION AND RESOLUTION-ADAPTIVE ATTENTION
5233WAVELET-DRIVEN SPATIAL-FREQUENCY MODULATION NETWORK FOR UNDERWATER IMAGE ENHANCEMENT
18264WAVELETGAUSSIAN: WAVELET-DOMAIN DIFFUSION FOR SPARSE-VIEW 3D GAUSSIAN OBJECT RECONSTRUCTION
12070WAVENEXT 2: CONVNEXT-BASED FAST NEURAL VOCODERS WITH RESIDUAL DENOISING AND SUB-MODELING FOR GAN AND DIFFUSION MODELS
5326WAVE-PCU: WAVELET-BASED POINT CLOUD UPSAMPLING WITH HIERARCHICAL TRANSFORMERS
11645WAVESPIKENET: A WAVELET-SPIKING FUSION ARCHITECTURE FOR AUDIO CLASSIFICATION ON EDGE DEVICES
1714WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection
14745WAVE-TRAINER-FIT: NEURAL VOCODER WITH TRAINABLE PRIOR AND FIXED-POINT ITERATION TOWARDS HIGH-QUALITY SPEECH GENERATION FROM SSL FEATURES
19136WavJourney: Compositional Audio Creation With Large Language Models
12540WAVLINK: COMPACT AUDIO–TEXT EMBEDDINGS WITH A GLOBAL WHISPER TOKEN
9095WEATHER-R1: LOGICALLY CONSISTENT REINFORCEMENT FINE-TUNING FOR MULTIMODAL REASONING IN METEOROLOGY
9511Weaving Time into Topics: A Neural-Dynamical Tapestry for Information Diffusion Modeling
3634WEBEXPERT: DOMAIN-AWARE WEB AGENTS WITH CRITIC-GUIDED EXPERT EXPERIENCE FOR HIGH-PRECISION SEARCH
16626WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent
1430WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope
13139WENETSPEECH-CHUAN: A LARGE-SCALE SICHUANESE CORPUS WITH RICH ANNOTATION FOR DIALECTAL SPEECH PROCESSING
11920WGIP: LOW-LIGHT IMAGE ENHANCEMENT WITH 4D LUT BY WAVELET-GUIDED INTENSITY PRIOR
15124WHAT IS THE RISK? EVALUATING THE IMPACT OF KNOWLEDGE DISTILLATION ON LLM VULNERABILITIES
13426WHAT THE STUDENT LEARNS IN KNOWLEDGE DISTILLATION: A SUBSPACE VIEW AND EVIDENCE ON CONVOLUTIONAL RECURRENT NETWORK
6058What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels
10848WHEN AND HOW LONG DID THERAPY HAPPEN? SOFT-SUPERVISING TEMPORAL LOCALIZATION USING AUDIO-LANGUAGE MODELS
16399WHEN AUDIO MATTERS: A LIGHTWEIGHT, HIERARCHICAL FUSION MODEL FOR SPEECH AND NON-VERBAL EMOTION RECOGNITION
19029WHEN BAYESIAN TENSOR COMPLETION MEETS MULTIOUTPUT GAUSSIAN PROCESSES: FUNCTIONAL UNIVERSALITY AND RANK LEARNING
11360WHEN CHILDREN TALK AND MACHINES LISTEN: TOWARD AN INTERPRETABLE SPEECH-BASED SCREENER FOR DUTCH DEVELOPMENTAL LANGUAGE DISORDER
11869When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence
2606WHEN LARGE VISION-LANGUAGE MODELS MEET PERSON RE-IDENTIFICATION
5923WHEN MAMBA MEETS KAN: A HYBRID LEARNING NETWORK FOR ELECTRIC VEHICLE CHARGING DEMAND PREDICTION
15731When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music LLMs
15891WHEN SIGNALS BEND: CURVATURE-GUIDED SELECTIVE GRAPH REWIRING FOR FEW-SHOT BOT DETECTION
7609WHEN SILENCE MATTERS: THE IMPACT OF IRRELEVANT AUDIO ON TEXT REASONING IN LARGE AUDIO-LANGUAGE MODELS
9868WHEN THREE HEADS COLLABORATE: ATTENTION-DRIVEN FUSION FOR LONG-TAILED SEMI-SUPERVISED LEARNING
12131WHEN VOICE MATTERS: A CONTROLLED STUDY OF AUDIO LLM BEHAVIOR IN CLINICAL DECISION-MAKING
14457WHERE, NOT WHAT: COMPELLING VIDEO LLMS TO LEARN GEOMETRIC CAUSALITY FOR 3D-GROUNDING
18131Which private attributes do VLMs agree on and predict well?
10877Whisper with Benefits: A Unified Approach to Speech and Speaker Attribute Recognition
13748WHISPER: COURTSIDE EDITION - ENHANCING ASR PERFORMANCE THROUGH LLM-DRIVEN CONTEXT GENERATION
9917WHISPER-FEST: SINGLE-CHANNEL FAR-FIELD ENHANCED SPEECH-TO-TEXT WITHOUT PARALLEL DATA
6489WHISPER-MLA: REDUCING GPU MEMORY CONSUMPTION OF ASR MODELS BASED ON MHA2MLA CONVERSION
1178WHISPER-QF: LEVERAGING DUAL CROSS-ATTENTION Q-FORMER FOR SPEECH EMOTION RECOGNITION WITH MULTI-TASK LEARNING
12711Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime
13362WHO'S RELATED? FAST AND ACCURATE FAMILY RELATIONSHIP DETECTION IN CONVERSATIONS
1528WHY DELETE? JUST MAKE IT NATURAL. MAXIMUM ENTROPY DISTRIBUTION DISTILLATION FOR LARGE LANGUAGE MODELS UNLEARNING
4839WHY DO SPEECH LANGUAGE MODELS FAIL TO GENERATE SEMANTICALLY COHERENT OUTPUTS? A MODALITY EVOLVING PERSPECTIVE
3672WHY TEMPORAL MODELING MODULES FALL SHORT IN TEMPORALLY SENSITIVE VIDEO-TEXT RETRIEVAL TASKS
11715WICON: A LIGHTWEIGHT CONTINUAL LEARNING APPROACH FOR WIFI-BASED HUMAN ACTIVITY RECOGNITION VIA MASK-ADAPTIVE CLASSIFIER EXPANSION
14374WIDEBAND DIRECTION-OF-ARRIVAL ESTIMATION THROUGH BLIND SPARSE LEAST SQUARE REGRESSION
18865WIDEBAND DOA ESTIMATION BASED ON STOCHASTIC MAXIMUM LIKELIHOOD ESTIMATION WITH FLAT SPECTRA ASSUMPTION
18889WIDEBAND RELATIVE TRANSFER FUNCTION (RTF) ESTIMATION EXPLOITING FREQUENCY CORRELATIONS
2957WIDTH-ENHANCED FINE-TUNING FOR LONG-TAILED LEARNING
11066WiFi-based Multi-user Activity Recognition via Id-Activity Decoupling
2028WiFi-GEN: High-Resolution Indoor Imaging from WiFi Signals Using Generative AI
17464WIFISIM: SIMULATING WIFI PROBE REQUESTS VIA AOSP ANALYSIS AND DEVICE BEHAVIOR MODELING
15217WINDMOE: MIXTURE-OF-EXPERTS METHOD FOR WIND POWER FORECASTING UNDER EXTREME WEATHER CONDITIONS
12306WINDOWED SUMMARYMIXING: AN EFFICIENT FINE-TUNING OF SELF-SUPERVISED LEARNING MODELS FOR LOW-RESOURCE SPEECH RECOGNITION
10123WIRAG: RETRIEVAL-AUGMENTED GENERATION WITH LARGE LANGUAGE MODELS (LLM) FRAMEWORK FOR WIFI-BASED HUMAN ACTIVITY RECOGNITION
12968WMOE-CLIP: WAVELET-ENHANCED MIXTURE-OF-EXPERTS PROMPT LEARNING FOR ZERO-SHOT ANOMALY DETECTION
5503WPGST: WAVELET POOLING GROUP SWIN TRANSFORMER FOR SUPERPIXEL SEGMENTATION
14385WRAPPER-AWARE RATE DISTORTION OPTIMIZATION IN FEATURE CODING FOR MACHINES
13514WTRSS: UNLEASHING THE POWER OF WAVELET TRANSFORM IN RADAR SEMANTIC SEGMENTATION
7578XAI-PRUNER: EXPLAINABILITY-DRIVEN PRUNING OF CNN AND TRANSFORMER
3241Xi+: Uncertainty Supervision for Robust Speaker Embedding
18903XLSR-MAMBA: A DUAL-COLUMN BIDIRECTIONAL STATE SPACE MODEL FOR SPOOFING ATTACK DETECTION
6099XMix: Combating Extremely Noisy Labels via Local Smoothness in Self-Supervised Feature Space
19043XPPG-PCA: Reference-Free Automatic Speech Severity Evaluation With Principal Components
10047YOIO: Fast and Reliable Optical Flow Estimation Using Accurate and Holistic References
4222ZEN-DARTS: MITIGATING PERFORMANCE COLLAPSE WITH SYNFLOW METRIC REGULARIZATION AND IMPROVED ARCHITECTURE PARAMETER INITIALIZATION
7755Zero-Shot VISUAL GROUNDING in 3D Gaussians via View Retrieval
15082ZIFORMER: TIME EFFICIENT SO(3)-EQUIVARIANT GRAPH NEURAL NETWORK FOR MOLECULAR SYSTEMS
7836Zip Your Data: Length-Adaptive Visual Token Optimization for Efficient Multi-Modal Training
9825ZIV-ZAKAI BOUND FOR DISTRIBUTED-ARRAY-BASED DOA ESTIMATION
10062ZK-VSA: ZERO-KNOWLEDGE VERIFIABLE SPEAKER ANONYMIZATION LEVERAGING PHASE VOCODER WITH TIME-SCALE MODIFICATION
14713Z-SCORES: A METRIC FOR LINGUISTICALLY ASSESSING DISFLUENCY REMOVAL
16932ZSDA-ICM: Zero-Shot Domain Adaptation of Image Compression for Machines in Diverse Scenes
8203ZSV2C-MLLM: Zero-Shot Visual Voice Cloning via Multimodal Large Language Models
6021β-AVSDNET: A NOVEL END-TO-END NEURAL NETWORK ARCHITECTURE FOR AUDIO-VISUAL SPEAKER DIARIZATION