List of Accepted Papers
Following is the list of accepted ICASSP 2026 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at papers@2026.ieeeicassp.org.
| Paper Number | Paper Title |
|---|---|
| 5047 | $\ell_0$-ESP: EFFICIENT STRUCTURED PRUNING FOR CONVOLUTIONAL NEURAL NETWORK COMPRESSION BASED ON $\ell_0$-NORM OPTIMIZATION |
| 6153 | $S^3$: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models |
| 10540 | (P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration |
| 7398 | {AutoACC: LLM-Driven Irregular Operator Optimization for Inference Acceleration on RISC-V Edge Devices |
| 13205 | {CAUSAL BLIND SOURCE SEPARATION: UNMIXING MULTIVARIATE SIGNALS BY DISCOVERING THEIR LATENT GENERATIVE GRAPH |
| 19138 | 1-BIT UNLIMITED SAMPLING BEYOND FOURIER DOMAIN: LOW-RESOLUTION SAMPLING OF QUANTIZATION NOISE |
| 15769 | 2 IN 1: A DUAL-PURPOSE APPROACH FOR EO-SAR SHIP DETECTION WITH SOURCE-FREE DOMAIN ADAPTATION |
| 5997 | 2025 URGENT SPEECH ENHANCEMENT CHALLENGE MULTILINGUAL P.808 LISTENING TESTS: APPROACH AND RESULTS |
| 11610 | 2I-Instruct: Generative Joint Empathy Detection and Empathy Intent Classification via Inter-Task and Inter-Instance Interactions |
| 2596 | 3D MESH GRID ROOM IMPULSE RESPONSES MEASURED WITH A LINEAR MICROPHONE ARRAY AND SUPPRESSION OF FRAME REFLECTIONS |
| 16784 | 3D MESH STEGANOGRAPHY ALGORITHM BASED ON NON-ADDITIVE DISTORTION MINIMIZATION |
| 14330 | 3D MOTION SYNTHESIS FROM SPARSE TRACKING WITH AUTOREGRESSIVE TEMPORAL WINDOWS |
| 1630 | 3D SCENE FLOW RECONSTRUCTION FOR DYNAMIC DEBLURRING WITH BOKEH RENDERING |
| 16854 | 3D-AWARE SEMANTIC ALIGNMENT: JOINT GLOBAL AND LOCAL MODELING FOR 3D FEW-SHOT ANOMALY DETECTION |
| 11322 | 3D-Aware Shadow Generation for Composite Image |
| 16864 | 3DIFFUSIONDET: DIFFUSION MODEL FOR 3D OBJECT DETECTION WITH ROBUST LIDAR-CAMERA FUSION |
| 17458 | 3DME: DUAL-BRANCH ENCODER WITH PROGRESSIVE MASKING FOR 3D MEDICAL FOUNDATION ENCODING MODEL |
| 7748 | 3GeM Pooling: Direction-Aware and Compact Global Descriptors for Visual Place Recognition |
| 18432 | 3-KEY-INPUT: EXPLORING THE THEORETICAL MINIMUM KEYS FOR TEXT ENTRY |
| 12641 | A BAYESIAN APPROACH TO SINGING SKILL EVALUATION USING SEMITONE PITCH HISTOGRAM AND MCMC-BASED GENERATED QUANTITIES |
| 4027 | A BENCHMARK DATASET AND BASELINE FRAMEWORK FOR ACTION RECOGNITION IN POWER CONSTRUCTION SAFETY |
| 13360 | A Benchmark for Joint Dialogue Satisfaction, Emotion Recognition, and Emotion State Transition Prediction |
| 12535 | A BIMODAL APPROACH FOR DETECTING FATIGUE USING SPEECH AND PERSONAL ASSESSMENTS IN COLLEGE STUDENTS |
| 17989 | A Broadband Unit-Circle MVDR Beamformer with Spatial Adaptive Canceller |
| 13692 | A CENTRALIZED PLANNING WITH DECENTRALIZED EXECUTION FRAMEWORK FOR COUNTER-UAV OPERATIONS IN URBAN ENVIRONMENTS |
| 12538 | A Class of Finitely LMI Representable Worst-Case SINR Maximization Problems of Robust Adaptive Beamforming for General-Rank Signal Models |
| 6787 | A COASTAL WIND SPEED RECONSTRUCTION ALGORITHM INTEGRATING LIMITED BUOY OBSERVATIONS AND LAYERED REFINEMENT |
| 5990 | A COCKTAIL-PARTY BENCHMARK: MULTI-MODAL DATASET AND COMPARATIVE EVALUATION RESULTS |
| 6227 | A COMPARATIVE STUDY ON HOW DATA NORMALIZATION AFFECTS ZERO-SHOT GENERALIZATION IN TIME SERIES FOUNDATION MODELS |
| 9885 | A Comprehensive Benchmark for Evaluating Video Colorization and Color Propagation Methods |
| 9587 | A Comprehensive Ecosystem for Open-Domain Customized Video Generation |
| 18898 | A COMPREHENSIVE GUIDE TO MULTISET CANONICAL CORRELATION ANALYSIS AND ITS APPLICATION TO JOINT BLIND SOURCE SEPARATION |
| 16901 | A Conflict-Free SpDMM Accelerator for GCN Inference on FPGA |
| 17419 | A CONSISTENT LEARNING DEPRESSION DETECTION FRAMEWORK INTEGRATING MULTI-VIEW ATTENTION |
| 8036 | A CONVERSATIONAL ENTITY LINKING METHOD BASED ON SENTENCE LEVEL AND TOKEN LEVEL DUAL EVALUATION |
| 12657 | A CONVEX DEMIXING APPROACH FOR HYBRID-FIELD CHANNEL ESTIMATION OF XL-MIMO SYSTEMS VIA ATOMIC NORM MINIMIZATION |
| 6197 | A DATA DRIVEN DESIGN FOR OPTIMAL SAMPLED SYNCHRONIZATION OF CHAOTIC SYSTEMS |
| 14137 | A Data-Centric Framework for Scientific Natural Language Inference via LLM-Driven Information-Theoretic Augmentation |
| 8493 | A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities |
| 12572 | A Data-Informed Adaptive Convolution Kernel Learning Method for Image Fusion |
| 14313 | A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks |
| 17728 | A DECOMPOSITION-BASED STATE SPACE MODEL FOR MULTIVARIATE TIME-SERIES FORECASTING |
| 19143 | A Deep Generative Model for Five-Class Sleep Staging with Arbitrary Sensor Input |
| 12174 | A DEEP LEARNING-BASED APPROACH TO TRAFFIC ACCIDENT EVIDENCE EXTRACTION |
| 17585 | A DISCRETE WAVELET TRANSFORM-BASED LIGHTWEIGHT TRANSFORMER MODEL FOR INTELLIGENT FAULT DIAGNOSIS |
| 3513 | A DISTRIBUTION MATCHING APPROACH TO NEURAL PIANO TRANSCRIPTION WITH OPTIMAL TRANSPORT |
| 17290 | A DUAL-BRANCH FRAMEWORK FOR SEMANTIC CHANGE DETECTION WITH BOUNDARY AND TEMPORAL AWARENESS |
| 4204 | A DUAL-CHANNEL ASR-LLM ARCHITECTURE WITH A PROGRESSIVE TRAINING STRATEGY FOR LOW-RESOURCE SPEECH RECOGNITION |
| 9650 | A DUAL-CONTEXT FUSION MODEL FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS |
| 14453 | A DUAL-MODULATION FRAMEWORK FOR RGB-T CROWD COUNTING VIA SPATIALLY MODULATED ATTENTION AND ADAPTIVE FUSION |
| 6280 | A DUAL-PATH APPROACH TO OPTIMIZING LLMS: ENTROPY CONSTRAINT FOR EXPLOITATION AND NEURAL PERTURBATION FOR EXPLORATION |
| 2777 | A DUAL-PATH MAMBA WITH FIXED AND VARIABLE PATCHES FOR TIME SERIES FORECASTING |
| 15436 | A Dynamic Dual-Backbone Model for Adaptive Vehicle-Pedestrian Detection |
| 18000 | A Dynamic Gated Cross-Attention Framework for Audio-Text Apparent Personality Analysis |
| 18866 | A FAST ALGORITHM FOR COMPUTATION OF GENERAL INTEGER-ORDER HANKEL TRANSFORMS |
| 9493 | A FEATURE-OPTIMIZED AUDIO WATERMARKING ALGORITHM WITH ADAPTIVE EMBEDDING STRENGTH |
| 9577 | A FINE-GRAINED MASK-GUIDED MULTIMODAL FRAMEWORK FOR WEAKLY SUPERVISED INSTANCE SEGMENTATION OF ROCK MICROGRAPHS |
| 4341 | A FINE-GRAINED MODALITY ALIGNMENT MODEL FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS |
| 16172 | A FRAMEWORK FOR BIPARTITE GRAPH STRUCTURE LEARNING THROUGH EIGENVECTOR PARTITIONING |
| 14631 | A FRAMEWORK FOR CONTROLLED MULTI-SPEAKER AUDIO SYNTHESIS FOR ROBUSTNESS EVALUATION OF SPEAKER DIARISATION SYSTEMS |
| 11990 | A FRAMEWORK FOR TEXT-TO-SEMANTIC SEGMENTATION MAP GENERATION |
| 9009 | A GAME-THEORETIC APPROACH FOR DISTRIBUTED MEC-ENABLED COLLABORATIVE INFERENCE IN AIGC NETWORKS |
| 15117 | A GENERALIZATION STRATEGY FOR SPEECH QUALITY PREDICTION: FROM DOMAIN-SPECIFIC TO UNIFIED DATASETS |
| 14537 | A Generative Model for Controllable Feature Heterophily in Graphs |
| 14495 | A GENERATIVE-FIRST NEURAL AUDIO AUTOENCODER |
| 17060 | A Graph-Based Framework for Detecting Small Noisy Targets: Theory and Analysis |
| 1874 | A High Performance Hardware Accelerator For Fully Homomorphic Encryption and Application to Neural Networks |
| 16402 | A Hybrid Convolution-Mamba Network With Tone-Octave Contrastive Learning For Stratified Semi-supervised Singing Melody Extraction |
| 11874 | A HYBRID GRID-BASED METHOD FOR VIDEO REPRESENTATION |
| 14715 | A JOINT SPATIAL TIME-FREQUENCY ATTENTION FOR LEAKAGE DETECTION IN WATER DISTRIBUTION NETWORKS |
| 12155 | A KEYWORD QUERY SYSTEM FOR NON-PUBLIC DATABASE SCHEMAS BASED ON SEMANTIC-ENHANCED INVERTED INDEXING |
| 12707 | A Latent Drift-Guided Replay Method for Robust Continual Learning in Medical Imaging |
| 5364 | A LEARNING-BASED AUTOMOTIVE SOUND FIELD REPRODUCTION METHOD USING PLANE-WAVE DECOMPOSITION AND MULTI-POSITION CONSTRAINT |
| 11730 | A LIGHTWEIGHT FOURIER-BASED NETWORK FOR BINAURAL SPEECH ENHANCEMENT WITH SPATIAL CUE PRESERVATION |
| 11236 | A LIGHTWEIGHT NETWORK WITH ADAPTIVE CONTEXT AND FREQUENCY-SPATIAL SYNERGY FOR HUMAN POSE ESTIMATION |
| 18683 | A LIGHT-WEIGHT PRNU-BASED CAMERA-DEVICE AUTHENTICATION BASED ON DEVICE-SPECIFIC IMAGE DOWNSAMPLING |
| 16437 | A LIGHTWEIGHT SEMANTIC SEGMENTATION SYSTEM FOR 3D MEDICAL IMAGE |
| 16847 | A LLM-Driven Acoustic Semantic Enriched Framework For Underwater Acoustic Target Recognition |
| 5025 | A long-form single-speaker real-time MRI speech dataset and benchmark |
| 9994 | A LOW-COMPLEXITY EQUALIZER DESIGN FOR OTFS MODULATION IN DOUBLY-DISPERSIVE CHANNELS |
| 6542 | A Low-Rank Angular Domain Sampling SVD Approximation for Massive MIMO Signal Processing |
| 1848 | A Malicious Policy Detection Approach Enhanced by Threat Knowledge in LLM-Based Embodied Robots |
| 15620 | A MARITIME SMALL TARGET DETECTION METHOD USING RANDOM FOREST WITH KMD FEATURE ENHANCEMENT AND COST-SENSITIVE LEARNING |
| 16114 | A Memory-Augmented Dual-Stream Framework to achieve Long-Horizon Generalization in Robotic Manipulation |
| 9876 | A mixed precision FFT with applications in MRI |
| 7079 | A MODEL-HETEROGENEOUS FEDERATED UNLEARNING METHOD VIA NEGATIVE KNOWLEDGE DISTILLATION |
| 12074 | A MODIFIED CONCEFT FRAMEWORK WITH OPTIMAL MULTITAPER WEIGHTS FOR ROBUST SYNCHROSQUEEZING |
| 11796 | A MODIFIED YOLO WITH DUAL-BRANCH ATTENTION FOR HIGH-ACCURACY DETECTION OF CORN SEEDLINGS IN UAV IMAGES |
| 18916 | A MOUSE DYNAMICS AUTHENTICATION SYSTEM WITH A RECURRENCE PLOT IMAGE REPRESENTATION AND A VISION TRANSFORMER FRAMEWORK |
| 10952 | A MULTI-AGENT SYSTEM FOR ZERO-SHOT CONTROLLABLE IMAGE CAPTIONING |
| 6764 | A MULTI-FREQUENCY CONTINUOUS-SHARE TRADING ALGORITHM WITH GARCH AND DEEP REINFORCEMENT LEARNING |
| 13142 | A MULTIMODAL DEPTH-AWARE METHOD FOR EMBODIED REFERENCE UNDERSTANDING |
| 9763 | A multi-prototypes graph-based clustering algorithm with entropy regularization |
| 1163 | A MULTI-ROUND INFERENCE BASED MACHINE READING COMPREHENSION MODEL FOR EMOTION CAUSE PAIR EXTRACTION |
| 6781 | A MULTI-SCALE SPATIALLY COLLABORATIVE FREQUENCY-GUIDED NETWORK FOR IMAGE DERAINING |
| 18230 | A MULTI-TASK APPROACH TOWARDS ROBUST VIETNAMESE AUDIO-BASED TOXIC SPAN DETECTION |
| 10053 | A Multi-View Fusion Framework for Audio-Visual Multi-Speaker Tracking |
| 16598 | A Neural Operator for Spatiotemporal Significant Wave Height Prediction Based on Spectral Residual Region Partitioning |
| 10291 | A NEW ADAPTIVE HYBRID REPRESENTATION METHOD FOR SINGLE-CELL DATA |
| 6629 | A NEW METHOD AND DATASET FOR CLASSROOM TEACHING STAGE SEGMENTATION |
| 11472 | A NEW WEIGHT-TYING ARCHITECTURE OF NONNEGATIVE NEURAL NETWORK: CONVERGENT PLUG-AND-PLAY IMAGE RESTORATION BY MONOTONE LIPSCHITZ-GRADIENT DENOISER |
| 13540 | A NONITERATIVE PHASE RETRIEVAL CONSIDERING THE ZEROS OF STFT MAGNITUDE |
| 6692 | A NON-OVERLAPPING HAWKES MODELING FOR TESTING GRANGER CAUSALITY |
| 18973 | A NONPARAMETRIC VARIABLE FORGETTING FACTOR RECURSIVE LEAST-SQUARES ALGORITHM |
| 13000 | A NO-REFERENCE SCREEN CONTENT IMAGE QUALITY ASSESSMENT METHOD BASED ON REGIONAL DISTORTION PERCEPTION |
| 10347 | A Noval Monte Carlo Gradient Method Based on Meta-learning for Effective Step-size Selection in Active Noise Control |
| 16281 | A NOVEL ARBITRARY RECOVERABLE MULTI-IMAGE HIDING ALGORITHM BASED ON POLARIZATION THEORY |
| 10958 | A NOVEL ARRAY DESIGN WITH INCREASED DEGREES OF FREEDOM BY EMPLOYING THE VERTICAL MOTION OF UNIFORM CIRCULAR ARRAY |
| 3503 | A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech |
| 9932 | A NOVEL BAYESIAN EM-LIKE ALGORITHM FOR FAST COMPTON CAMERA IMAGING |
| 10825 | A novel intrinsic Cramér-Rao bound for exact Gaussian distribution on Lie groups |
| 5261 | A Novel Iterative OTFS Detector based on Local L-MMSE and Global Message Passing |
| 8330 | A NOVEL MULTIBEAM TIME-DIVISION ISAC APPROACH WITH ACCURATE SENSING |
| 11194 | A NOVEL MULTI-SCALE FEATURE FUSION METHOD FOR REAL-TIME DANGEROUS DRIVING BEHAVIOR DETECTION IN REAL-WORLD DRIVING SCENARIOS |
| 1176 | A NOVEL MULTISCALE ORDER-FREQUENCY SPECTRAL CORRELATION ESTIMATOR FOR ANGLE-TIME CYCLOSTATIONARY SIGNALS |
| 2876 | A NOVEL SELF-CORRECTING DIRECT POSITION DETERMINATION IN ASYNCHRONOUS SENSOR NETWORKS |
| 17299 | A Novel Underwater Integrated Communication and Positioning Algorithm Based on OAM-OFDM |
| 12521 | A NUMERICALLY STABLE HOUSEHOLDER-BASED EX-RLS ALGORITHM |
| 14171 | A PARAMETER-EFFICIENT MULTI-SCALE CONVOLUTIONAL ADAPTER FOR SYNTHETIC SPEECH DETECTION |
| 13453 | A PARAMETRIC POWER MODEL OF UPPER MID-BAND (FR3) BASE STATIONS FOR 6G |
| 9018 | A PERSONALIZED FRAMEWORK FOR AUTOMATED AUDIO TUNING ON SHORT-FORM VIDEO PLATFORMS |
| 3523 | A PERSONALIZED REAL-TIME PROACTIVE VOICE MEMORY ASSISTANT |
| 17078 | A PSEUDOINVERSE-BASED MOMENTUM FISTA FOR SPARSE SIGNAL RECOVERY |
| 9499 | A Query-based End-to-End Transformer for Third-person Human Gaze Analysis via Joint Fine-tuning Strategy |
| 1032 | A Queueing Model for Memory Controller Scheduler Subject to DRAM Column Access Timing Constraints |
| 11458 | A RANDOM MATRIX PERSPECTIVE OF ECHO STATE NETWORKS: FROM PRECISE BIAS–VARIANCE CHARACTERIZATION TO OPTIMAL REGULARIZATION |
| 16607 | A ROBUST KNN APPROACH FOR MULTI-CLASS LARYNGEAL DISEASE DETECTION USING MFCC FEATURES |
| 10501 | A ROBUST METHOD FOR GEAR FAILURE DETECTION AND SEVERITY ESTIMATION BASED ON MULTI-SENSOR PHYSICAL FEATURE FUSION AND DOMAIN ADAPTATION |
| 6089 | A SCALED POISSON BAYESIAN MODEL FOR VIRAL EPIDEMIC MONITORING |
| 10207 | A SIMILARITY-GUIDED AGGREGATION NETWORK FOR SOUND EVENT LOCALIZATION AND DETECTION WITH SOURCE DISTANCE ESTIMATION |
| 6190 | A SPECTRAL-GUIDED LATENT PHYSICS SOLVER FOR PDE PROBLEMS |
| 8048 | A SPEECH-DRIVEN PARADIGM FOR PHYSICS-INFORMED MODELING OF COUPLED MICRO-SPEAKERS |
| 10889 | A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering |
| 5854 | A STAGE-WISE LEARNING STRATEGY WITH FIXED ANCHORS FOR ROBUST SPEAKER VERIFICATION |
| 13295 | A State-Dependent Markov Diffusion Process for Generative Speech Enhancement |
| 13120 | A STUDY OF DATA SELECTION STRATEGIES FOR PRE-TRAINING SELF-SUPERVISED SPEECH MODELS |
| 15434 | A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection |
| 2629 | A SUPPORT VECTOR APPROACH IN SEGMENTED REGRESSION FOR MAP-ASSISTED NON-COOPERATIVE SOURCE LOCALIZATION |
| 7202 | A TASK-AWARE DUAL-LEVEL SELF-SUPERVISED LEARNING METHOD FOR EFFECTIVE SOUND EVENT DETECTION |
| 5888 | A TEXT-TO-TEXT ALIGNMENT ALGORITHM FOR BETTER EVALUATION OF MODERN SPEECH RECOGNITION SYSTEMS |
| 11136 | A Training-Free Framework for High-Fidelity Appearance Transfer via Diffusion Transformers |
| 9484 | A TWO-PHASE HYBRID TASK SCHEDULING ALGORITHM WITH ROUTE PLANNING |
| 14911 | A Two-Stage Globally-Diverse Adversarial Attack for Vision-Language Pre-training Models |
| 5558 | A Unified Four-Stage Dynamic Cycle for Robust Federated Fine-Tuning of Large Language Models |
| 12990 | A UNIFIED HARDWARE ACCELERATOR FOR PRIVACY PRESERVING LLMS CLIENT-SIDE BASED ON CKKS HOMOMORPHIC ENCRYPTION |
| 10380 | a Unified Rate Control Method for Spinning and Non-Spinning LiDAR Point Cloud Compression |
| 4067 | A Unified SVD-Modal Solution for Sparse Sound Field Reconstruction with Hybrid Spherical-Linear Microphone Arrays |
| 3844 | A UNITARY QUANTUM PROCESS TOMOGRAPHY METHOD BASED ON A DENSITY MATRIX DIAGONALIZATION |
| 16472 | A UNSUPERVISED DOMAIN ADAPTATION FRAMEWORK FOR SEMI-SUPERVISED MELODY EXTRACTION USING CONFIDENCE MATRIX REPLACE AND NEAREST NEIGHBOUR SUPERVISION |
| 1768 | A User-Item Aware Encoding Framework for Short Video |
| 15722 | A WAVELET-BASED GRAPH DYNAMICAL CONFIDENCE INFORMATION BOTTLENECK NETWORK FOR CLASS-IMBALANCED NODE CLASSIFICATION |
| 13865 | A Wavelet-Based Network with Multi-Scale Feature Complementarity Enhancement for Salient Object Detection in Optical Remote Sensing Images |
| 14895 | A WAVELET–QUATERNION NEURAL MODULE FOR UNIVERSAL VISUAL BACKBONES |
| 5128 | A3D: ADVANCED ADVERSARIAL ATTACK AS DETECTION FRAMEWORK FOR EDGE DEVICES |
| 14511 | ABC-EVAL: BENCHMARKING LARGE LANGUAGE MODELS ON SYMBOLIC MUSIC UNDERSTANDING AND INSTRUCTION FOLLOWING |
| 17318 | ABRACADDBRA: TOUCH-GUIDED OBJECT ADDITION BY DECOUPLING PLACEMENT AND EDITING SUBTASKS |
| 17197 | ABS-HUNET: AN ULTRA-LIGHTWEIGHT SPEECH ENHANCEMENT MODEL WITH ADAPTIVE BAND-SPLIT AND HALF-UNET DESIGN |
| 16546 | ACAVCAPS: ENABLING LARGE-SCALE TRAINING FOR FINE-GRAINED AND DIVERSE AUDIO UNDERSTANDING |
| 17182 | Accelerated Approximate Message Passing |
| 11289 | Accelerated Sinkhorn Algorithms for Partial Optimal Transport |
| 12336 | Accelerated training of Gaussian processes using banded square exponential covariances |
| 13081 | ACCELERATING 3D GAUSSIAN SPLATTING VIA WAVELET-GUIDED SCHEDULING |
| 14708 | ACCELERATING FEDERATED LEARNING THROUGH DROPOUT OF RENEWABLE NEURON PARAMETERS |
| 3280 | ACCELERATING KBQA VIA LOGICAL-QUESTION BIDIRECTIONAL RERANKING |
| 18106 | ACCELERATING VEHICULAR FEDERATED LEARNING VIA CONVERGENCE-AWARE HIERARCHICAL SCHEDULING |
| 3563 | ACCELGS: AN ACCELERATION FRAMEWORK FOR LARGE-SCALE 3D GAUSSIAN SPLATTING TRAINING |
| 16467 | ACCENT-INVARIANT AUTOMATIC SPEECH RECOGNITION VIA SALIENCY-DRIVEN SPECTROGRAM MASKING |
| 15083 | ACCEPTANCE-GUIDED ADAPTIVE SPECULATIVE DECODING FOR EFFICIENT LARGE LANGUAGE MODEL INFERENCE |
| 14448 | ACCLID: ACCENT-AWARE LANGUAGE IDENTIFICATION FOR ROBUST MULTILINGUAL SPEECH RECOGNITION |
| 1226 | ACD-CLIP: DECOUPLING REPRESENTATION AND DYNAMIC FUSION FOR ZERO-SHOT ANOMALY DETECTION |
| 3084 | Achieving Linear Speed-Up for Distributed Inexact-ADMM |
| 6230 | ACHIEVING PARETO OPTIMALITY IN GAMES VIA SINGLE-BIT FEEDBACK |
| 9307 | ACIR-MACL: EFFECTIVE MULTIMODAL SENTIMENT ANALYSIS VIA ATTENTION-BASED CAUSAL INTERVENTION REGULARIZATION AND MULTI-ASPECT CONTRASTIVE LEARNING |
| 16264 | ACM: MULTIPLE ATTRIBUTES CONTRASTIVE MECHANISM FOR VALUE DECOMPOSITION IN MULTI-AGENT REINFORCEMENT LEARNING |
| 15340 | ACOUSTIC AND FACIAL MARKERS OF PERCEIVED CONVERSATIONAL SUCCESS IN SPONTANEOUS SPEECH |
| 17105 | ACOUSTIC FEEDBACK CANCELLATION IN HEARING AIDS EXPLOITING AN INERTIAL SENSOR |
| 1182 | ACOUSTIC NON-STATIONARITY OBJECTIVE ASSESSMENT WITH HARD LABEL CRITERIA FOR SUPERVISED LEARNING MODELS |
| 19118 | Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities |
| 14026 | ACOUSTIC TELEPORTATION VIA DISENTANGLED NEURAL AUDIO CODEC REPRESENTATIONS |
| 3260 | ACTION-AWARE QUERY SELECTION AND AMBIGUOUS SNIPPET DISAMBIGUATION FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION |
| 7518 | ActionHSMR: Sequence-based 3D Human Pose and Mesh Estimation with Temporal Consistency |
| 12678 | ACTIVE INFERENCE FRAMEWORK FOR CLOSED-LOOP SENSING, COMMUNICATION, AND CONTROL IN UAV SYSTEMS |
| 14428 | ACTIVE JAMMER LOCALIZATION VIA ACQUISITION-AWARE PATH PLANNING |
| 12135 | ACTIVE SENSING BASED BEAM ALIGNMENT FOR BACKSCATTER COMMUNICATION |
| 7206 | ACTIVE SEQUENTIAL HYPOTHESIS TESTING WITH NON-HOMOGENEOUS COSTS |
| 4719 | ACTIVE-EDIT: HIGH-FIDELITY 3D EDITING FROM A HANDFUL OF TASK-RELEVANT |
| 2507 | ACTIVEPARAM: SELECTIVE PARAMETERIZATION FOR EFFICIENT AND ROBUST RETRIEVAL-AUGMENTED GENERATION |
| 2815 | ACTIVITY RECOGNITION USING INAUDIBLE ACOUSTIC FMCW |
| 12383 | ADAEVOL: DYNAMIC ADAPTER MERGING FOR EFFECTIVE CONTINUAL LEARNING AND KNOWLEDGE TRANSFER IN LARGE LANGUAGE MODELS |
| 11440 | ADAFLOW: EFFICIENT LONG VIDEO EDITING VIA ADAPTIVE ATTENTION SLIMMING AND KEYFRAME SELECTION |
| 4334 | AdaNODEs: Test Time Adaptation for Time Series Forecasting Using Neural ODEs |
| 6768 | AdaParse: A Structured Lipschitz Regularization Framework for Robust Reinforcement Learning |
| 3172 | AdaPrune: A Two-Stage Filter-Select Methods for Visual Token Pruning in Specialized VLMs |
| 12619 | ADAPTER-STATE SHARING CLIP FOR PARAMETER-EFFICIENT MULTIMODAL SARCASM DETECTION |
| 14046 | ADAPTING DIARIZATION-CONDITIONED WHISPER FOR END-TO-END MULTI-TALKER SPEECH RECOGNITION |
| 14392 | ADAPTING WHISPER FOR PADDING-FREE INFERENCE USING AN ENCODER ATTENTION MASK AND KNOWLEDGE DISTILLATION |
| 11235 | Adaptive and Balanced Re-initialization for Long-timescale Continual Test-time Domain Adaptation |
| 6151 | Adaptive Closed-Form DOA Estimation in the Spherical Harmonics Domain |
| 3105 | ADAPTIVE COMPRESSED INTEGRATE-AND-FIRE TIME ENCODING MACHINE |
| 10761 | Adaptive Defense against Stationary Test-Time Attacks on Classifiers |
| 15571 | ADAPTIVE DETERMINISTIC FLOW MATCHING FOR TARGET SPEAKER EXTRACTION |
| 17454 | ADAPTIVE DISTILLATION FOR LM-GNN ALIGNMENT IN SEMI-SUPERVISED TEXT-ATTRIBUTED GRAPH NODE CLASSIFICATION |
| 3680 | ADAPTIVE EMBEDDING FUSION WITH CONTRASTIVE LEARNING FOR ROBUST FULLY FEW-SHOT CLASS-INCREMENTAL AUDIO CLASSIFICATION |
| 16632 | ADAPTIVE FEW-SHOT CHANNEL STATE INFORMATION PHYSICAL LAYER AUTHENTICATION FOR LEO CONSTELLATIONS |
| 15754 | Adaptive Graph Coarsening for Efficient GNN Training |
| 11568 | ADAPTIVE GUIDANCE SEMANTICALLY ENHANCED VIA MULTIMODAL LLM FOR EDGE-CLOUD OBJECT DETECTION |
| 6013 | ADAPTIVE METAHEURISTIC-OPTIMIZED STOCHASTIC RESONANCE NETWORK FOR DOA ESTIMATION IN LOW-SNR UNDERWATER ENVIRONMENTS |
| 16731 | Adaptive Multi-Scale Correlation Meta-Network for Few-Shot Remote Sensing Image Classification |
| 5818 | ADAPTIVE PER-CHANNEL ENERGY NORMALIZATION FRONT-END FOR ROBUST AUDIO SIGNAL PROCESSING |
| 3175 | ADAPTIVE REPRESENTATION REFINEMENT FOR ROBUST FINE-GRAINED FEW-SHOT IMAGE CLASSIFICATION |
| 11963 | Adaptive Retrieval-Augmented Generation via Contrastive Learning on Implicit Feedback |
| 15076 | ADAPTIVE RFS TRACKING FOR SWARM UAV-BORNE RADARS USING RANGE-DOPPLER MEASUREMENTS |
| 14016 | ADAPTIVE ROTARY STEERING WITH JOINT AUTOREGRESSION FOR ROBUST EXTRACTION OF CLOSELY MOVING SPEAKERS IN DYNAMIC SCENARIOS |
| 6623 | ADAPTIVE RUNGE-KUTTA DYNAMICS FOR SPATIOTEMPORAL PREDICTION |
| 4193 | Adaptive Score Calibration for Content-Based Image Retrieval |
| 8237 | ADAPTIVE SHARED EXPERTS WITH LORA-BASED MIXTURE OF EXPERTS FOR MULTI-TASK LEARNING |
| 14308 | ADAPTIVE SPATIAL GOODNESS ENCODING: SCALING THE FORWARD-FORWARD ALGORITHM FOR CONVOLUTIONAL NEURAL NETWORKS |
| 3040 | ADAPTIVE SPEAKER EMBEDDING SELF-AUGMENTATION FOR PERSONAL VOICE ACTIVITY DETECTION WITH SHORT ENROLLMENT SPEECH |
| 12021 | ADAPTIVE SPECTRAL GRAPH PARTITIONING FOR PORTFOLIO OPTIMISATION |
| 18183 | ADAPTIVE SPECTRAL WEIGHTING IN SAGITTAL-PLANE SOUND LOCALIZATION: A RELIABILITY-DRIVEN APPROACH |
| 15336 | ADAPTIVE TASK-INCREMENTAL LEARNING FOR UNDERWATER ACOUSTIC RECOGNITION BASED ON MIXTURE-OF-EXPERTS ADAPTER |
| 3774 | ADAPTIVE TOPOLOGICAL CONSTRAINT ENHANCED PEDESTRIAN TRAJECTORY PREDICTION |
| 14925 | ADAPTIVE VOLUMETRIC VIDEO STREAMING WITH IMAGE-BASED RENDERING |
| 17829 | Adaptive Waveform Design for Cognitive FDA Radar Using AT-WWB |
| 11823 | ADAPTIVE WORLD MODEL WITH LATENT GENERATION ALGORITHM FOR DEEP REINFORCEMENT LEARNING IN PORTFOLIO OPTIMIZATION |
| 5629 | ADAPTIVEDIFFUSEMOTION: ADAPTIVE MULTI-TASK DIFFUSION MODEL FOR SPEECH-DRIVEN HOLISTIC MOTION GENERATION |
| 2125 | ADAPTIVELY TAMING ESTIMATION BIAS FOR DEEP REINFORCEMENT LEARNING WITH MULTI-OBJECTIVE OPTIMIZATION |
| 13060 | ADAPTIVELY WEIGHTED MULTI-MODAL JOINT ENTROPY WITH DYNAMIC ALLOCATION AND FAULT-TOLERANT FUSION FOR INDUSTRIAL DIAGNOSTICS |
| 13528 | ADAPTIVE-VOCO: COMPLEXITY-AWARE VISUAL TOKEN COMPRESSION FOR VISION-LANGUAGE MODELS |
| 15222 | AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration |
| 13152 | ADD-RAG: AGENT-DRIVEN DYNAMIC RAG WITH ADAPTIVE RETRIEVAL STRATEGIES AND MULTI-RETRIEVER COLLABORATION FOR ENHANCED GENERATION |
| 12179 | ADDRESSING GRADIENT MISALIGNMENT IN DATA-AUGMENTED TRAINING FOR ROBUST SPEECH DEEPFAKE DETECTION |
| 4310 | ADEPT: An Entropy-Driven Dual-Strategy Agent for Interactive Video Retrieval |
| 9747 | ADH-VA: ADAPTIVE DIRECTED-HYPERGRAPH CONVOLUTION WITH VA CONTRASTIVE LEARNING FOR MULTIMODAL CONVERSATIONAL EMOTION RECOGNITION |
| 4462 | ADORE: ASYMMETRIC RELATIONAL DISTILLATION WITH RERANKING FOR INSTANCE LEVEL IMAGE RETRIEVAL |
| 11682 | ADP-NET: AN ASYMMETRIC DUAL-BRANCH NETWORK FOR DIBR HOLE FILLING |
| 14642 | ADREC: TRAINING AN AUTONOMOUS DECISION-MAKING RECOMMENDATION AGENT THROUGH BEHAVIOR CLONING |
| 16668 | ADVANCED MODELING OF INTERLANGUAGE SPEECH INTELLIGIBILITY BENEFIT WITH L1-L2 MULTI-TASK LEARNING USING DIFFERENTIABLE K-MEANS FOR ACCENT-ROBUST DISCRETE TOKEN-BASED ASR |
| 15857 | ADVANCING FINE-GRAINED SENTIMENT ANALYSIS IN COMPLEX CONTEXTS: A NEW BENCHMARK AND INTERPRETATION-ENHANCED APPROACH |
| 13115 | ADVANCING LLM-BASED MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION WITH GLOBAL CROSS-CHANNEL ATTENTION AND SENTENCE-ORDERED FIRST-IN FIRST-OUT SERIALIZED OUTPUT TRAINING |
| 16771 | Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise |
| 15761 | ADVANCING SPEAKER BASED VOCAL EFFORT CLASSIFICATION WITH WAVLM AND DATA AUGMENTATION IN NATURALISTIC NON-CALIBRATED SPEECH RECORDINGS |
| 10976 | ADVANCING SPEECH SUMMARIZATION IN MULTI-MODAL LLMS WITH REINFORCEMENT LEARNING |
| 9404 | ADVANCING SPEECH UNDERSTANDING IN SPEECH-AWARE LANGUAGE MODELS WITH GRPO |
| 16916 | ADVANTAGE-WEIGHTED POLICY LEARNING WITH ADAPTIVE REGULARIZATION FOR OFFLINE REINFORCEMENT LEARNING |
| 10660 | ADVERSARIAL CONTRASTIVE RETRIEVAL-AUGMENTED GENERATION |
| 5559 | Adversarial Defense via Generative Speech Enhancement Module |
| 10907 | Adversarial Detection via Multi-Layer Contrastive Learning and Cross-Layer Stability Analysis |
| 17114 | ADVERSARIAL FINE-TUNING ON SPEECH FOUNDATION MODEL WITH VULNERABLE ATTENTION CONSISTENCY REGULARIZATION FOR ROBUST SPEECH RECOGNITION |
| 10884 | Adversarial label recovery with Multi-Modal Fusion and Dual-Task Contrastive Learning |
| 14161 | Adversarial Learning with a Uniformly Distributed Cost Bound |
| 12308 | ADVERSARIAL PROMPT DISTILLATION FOR VISION-LANGUAGE MODELS |
| 3414 | ADVERSARIAL RIVALRY LEARNING FOR MUSIC CLASSIFICATION |
| 13708 | ADVERSARIAL UPDATE-BASED FEDERATED UNLEARNING FOR POISONED MODEL RECOVERY |
| 14994 | ADVERSE EFFECT REMOVAL NETWORK VIA UNSUPERVISED WEATHER TYPE TRANSFER |
| 6015 | AEGIS: ENHANCING PROVENANCE-BASED INTRUSION DETECTION SYSTEM WITH LLM-POWERED DEEP SEMANTIC REPRESENTATION |
| 13756 | AERIAL VIDEO ACTION RECOGNITION WITH PRETRAINED VISION-LANGUAGE MODEL |
| 10880 | AERIS-RTDETR: ULTRASOUND-AWARE REAL-TIME DETECTION WITH ORTHOGONAL ANISO-SCALE BLOCKS AND ECHOGENICITY-GUIDED FUSION |
| 1878 | AEROGSPNET: GRAPH SIGNAL PROCESSING FOR MULTI-TASK AERODYNAMIC PREDICTION |
| 2322 | AFD-SLU: ADAPTIVE FEATURE DISTILLATION FOR SPOKEN LANGUAGE UNDERSTANDING |
| 9667 | AFER: ADAPTIVE FACT SELECTION VIA ENTROPY REDUCTION FOR FACTUAL LONG-FORM GENERATION |
| 12923 | AFFECT-JIGSAW: INTEGRATING CORE AND PERIPHERAL EMOTIONS FOR HARMONIOUS FINE-GRAINED MULTIMODAL EMOTION RECOGNITION |
| 1383 | Affordance Benchmark for MLLMs |
| 1154 | AFFORDANCE OBJECT SWAPPING FOR HAND-OBJECT INTERACTION IMAGES |
| 6569 | AFT: AN EXEMPLAR-FREE CLASS INCREMENTAL LEARNING METHOD FOR ENVIRONMENTAL SOUND CLASSIFICATION |
| 14216 | AGENT-GSPO: COMMUNICATION-EFFICIENT MULTI-AGENT SYSTEMS VIA GROUP SEQUENCE POLICY OPTIMIZATION |
| 8601 | AGFORMER: ADAPTIVE GRAPH TRANSFORMER FOR MULTISPECTRAL AND HYPERSPECTRAL IMAGE FUSION |
| 6933 | AG-FUSION: ADAPTIVE GATED MULTIMODAL FUSION FOR 3D OBJECT DETECTION IN COMPLEX SCENES |
| 16793 | AGI-CLIP: MULTI-MODAL LLM KNOWLEDGE TRANSFER FOR AI-GENERATED IMAGE QUALITY ASSESSMENT |
| 15872 | AGRIDOCTOR: A MULTIMODAL INTELLIGENT ASSISTANT FOR AGRICULTURE |
| 13073 | AGRI-MIX:MUTUAL INFORMATION-GUIDED HIERARCHICAL FUSION FOR AGRICULTURAL DISEASE MULTIMODAL RELATION EXTRACTION |
| 12581 | AHAI: ADAPTIVE HYBRID-ATTENTION INFERENCE FOR DIFFUSION-BASED ARBITRARY STYLE TRANSFER |
| 13189 | AHM-NET: AN ASYMMETRIC HIERARCHICAL MULTI-MODAL FUSION NETWORK FOR ROBUST UAV DETECTION USING RGB AND EVENT DATA |
| 11557 | AI-AIDED CONSENSUS KALMAN TRACKING IN PARTIALLY-KNOWNSTATE-SPACE MODELS |
| 9681 | AIBA-YOLO: Adaptive Information Balance Augmentation YOLO |
| 16218 | AI-GENERATED MUSIC DETECTION IN BROADCAST MONITORING |
| 1841 | AIMREC: ALIGNING BOTH INDIVIDUALS AND MODALITIES FOR MULTIMODAL RECOMMENDATION |
| 1820 | AirGlove: Exploring Egocentric 3D Hand Tracking and Appearance Generalization for Sensing Gloves |
| 8862 | AISHELL6-WHISPER: A CHINESE MANDARIN AUDIO-VISUAL WHISPER SPEECH DATASET WITH SPEECH RECOGNITION BASELINES |
| 13247 | AITG: AUTOMATING INTENT-ORIENTED TASK GENERATION FOR MOBILE GUI-AGENT |
| 14595 | AL-COLE: AUGMENTED LAGRANGIAN FOR CONSTRAINED LEARNING |
| 3419 | ALFM: ADAPTIVE LOCAL FEATURE MINING OF VISION-LANGUAGE MODELS FOR OUT-OF-DISTRIBUTION DETECTION |
| 3831 | Algebraic Covariance Matrix Reconstruction for Sparse Arrays Using Newton's Identities |
| 5900 | ALIGN TO THE PIVOT: DUAL ALIGNMENT WITH SELF-FEEDBACK FOR MULTILINGUAL MATH REASONING |
| 9984 | Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization |
| 13542 | ALIGN3D: PROGRESSIVE DIFFUSION ADAPTATION WITH GEOMETRY-AWARE PROPAGATION FOR CONSISTENT 3D SCENE EDITING |
| 3903 | ALIGNCLIP: MINING AND ALIGNING MULTI-SCALE VISION-LANGUAGE FEATURES FOR ZERO-SHOT SEMANTIC SEGMENTATION |
| 6610 | ALIGNING GENERATIVE SPEECH ENHANCEMENT WITH PERCEPTUAL FEEDBACK |
| 17252 | ALIGNING GEOMETRY REPRESENTATION AND INITIALIZATION IN CLASS-INCREMENTAL SEMANTIC SEGMENTATION |
| 10994 | ALIGNING LANGUAGE MODELS FOR LYRIC-TO-MELODY GENERATION WITH RULE-BASED MUSICAL CONSTRAINTS |
| 14678 | ALLEVIATING FORGETTING IN CLASS-INCREMENTAL LEARNING VIA IMPLICIT SEMANTIC AUGMENTATION |
| 10687 | ALLEVIATING OVERTHINKING IN LARGE REASONING MODELS VIA SELF-ITERATIVE PREFERENCE OPTIMIZATION |
| 16058 | ALMA-CHOR: LEVERAGING AUDIO-LYRIC ALIGNMENT WITH MAMBA FOR CHORUS DETECTION |
| 4585 | Alternating Balancing Sums for Accurate Low-Power Dot Products |
| 15318 | ALTPROJ-MIN: A PROJECTION-BASED ALTERNATING MINIMIZATION ALGORITHM FOR LOW-RANK MATRIX RECOVERY |
| 14514 | AMBER²: DUAL AMBIGUITY-AWARE EMOTION RECOGNITION APPLIED TO SPEECH AND TEXT |
| 2567 | AMBIDROP: ARRAY-AGNOSTIC SPEECH ENHANCEMENT USING AMBISONICS ENCODING AND DROPOUT-BASED LEARNING |
| 17431 | AMBISONIC-DML: A Benchmark Dataset for Dynamic Higher-Order Ambisonics Music with Motion-Aligned Stems |
| 5035 | AMFN:Adaptive Multi-view Fusion Network Framework |
| 10418 | AMGHI-CR: ADAPTIVE MASK-GUIDED HIGH-ORDER INTERACTION NETWORK FOR CLOUD REMOVAL |
| 15887 | AMODAL INSTANCE SEGMENTATION BY EXPANDING FROM ACTIVE BOUNDARY WITH COMPATIBLE PRIOR |
| 17433 | AMPLITUDE OPTIMIZATION DRIVEN MULTI-OFDM WAVEFORM DESIGN WITH GOOD PMEPR AND ISL PERFORMANCES FOR JOINT RADAR AND COMMUNICATIONS |
| 5384 | A-MSA: ADVERSARIAL FEATURE DISENTANGLEMENT FOR MULTIMODAL SENTIMENT ANALYSIS |
| 5673 | AN ADAPTIVE SAMPLING METHOD BASED ON REINFORCEMENT LEARNING FOR WIND POWER FORECASTING UNDER EXTREME WEATHER |
| 9544 | AN AMP-BASED ASYMPTOTIC ANALYSIS FOR NONLINEAR ONE-BIT PRECODING |
| 9635 | AN AUDIO-VISUAL SPEECH SEPARATION NETWORK WITH JOINT CROSS-ATTENTION AND ITERATIVE MODELING |
| 10082 | AN EFFECTIVE DATA AUGMENTATION METHOD BY ASKING QUESTIONS ABOUT SCENE TEXT IMAGES |
| 13351 | AN EFFICIENT NEURAL NETWORK FOR MODELING HUMAN AUDITORY NEUROGRAMS FOR SPEECH |
| 4378 | An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas |
| 9719 | AN ENHANCED GRAVITATIONAL-WAVE DETECTION AND INTELLIGENT ANALYSIS FRAMEWORK BASED ON MULTIMODAL LARGE LANGUAGE MODELS |
| 16637 | AN ENHANCED MEMORY ATTENTION AND CONTENT-GUIDED MODEL FOR MINI LED ANOMALY DETECTION |
| 11473 | AN ENSEMBLE DEFENSE METHOD AGAINST FALSE DATA IN STRUCTURED PREFERENCE LEARNING |
| 13481 | AN ENVELOPE SEPARATION AIDED MULTI-TASK LEARNING MODEL FOR BLIND SOURCE COUNTING AND LOCALIZATION |
| 11517 | AN EVENT-BASED SEQUENCE MODELING APPROACH TO RECOGNIZING NON-TRIAD CHORDS WITH OVERSEGMENTATION MINIMIZATION |
| 17130 | AN EXACT PENALTY METHOD FOR SPARSITY-CONSTRAINED OPTIMIZATION |
| 11981 | AN IMPROVED CONVERGENCE ANALYSIS OF GOSSIP METHODS FOR LARGE RANDOM GRAPHS |
| 17002 | An Information Geometric Approach to Fairness With Equalized Odds Constraint |
| 1941 | An Information-Theoretic Approach to Optimal Universal Quantum Encoding for Statistical Inference |
| 13920 | AN ITERATIVE FIXED-POINT KERNEL MINIMUM ERROR ENTROPY ALGORITHM |
| 7995 | An Unsupervised Alignment Feature Fusion System for Spoken Language-based Dementia Detection |
| 6677 | ANALYTIC INCREMENTAL LEARNING FOR SOUND SOURCE LOCALIZATION WITH IMBALANCE RECTIFICATION |
| 13915 | ANALYTICAL FRAMEWORK FOR WIRELESS LOCALISATION USING TERAHERTZ BACKSCATTERING TAGS |
| 6890 | ANCHOR FIELD CONSISTENCY FOR IMPERCEPTIBLE ADVERSARIAL ATTACKS ON 3D POINT CLOUDS |
| 13072 | ANCHORED SPECTRAL ESTIMATOR FOR RIGID MOTION SYNCHRONIZATION |
| 6312 | ANGULARDINO: SEMI-SUPERVISED ANOMALY DETECTION VIA SELF-DISTILLATION WITH HYBRID ANGULAR MARGIN |
| 6146 | AnimateScene: Camera-controllable Animation in Any Scene |
| 18083 | ANIPU: Geometry-Aware Point Cloud Upsampling via Anisotropic Differential Operators |
| 9487 | Anisotropic Tensor Deconvolution of Hyperspectral Images |
| 3999 | ANNEALED GUIDED DIFFUSION WITH OPTIONAL MANIFOLD PROJECTION REMOVAL |
| 10353 | ANOMALY DRIVING BEHAVIOR IDENTIFICATION IN TRAFFIC PERCEPTION WITH TRANSFORMER-BASED MULTI-MODAL SIGNAL FUSION |
| 4834 | ANOMALY-AWARE ASSOCIATION DISCREPANCY FOR TEMPORAL ANOMALY DETECTION |
| 3696 | ANTI-EXCEPTION ACTION GENERATION FOR AUTOMATIC PLANNING |
| 5746 | ANYACCOMP: GENERALIZABLE ACCOMPANIMENT GENERATION VIA QUANTIZED MELODIC BOTTLENECK |
| 4877 | ANYRIR: ROBUST NON-INTRUSIVE ROOM IMPULSE RESPONSE ESTIMATION IN THE WILD |
| 2580 | APKD: ALIGNED AND PACED KNOWLEDGE DISTILLATION TOWARDS LIGHTWEIGHT HETEROGENEOUS MULTIMODAL EMOTION RECOGNITION |
| 3396 | APMDET: DEFENDING AGAINST OBJECT-BASED ATTACKS FOR LIDAR DETECTION IN AUTONOMOUS DRIVING |
| 7117 | APPLE: ATTENTION-PROMOTED PROTOTYPE LEARNING FOR FEDERATED CROSS-MODAL HASHING |
| 2203 | APPROXIMATE MESSAGE PASSING FOR MULTI-PREAMBLE DETECTION IN OTFS RANDOM ACCESS |
| 13164 | Approximating Products of Distributions via Variable Duplication and Belief Propagation |
| 17712 | APPROXIMATING THE LIKELIHOOD OF A WHITE, NON-GAUSSIAN, NON-IID, SKEWED STATIONARY PROCESS, WITH APPLICATIONS IN SIGNAL DETECTION |
| 16984 | APSDA: ADVERSARIALLY PRUNED SPARSE DYNAMIC ATTENTION FOR ROBUST HANDWRITTEN TEXT RECOGNITION |
| 16912 | APSFORMER: ENHANCING TRANSFORMER IN TIME SERIES FORECASTING WITH ADAPTIVE MULTI-SCALE PATCH AND SPARSE ATTENTION |
| 4569 | AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering |
| 6256 | AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs |
| 17379 | ARA-BEST-RQ: MULTI DIALECTAL ARABIC SSL |
| 6982 | ARABKT: A COMPREHENSIVE ARAB KNOWLEDGE EVALUATION SUITE FOR LARGE LANGUAGE MODELS |
| 3212 | ARAP-GS: DRAG-DRIVEN AS-RIGID-AS-POSSIBLE 3D GAUSSIAN SPLATTING EDITING WITH DIFFUSION PRIOR |
| 12577 | Arbitrarily Settable Frame Rate Neural Speech Codec with Content Adaptive Variable Length Segmentation |
| 15852 | AR-BSNET: TOWARDS ULTRA-LOW COMPLEXITY AUTOREGRESSIVE TARGET SPEAKER EXTRACTION WITH BAND-SPLIT MODELING |
| 10238 | ARCHAGENT: SCALABLE LEGACY SOFTWARE ARCHITECTURE RECOVERY WITH LLMS |
| 13504 | ARCHI-TTS: A FLOW-MATCHING-BASED TEXT-TO-SPEECH MODEL WITH SELF-SUPERVISED SEMANTIC ALIGNER AND ACCELERATED INFERENCE |
| 9649 | ARCTIMESDE: ALIGNING COMPUTE WITH INFORMATION VIA ARC LENGTH TIME IN NEURAL SDES |
| 17977 | ARE MODERN SPEECH ENHANCEMENT SYSTEMS VULNERABLE TO ADVERSARIAL ATTACKS? |
| 17668 | ARE THESE EVEN WORDS? QUANTIFYING THE GIBBERISHNESS OF GENERATIVE SPEECH MODELS |
| 4636 | Are VLMs Ready for Lane Topology Awareness in Autonomous Driving? |
| 16234 | ARGI: ANCHOR-GUIDED RIGID GEOMETRY ASSISTS POINT CLOUD INTERPOLATION |
| 1935 | AR-LIF: ADAPTIVE RESET LEAKY INTEGRATE-AND-FIRE NEURON FOR SPIKING NEURAL NETWORKS |
| 9548 | AROMMA: UNIFYING OLFACTORY EMBEDDINGS FOR SINGLE MOLECULES AND MIXTURES |
| 6235 | ARRAYDPS-REFINE: GENERATIVE REFINEMENT OF DISCRIMINATIVE MULTI-CHANNEL SPEECH ENHANCEMENT |
| 14242 | ARROW-GS: DIRECTED GROWTH FOR EFFICIENT 3D GAUSSIAN SPLATTING |
| 3305 | ARTI-6: TOWARDS SIX-DIMENSIONAL ARTICULATORY SPEECH ENCODING |
| 10473 | ARTIFACT-AWARE EVALUATION FOR HIGH-QUALITY VIDEO GENERATION |
| 10083 | ARTIFREE: DETECTING AND REDUCING GENERATIVE ARTIFACTS IN DIFFUSION-BASED SPEECH ENHANCEMENT |
| 18286 | ARTPOSE: STYLE-ADAPTIVE MIXTURE-OF-EXPERTS FOR HUMAN POSE ESTIMATION IN ARTISTIC IMAGES |
| 11131 | ASRC-SNN: ADAPTIVE SKIP RECURRENT CONNECTION SPIKING NEURAL NETWORK |
| 13066 | ASSESSING IDENTITY LEAKAGE IN TALKING FACE GENERATION: METRICS AND EVALUATION FRAMEWORK |
| 11863 | Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions |
| 7641 | ASSESSING THE IMPACT OF SPEAKER IDENTITY IN SPEECH SPOOFING DETECTION |
| 13263 | Assessing the Perceptual Impact of Low-Altitude Aircraft Noise in Cities: An Auralization Framework Using Gaussian Beam Tracing |
| 7969 | ASTCA: A NOVEL MOTION PREDICTION FRAMEWORK FOR ROBUST TARGET TRACKING ADDRESSING HIGH MANEUVERABILITY AND FALSE ALARMS |
| 16475 | ASTMNET: ADAPTIVE SPECTRAL TOKEN MIXER WITH SELECTIVE FEATURE ENHANCEMENT FOR PANSHARPENING |
| 18078 | ASWE: Adaptive Small-World Encoder for Efficient Channel Coding |
| 3531 | Asymmetric Region Denoising and Rotation Equivariant for Image Reflection Symmetry Detection |
| 16184 | Asymmetric StarFus: Learning Incoherent Measurements for Semantics-Aware Spatial-Spectral Fusion |
| 19109 | ASYMPTOTIC ANALYSIS OF SYNCHRONOUS SIGNAL PROCESSING |
| 18873 | Asymptotic Classification Error for Heavy-Tailed Renewal Processes |
| 18872 | Asymptotic Error Rates for Point Process Classification |
| 10924 | ASYMPTOTICALLY OPTIMAL BANDIT ONLINE CLUSTERING FOR SINGLE PARAMETER EXPONENTIAL FAMILY OF DISTRIBUTIONS |
| 9828 | ASYNCHRONOUS HIGH-SPEED TRACKING OF ASTRONOMICAL OBJECTS USING NEUROMORPHIC CAMERA FOR EDGE COMPUTING |
| 17200 | Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation |
| 12843 | ATNPLOC: PHASE-ENHANCED ASYNCHRONOUS TDOA FOR ACCURATE UWB LOCALIZATION |
| 5662 | ATO: ADAPTIVE TARGET OPTIMIZATION FOR SEMI-SUPERVISED DOMAIN ADAPTATION VIA DEEP REINFORCEMENT LEARNING |
| 10805 | ATOM: Adaptive Token-level Optimal Transport Mixup for Speech Translation |
| 3670 | ATOMIC NORM MINIMIZATION REVISITED: PROGRESSIVE ATOM IDENTIFICATION AND REFINEMENT |
| 14672 | ATOMU: ARTIFICIAL TEMPLATE OF MARKER UNIT FOR 1/100-PIXEL ACCURACY DISPLACEMENT MEASUREMENT |
| 11271 | ATTENTION OUTPUT PROJECTION IMPORTANCE SCORE FOR KEY-VALUE EVICTION |
| 10744 | ATTENTION TO DETAILS, LOGITS TO TRUTH: VISUAL-AWARE ATTENTION AND LOGITS ENHANCEMENT TO MITIGATE HALLUCINATIONS IN LVLMS |
| 6790 | ATTENTION2PROBABILITY: ATTENTION-DRIVEN TERMINOLOGY PROBABILITY ESTIMATION FOR ROBUST SPEECH-TO-TEXT SYSTEM |
| 17998 | ATTENTION-BASED ENCODER-DECODER TARGET-SPEAKER VOICE ACTIVITY DETECTION FOR ROBUST SPEAKER DIARIZATION |
| 3709 | ATTENTION-ENHANCED LEARNING FOR SENSING-ASSISTED LONG-TERM BEAM TRACKING IN MMWAVE COMMUNICATIONS |
| 11669 | ATTENTION-GUIDED DYNAMIC COMPENSATION SAMPLING FOR ROBUST INVERSION-BASED DIFFUSION WATERMARKING |
| 15305 | Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition |
| 6893 | Attentive AV-FusionNet: Audio-Visual Quality Prediction with Hybrid Attention |
| 3058 | ATTENTIVE MASKED SELF-DISTILLATION FOR RESPIRATORY SOUND CLASSIFICATION |
| 13156 | Attn-Defense: Attention-Guided Detection, Location and Removal for Indirect Prompt Injection |
| 17999 | ATTRIBUTE DRIVEN W SPACE FOR QUERY LIMITED FACE TEMPLATE INVERSION |
| 10001 | AUDEN-VOICE: GENERAL-PURPOSE VOICE ENCODER FOR SPEECH AND LANGUAGE UNDERSTANDING |
| 16568 | AUDIENCE-AWARE CO-SPEECH GESTURE GENERATION IN PUBLIC SPEAKING VIA ANTICIPATION TOKENS |
| 13821 | AUDIO DEEPFAKE DETECTION AT THE FIRST GREETING: “HI!” |
| 16937 | AUDIO EFFECT ESTIMATION WITH DNN-BASED PREDICTION AND SEARCH ALGORITHM |
| 12710 | AUDIOCARDS: STRUCTURED METADATA IMPROVES AUDIO LANGUAGE MODELS FOR SOUND DESIGN |
| 12856 | AUDIO-CONDITIONED DIFFUSION LLMS FOR ASR AND DELIBERATION PROCESSING |
| 14439 | AUDIOFUSE: UNIFIED SPECTRAL-TEMPORAL LEARNING VIA A HYBRID VIT-1D CNN ARCHITECTURE FOR PHONOCARDIOGRAM CLASSIFICATION |
| 8244 | AUDIOGENIE-REASONER: A TRAINING-FREE MULTI-AGENT FRAMEWORK FOR COARSE-TO-FINE AUDIO DEEP REASONING |
| 6464 | AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation |
| 15380 | Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection |
| 18882 | AUDIOSETCAPS: AN ENRICHED AUDIO-CAPTION DATASET USING AUTOMATED GENERATION PIPELINE WITH LARGE AUDIO AND LANGUAGE MODELS |
| 12298 | AUDIO-TEXT JAILBREAK ATTACK ON LARGE AUDIO-LANGUAGE MODELS: TOWARDS GENERALITY AND STEALTHINESS |
| 18116 | Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver |
| 13509 | AUDIO-VISUAL DEEPFAKE GENERATION AND DETECTION: AN EXPLORATORY SURVEY |
| 17415 | AUDIO-VISUAL FEATURE FUSION FOR CALIBRATING RELEVANCE SCORES OF VIDEO MOMENT RETRIEVAL |
| 14898 | Audiovisual Speech Enhancement and Voice Activity Detection Using Generative and Speech Recognition Features |
| 12230 | AUDITGPT: A MULTI-AGENT FRAMEWORK FOR ENHANCING STATIC ANALYSIS |
| 16286 | Auditory Illusion Benchmark for Large Audio Language Models |
| 2217 | AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? |
| 13705 | AUDITORY-INSPIRED TRANSFORMER FOR BINAURAL SPEECH ENHANCEMENT AND SPATIAL CUE PRESERVATION |
| 16881 | AUGMENT-AND-REGULARIZE: TOWARD RELIABLE SEMI-SUPERVISED DOMAIN GENERALIZATION |
| 13856 | AUGMENTED LAGRANGIAN CONTROLLER DESIGN IN MODEL-BASED REINFORCEMENT LEARNING FOR ISAC RESOURCE ALLOCATION |
| 17340 | AUGMENTING IMAGE LLMS FOR DIVERSE VIDEO GROUNDING TASKS WITHOUT TRAINING |
| 17380 | AURA: A STEGAFORMER-BASED SCALABLE DEEP AUDIO WATERMARK WITH EXTREME ROBUSTNESS |
| 4179 | AURA: YCBCR-BASED UNIVERSAL RAW-RECONSTRUCTION FOR INVERSE ISP |
| 11385 | Aurora: Precise and Lightweight Multimodal Fusion for Efficient Referring Remote Sensing Image Segmentation |
| 3567 | AUTO-MATCHCUT: AN AUDIO-VISUAL RETRIEVAL FRAMEWORK FOR SEAMLESS MATCH CUTTING |
| 6024 | AUTOMATIC ESTIMATION OF SPEAKER DIARIZATION ERROR RATE BASED ON FEATURES OF AUDIO QUALITY AND SPEAKER DISCRIMINABILITY |
| 16357 | Automatic Music Mixing using a Generative Model of Effect Embeddings |
| 14887 | Automatic Music Sample Identification with Multi-Track Contrastive Learning |
| 2666 | AUTOP2C: AN LLM-BASED AGENT FRAMEWORK FOR CODE REPOSITORY GENERATION FROM MULTIMODAL CONTENT IN MACHINE LEARNING PAPERS |
| 2455 | AUTOREGRESSIVE-GAUSSIAN MIXTURE MODELS: EFFICIENT GENERATIVE MODELING OF WSS SIGNALS |
| 10705 | AUTOVQA-G: SELF-IMPROVING AGENTIC FRAMEWORK FOR AUTOMATED VISUAL QUESTION ANSWERING AND GROUNDING ANNOTATION |
| 17878 | AUV: TEACHING AUDIO UNIVERSAL VECTOR QUANTIZATION WITH SINGLE NESTED CODEBOOK |
| 10465 | AUXILIARY MULTI-LABEL TRAINING FOR IMPROVING THE ROBUSTNESS OF AUDIO DEEPFAKE DETECTION ON AI-PROCESSED DATA |
| 15583 | AVATAR: AUDIO-VISUAL ADAPTIVE FUSION VIA TRAINED AGENT REINFORCEMENT FOR MULTIMODAL DEEPFAKE DETECTION |
| 2607 | Averaging is Not Enough: Preserving Client-Specific Knowledge in Federated PEFT with One-Round Aggregation |
| 1931 | AVO-65: A LARGE-SCALE HIERARCHICAL AUDIO-VISUAL OBJECT DATASET |
| 3052 | AWARENESS OR GUIDANCE? A MODALITY-ENHANCED FUSION MODEL FOR MULTIMODAL KNOWLEDGE GRAPH COMPLETION |
| 2233 | AWGFORMER: ADAPTIVE WAVELET-GUIDED TRANSFORMER FOR MULTI-RESOLUTION TIME SERIES FORECASTING |
| 16570 | B2CAMO: LEVERAGING BACKGROUND CUES FOR PARAMETER-EFFICIENT FINE-TUNING IN OPEN-VOCABULARY CAMOUFLAGED OBJECT SEGMENTATION |
| 6242 | BABI: BLACKLISTED ACCRETION FOR BACKDOOR INVERSION IN INSTRUCTION FINE-TUNED LLMS |
| 17335 | BACHI: BOUNDARY-AWARE SYMBOLIC CHORD RECOGNITION THROUGH MASKED ITERATIVE DECODING ON POP AND CLASSICAL MUSIC |
| 13973 | BACKGROUND DISAMBIGUATION CONTRASTIVE LOSS FOR ROBUST BINARY SEGMENTATION |
| 16811 | BACKWARD DESIGN STFT LABORATORY FOR STEM EDUCATION: ACCESSIBLE, RESOURCE EFFICIENT, AND IMPROVED LEARNING OUTCOMES |
| 3392 | BADLLM_TG:A BACKDOOR DEFENDER POWERED BY LLM TRIGGER GENERATOR |
| 10807 | BADREASONER: PLANTING TUNABLE OVERTHINKING BACKDOORS INTO LARGE REASONING MODELS FOR FUN OR PROFIT |
| 18348 | BadTail: Exploiting Rationale Tails for Stealthy Multimodal Backdoor Attacks |
| 12552 | BadViM: Backdoor Attack against Vision Mamba |
| 16795 | BALANCING ACCURACY AND DIVERSITY: EVOLVING ANCHOR MATCHING FOR VIDEO TEMPORAL GROUNDING |
| 12971 | BALANCING EFFICIENCY AND FIDELITY IN IMAGE SUPER-RESOLUTION VIA ATTENTION-ENHANCED DISTILLATION |
| 12384 | BALANCING REWARDS IN TEXT SUMMARIZATION: MULTI-OBJECTIVE REINFORCEMENT LEARNING VIA HYPERVOLUME OPTIMIZATION |
| 11515 | BaldWhisper: Faster Whisper with Head Shearing and Layer Merging |
| 14607 | BAMoE: Bi-Attention Synergy with Expert Routing for Time Series Anomaly Detection |
| 10980 | Bayesian Channel Estimation with Diffusion Probabilistic Priors |
| 13312 | BAYESIAN LOW-RANK FACTORIZATION FOR ROBUST MODEL ADAPTATION |
| 12196 | BAYESIAN MATRIX COMPLETION UNDER GEOMETRIC CONSTRAINTS |
| 5991 | Bayesian Multi-Modal LSTM with Dynamic Uncertainty Modeling for Net fluid removal Prediction |
| 4466 | BAYESIAN SIGNAL SEPARATION VIA PLUG-AND-PLAY DIFFUSION-WITHIN-GIBBS SAMPLING |
| 17885 | Bayesian Uncertainty-Aware MRI Reconstruction |
| 10149 | BBPE16: UTF-16-BASED BYTE-LEVEL BYTE-PAIR ENCODING FOR IMPROVED MULTILINGUAL SPEECH RECOGNITION |
| 7637 | BDBR: TOWARDS RESISTANT BACKDOOR DEFENSE VIA BOUNDARY RECONSTRUCTION |
| 12219 | BDIO: BIAS-AWARE DENOISING INERTIAL ODOMETRY FOR ACCURATE DRONE TRAJECTORY ESTIMATION |
| 12938 | BDRNET:BIDIRECTIONAL DECOMPOSED AND RECALIBRATED LIGHTWEIGHT NETWORK FOR HUMAN POSE ESTIMATION |
| 17265 | BEAM-CLIP: MULTIMODAL ALIGNMENT AND MMWAVE BEAM PATTERN REPRESENTATION LEARNING |
| 18906 | BEAMFOCUSING CAPABILITIES OF A UNIFORM LINEAR ANTENNA ARRAY IN THE HOLOGRAPHIC REGIME |
| 14310 | BEAMFORMER DESIGNS FOR SWARM OF REPEATER-AIDED MASSIVE MIMO ISAC |
| 5719 | Beamforming using Virtual Microphones for Hearing Aid Applications |
| 11973 | BEAMSPACE MODEL AND BEAM ALIGNMENT METHOD FOR RECONFIGURABLE HOLOGRAPHIC SURFACE ANTENNA SYSTEMS |
| 2837 | BEAP-AGENT: BACKTRACKABLE EXECUTION AND ADAPTIVE PLANNING FOR GUI AGENTS |
| 12040 | Beat and Downbeat Detection: A Reformulated Approach |
| 6008 | BEATMAMBA: BIDIRECTIONAL SELECTIVE STATE-SPACE MODELING FOR EFFICIENT BEAT TRACKING |
| 11368 | BeepBeep: Leveraging Structural Attenuation for Robust Device-to-Device Authentication |
| 3406 | Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition |
| 13265 | BELIEF PROPAGATION VIA STOCHASTIC TRANSPORT MAPPING |
| 17932 | BE-MVSNET: BOUNDARY- AND EDGE-AWARE CONSTRAINED MULTI-VIEW STEREO |
| 18213 | Benchmarking Emotional Accuracy and Identity Consistency in Facial Image-to-Video Generation |
| 17802 | BENCHMARKING GASLIGHTING ATTACKS AGAINST SPEECH LARGE LANGUAGE MODELS |
| 17945 | BENCHMARKING HUMANS AND MACHINES ON COMPLEX MULTILINGUAL SPEECH UNDERSTANDING TASKS |
| 9856 | Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction |
| 18177 | BENCHMARKING MULTIMODAL LARGE LANGUAGE MODELS FOR FACE RECOGNITION |
| 2951 | Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets |
| 18917 | BERP: A Blind Estimator of Room Parameters for Single-Channel Noisy Speech Signals |
| 9816 | BEST-RQ-BASED SELF-SUPERVISED LEARNING FOR WHISPER DOMAIN ADAPTATION |
| 13869 | BEST-STD 2.0: BALANCED AND EFFICIENT SPEECH TOKENIZER FOR SPOKEN TERM DETECTION |
| 14666 | Better Together: Uncalibrated Photometric Stereo with Shading and Specularities |
| 16763 | BEV-ID: A Depth-Guided BEV Perception Method combining Feature Indexing with Depth Estimation |
| 9419 | BEYOND AMPLITUDE: CHANNEL STATE INFORMATION PHASE-AWARE DEEP FUSION FOR ROBOTIC ACTIVITY RECOGNITION |
| 15329 | BEYOND ANSWERS: TRAJECTORY SEMANTIC ENTROPY FOR RELIABLE UNCERTAINTY QUANTIFICATION IN LLMS |
| 5271 | BEYOND ATTENTION: ADAPTING SEGMENT ANYTHING WITH FREQUENCY AND STRUCTURAL PRIORS |
| 2335 | BEYOND BLURRINESS AND ARTIFACTS: A SYNERGISTIC DETERMINISTIC-PROBABILISTIC APPROACH FOR RADAR RECONSTRUCTION |
| 9439 | BEYOND CLEAN DATA: NOISE ROBUST DATASET PRUNING WITH FLIP-SENSITIVITY FILTERING |
| 6420 | BEYOND FACE SWAPPING: A DIFFUSION-BASED DIGITAL HUMAN BENCHMARK FOR MULTIMODAL DEEPFAKE DETECTION |
| 17159 | Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation |
| 16420 | Beyond History: Active Prompting with Dynamic Contemporaneous Facts for Temporal Knowledge Graph Forecasting |
| 5985 | BEYOND HUMAN SKELETONS: PROMPT-GUIDED GRAPH MATCHING FOR MULTI-LIMBED POSE ESTIMATION IN ARTISTIC IMAGERY |
| 2966 | BEYOND LIPS: INTEGRATING GESTURE AND LIP CUES FOR ROBUST AUDIO-VISUAL SPEAKER EXTRACTION |
| 16083 | BEYOND MAPPING : DOMAIN-INVARIANT REPRESENTATIONS VIA SPECTRAL EMBEDDING OF OPTIMAL TRANSPORT PLANS |
| 11296 | BEYOND OMNIDIRECTIONAL: NEURAL AMBISONICS ENCODING FOR ARBITRARY MICROPHONE DIRECTIVITY PATTERNS USING CROSS-ATTENTION |
| 16188 | BEYOND PIXEL PROPHECY: HIERARCHICAL KNOWLEDGE STRUCTURES FOR TRAINING-FREE VIDEO ANOMALY PREDICTION |
| 16504 | BEYOND PIXELS: A VECTOR-TO-GRAPH FRAMEWORK FOR RELIABLE SCHEMATIC AUDITING |
| 10171 | Beyond Sampling: Classwise Loss Optimization for Imbalanced Deep Learning Recommendation Systems |
| 9340 | Beyond Shadows: A Large-Scale Benchmark and Multi-Stage Framework for High-Fidelity Facial Shadow Removal |
| 3410 | Beyond Single Video Boundaries: Unified Unsupervised Video Object Segmentation with Historical, Future, and Cross-Video Reasoning |
| 13550 | BEYOND SPECTRAL PEAKS: INTERPRETING THE CUES BEHIND SYNTHETIC IMAGE DETECTION |
| 11356 | BEYOND THE WINDOW: REGION-BASED ANOMALY LOCALIZATION NETWORK FOR TIME SERIES ANOMALY DETECTION |
| 2373 | BEYOND VIDEO-TO-SFX: VIDEO TO AUDIO SYNTHESIS WITH ENVIRONMENTALLY AWARE SPEECH |
| 16097 | BEYOND VISUAL REALISM: TOWARD RELIABLE FINANCIAL TIME SERIES GENERATION |
| 2241 | B-GRPO: UNSUPERVISEDSPEECHEMOTIONRECOGNITIONBASEDON BATCHED-GROUPRELATIVEPOLICYOPTIMIZATION |
| 6391 | BHSFLOW: LOW-LATENCY FLOW ESTIMATION WITH BLOCK-WISE HUBER LOSS AND SIMPLIFIED STRUCTURE |
| 14049 | BI-DIRECTIONAL ATTENTION FOR DUAL-BRANCH GENERATOR FOR CHANNEL EXTRAPOLATION AND HIGH-RESOLUTION SENSING |
| 12906 | BIDIRECTIONAL CLASS-TEXT AND VISION INTERACTION NETWORK FOR CAMOUFLAGED OBJECT DETECTION |
| 5171 | BIDIRECTIONAL CONTINUOUS-TIME VIDEO SUPER-RESOLUTION VIA NEURAL ORDINARY DIFFERENTIAL EQUATIONS AND MULTI-ORDER SPATIAL INTERACTIONS |
| 5970 | BIDIRECTIONAL SEMANTIC ENHANCEMENT NETWORK FOR VIDEO MOMENT RETRIEVAL |
| 5727 | Bifrost: An Adaptive Decision Framework for Regulating Depth of Thought in LLM Agents |
| 11811 | BIG: A BIDIRECTIONAL GENERATIVE VERIFICATION FRAMEWORK FOR MULTIMODAL RUMOR DETECTION |
| 11795 | Bilateral Graph Filtering Framework with Alternating Optimization for Robust Multi-View Outlier Detection |
| 11449 | Bimodal Fusion Framework for Dynamic Facial Expression Recognition in-the-wild |
| 5581 | Bi-Modal Textual Prompt Learning for Vision-Language Models in Remote Sensing |
| 9446 | BINARY MODULATION ON CONJUGATE-RECIPROCAL ZEROS (MOCZ) WITH LIST DECODING FOR UNKNOWN CHANNEL LENGTH |
| 3431 | BiNR: Live Video Broadcasting Quality Assessment |
| 15944 | BIOMED-R²: JOINT DIVERSITY RETRIEVAL AND EVIDENCE REASONING FOR BIOMEDICAL QUESTION ANSWERING |
| 1988 | Biorthogonal Z-Transform: A Unified Framework for Generalized Signals |
| 3163 | BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations |
| 14005 | BIPOLAR RELATIONAL NETWORK FOR IRREGULAR TIME SERIES ANOMALY DETECTION |
| 3480 | BI-RECNET: BIDIRECTIONAL RECONCILIATION NETWORK FOR HIERARCHICAL TIME SERIES FORECASTING |
| 10020 | BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition |
| 7838 | BISAM: BI-DOMAIN SEGMENT ANYTHING MODEL FOR CAMOUFLAGED OBJECT DETECTION |
| 10560 | BLACK-BOX ONLINE DATA POISONING AGAINST TRIMMING DEFENSES: AN MAB-BASED APPROACH |
| 14069 | BLEED NO MORE: GENERATIVE INTERFERENCE REDUCTION FOR MUSICAL RECORDINGS |
| 15445 | BLIND IMAGE DEBLURRING WITH DECOUPLED DIFFUSION REVERSION |
| 4390 | Blind Lunar Soil Image Enhancement |
| 17172 | Blind Online Neural Recovery of RADAR Waveforms from Linear Projections of Interleaved ADC measurements |
| 3045 | Blind Quality Assessment of Stereoscopic Videos Using Scene-Based Attributes |
| 6761 | BLINDDET: TOWARDS ROBUST PHYSICAL-WORLD BACKDOOR ATTACK IN LOW-LIGHT SCENARIOS AGAINST OBJECT DETECTION |
| 16092 | Blink-based Biometric Identification Using Wearable EEG under Fatigue and Effort Variations |
| 11346 | Block-wise 3D Gaussian Splatting for Efficient and High-Fidelity Cross-Device Rendering |
| 9230 | BLOODROOT: WHEN WATERMARKING TURNS POISONOUS FOR STEALTHY BACKDOOR |
| 11482 | BOATT: UNIFIED BAYESIAN ONLINE TRACKING AND ADAPTATION FOR DYNAMIC TENSOR STREAMS |
| 11931 | BONE-CONDUCTION GUIDED MULTIMODAL SPEECH ENHANCEMENT WITH CONDITIONAL DIFFUSION MODELS |
| 3681 | BOOSTING ANOMALY DETECTION IN INDUSTRIAL 3D DATA: AN ENTROPY-GUIDED DENOISING AUTOENCODER |
| 9818 | BOOSTING CONTEXTUAL ADAPTIVE POLICY LEARNING WITH FOUNDATION MODEL GUIDANCE |
| 8234 | BOOSTING KNOWLEDGE DISTILLATION VIA LOCAL CATEGORIES SIMILARITY SCALING |
| 13872 | BOOSTING KNOWLEDGE SHARING AMONG AGENTS VIA GRAPH DECOUPLING |
| 12099 | BOOSTING PRIOR GENERATION VIA MULTIMODAL GRADIENT ATTENTION FOR FEW-SHOT SEGMENTATION |
| 14596 | BORA: BLOCKWISE ORTHOGONAL RANK-1 ADAPTIVE OPTIMIZATION |
| 5761 | BOUNDARY-ENHANCED VISION MAMBA U-NET FOR MEDICAL IMAGE SEGMENTATION |
| 14272 | BOUNDING MEMORIZATION WITH LOSS CURVATURE AND CONNECTIONS TO COMPRESSION |
| 2754 | Box-Chain VLA: Explicit Reasoning-to-Action Interfaces for Generalizable Robotic Manipulation |
| 15982 | BPMF: BIDIRECTIONAL PREDICTION BY MULTI-SCALE FEATURES FOR MULTIMODAL INDUSTRIAL ANOMALY DETECTION |
| 12975 | BRAIN-GRASP: GRAPH-BASED SALIENCY PRIORS FOR IMPROVED FMRI-BASED VISUAL BRAIN DECODING |
| 5859 | BRAIN-INFORMED SPEECH SEPARATION FOR COCHLEAR IMPLANTS |
| 18998 | BRAIN-INSPIRED VIDEO QUALITY ASSESSMENT VIA VISUAL-EEG FEATURE ALIGNMENT |
| 1549 | BRAIN-SCORE MEETS REPRESENTATIONAL SIMILARITY ANALYSIS: A METHODOLOGICAL CONVERGENCE IN MODEL-BRAIN ALIGNMENT |
| 16377 | Breaking Codebook Redundancy for Faster Autoregressive Image Generation with Retrieval-Augmented Speculative Decoding |
| 11617 | Breaking Cognitive Fixation in Multi-Turn Dialogue with Self-Distancing and Incubation |
| 14665 | Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech |
| 10440 | BREAKING THE CURSE OF DIMENSIONALITY IN GAUSSIAN PROCESS TRAINING WITH ZEROTH-ORDER ADAPTIVE PERTURBATION |
| 13380 | BREAKING THE FORGETTING-MEMORIZATION TRADE-OFF: A MEMORY-ADAPTIVE OPTIMIZER FOR EFFECTIVE LARGE LANGUAGE MODELS UNLEARNING |
| 17036 | BREAK-THE-BEAT! CONTROLLABLE MIDI-TO-DRUM AUDIO SYNTHESIS |
| 7419 | BRIDGECODE: A DUAL SPEECH REPRESENTATION PARADIGM FOR AUTOREGRESSIVE ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS |
| 2381 | Bridging Academia and Industry: Large-Scale NIR Signal Foundation for Robust Multi-Task and Real-world Modeling |
| 4314 | Bridging Legal Expertise and LLMs: A Cooperative Logical Reasoning Framework for Sentencing Recommendation |
| 18091 | BRIDGING MULTI-SCALE CONTEXTS: PRIOR-GUIDED DYNAMIC FUSION FOR DEGRADATION-ROBUST IMAGE RESTORATION |
| 10169 | BRIDGING MULTI-VIEW STEREO AND GAUSSIAN SPLATTING: ENHANCING HIGH-FIDELITY RENDERING WITH GEOMETRIC PRIORS |
| 12728 | BRIDGING PHYSICAL MODELS AND GENERATIVE PRIORS FOR DEHAZING: FROM COARSE ESTIMATION TO RESIDUAL REFINEMENT |
| 5996 | BRIDGING SAR AND OPTICAL DOMAINS: SYNERGIZING BROWNIAN BRIDGE DIFFUSION AND LOCAL CONTRASTIVE LEARNING FOR IMAGE TRANSLATION |
| 13909 | BRIDGING THE FRONT-END AND BACK-END FOR ROBUST ASR VIA CROSS-ATTENTION-BASED U-NET |
| 10939 | BRIDGING THE GAP: A COMPARATIVE EXPLORATION OF SPEECH-LLM AND END-TO-END ARCHITECTURE FOR MULTILINGUAL CONVERSATIONAL ASR |
| 11313 | BRIDGING THE GAP: TRANSFORMING NATURAL LANGUAGE QUESTIONS INTO SQL QUERIES VIA ABSTRACT QUERY PATTERN AND CONTEXTUAL SCHEMA MARKUP |
| 16077 | Bridging the Knowledge Gap: LLM-Driven Contrastive Memory-of-Thought Prompting for Task-Oriented Dialogue |
| 13219 | Bridging the Measurement-Simulation Gap in Room Acoustics with Real2Sim Diffusion |
| 11642 | BRIDGING THE SEMANTIC GAP: CROSS-ATTENTIVE FUSION FOR JOINT ACOUSTIC-SEMANTIC SPEECH QUALITY ASSESSMENT |
| 3605 | BRIDGING VISION AND LANGUAGE WITH QUANTUM STATE FOR VIDEO-TEXT RETRIEVAL |
| 5946 | BRINGING MULTIMODAL FOUNDATION MODELS TO HEARING AIDS |
| 18942 | BSM-iMagLS: ILD Informed Binaural Signal Matching for Reproduction With Head-Mounted Microphone Arrays |
| 1825 | BSMP-SENET: BAND-SPLIT MAGNITUDEPHASE NETWORK FOR SPEECH ENHANCEMENT |
| 2535 | BTCCHAT: ADVANCING REMOTE SENSING BI-TEMPORAL CHANGE CAPTIONING WITH MULTIMODAL LARGE LANGUAGE MODEL |
| 12512 | BTDA: A ROBUST FRAMEWORK FOR ENCRYPTED TRAFFIC CLASSIFICATION WITH BYTE-LEVEL TLS DATA AUGMENTATION |
| 14208 | BUILD WITH PRECISION: BOTTOM-UP INFERENCE OF LINEAR DAGS |
| 12076 | Bundling-aware Masked Graph AutoEncoder for Bundle Recommendation |
| 15578 | BUTTERFLY TRANSFORMER FOR LIGHTWEIGHT IMAGE RESTORATION |
| 12918 | C&F-WSVAD: TOWARDS HIGH-PERFORMANCE COARSE AND FINE-GRAINED WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION |
| 15148 | C2BNVAE: Dual-Conditional Deep Generation of Network Traffic Data for Network Intrusion Detection System Balancing |
| 15708 | CADD: Condition-Anchor Dataset Distillation |
| 15945 | CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation |
| 11838 | CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering |
| 2495 | CADMamba: Clustering ADaptive Mamba for Multivariate Time Series Forecasting |
| 13214 | CAF-MAMBA: MAMBA-BASED CROSS-MODAL ADAPTIVE ATTENTION FUSION FOR MULTIMODAL DEPRESSION DETECTION |
| 15870 | CALM: JOINT CONTEXTUAL ACOUSTIC-LINGUISTIC MODELING FOR PERSONALIZATION OF MULTI-SPEAKER ASR |
| 18066 | CAMA:CHARACTER-AWARE MASKING AND ALIGNMENT FOR SELF-SUPERVISED STR |
| 11527 | CAMEO: Collection of Multilingual Emotional Speech Corpora |
| 3939 | CAMOD: CAUSAL-AWARE MODALITY DENOISING FOR MULTIMODAL DIALOGUE INTENT RECOGNITION |
| 18946 | CAN AUDIO REVEAL MUSIC PERFORMANCE DIFFICULTY? INSIGHTS FROM THE PIANO SYLLABUS DATASET |
| 1174 | CAN DATA AUGMENTATION BECOME A PRIVACY SHIELD FOR MODEL INVERSION ATTACKS? |
| 16934 | CAN HIERARCHICAL CROSS-MODAL FUSION PREDICT HUMAN PERCEPTION OF AI DUBBED CONTENT? |
| 9239 | Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs |
| 15132 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate |
| 11161 | CAN META-LEARNING ADDRESS THE CHALLENGES OF BIOSIGNAL PERSONALIZATION? |
| 5935 | CAN SYNTHETIC IMAGES SERVE AS EFFECTIVE AND EFFICIENT CLASS PROTOTYPES? |
| 15884 | CAN UNLEARNING OF MODELS LEAD TO ADVERSARIAL ROBUSTNESS? |
| 15547 | CAN VISION LANGUAGE MODELS PERCEIVE GRAPHS ACCURATELY? A VISUAL GRAPH PERCEPTION EVALUATION BENCHMARK |
| 13849 | CAPACITY ANALYSIS OF OFDM SYSTEMS WITH A SWARM OF NETWORK-CONTROLLED REPEATERS |
| 6354 | CAPT: A Lightweight Continual Adversarial Pre-training Framework for Traffic Analysis Model |
| 4634 | CAPTION AND AUDIO-GUIDED VIDEO REPRESENTATION LEARNING WITH GATED ATTENTION FOR PARTIALLY RELEVANT VIDEO RETRIEVAL |
| 3162 | CARBON-ECS: A BENCHMARK AND A PHYSICS-DECOUPLED MODEL FOR THE EAST CHINA SEA CARBON FLUX |
| 17604 | CARDINALITY-CONSTRAINED COVARIANCE ESTIMATION IN ARRAY BEAMFORMING |
| 13250 | CARDIOCOT: MULTIMODAL PREDICTION OF MACE RECURRENCE RISK WITH HIERARCHICAL CHAIN-OF-THOUGHT REASONING |
| 14986 | CARE: COGNITIVE-REASONING AUGMENTED REINFORCEMENT FOR EMOTIONAL SUPPORT CONVERSATION |
| 15170 | CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control |
| 9429 | CARE-AGENT: MULTI-AGENT COLLABORATION WITH CONFLICT-AWARE ROUTING MECHANISM FOR DIAGNOSIS PREDICTION |
| 10536 | CAREFUL-MM: CAUSAL AND UNCERTAINTY-AWARE, SARCASM-ROBUST MULTIMODAL DEPRESSION DETECTION |
| 13997 | CAS-J: Cross-Modal Attention Synergy for Jailbreaking Large Vision-Language Models |
| 11103 | CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays |
| 6474 | CAS-ODE: JOINTLY LEARNING ADAPTIVE STRUCTURES AND CONTINUOUS DYNAMICS FOR EMOTION RECOGNITION IN CONVERSATION |
| 17803 | CASP: CONFIDENCE-AWARE STRUCTURAL PRESERVATION FOR CONTINUAL TEST-TIME ADAPTATION OBJECT DETECTION |
| 4591 | CAST-ACF: ROBUST GENERATION AND EVALUATION FOR MULTI-GRANULARITY TIMELINE SUMMARIZATION |
| 10700 | CASTELLA: LONG AUDIO DATASET WITH CAPTIONS AND TEMPORAL BOUNDARIES |
| 10135 | Category-Adaptive Feature Compression for Multi-Device Collaborative Computing |
| 3919 | CAUSAL DEBIASING AND FEATURE FUSION FOR OPEN SET DOMAIN ADAPTATION |
| 8007 | CAUSAL EFFECT ESTIMATION UNDER NETWORK INTERFERENCE WITH STATE SPACE MODELS |
| 17422 | Causal Fingerprints of AI Generative Models |
| 6038 | CAUSAL INTERVENTED DISENTANGLEMENT FOR MULTI-SOURCE CROSS-DOMAIN RECOMMENDATION |
| 13285 | CAUSAL-BOOTSTRAPPED MULTI-AGENT REINFORCEMENT LEARNING FOR MITIGATING THE COLD-START PROBLEM |
| 4511 | CAUSALRAP: CAUSAL GRAPH-DRIVEN RETRIEVAL AUGMENTED LONG-HORIZON TASK PLANNING FOR LARGE LANGUAGE MODELS |
| 16297 | CBR-DETR: AN ENHANCED RT-DETR WITH MULTI-LEVEL CONTEXT FUSION AND BIDIRECTIONAL ROUTING |
| 11124 | CC-G2PNP: STREAMING GRAPHEME-TO-PHONEME AND PROSODY WITH CONFORMER-CTC FOR UNSEGMENTED LANGUAGES |
| 12778 | CCMA: CONSISTENCY-AWARE CROSS-MODAL ALIGNMENT FOR TEXT-BASED PERSON RETRIEVAL |
| 4377 | C-Conformer: Channel-augmented Conformer for Sound Event Localization and Detection |
| 10974 | CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition |
| 9694 | C-DGPA: Class-Centric Dual-Alignment Generative Prompt Adaptation |
| 12029 | CDIFF: CONTEXT-DISENTANGLED IMAGE SYNTHESIS FOR ANIMAL INDIVIDUAL IDENTIFICATION |
| 16849 | CD-MTO: A Joint Optimization Framework for Multi-Hop Task Offloading and Energy-Aware Computing in Mobile Edge Networks |
| 16750 | CDT: ROBUST DETECTION OF DOH TUNNELS VIA BACKGROUND ASSOCIATION |
| 10436 | CEAL: CROSS-EXPERT ATTENTION WITH CURATED PEER SELECTION FOR SCALABLE AND PRIVACY-PRESERVING EXEMPLAR-FREE CONTINUAL LEARNING |
| 13315 | CEFL-Ranking:Re-evaluation of Communication-Efficient FL Methods |
| 11204 | CE-GOCD: Central Entity-Guided Graph Optimization for Community Detection to Augment LLM Scientific Question Answering |
| 7614 | CENTRALIZED SPECTRAL INITIALIZATION FOR SPARSE PHASE RETRIEVAL |
| 13252 | CERF: A COMMUNICATION-EFFICIENT AND RETRAINING-FREE FRAMEWORK FOR MULTI-UAV COLLABORATIVE PERCEPTION |
| 6371 | CFA: Lightweight Defense against Membership Inference Attacks through Class-wise Feature Aggregation |
| 17875 | CFGAN : COMPLEX FREQUENCY-DOMAIN GRAPH ATTENTION NETWORK FOR TIME SERIES FORECASTING |
| 13147 | CFIRE: Cross-View Feature Interaction for Fine-Grained Regression-based UAV Localization |
| 3240 | CFRNet: Accelerating Lane Line Detection with Asymmetric Weighted Attention Distillation and Cascaded Feature Refinement |
| 9407 | CGNN+: A GRAPH NEURAL INSTRUMENTAL VARIABLE FRAMEWORK FOR ROBUST CAUSAL INFERENCE IN NETWORKED DATA |
| 1037 | CHAIN OF CORRECTION FOR FULL-TEXT SPEECH RECOGNITION WITH LARGE LANGUAGE MODELS |
| 13787 | CHAIN-OF-CAPTION: TRAINING-FREE IMPROVEMENT OF MULTIMODAL LARGE LANGUAGE MODEL ON REFERRING EXPRESSION COMPREHENSION |
| 12030 | Change Detection Methods for Non-stationary Stochastic Linear Bandits |
| 17917 | CHANGE-AWARE TEMPORAL ALIGNMENT ON HETEROGENEOUS GRAPH SNAPSHOTS FOR INSIDER THREAT DETECTION |
| 19148 | Channel Estimation and Data Detection in DS-Spread Channels: A Unified Framework, Novel Algorithms, and Waveform Comparison |
| 9664 | CHANNEL MODELING IN THE DELAY-DOPPLER DOMAIN FOR COMMUNICATIONS WITH A MOBILE RECEIVER |
| 6645 | Channel Prediction under Network Distribution Shift Using Continual Learning-based Loss Regularization |
| 3675 | Channel, Trend and Periodic-Wise Representation Learning for Multivariate Long-term Time Series Forecasting |
| 14658 | Channel-Adaptive Robust Aggregation for Over-the-Air Federated Learning in Heterogeneous Networks |
| 17666 | CHANNEL-WISE RETRIEVAL FOR MULTIVARIATE TIME SERIES FORECASTING |
| 14383 | CHAOS: Chart Analysis with Outlier Samples |
| 15671 | Chemical Sight Net: Incorporating Crystallographic Priors for Accurate Space Group Determination from PXRD |
| 15407 | CHNERV: CONDITION ENHANCED HYBRID NEURAL REPRESENTATION FOR VIDEOS |
| 15495 | CHROMOUVQA: BENCHMARKING VISION-LANGUAGE MODELS UNDER CHROMATIC CAMOUFLAGED IMAGES |
| 2783 | CHUNKWISE ALIGNERS FOR STREAMING SPEECH RECOGNITION |
| 10329 | CHUNK-WISE ATTENTION TRANSDUCER FOR FAST AND ACCURATE STREAMING SPEECH-TO-TEXT |
| 2688 | CIDER: A Causal Cure for Brand-Obsessed Text-to-Image Models |
| 11264 | CIF: A TWO-STAGE COGNITIVELY-INSPIRED FRAMEWORK FOR CHINESE SPELLING CORRECTION |
| 13299 | CIFC-MFD: END-TO-END MULTI-FACE FORGERY DETECTION USING CROSS-IMAGE FACE CONTRAST |
| 9910 | CIMRAG: CIM-AWARE DOMAIN-ADAPTIVE AND NOISE-RESILIENT RETRIEVAL-AUGMENTED GENERATION FOR EDGE-BASED LLMS |
| 6987 | CINE-SHOT DIRECTOR: NATIVE CINEMA-GRADE MULTI-SHOT VIDEO GENERATION FRAMEWORK |
| 6005 | CIP-DOA: Cross-Instance Prompted DoA Estimation via Semantic-Spatial Matching |
| 4928 | CITRAG : CONTRADICTION IDENTIFICATION AND TRACING FOR RETRIEVAL-AUGMENTED GENERATION |
| 16953 | CKANST: CKAN-BASED ARBITRARY STYLE TRANSFER FOR HOMOGENEOUS IMAGES |
| 18194 | CLASS DIFFICULTY-AWARE REAL-TIME INSTANCE SEGMENTATION |
| 14499 | CLASS-AWARE PERMUTATION-INVARIANT SIGNAL-TO-DISTORTION RATIO FOR SEMANTIC SEGMENTATION OF SOUND SCENE WITH SAME-CLASS SOURCES |
| 17897 | Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation |
| 11301 | CLASS-IMBALANCED MULTI-VIEW CLUSTERING VIA SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE |
| 12349 | CLASS-INVARIANT TEST-TIME AUGMENTATION FOR DOMAIN GENERALIZATION |
| 16636 | CLEAN: COMPLIANT LOOPS WITH ENHANCED ADJUSTMENT FOR TRAINING-FREE UNLEARNING |
| 5518 | ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery |
| 3259 | CLIP-driven Zero-shot Learning with Ambiguous Labels |
| 13632 | CLIP-Guided Unsupervised Semantic-Aware Exposure Correction |
| 3888 | CLOSED-FORM 3D TDOA-AOA SOURCE LOCALIZATION WITH QUATERNIONS |
| 6960 | Closed-form Ziv-Zakai Bound for Compressive Time Delay Estimation |
| 13416 | Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Multimodal Emotion Recognition |
| 6066 | CLUEUP: RESOLVING INTENT AMBIGUITY IN PERSONALIZED WEB AGENTS WITH PROFILE-DRIVEN CLARIFICATION |
| 5247 | CLUSTERING OF MULTISOURCE REMOTE SENSING DATA VIA LOW-RANK TENSOR LEARNING WITH SPATIAL CONSTRAINTS |
| 11754 | CLUSTERING-DRIVEN MEMORY COMPRESSION FOR ON-DEVICE LARGE LANGUAGE MODELS |
| 13490 | CMCFAE: CLOUD MODEL CHARACTERISTIC FUNCTION AUTO-ENCODER FOR STRUCTURE-AWARE GENERATIVE MODELING |
| 3877 | COARSE ADVERSARIAL TRAINING WITH LABEL GROUPING FOR ROBUST CLASSIFICATION |
| 4632 | COARSE-TO-FINE TRAJECTORY PREDICTION VIA TIME-AWARE INTERACTION PREDICTOR AND CONDITIONAL DIFFUSION-BASED REFINER |
| 5107 | CODECSLIME: TEMPORAL REDUNDANCY COMPRESSION OF NEURAL SPEECH CODEC VIA DYNAMIC FRAME RATE |
| 18932 | CODED ROBUST AGGREGATION FOR DISTRIBUTED LEARNING UNDER BYZANTINE ATTACKS |
| 12597 | CODEOE: A BENCHMARK FOR JOINTLY EXTRACTING CROSS-DOCUMENT EVENTS AND OPINIONS FROM SOCIAL MEDIA |
| 11098 | CODEPMP: SCALABLE PREFERENCE MODEL PRETRAINING FOR LARGE LANGUAGE MODEL REASONING |
| 5980 | CODESEP: LOW-BITRATE CODEC-DRIVEN SPEECH SEPARATION WITH BASE-TOKEN DISENTANGLEMENT AND AUXILIARY-TOKEN SERIAL PREDICTION |
| 3474 | Codesign of FDA-MIMO Radar-Communication System in the Presence of Mainlobe Deceptive Jammers |
| 13430 | CODE-VISION: EVALUATING MULTIMODAL LLMS LOGIC UNDERSTANDING AND REASONING CAPABILITIES THROUGH CODE GENERATION |
| 9277 | CO-DRS: DATA-FREE ROBUSTNESS STEALING VIA DUAL-MODEL COLLABORATION |
| 9565 | CoETR2: Complementary Packet-based Modeling for Encrypted Traffic |
| 9434 | COFE: A FRAMEWORK GENERATING COUNTERFACTUAL ECG FOR EXPLAINABLE CARDIAC AI-DIAGNOSTICS |
| 19010 | Co-forecasting of Time-varying Spatial-frequency Map for Selective Fixed-Filter Multichannel ANC based on Dynamic Factor Graph |
| 11938 | Cognition-enhanced One-step Diffusion Model for Degradation-aware Super-Resolution in the Dark |
| 3616 | Cognitive Attention and Dual Residual Networks for Offline Regularized Multi-Agent Reinforcement Learning |
| 11240 | COHERENT-GS: HIGH-FIDELITY 3DGS STYLIZATION WITH A GLOBALLY COHERENT COLOR MANIFOLD |
| 14601 | CO-INITIALIZATION OF CONTROL FILTER AND SECONDARY PATH VIA META-LEARNING FOR ACTIVE NOISE CONTROL |
| 14298 | COLLABORATIVE COMPRESSION FOR LARGE-SCALE MOE DEPLOYMENT ON EDGE |
| 2897 | COLLABORATIVE GRAPH CONTRASTIVE NETWORK FOR SEMI-SUPERVISED GRAPH NODE CLASSIFICATION |
| 16085 | Collaborative learning for Enhanced Cross Domain Adaptation |
| 8132 | COLLABORATIVE OPTIMIZATION OF LEARNABLE PROBE TOKENS AND ATTRIBUTE TEXT PROMPTS FOR LOW-RESOLUTION FINE-GRAINED VISUAL CLASSIFICATION |
| 13234 | COLLABORATIVE STANDARDIZATION OF MULTI-CENTER CLINICAL DATA USING DISTRIBUTION-AWARE LLM FUSION |
| 16362 | Collective Experts against Noise: Enhancing Social Media Popularity Prediction via Retrieval-Augmented Multimodal Experts |
| 2726 | Collusion-Resistant and Trusted Authority-Free Verifiable Federated Learning via a Two-Server Architecture |
| 5613 | COMBINING MULTI-ORDER ATTENTION AND MULTI-RESOLUTION DISCRIMINATOR FOR HIGH-FIDELITY NEURAL VOCODER |
| 14215 | COMBINING SSL SPEECH FEATURES, CONTEXTUAL TRANSFORMERS AND MAMBA MODELS FOR REALISTIC AUDIO SPOOFING DETECTION |
| 18878 | COMBINING X-VECTORS AND BAYESIAN BATCH ACTIVE LEARNING: TWO-STAGE ACTIVE LEARNING PIPELINE FOR SPEECH RECOGNITION |
| 1566 | COME: Towards Superior Embeddings for Multimodal RAG With Heterogeneous Input |
| 2522 | COMET: continuous-time trajectory-guided temporal modeling for spacecraft pose estimation |
| 7514 | Communication-Efficient Federated Learning with Pre-Executed Gradient Descent |
| 16328 | COMNET: A COMPLEMENTARY PROTOTYPES-GUIDED RECONSTRUCTION FRAMEWORK FOR MULTI-CLASS ANOMALY DETECTION |
| 3784 | COMPACT REPRESENTATION LEARNING FOR MULTIMODAL DRUG-DRUG INTERACTION EVENT PREDICTION |
| 1357 | Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification |
| 16411 | Complex-Aware Semi-Supervised Modulation Recognition via Latent Adversarial Training |
| 9052 | COMPOSED GRIDS, ONE-WARP: TEST-TIME RECTIFICATION |
| 13329 | COMPOSED VISUAL GROUNDING IN REMOTE SENSING IMAGES |
| 2890 | Composite Memory Transformer for Online 3D Human Motion Predicition |
| 3904 | Compositional Image Synthesis with Inference-Time Scaling |
| 10056 | Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions |
| 9558 | COMPRESSED BC-LISTA VIA LOW-RANK CONVOLUTIONAL DECOMPOSITION |
| 14394 | Compressed Spectrum Cartography: Estimating Wide-Band Channel Gain Maps from Sub-Nyquist Delay Correlations |
| 16004 | COMPRESSING KV CACHE FOR LONG-CONTEXT LLM INFERENCE WITH INTER-LAYER ATTENTION SIMILARITY |
| 16073 | Compression meets Sampling: LZ78-SPA for Efficient Symbolic Music Generation |
| 4813 | COMPRESSIVE RECOVERY OF SIGNALS DEFINED ON PERTURBED GRAPHS |
| 9961 | COMPRESSIVE SPATIAL CHANNEL ESTIMATION UNDER IQ IMBALANCE |
| 13956 | CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures |
| 11148 | CONCEPT ACTIVATION VECTORS: A UNIFYING VIEW AND ADVERSARIAL ATTACKS |
| 17515 | CONCEPTDEBIAS: INTERPRETABLE BIAS MITIGATION VIA CONCEPT DECOMPOSITION IN DEEP NEURAL NETWORKS |
| 16365 | CONDITIONAL DIFFUSION INVERSION LEARNING FOR MULTI-VIEW STEREO |
| 14189 | CONDITIONAL DIFFUSION MODELS FOR MENTAL HEALTH-PRESERVING VOICE CONVERSION |
| 15755 | Conditional Prior-based Non-stationary Channel Estimation Using Accelerated Diffusion Models |
| 3677 | CONDITIONAL VARIATIONAL AUTOENCODER FOR GLOSS-FREE SIGN LANGUAGE TRANSLATION |
| 18155 | CONFCLIP: CONFIDENCE-WEIGHTED AND CLIPPED REWARD FOR REINFORCEMENT LEARNING IN LLMS |
| 11302 | CONFIDENCE-BASED FILTERING FOR SPEECH DATASET CURATION WITH GENERATIVE SPEECH ENHANCEMENT USING DISCRETE TOKENS |
| 14218 | Confidence-Guided Error Correction for Disordered Speech Recognition |
| 3140 | CONFIDENT MOTION MAGNIFICATION CURRICULUM FOR SELF-SUPERVISED OPTICAL FLOW |
| 11282 | Conflict-Aware Client Selection for Multi-Server Federated Learning |
| 13901 | CONFORMAL INFERENCE FOR TIME SERIES OVER GRAPHS |
| 10073 | CONFORMAL PREDICTION AIDED KALMAN FILTERS WITH CONFIDENCE INTERVALS |
| 17459 | CONFORMAL SIGNAL TEMPORAL LOGIC FOR ROBUST REINFORCEMENT LEARNING CONTROL: A CASE STUDY |
| 15093 | Conformalized Gaussian processes for online uncertainty quantification over graphs |
| 11762 | Conjugate Relation Modeling for Few-Shot Knowledge Graph Completion |
| 4560 | Connecting Layer-Wise Representation of WavLM with Spectro-Temporal Modulation on Speaker Verification |
| 13574 | CONQUER: CONTEXT-AWARE REPRESENTATION WITH QUERY ENHANCEMENT FOR TEXT-BASED PERSON SEARCH |
| 3192 | CONSENSUS, CONFLICT, AND COORDINATION: THE C^3-MKD FRAMEWORK FOR RELIABLE MULTI-TEACHER KNOWLEDGE DISTILLATION |
| 8559 | Consensus-Awarded Multi-Agent Debate via Adversarial Interaction |
| 12190 | CONSISTENCY-AWARE LEARNING FOR UNBIASED VISUAL QUESTION ANSWERING |
| 10137 | CONSTANT-MODULUS LINEAR TRANSFORM FOR RIS BEAMFORMING IN UPLINK MULTIUSER MIMO SYSTEMS |
| 18940 | CONSTRAINED CONDITIONAL DENOISING DIFFUSION FOR HYPERSPECTRAL-MULTISPECTRAL FUSION |
| 4131 | CONSTRAINED LOCAL POINT CLOUD PERTURBATIONS USING ADAPTIVE CURVATURE FOR 3D ADVERSARIAL ATTACKS |
| 3092 | Constrained Paraphrase Consistency for LLM Hallucination Detection |
| 1010 | CONSTRAINT OPTIMIZED MULTICHANNEL MIXER-LIMITER DESIGN |
| 12679 | CONSTRUCTING COMPOSITE FEATURES FOR INTERPRETABLE MUSIC-TAGGING |
| 4622 | CONSTRUCTION OF BINARY SEQUENCE PAIRS WITH EQUAL PERIODIC AUTOCORRELATION |
| 9477 | CONTENT ADAPTIVE SWITCHABLE HYPERPRIOR NETWORKS FOR LEARNED IMAGE COMPRESSION |
| 16132 | CONTENT ANONYMIZATION FOR PRIVACY IN LONG-FORM AUDIO |
| 11020 | CONTENT LEAKAGE IN LIBRISPEECH AND ITS IMPACT ON THE PRIVACY EVALUATION OF SPEAKER ANONYMIZATION |
| 1482 | Content-Aware Model Slimming for Image Super-Resolution with Large Input |
| 12605 | CONTENT-PRESERVING SPEECH REPRESENTATION LEARNING VIA ADAPTIVE SEGMENT-LEVEL ALIGNMENT |
| 13330 | CONTEXT-AWARE DEEP HASHING FOR CROSS-DOMAIN IMAGE RETRIEVAL |
| 6772 | CONTEXT-AWARE DYNAMIC GRAPH LEARNING FOR MULTIMODAL EMOTION RECOGNITION WITH MISSING MODALITIES |
| 12583 | CONTEXTUAL BIASING FOR ASR IN SPEECH LLM WITH COMMON WORD CUES AND BIAS WORD POSITION PREDICTION |
| 12821 | CONTEXTUAL CLUE MINING AND CLASS CALIBRATION FOR WEAKLY SUPERVISED VIDEO ANOMALY DETECTION |
| 8773 | CONTEXTUAL RELATIONSHIP FEATURE-ENHANCED STEGANALYSIS FOR SOCIAL TEXTS |
| 11348 | Continual Learning with CLIP Text-Prototype and an Orthogonal Pre-Expanded Classification Head |
| 11102 | CONTINUAL NEURAL NETWORK RETRIEVAL FOR EVER-EXPANDING MODEL ZOO |
| 15307 | CONTINUAL TIME SERIES FORECASTING WITH DIFFUSION MODELS UNDER FUNCTIONAL REGULARIZATION |
| 14566 | CONTINUATION METHOD FOR FEEDBACK DELAY NETWORK MODAL DECOMPOSITION |
| 19062 | Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion |
| 16158 | Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs |
| 11216 | CONTRASTIVE DISTILLATION OF EMOTION KNOWLEDGE FROM LLMS FOR ZERO-SHOT EMOTION RECOGNITION |
| 14867 | CONTRASTIVE HYPERSPHERE FOR ONE-CLASS LINGUISTIC STEGANALYSIS |
| 10514 | Contrastive Learning-Based Deep Neural Network for Robust DOA Estimation |
| 12661 | CONTRASTIVE PERTURBATION WITH FREQUENCY-DOMAIN FEATURE FUSION FOR FACE PRIVACY PROTECTION |
| 9781 | CONTRASTIVE TIMBRE REPRESENTATIONS FOR MUSICAL INSTRUMENT AND SYNTHESIZER RETRIEVAL |
| 15605 | Controllable Embedding Transformation for Mood-Guided Music Retrieval |
| 11759 | CONTROLLABLE LOCALIZED FACE ANONYMIZATION VIA DIFFUSION INPAINTING |
| 6897 | CONTROLLING LANGUAGE DIFFICULTY IN DIALOGUES WITH LINGUISTIC FEATURES |
| 18901 | Convergence Analysis of the Factorial Kalman Filter |
| 18886 | CONVOLUTIONAL FILTERING WITH RKHS ALGEBRAS |
| 14825 | CONVOLUTIONAL GRAPH FILTER DESIGN FOR SIGNED GRAPHS |
| 11944 | COOPERATIVE DETECTION OF CYCLOSTATIONARY TARGET ECHOES FOR PASSIVE RADAR NETWORKS |
| 3514 | COOPERATIVE MULTI-AGENT REINFORCEMENT LEARNING FOR ADAPTIVE AGGREGATION IN SEMI-SUPERVISED FEDERATED LEARNING WITH NON-IID DATA |
| 6777 | COREANCHOR-QA: CENTER-ANCHORED AND SELF-IMPROVING FOR QUESTION-ANSWER GENERATION |
| 15089 | CORM: COARSE-TO-FINE-GRAINED OFFLOADING FOR SMOE LLM INFERENCE ON CONSUMER-GRADE GPU |
| 7936 | CORRECTING THE BIAS: AVOIDING FALSE TRIPLET INJECTION IN MULTILINGUAL KNOWLEDGE GRAPH COMPLETION WITH LLM-AUGMENTED REASONING |
| 9645 | CorrEctor: An Execute-to-Correct Paradigm for Efficient LLM Secure Inference |
| 7879 | COSAGE: FEDERATED LEARNING WITH GRADIENT SUMMARIES FOR CENTRALIZED CLIENT SELECTION |
| 10009 | COST–EFFICIENT DYNAMIC FEATURE ACQUISITION UNDER LIMITED SUPERVISION |
| 15663 | CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data |
| 8164 | COUNTERFACTUAL PROBABILITY DISTILLATION FOR REMOTE SENSING |
| 3056 | COUNTING DISTINCT MULTIVARIATE SELF-SIMILARITY PARAMETERS USING A BOOTSTRAP-DRIVEN GRAPH CLUSTERING APPROACH |
| 10909 | Coupling Acoustic Geometry and Visual Semantics for Robust Depth Estimation |
| 9901 | COVA: TEXT-GUIDED COMPOSED RETRIEVAL FOR AUDIO-VISUAL CONTENT |
| 9557 | CoVariance Filters and Neural Networks over Hilbert Spaces |
| 18243 | Covariance-Agnostic Model-Based Deep Learning Filter for Jump Systems |
| 12715 | CP LOSS: CHANNEL-WISE PERCEPTUAL LOSS FOR TIME SERIES FORECASTING |
| 2490 | CP-GUARD: CONTINUAL PREFERENCE ALIGNMENT FOR COPYRIGHT PROTECTION |
| 17870 | CPJ: Explainable Agricultural Pest Diagnosis via Caption–Prompt–Judge with LLM-Judged Refinement |
| 5794 | CPMark: Robust Latent watermarking against composite perturbations |
| 14250 | CPMIL: COMPOUND PROTOTYPE-BASED MULTIPLE INSTANCE LEARNING FOR WHOLE SLIDE IMAGE CLASSIFICATION |
| 10103 | CPT: CONSISTENT PROXY TUNING FOR BLACK-BOX MODELS |
| 11488 | CPTFORMER: SELF-SUPERVISED CHANGE-POINT-AWARE TRANSFORMER FRAMEWORK FOR NON-STATIONARY TIME SERIES FORECASTING |
| 19039 | Cramér-Rao Bounds for Laplacian Matrix Estimation |
| 15373 | Cramér-Rao Bounds on Sparse-Diffuse Channel Estimation |
| 19034 | CRB Optimization for Intelligent Reflecting Surface-Assisted NLOS Wireless Sensing |
| 14939 | CRDSNet: Scene Text Recognition Based on Cross-modal and Recurrent Decomposed Self-Attention |
| 8760 | CREDID: CREDIBLE MULTI-BIT WATERMARK FOR LARGE LANGUAGE MODELS IDENTIFICATION |
| 11274 | Critical Noise: An Efficient Label-Flipping Attack Against Malicious Traffic Detection Systems |
| 6636 | CRLB-Guided Orientation Design of Photodiode Arrays for Wide-FOV Optical Wireless Reception |
| 12671 | Crop Classification in Satellite Images via First Eigenvector of Learned Signed Graph Laplacian |
| 11776 | Cross Paraphrastic Invariance Learning for Hallucination Detection |
| 9746 | CROSS PSEUDO LABELING FOR WEAKLY SUPERVISED VIDEO ANOMALY DETECTION |
| 17257 | CROSS TASK KNOWLEDGE TRANSFER FOR REHEARSAL-FREE CONTINUAL LEARNING |
| 10754 | Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification |
| 16141 | CROSS-ATTENTION BASED DUAL-STREAM FRAMEWORK FOR BLIND UNDERWATER IMAGE QUALITY ASSESSMENT |
| 14891 | CROSS-ATTENTIVE ADAPTER WITH REGULARIZED DOMAIN ADAPTATION FOR SPEAKER VERIFICATION |
| 4470 | CROSS-CULTURAL BIAS IN MEL-SCALE REPRESENTATIONS: EVIDENCE AND ALTERNATIVES FROM SPEECH AND MUSIC |
| 10148 | CROSS-DOMAIN CONTRASTIVE LEARNING WITH DYNAMIC THRESHOLD CALIBRATION FOR SOURCE SPEAKER TRACING |
| 15707 | CROSS-DOMAIN LORA FINGERPRINT LOCALIZATION VIA SPATIAL REPRESENTATION FEW-SHOT KNOWLEDGE DISTILLATION |
| 15014 | CROSS-EXAMINER: EVALUATING CONSISTENCY OF LARGE LANGUAGE MODEL-GENERATED EXPLANATIONS |
| 6170 | CROSS-LINGUAL ALZHEIMER’S DISEASE DETECTION WITH MULTIMODAL LLMS VIA SPEECH CUE-AUGMENTED PROMPTING AND INSTRUCTION TUNING |
| 11173 | Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis |
| 14279 | CROSS-LINGUAL INTERLEAVING FOR SPEECH LANGUAGE MODELS |
| 6702 | Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition |
| 16682 | CROSS-MODAL GUIDANCE FOR FAST DIFFUSION-BASED COMPUTED TOMOGRAPHY |
| 14318 | CROSS-MODAL KNOWLEDGE DISTILLATION FOR SPEECH LARGE LANGUAGE MODELS |
| 11056 | CROSS-MODAL KNOWLEDGE DISTILLATION FROM VIDEO TO WIFI CSI FOR MULTI-USER HUMAN ACTIVITY RECOGNITION |
| 8178 | CROSS-MODAL POINT CLOUD COMPLETION VIA STRUCTURALLY-AWARE PROXY GUIDANCE |
| 5524 | Cross-scene Hyperspectral Image Classification via Topology-Aware Learning without Source Data |
| 14032 | Cross-View Change Detection via Self-Supervised Contrastive Representation Learning |
| 14701 | CROWDSOURCED DIGITAL TWINS FOR BEAMFORMING OPTIMIZATION IN XR COMMUNICATIONS |
| 6841 | CS3-BENCH: EVALUATING AND ENHANCING SPEECH-TO-SPEECH LLMS FOR MANDARIN-ENGLISH CODE-SWITCHING |
| 17443 | CSFUSION: FLEXIBLE MULTI-MODAL IMAGE FUSION VIA CONTENT-STYLE CROSS MODULATION |
| 10118 | CSGAN-VLP:Swin-Transformer Enhanced GAN and Contrastive Alignment for Robust Cross-Scene Passive Visible Light Positioning |
| 15788 | CSGONET: COLLABORATIVE SELF-SUPERVISED LEARNING WITH GRAPH AND OCCUPANCY RECONSTRUCTION FOR TRAJECTORY PREDICTION |
| 7925 | CSPC: A High-Quality Dataset and Comprehensive Evaluation Metric for Chinese Sentence Paraphrasing |
| 17782 | CTC-DID: CTC-BASED ARABIC DIALECT IDENTIFICATION FOR STREAMING APPLICATIONS |
| 13579 | CTGFILTER: USABILITY-PRESERVING CONTROLLABLE TEXT GENERATION VIA NULL-SPACE PROJECTION |
| 10675 | CTR-LORA: CURVATURE-AWARE AND TRUST-REGION GUIDED LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS |
| 11656 | CUE-TS: SOFT COVARIATE PROMPTS AS A UNIVERSAL ENHANCER FOR TIME SERIES FOUNDATION MODELS |
| 5021 | CURRICULUM LEARNING WITH CONTRASTIVE LOSS FOR LIGHTWEIGHT SPEAKER VERIFICATION |
| 14082 | CURVATURE-DRIVEN SYNCHROSQUEEZING TRANSFORM: A FINE-SCALE BIDIRECTIONAL METHOD FOR TIME-FREQUENCY REPRESENTATION |
| 11880 | CURVILINEAR SPECTRAL U-NET: A FRAMEWORK FOR STRUCTURE-AWARE ROAD EXTRACTION FROM VERY HIGH RESOLUTION IMAGERY |
| 16499 | CVAR-AWARE NETWORK SLICING FOR TAIL LATENCY UNDER TIERED DEADLINES |
| 18705 | CVSTIM: MITIGATING OBJECT HALLUCINATION IN MLLMS VIA CO-OCCURRENCE GUIDED VISUAL STIMULATION |
| 12517 | CYLINDERFUSION: SELF-ADAPTIVE CYLINDRICAL 3+1D RADAR-CAMERA FUSION FOR WATERWAY POINT CLOUD SEGMENTATION |
| 9556 | CZSRSSC: CONTINUAL ZERO-SHOT REMOTE SENSING SCENE CLASSIFICATION |
| 13063 | D2AFM: DUAL-DOMAIN ADAPTIVE FUSION MODULE FOR UNDERWATER IMAGE ENHANCEMENT |
| 12753 | D²-DETR: Dual-Sourced Augmentation with Duration-Aware Differential Decoder for Video Temporal Grounding |
| 4764 | D2M: Decoupling to Modulate via Emotion Trajectories for Dynamic Facial Expression Recognition |
| 4363 | D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead sheet |
| 10130 | DAAGNET: DEPTH-ADAPTIVE ANCHOR GRAPH FOR WEAKLY-SUPERVISED CROWD COUNTING |
| 9361 | DAC: DIFFERENTIABLE ARCHITECTURE CAUSALITY |
| 17911 | DACAS: ADVERSARIAL SPARSE ATTENTION NETWORKS FOR CROSS-MODAL ANOMALY DETECTION IN DISTRIBUTED SYSTEMS |
| 10415 | DADAGAN: AN IMAGE SUPER-RESOLUTION NETWORK WITH PIXEL-WISE ESTIMATION OF DEGRADATION DEGREES |
| 16671 | DAFMR: DUAL-SIDE ATTRIBUTE-AWARE FUSION WITH MIXTURE-OF-EXPERTS AND REGULARIZATION FOR RECOMMENDATION SYSTEMS |
| 4602 | DAIEN-TTS: DISENTANGLED AUDIO INFILLING FOR ENVIRONMENT-AWARE TEXT-TO-SPEECH SYNTHESIS |
| 6185 | DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation |
| 5816 | DAME: DURATION-AWARE MATRYOSHKA EMBEDDING FOR DURATION-ROBUST SPEAKER VERIFICATION |
| 14873 | DAMO: A DATA-EFFICIENT MULTIMODAL ORCHESTRATOR FOR TEMPORAL REASONING WITH VIDEO LLMS |
| 2389 | DANCINGMATE: COORDINATED AND SYNCHRONIZED DANCE ACCOMPANIMENT GENERATION |
| 15894 | DAPT: A DUAL-PATH FRAMEWORK FOR MULTILINGUAL MULTI-HOP QUESTION ANSWERING |
| 18874 | DARAS: DYNAMIC AUDIO-ROOM ACOUSTIC SYNTHESIS FOR BLIND ROOM IMPULSE RESPONSE ESTIMATION |
| 10671 | DARC-CLIP: DYNAMIC ADAPTIVE REFINEMENT WITH CROSS-ATTENTION FOR MEME UNDERSTANDING |
| 1415 | DARE: Dual-Aspect Reflective Evolution for Prompt Optimization |
| 3750 | DarkCite: Unveiling Authority Bias as Implicit RAG Jailbreak Attacks |
| 10025 | DARKVRAI: CAPTURE-CONDITION CONDITIONING AND BURST-ORDER SELECTIVE SCAN FOR LOW-LIGHT RAW VIDEO DENOISING |
| 8705 | DARL-CLIP: DENSITY-ADAPTIVE AND REINFORCEMENT FINE-TUNING CLIP FOR CROSS-SCENARIO UAV OBJECT DETECTION |
| 2817 | DART: a Dual-modality Adaptive Representation with divergence Training framework for ZS-CIR |
| 3834 | DART: DIFFERENTIAL ACOUSTIC RANGING FOR CALIBRATION-FREE HEAD TRACKING WITH ULTRASONIC SENSORS |
| 9435 | DASE: MAXIMUM ENTROPY DATA SELECTION FOR BALANCED PRETRAINING CORPORA OF LARGE LANGUAGE MODELS |
| 1029 | Data-Adaptive Proximal Operator: Demonstration on Low-Rank Sparse Subspace Clustering |
| 17047 | DATA-BRIDGE: A MULTI-AGENT SYSTEM FOR CODE-BASED MULTIMODAL SCHEMA ALIGNMENT |
| 14424 | DATA-DRIVEN ALGORITHMS FOR ROBUST OR SELECTIVE CFAR DETECTION IN COLORED GAUSSIAN NOISE |
| 7798 | DATA-DRIVEN CLUSTERING AND MERGING OF ADAPTERS FOR ON-DEVICE LARGE LANGUAGE MODELS |
| 17578 | DATA-DRIVEN GRAPH FILTERS VIA ADAPTIVE SPECTRAL SHAPING |
| 3209 | DATA-DRIVEN REGULARIZATION USING IDLE-STATE MEASUREMENTS FOR IMPROVED VEHICLE NOISE PREDICTION |
| 15169 | Data-Driven Two-Stage IRS-Aided Sumrate Maximization with Inexact Precoding |
| 2231 | Dataset-Driven Channel Masks in Transformers for Multivariate Time Series |
| 11710 | DATKD: DECOUPLED ATTENTION TRANSFER KNOWLEDGE DISTILLATION FOR VISION TRANSFORMERS |
| 8573 | DA-VLM: Data Factory with Minimal Effort Using VLMs |
| 5939 | DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content |
| 5000 | DCFL: DUAL END CONSTRAINT FEDERATED LEARNING WITH AN ADAPTIVE ANALYTIC ANCHOR |
| 16216 | DCINJECT: PERSISTENT BACKDOOR ATTACKS VIA FREQUENCY MANIPULATION IN PERSONAL FEDERATED LEARNING |
| 18059 | DC-MAMBER: A DUAL CHANNEL PREDICTION MODEL BASED ON MAMBA AND LINEAR TRANSFORMER FOR MULTIVARIATE TIME SERIES FORECASTING |
| 2968 | DCR-MUCL:Dual Granularity Consistency Routing Multimodal Unified Contrastive Learning for Rumor Detection Network |
| 12690 | DCSF: ENHANCING CERTIFIED ROBUSTNESS VIA DYNAMIC COST-SENSITIVE AND SELF-SUPERVISION FRAMEWORK |
| 6723 | DDCM: Dual-Domain Collaborative Modeling with Boundary-to-Structure Refinement for Camouflaged Object Detection |
| 17461 | DDPT: DISTILLATION AND DYNAMIC TOWARD BETTER PROMPT TUNING FOR IMPROVING COMPLEX REASONING IN LARGE LANGUAGE MODELS |
| 9362 | DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing |
| 13270 | DDSC: DYNAMIC DUAL-SIGNAL CURRICULUM FOR DATA-EFFICIENT ACOUSTIC SCENE CLASSIFICATION UNDER DOMAIN SHIFT |
| 5801 | DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment |
| 2514 | DEAR-SR: A Degradation-Aware Adversarially Robust Framework for Infrared Super-Resolution |
| 4898 | DebateCTI: Enhancing ATT&CK Technique Identification in CTI Reports via a Role-Specialized Multi-Agent Debate |
| 11382 | DEBATING FOR COREFERENCE: A MULTI-AGENT FRAMEWORK FOR CROSS-DOCUMENT EVENT COREFERENCE RESOLUTION |
| 16202 | DEBIASED ADAPTIVE DUAL-VIEW GRAPH LEARNING FOR NEXT POI RECOMMENDATION |
| 10743 | DebiasHSD: Failure-guided Debiasing for Cross-Domain Hate Speech Detection |
| 8153 | DE-BIASING FACIAL AGE ESTIMATION: A DUAL-STAGE DISENTANGLEMENT FRAMEWORK FOR CROSS-RACIAL GENERALIZATION |
| 11523 | DECENTRALIZED ACCELERATED MINIMAX OPTIMIZATION VIA EXACT DIFFUSION |
| 2067 | Decentralized Detection with Many Sensors: Optimality of Exchangeable and Identical Encoding Policies |
| 11407 | DECENTRALIZED LEARNING OF DECISION MODELS FOR CLASSIFICATION WITH DEPENDENT AGENTS |
| 14156 | Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks |
| 12688 | DECENTRALIZED LEARNING WITH DYNAMICALLY REFINED EDGE WEIGHTS: A DATA-DEPENDENT FRAMEWORK |
| 11653 | DECISION FUSEDCONV: EFFICIENT OFFLINE REINFORCEMENT LEARNING VIA FUSED STATE-REWARD ENCODING AND HYBRID TEMPORAL CONVOLUTION |
| 3801 | Deco3D: Decoupled Semantic and Geometric Learning for Sparse Supervision in 3D Object Detection |
| 12141 | DECODER-ONLY CONFORMER WITH MODALITY-AWARE SPARSE MIXTURES OF EXPERTS FOR ASR |
| 4028 | DECOMPOSED SEASONAL-TREND NETWORK WITH ROTARY ATTENTION FOR TIME SERIES FORECASTING |
| 2737 | Decomposing Multilingual Representations: How Scale, Architecture, and Data Shape Functional Specialization |
| 5045 | DECONFUSION CLIP TOWARDS ROBUST OUT-OF-DISTRIBUTION DETECTION |
| 12675 | DECORRELATION-ENHANCED MULTIBAND SUBBAND ADAPTIVE FILTERING FOR RIR TRACKING IN SOUND FIELD CONTROL |
| 10507 | Decouple and Match: Frequency-Decoupled Local Matching for Signature Verification |
| 15473 | Decoupled Reconstruction for Low-Dose CT: From SR-Perceptual Recovering to Frequency-Contrastive Fidelity Rebuilding |
| 3134 | DecoupleGS: Physical-aware Gaussian Decoupling for High-Quality 3D Scene Lighting Enhancement |
| 3982 | DECOUPLING MOTION AND TEXTURE: A HYBRID RECURRENT NETWORK FOR VIDEO QUALITY ENHANCEMENT |
| 4273 | DECOUPLING ORTHOGONAL LIP FEATURES AGAINST GENERATIVE IMPOSTERS |
| 8059 | DECOUPLING RAW-BASED LOW-LIGHT ENHANCEMENT VIA A FREQUENCY-AWARE TRANSFORMER |
| 11549 | Deep Co-occurrence Matrix Network for Classification of Plant Fiber SEM Images |
| 12025 | DEEP DUBBING: END-TO-END AUTO-AUDIOBOOK SYSTEM WITH TEXT-TO-TIMBRE AND CONTEXT-AWARE INSTRUCT-TTS |
| 6561 | DEEP FUZZY CLUSTERING WITH ANCHOR GRAPH PRESERVATION AND MEMBERSHIP ALIGNMENT |
| 1303 | DEEP IMAGE PRIOR WITH L0 GRADIENT REGULARIZER FOR IMAGE SMOOTHING |
| 5591 | DEEP LARGE-MARGIN LP-SVDD WITH CNN FEATURE LEARNING FOR NOVELTY DETECTION |
| 14072 | DEEP LEARNING BASED ZERO LATENCY AUTOMATIC MUSIC MIXING FOR LIVE PERFORMANCES |
| 14663 | DEEP LEARNING-BASED JOINT OPTIMIZATION OF ADAPTIVE FEEDBACK CANCELLATION AND RESIDUAL FEEDBACK SUPPRESSION FOR HEARING AIDS |
| 3927 | DEEP LOCAL FIELD CONSISTENCY FOR NON-RIGID POINT CLOUD REGISTRATION |
| 3189 | Deep Lossless Point Cloud Attribute Compression via EED Prediction |
| 18902 | DEEP PHYSICALLY PARAMETERIZED ALL-IN-ONE NETWORK FOR LENS-FREE MICROSCOPY IMAGING |
| 16543 | Deep Reinforcement Learning for Dynamic Sensing and Communications |
| 3656 | Deep Spatial Clue Informed Ambisonic Encoding for irregular microphone arrays |
| 8009 | Deep Tensor Completion for Fast Direct Position Determination |
| 13852 | DEEP TPC: TEMPORAL-PRIOR CONDITIONING FOR TIME SERIES FORECASTING |
| 11130 | DEEP UNFOLDED SUBSPACE-BASED DOA RECOVERY FROM SPARSE ARRAYS |
| 11637 | DEEP UNFOLDED SUPERIORIZED POCS FOR ROBUST JOINT TRANSMISSION UNDER PHASE MISALIGNMENT |
| 10852 | DEEP VIDEO FRAME INTERPOLATION DETECTION VIA EVENT-GUIDED TEMPORAL ANALYSIS AND HIGH-FREQUENCY ARTIFACTS |
| 13344 | DEEPAQ: A PERCEPTUAL AUDIO QUALITY METRIC BASED ON FOUNDATIONAL MODELS AND WEAKLY SUPERVISED LEARNING |
| 5781 | Deepfake Detection via Data-Level Multi-Stream Assessment |
| 18328 | Deepfake-HMDE: Hierarchical Mixture of Deepfake Experts for Deepfake Detection |
| 16359 | Deep--Shallow Mixed Gaussian Processes for Efficient and Robust Training |
| 14121 | DEEPTRAVERSE: AN ALGORITHM-INSPIRED DESIGN PARADIGM FOR STRUCTURED AND INTERPRETABLE VISION BACKBONES |
| 13410 | Defending 3D Point Clouds with Frequency-Guided Diffusion model |
| 11632 | DEFENSEMEL: ENHANCING ADVERSARIAL ROBUSTNESS OF MULTIMODAL ENTITY LINKING WITH MULTIMODAL LARGE LANGUAGE MODELS |
| 3582 | DEFINE: A FINE-GRAINED ANNOTATED AND HIERARCHICALLY STRUCTURED DATASET FOR LONG-FORM ARTICLE GENERATION |
| 12302 | DEGRADATION DESCRIPTION PROMPTING FOR UNDERWATER IMAGE RESTORATION |
| 13387 | DELAY AND RANDOM SCATTERING ESTIMATION WITH A BAND-LIMITED SIGNAL: UNCONDITIONAL CRB AND MLE |
| 13335 | Delay Embedding For Differential Graph Learning From Dependent Data |
| 3411 | DELNET: CONTINUOUS ALL-IN-ONE WEATHER REMOVAL VIA DYNAMIC EXPERT LIBRARY |
| 5084 | DeMoFL: Efficient and Effective Decentralized Model-Heterogeneous Federated Learning |
| 16225 | DEMONET: DEGRADATION-AWARE MODALITY INTERACTION FOR MULTI-MODAL OBJECT DETECTION IN CAR CABIN |
| 1368 | DEMO-POSE: DEPTH-MONOCULAR MODALITY FUSION FOR OBJECT POSE ESTIMATION |
| 13647 | DemoReranker: Enhancing the In-context Learning Capability of Multi-modal Large Models via Demonstration Reranking |
| 3169 | Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning |
| 11508 | DENOISING DIFFUSION MODEL FOR DOA ESTIMATION |
| 12146 | DENOISING OF STOCHASTIC RAY TRACING ROOM IMPULSE RESPONSES |
| 18914 | DENOISING PIECEWISE CONSTANT NANOPORE SIGNALS |
| 4694 | DEOPT: SYNERGIZING LARGE LANGUAGE MODELS AND DIFFERENTIAL EVOLUTION FOR JOIN ORDER OPTIMIZATION |
| 9408 | DEPMPK: MULTI-PERSPECTIVE KNOWLEDGE FUSION FOR MULTIMODAL DEPRESSION DETECTION |
| 13211 | Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation |
| 1156 | DepthCLIP3D: A UNIFIED APPROACH FOR 3D VISUAL UNDERSTANDING WITH DEPTH |
| 15419 | DEPTHFUSION: DEPTH-GUIDED INFRARED AND VISIBLE IMAGE FUSION FOR ENHANCED DOWNSTREAM TASKS |
| 16206 | Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery |
| 11418 | DEPTH-GUIDED RELIGHTING: RESOLVING LIGHTING INCONSISTENCY FOR SEAMLESS BACKGROUND REPLACEMENT |
| 5105 | DEPTHSHIELD: ROBUST DEPTH ESTIMATION ON TRANSPARENT OR MIRROR SURFACES |
| 14699 | DepthTalk: Few-Shot Talking Head Generation with Depth-Aware 3D Gaussian Field Motion |
| 10646 | DERIVING MOMENTS IN THE AGE OF GOSSIP PROCESS FROM PERCOLATION |
| 9788 | DES: A MULTI-STAGE FRAMEWORK FOR ACCURATE FABRIC PRINTED PATTERN SEGMENTATION |
| 3807 | DESIGN OF DIFFERENTIAL MICROPHONE ARRAYS VIA A 3D SPATIAL DIFFERENCE OPERATOR |
| 12208 | DetailCLIP: Injecting Image Details into CLIP's Feature Space |
| 6231 | DETECTING AND ATTRIBUTING SYNTHETIC SPANISH SPEECH: THE HISPASPOOF DATASET |
| 2938 | DETECTING OSCILLATING SINGULARITIES WITH THE WEAK SCALING EXPONENT |
| 2677 | Detecting Trojaned Inputs at Runtime: Activation-Distribution Defenses for Untrusted CNNs |
| 18644 | Detection and Angle Estimation in Colocated MIMO Radar in the Presence of Grating Lobes |
| 4478 | DETECTWILD: IN-THE-WILD AI-GENERATED TEXT DETECTION BENCHMARK |
| 10404 | DFA-SNN: Dual-Frequency Attention Module for Spiking Neural Networks |
| 18463 | DFATran: Beyond Static Features for Dynamic Transferability Estimation |
| 11138 | DFF-CGT: Frequency-Domain Feature Fusion with Class-Guided Thresholding for UniSSDA |
| 2274 | DFFNET: COMBINING SIMILAR AND DIFFERENT DUAL FEATURE FLOWS TO ACHIEVE MULTIPLE WEATHER REMOVAL |
| 4567 | DFGA-Net: A Dual-Frequency Guided Attention Network for Multivariate Time Series Prediction |
| 1535 | DFL-ALLC: ADAPTIVE LOCAL LEARNING CONTROL FOR DECENTRALIZED FEDERATED LEARNING IN HETEROGENEOUS VEHICULAR NETWORKS |
| 11367 | DFLF: A SCALABLE DECENTRALIZED FEDERATED LEARNING FRAMEWORK BASED ON PYTORCH |
| 15728 | DFMAD: DATA-FREE BACKDOOR DEFENSE FOR FEDERATED LEARNING VIA MULTI-TEACHER ADVERSARIAL DISTILLATION |
| 10914 | DGCS: Depth-Guided Continual Self-learning for Infrared and Visible Image Fusion |
| 4732 | DGER: DIFFUSION-GUIDED EFFICIENT RESTORATION FOR UNDERWATER IMAGES |
| 10932 | DGF-Net: Underwater Image Enhancement via Depth Priors and Frequency-Domain Modeling |
| 16401 | DGSDNET: DUAL-GRAPH SPECTRAL DIFFUSION NETWORK FOR INCOMPLETE MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS |
| 17031 | DHEval: A Dynamic Hallucination Evaluation Protocol Robust to Data Contamination |
| 15011 | DIACDM: COGNITIVE DIAGNOSIS IN TEACHER-STUDENT DIALOGUES USING THE INITIATION-RESPONSE-EVALUATION FRAMEWORK |
| 13553 | DIAGNOSE-REFLECTIVE PLANNING: FAITHFUL KG REASONING VIA LLM-GUIDED MCTS WITH STRATEGIC SELF-CORRECTION |
| 10595 | DIAL: DATABASE-INFORMED INTERACTIVE MULTI-AGENT SYSTEM LOOP FOR PERSONALIZED IMAGE GENERATION |
| 11343 | DIFF: DIFFUSION MODEL AIDED FEATURE FUSION NETWORK FOR E2E LONG-TAILED VISUAL RECOGNITION |
| 16087 | DIFFACLFSMD: DIFFUSION-AUGMENTED CONTRASTIVE LEARNING FOR FEW-SHOT MALWARE DETECTION |
| 6287 | DiffAntiSeq: A Controllable Diffusion Model for Efficient Antibody Library Design |
| 9491 | DIFFEMOTALK: AUDIO-DRIVEN FACIAL ANIMATION WITH FINE-GRAINED EMOTION CONTROL VIA DIFFUSION MODELS |
| 6240 | DIFFERENCE COARRAY OF MULTI-FREQUENCY SPARSE RATIONAL ARRAYS |
| 17434 | Differentiable Grouped Feedback Delay Networks for Learning Direction and Position-Dependent Late Reverberation |
| 14649 | DIFFERENTIABLE META-OPTIMIZATION FOR FEDERATED NEURAL ARCHITECTURE SEARCH |
| 13774 | DIFFERENTIABLE PULSETABLE SYNTHESIS FOR WIND INSTRUMENT MODELING |
| 6628 | Differentiable Resizing: Resolution Layers |
| 14119 | Differential Privacy of Network Parameters from a System Identification Perspective |
| 17416 | DIFFERENTIALLY PRIVATE CLUSTERED FEDERATED LEARNING WITH PRIVACY-PRESERVING INITIALIZATION AND NORMALITY-DRIVEN AGGREGATION |
| 13873 | DIFFERENTIALLY PRIVATE DECENTRALIZED CONSTRAINED LEARNING WITH DUAL AVERAGING |
| 6184 | DIFFERENTIALLY PRIVATE WEIGHTED K-SELECTION AT SCALE |
| 3867 | Diff-EvINR: event-to-video reconstruction using diffusion models and implicit neural representations |
| 11253 | DIFFFACE-EDIT: A DIFFUSION-BASED FACIAL DATASET FOR FORGERY-SEMANTIC DRIVEN DEEPFAKE DETECTION ANALYSIS |
| 14636 | DIFF-IML : TOWARDS THE DIFFUSION-BASED REAL-WORLD IMAGE MANIPULATION LOCALIZATION |
| 10555 | DIFFNATOR: GENERATING STRUCTURED EXPLANATIONS OF TIME-SERIES DIFFERENCES |
| 9958 | DiffQ: UNIFIED PARAMETER INITIALIZATION FOR VARIATIONAL QUANTUM ALGORITHMS VIA DIFFUSION MODELS |
| 10077 | DIFFRIM: A DIFFUSION-DRIVEN MODEL FOR HIGH EFFICIENCY RADAR INTERFERENCE MITIGATION |
| 17156 | Diffusion Algorithm for Metalens Optical Aberration Correction |
| 9629 | Diffusion Contrastive Learning for Robust Image Classification |
| 14192 | Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access |
| 16050 | DIFFUSION POSTERIOR SAMPLING FOR SLITLESS SPECTRAL IMAGING |
| 2547 | DIFFUSION RESIDUAL MODELING FOR LONG-TERM TIME SERIES FORECASTING |
| 10037 | Diffusion Stochastic Learning over Multi-Team Network Games |
| 14117 | DIFFUSION TIMBRE TRANSFER VIA MUTUAL INFORMATION GUIDED INPAINTING |
| 12389 | Diffusion-aided Extreme Video Compression with Lightweight Semantics Guidance |
| 11597 | DIFFUSION-BASED NATURAL ADVERSARIAL PERTURBATIONS TOWARDS SEGMENT ANYTHING MODEL |
| 11797 | Diffusion-Based Scene Text Image Super-Resolution with Visual Style and Semantic Guidance |
| 5026 | DIFFUSION-BASED UNSUPERVISED AUDIO-VISUAL SPEECH SEPARATION IN NOISY ENVIRONMENTS WITH NOISE PRIOR |
| 16394 | DIFFUSIONCOM: STRUCTURE-AWARE MULTIMODAL DIFFUSION MODEL FOR MULTIMODAL KNOWLEDGE GRAPH COMPLETION |
| 18186 | DIFFUSION-DRIVEN PROXIMAL POSTERIOR SAMPLING FOR SYNTHETIC APERTURE RADAR IMAGING |
| 13367 | DIFFUSION-LINK: DIFFUSION PROBABILISTIC MODEL FOR BRIDGING THE AUDIO-TEXT MODALITY GAP |
| 14081 | DiffVAGS: Visual Alignment for High-Fidelity 3D Gaussian Splatting Generation |
| 6181 | Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation |
| 5363 | DIFTrack: Vision-Language Tracking with Deep Interaction Fusion |
| 2652 | DiGA-Fuse: Depth-Guided Geometry-Aware Infrared and Visible Image Fusion |
| 6482 | DIGITAL HUMAN-ASSISTED SMART CONTRACT VULNERABILITY DETECTION UNDER LIMITED SAMPLE CONSTRAINTS |
| 7037 | DIGRAPH SIGNAL PROCESSING VIA POLAR DECOMPOSITION |
| 2599 | Dilated Array Scheme Based on asymmetrically defined cumulants with Moving Platform |
| 9093 | Dimensionality Reduction for Beamforming by Change of Variable |
| 18012 | DIMO: DUAL-STRATEGY LEARNING FOR AMBIGUOUS SAMPLES IN CLASS-IMBALANCED FACIAL EXPRESSION RECOGNITION |
| 7082 | DINN: KEY-ACTIVATED MODEL STEGANOGRAPHY WITH DYNAMIC SPARSE INVERTIBLE NEURAL NETWORKS |
| 4475 | DiPaS-Bridge: Towards Paired-Guided Diverse Generation for Urban Layouts |
| 12670 | DIRA: Deep High-Rank Adaptation of Pre-trained Language Models |
| 14388 | DIRAVIG: DIFFERENTIABLE REGION ASSIGNMENT VISION GRAPH NETWORKS |
| 2809 | DIRCR: Dual-Inference Rule-Contrastive Reasoning for Solving RAVENs |
| 6386 | DIRECT POSITION DETERMINATION METHOD BASED ON SPARSE SPECTRUM DATA |
| 18039 | Direct Preference Optimization for Speech Autoregressive Diffusion Models |
| 10883 | DIRECT RICIAN-DOMAIN PROCESSING FOR NOISE-AWARE MRI DENOISING AND MICROSTRUCTURE PRESERVATION |
| 9461 | Direct Simultaneous Translation Activation for Large Audio-Language Models |
| 13866 | DIRECT TRANSFER OF PROSODY IN SPEECH-TO-SPEECH TRANSLATION USING DISENTANGLED SPEECH TOKENS |
| 11003 | Directed Hypergraph Framelet Neural Network |
| 12793 | DIRECTION-AWARE CROSS-MODAL FUSION NETWORK WITH HIERARCHICAL FEATURE RECONSTRUCTION FOR RGB-T SALIENT OBJECT DETECTION |
| 17536 | Direction-PointNet: A Spatiotemporal Anisotropy Network for Human Action Recognition |
| 17775 | Directly Trained Spiking Neural Networks with Adaptive Phase Coding |
| 17886 | Disabling Reasoning: Backdoor Construction in Large Reasoning Models via Knowledge Editing |
| 10782 | DISASTER-AFFECTED AREA EXTRACTION METHOD THROUGH PIXEL DIFFERENCE CONVOLUTION AND FREQUENCY-DOMAIN ENHANCEMENT |
| 5833 | DISCONTSE: SINGLE-STEP DIFFUSION SPEECH ENHANCEMENT BASED ON JOINT DISCRETE AND CONTINUOUS EMBEDDINGS |
| 17550 | DISCREPANCY-AWARE DISENTANGLED CONTRASTIVE LEARNING FOR MULTIMODAL RUMOR DETECTION |
| 13946 | DISCRETE DIFFUSION FOR GENERATIVE MODELING OF TEXT-ALIGNED SPEECH TOKENS |
| 11620 | DISCRETE VISION TOKENIZATION FOR VISION-LANGUAGE ALIGNMENT IN AUTONOMOUS DRIVING |
| 1143 | DISCRETE-CONTINUOUS FUSION WITH ADAPTIVE HIERARCHICAL FEATURES FOR AUDIO DEEPFAKE DETECTION |
| 5590 | Discrete-Periodic Ambiguity Function of Random Communication Signals |
| 3771 | DISCRIMINANT LEARNING-BASED COLORSPACE FOR BLADE SEGMENTATION |
| 15561 | Disentangled Authenticity Representation for Partially Deepfake Audio Localization |
| 15866 | Disentangled Signals, Dynamic Prompts: A Meta-Network Framework for Robust Task-Oriented Dialogue |
| 2387 | DISENTANGLED STRUCTURE PRIOR PROPAGATION FOR GUIDED DEPTH SUPER-RESOLUTION |
| 3583 | Disentangling Contextual and Background Signals for Social Diffusion Prediction |
| 13125 | DISPATCH: DISTILLING SELECTIVE PATCHES FOR SPEECH ENHANCEMENT |
| 12183 | DISSECTING PERFORMANCE DEGRADATION IN AUDIO SOURCE SEPARATION UNDER SAMPLING FREQUENCY MISMATCH |
| 1967 | DISSR: DISENTANGLING SPEECH REPRESENTATION FOR DEGRADATION-PRIOR GUIDED CROSS-DOMAIN SPEECH RESTORATION |
| 9766 | Distillation based Layer Dropping (DLD): Effective end-to-end framework for dynamic speech networks |
| 16212 | DISTILLED FEW-STEP SAMPLERS FOR BAYESIAN FLOW NETWORKS |
| 4633 | DISTILLING ATTENTION KNOWLEDGE FOR SPEAKER VERIFICATION |
| 13401 | DISTILLING SYNERGISTIC KNOWLEDGE FROM A FUSION TEACHER FOR SAR OBJECT DETECTION |
| 9809 | Distilling Time-series Foundation Models for Efficient Forecasting |
| 6159 | DISTILMOS: LAYER-WISE SELF-DISTILLATION FOR SELF-SUPERVISED LEARNING MODEL-BASED MOS PREDICTION |
| 11158 | DISTRACTION-FREE OUTDOOR 3D GAUSSIAN SPLATTING WITH ENHANCED DEPTH PROPAGATION |
| 4217 | DISTRIBUTED ASSOCIATIVE MEMORY VIA ONLINE CONVEX OPTIMIZATION |
| 11683 | DISTRIBUTED MULTICHANNEL ACTIVE NOISE CONTROL WITH ASYNCHRONOUS COMMUNICATION |
| 5689 | DISTRIBUTED OPTIMISATION VIA THE GENERALISED PRIMAL-DUAL METHOD OF MULTIPLIERS UNDER UNRELIABLE AND QUANTISED COMMUNICATION |
| 13770 | DISTRIBUTIONAL PPO FOR STABLE POLICY GRADIENT OPTIMIZATION |
| 9029 | DISTRIBUTION-AWARE DATA CURATION FOR SEMANTIC SEGMENTATION VIA MIXTURE OF VMFS |
| 10045 | DISTRIBUTION-AWARE MOBILITY-ASSISTED DECENTRALIZED FEDERATED LEARNING |
| 4668 | Distribution-Aware Neural Additive Models: Robust Interpretable Deep Learning with Feature Selection |
| 16647 | DISTRICACHE: DISTRIBUTED PARALLELISM FOR ACCELERATING DIFFUSION MODELS |
| 10276 | DITHERED 1-BIT QUANTIZATION AND SPARSE RECONSTRUCTION FOR NEAR-FIELD 3D MILLIMETER-WAVE IMAGING |
| 14863 | DITSE: HIGH-FIDELITY GENERATIVE SPEECH ENHANCEMENT VIA LATENT DIFFUSION TRANSFORMERS |
| 12024 | DITSINGER: SCALING SINGING VOICE SYNTHESIS WITH DIFFUSION TRANSFORMER AND IMPLICIT ALIGNMENT |
| 18031 | DIVERSE AND FEW-STEP AUDIO CAPTIONING VIA FLOW MATCHING |
| 6425 | DIVERSITY IS ALL YOU NEED: SELF-SUPERVISED HYPERGRAPH LEARNING FOR MITIGATING POPULARITY BIAS IN CONVERSATIONAL RECOMMENDER SYSTEM |
| 4039 | DJ-NORM: A DECOMPOSITION-BASED JOINT NORMALIZATION FRAMEWORK FOR NON-STATIONARY TIME SERIES FORECASTING |
| 4676 | DKFMA: A MULTI-AGENT FRAMEWORK FOR DUAL-SOURCE KNOWLEDGE FUSION IN IT INFRASTRUCTURE OPERATIONS AND MAINTENANCE |
| 13599 | DLCRR: DIFFERENTIAL LEARNING AND CAUSAL REPRESENTATION RESTORATION MODEL FOR EVENT CAUSALITY IDENTIFICATION |
| 4335 | DLMDC: A Method for Controllable Text Generation |
| 12795 | D-LoRA: A Dual Low-Rank Adaptation Framework for Cost-Efficient Personalized Federated Learning |
| 5715 | DMM-JA: A DYNAMIC MULTIMODAL FUSION AND MULTI-SCALE MODELING FRAMEWORK WITH JUMP-AWARENESS FOR INDUSTRIAL EQUIPMENT RUL PREDICTION |
| 1725 | DMP-TTS: DISENTANGLED MULTIMODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE |
| 5989 | DMS-GFViT: DYNAMIC MULTI-SCALE VISION TRANSFORMER WITH INFUSED GATED FUSION FOR HANDWRITTEN TEXT RECOGNITION |
| 13653 | DMTC: A COLLABORATIVE DUAL MMWAVE RADAR SYSTEM FOR SMART SPACES |
| 13155 | DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection |
| 6665 | DNLMark: Dual Noise Layer-based Robust Watermarking Against Image Editing |
| 14411 | DNN-BASED ONLINE SOURCE COUNTING BASED ON SPATIAL GENERALIZED MAGNITUDE SQUARED COHERENCE |
| 15990 | DNS: DATA-DRIVEN NONLINEAR SMOOTHER FOR COMPLEX MODEL-FREE PROCESS |
| 14007 | Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs |
| 10100 | DO FOUNDATIONAL AUDIO ENCODERS UNDERSTAND MUSIC STRUCTURE? |
| 1609 | Do Multi-modal LLMs possess Compositional Zero-Shot Recognition Capabilities? |
| 11921 | DO SPEECH LLMS LEARN CROSSMODAL EMBEDDING SPACES? |
| 13501 | Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture |
| 2215 | DO WE REALLY NEED SELF-ATTENTION FOR STREAMING AUTOMATIC SPEECH RECOGNITION? |
| 1003 | Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems |
| 4282 | DOA ESTIMATION FOR MM-WAVE FMCW RADAR WITH TWO RECEIVING ANTENNAS |
| 2202 | DOCKPOSE: UNDERWATER DOCK POSE ESTIMATION USING ADAPTIVE N-GRAM CONTEXT AND RECONSTRUCTION-DRIVEN LEARNING |
| 4103 | DocLayout: Elevating the Role of Complex Layout Understanding in Document Visual Question Answering |
| 6520 | DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue |
| 11930 | DOES HIGH-FREQUENCY MATTER? A REMOTE SENSING CHANGE DETECTION NETWORK WITH HIGH-FREQUENCY SELECTION AND HIGH-ORDER RECURSION |
| 16886 | DOES THE PRE-TRAINING OF AN EMBEDDING INFLUENCE ITS ENCODING OF AGE? |
| 13882 | DOMA: LEVERAGING DIFFUSION LANGUAGE MODELS WITH ADAPTIVE PRIOR FOR INTENT CLASSIFICATION AND SLOT FILLING |
| 19128 | DOMAIN ADAPTATION OF FEW-SHOT BIOACOUSTIC EVENT DETECTION IN DIFFERENT ENVIRONMENTS |
| 9502 | DOMAIN DISTILLATION WITH TRANSFORMER FOR UNSUPERVISED DOMAIN ADAPTATION |
| 14671 | Domain Generalization via Distilling from Domain-Deconfused CLIP Features |
| 10402 | DOMAIN PARTITIONING MEETS PARAMETER-EFFICIENT FINE-TUNING: A NOVEL METHOD FOR IMPROVED LANGUAGE-QUERIED AUDIO SOURCE SEPARATION |
| 6545 | DOMAIN-ADAPTIVE MODEL MERGING ACROSS DISCONNECTED MODES |
| 17153 | DOMAIN-AWARE SCHEDULING FOR ASR FINE-TUNING |
| 15555 | DOMAIN-GENERALIZABLE RELATION-AWARE KNOWLEDGE TRACING FOR COLD-START EDUCATION SYSTEM |
| 9785 | Domain-Invariant Representation Learning of Bird Sounds |
| 14808 | Domination Strategies for Free-Riding in Cross-Silo FL-based Caching |
| 7821 | DOMINO: DOMINANT PATH-BASED COMPENSATION FOR HARDWARE IMPAIRMENTS IN MODERN WIFI SENSING |
| 9898 | DOPPLER RADIANCE FIELD-GUIDED ANTENNA SELECTION FOR IMPROVED GENERALIZATION IN MULTI-ANTENNA WI-FI-BASED HUMAN ACTIVITY RECOGNITION |
| 3626 | Doppler-Based Pseudo-Reciprocity in FDD for LEO MU-MIMO |
| 10832 | DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting |
| 8348 | DP-DEGAUSS: DYNAMIC PROBABILISTIC GAUSSIAN DECOMPOSITION FOR EGOCENTRIC 4D SCENE RECONSTRUCTION |
| 5846 | DPFAN: DUAL-PATH FEATURE-ADAPTIVE NETWORK FOR KPI ANOMALY DETECTION |
| 17931 | DPI: EXPLOITING PARAMETER HETEROGENEITY FOR INTERFERENCE-FREE FINE-TUNING |
| 14093 | DP-LAC: LIGHTWEIGHT ADAPTIVE CLIPPING FOR DIFFERENTIALLY PRIVATE FEDERATED FINE-TUNING OF LANGUAGE MODELS |
| 15542 | DPMM-CFL: CLUSTERED FEDERATED LEARNING VIA DIRICHLET PROCESS MIXTURE MODEL NONPARAMETRIC CLUSTERING |
| 10106 | DPO-REGULARIZED REGRESSION FOR AGE PREDICTION |
| 11821 | DQUDF: DEFLATING QUADRATIC BEHAVIOR IN UNSIGNED DISTANCE FUNCTIONS FOR HIGH-FIDELITY SURFACE RECONSTRUCTION |
| 17733 | DR.Roleplay: Role-play LLM with Direct Preference Optimization and Retrieval-Augmented Generation |
| 9488 | DRAG WITHIN PRIOR DISTRIBUTION: TEXT-CONDITIONED POINT-BASED IMAGE EDITING WITHIN DISTRIBUTION CONSTRAINTS |
| 2721 | DRAWMARK: DEFEATING REGENERATION ATTACKS BY EMBEDDING WATERMARK INTO PREDICTED NOISE OF DIFFUSION MODELS |
| 15903 | DREAM: DUAL-PERSPECTIVE REASONING AND ATTRIBUTION-BASED REFINEMENT FOR CONVERSATIONAL QUERY REWRITING |
| 12271 | DreamFragment: Instance-Aware Text-to-3D Generation for Compositional Multi-Object Scenes with Complex Interactions |
| 3133 | DREAMVAR: TAMING REINFORCED VISUAL AUTOREGRESSIVE MODEL FOR HIGH-FIDELITY SUBJECT-DRIVEN IMAGE GENERATION |
| 9562 | DRIVINGSCENE: A MULTI-TASK ONLINE FEED-FORWARD 3D GAUSSIAN SPLATTING METHOD FOR DYNAMIC DRIVING SCENES |
| 15689 | DR-Mark: Enhancing Printed-Camera Watermarking Robustness via Noise Decomposition and Dichromatic Reflection Model |
| 8210 | DRMTST: Dual Retention-Enhanced Transformer with Multiscale and Multivariate Mixing for Time Series Forecasting |
| 13776 | DSA: DIRECTION AND SIGN ALIGNMENT FOR CONTRIBUTION EVALUATION IN FEDERATED LEARNING |
| 3059 | DSFR-NET: DISTRIBUTION GUIDED NIGHTTIME IMAGE SCATTERING FLARE REMOVAL |
| 14930 | DSG: DUAL-SEMANTIC GUIDANCE FROM LLM TO TOKEN DISTILLATION FOR FEW-SHOT INCREMENTAL LEARNING |
| 3942 | DSGBENCH: A DIVERSE STRATEGIC GAME BENCHMARK FOR EVALUATING LLM-BASED AGENTS IN COMPLEX DECISION-MAKING ENVIRONMENTS |
| 3455 | DSNET: DUAL-STREAM HARMONIZATION NETWORK FOR IMAGE ENHANCEMENT |
| 8782 | DSPAST: DISENTANGLED REPRESENTATIONS FOR SPATIAL AUDIO REASONING WITH LARGE LANGUAGE MODELS |
| 1662 | DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning |
| 11197 | DSPFusion: DEPTH AND SEMANTIC PRIOR GUIDED MULTI-FOCUS IMAGE FUSION WITH VISION FOUNDATION MODELS |
| 4562 | DSRMS-TransUNet: A Decentralized Non-Shifted TransUNet for Shallow Water Acoustic Source Range Estimation |
| 9444 | DSR-REC: ENHANCING GENERATIVE RECOMMENDATION THROUGH DYNAMIC EXPERT SELECTION AND SEMANTIC ID REDIRECTION |
| 17701 | DSSR: DECOUPLING SALIENT AND SUBTLE REPRESENTATIONS UNDER MISSING MODALITIES FOR MULTIMODAL EMOTION RECOGNITION |
| 16545 | DSV-CTGS: Dynamic Sparse-view CT Reconstruction based on Gaussian Splatting and Prior Transfer |
| 10605 | DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation |
| 2609 | DSWP: A DUAL-STAGE WATERMARKING PARTITIONING FRAMEWORK FOR EFFICIENT AND ROBUST MULTI-BIT WATERMARKING IN LARGE LANGUAGE MODELS |
| 1648 | DTA-PDVC: DYNAMIC TEMPORAL ANCHOR BOXES FOR PARALLEL DENSE VIDEO CAPTIONING |
| 5072 | DTOPAGENT: A MULTI-AGENT FRAMEWORK FOR DYNAMIC TOP-K CHUNK RETRIEVAL IN RAG PIPELINE |
| 4641 | DTPE: DOCUMENT TREE PARSING FOR EFFICIENT DOCUMENT-LEVEL RELATION EXTRACTION WITH LLM-BASED DATA REFINEMEN |
| 3078 | DTR4CAT: Dual-Threshold Retrieval with Ability Gap Upper Bound for Computerized Adaptive Testing |
| 8271 | DTST: Dual-Transformer for Multivariate Time Series Forecasting |
| 12754 | DUAL CONTRASTIVE DOCUMENT CLUSTERING WITH MULTI-REPRESENTATION |
| 6974 | Dual Contrastive Learning for Semi-supervised Domain Adaptation in Bi-modal Depression Recognition |
| 3467 | DUAL CORRELATION ADAPTIVE HIERARCHICAL SPATIO-TEMPORAL TRANSFORMER FOR STOCK PRICE FORECASTING |
| 12167 | Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting |
| 6487 | DUALAST: AST-GUIDED EXEMPLAR RETRIEVAL FOR IN-CONTEXT LEARNING IN MULTI-STEP REASONING |
| 14138 | DUAL-BRANCH FEATURE-FUSED AND MULTI-SEMANTIC ALIGNED HASHING FOR SUPERVISED CROSS-MODAL RETRIEVAL |
| 1270 | DUAL-BRANCH SPATIAL-LIGHTING NETWORK FOR PHOTOMETRIC STEREO |
| 4704 | DUAL-BRANCH SPIRAL INTERSECT NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS |
| 6049 | DUAL-CHANNEL PERSONALIZED FEDERATED BUNDLE RECOMMENDATION |
| 17667 | Dual-Criterion Sample Selection for Noisy Labels: Integrating Neighborhood Prediction Divergence and Loss Values |
| 11786 | DUAL-DOMAIN 3D MESH WATERMARKING WITH ADAPTIVE VERTEX GROUPING |
| 11306 | DUAL-DOMAIN FEATURE MODULATION FOR LIGHTWEIGHT IMAGE SUPER-RESOLUTION |
| 5898 | DUAL-DRIVE: A HIERARCHICAL FUSION FRAMEWORK FOR DUAL-MODEL SAFETY-ENHANCED AUTONOMOUS DRIVING |
| 12960 | DUALEXPERTNET: DISPARITY-AWARE SEMANTIC-DETAIL COMPLEMENTARITY FOR CAMOUFLAGED OBJECT DETECTION |
| 9532 | Dual-Geometry Prior Frequency Nonlinear Graph Convolutional Network For Human Action Recognition |
| 3804 | DUAL-GRAINED ROUTING GUIDED MULTI-LORA EXPERTS FOR MULTILINGUAL LOW-RESOURCE SPEECH RECOGNITION |
| 14129 | Dual-Graph: Protocol Interaction-aware Flow Representation for Accurate Unidirectional Encrypted Traffic Classification |
| 16872 | DUALGUARD: TWO-STAGE ALIGNMENT PRESERVATION FOR SAFE PEFT |
| 11825 | DUAL-GUIDED GENERATIVE FRAME INTERPOLATION |
| 14854 | DUAL-MODEL INFORMATION-BASED CSI RECONSTRUCTION IN HYBRID BEAMFORMING MIMO-OFDM SYSTEMS |
| 9628 | DUAL-PATH COMPRESSION FOR REAL-TIME MULTIMODAL CLICKBAIT DETECTION: QUANTIZATION AND DISTILLATION |
| 5908 | DUAL-PATH JND: A NEW FRAMEWORK FOR ROBUST AND IMPERCEPTIBLE IMAGE WATERMARKING |
| 7214 | DUAL-PERSPECTIVE MULTIMODAL SENTIMENT ANALYSIS WITH MOE FUSION: REPRESENTATION LEARNING VIA SEMANTIC RESONANCE AND DIVERGENCE |
| 10649 | DUAL-REGULARIZED ITERATIVE ADAPTIVE APPROACH FOR DOA SPECTRUM RECONSTRUCTION IN LIMITED ANGLE SECTOR |
| 14415 | DUAL-SPACE KNOWLEDGE DISTILLATION WITH KEY-QUERY MATCHING FOR LARGE LANGUAGE MODELS WITH VOCABULARY MISMATCH |
| 16065 | DualSteg: High-Capacity Provably Secure Text Steganography in Asymmetric Resource Scenario via Dual-Scale LLMs |
| 17579 | Dual-Strategy-Enhanced ConBiMamba for Neural Speaker Diarization |
| 3976 | Dual-Stream Feature Fusion for Spoofing Detection under Aliased Interference in UAV Communications |
| 7922 | DUOTRACKER: CONFIDENCE-ROUTED EYE TRACKING FOR DIGITAL BIOMARKERS IN CLINICAL SCREENING |
| 15256 | DUST STORM ANOMALY DETECTION ON MARS WITH EVENT CAMERA |
| 14153 | DVT-AD: DISCRIMINATIVE VISION TRANSFORMERS FOR SCALABLE UNSUPERVISED ANOMALY DETECTION VIA SIMPLE SELF-DISTILLATION |
| 17411 | DWC-PO: Dynamic Weight Constraints for Model-Based Policy Optimization via Wasserstein Policy Improvement Bounds |
| 18974 | dYIN AND dSWIPE: DIFFERENTIABLE VARIANTS OF CLASSICAL FUNDAMENTAL FREQUENCY ESTIMATORS |
| 12333 | DyLUT-UIE: A Dynamic Lookup Table Paradigm for Efficient Underwater Image Enhancement |
| 2123 | Dynabits: Token Aware Weight-Activation Quantization for Large Vision–Language Models |
| 5318 | DYNAMIC ADAPTIVE WAVELET STATE SPACE MODEL FOR EFFICIENT LOW-LIGHT IMAGE ENHANCEMENT |
| 12587 | DYNAMIC ATTENTION-AWARE SHAPING FOR OUT-OF-DISTRIBUTION DETECTION |
| 5174 | Dynamic Automaton Refinement and Planning for Non-Markovian RL |
| 4066 | Dynamic Balanced Cross-modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis |
| 12898 | DYNAMIC BASIS GENERATION AND MULTI-SCALE GAUSSIAN RESPONSE FUSION FOR ROBUST POINT CLOUD REGISTRATION |
| 18896 | DYNAMIC BIT-PLANE ARITHMETIC CODING METHOD FOR QUANTIZED SPECTRAL COEFFICIENTS IN USAC |
| 5770 | DYNAMIC ESTIMATION LOSS CONTROL IN VARIATIONAL QUANTUM SENSING VIA ONLINE CONFORMAL INFERENCE |
| 12316 | DYNAMIC EXPLAINABLE RECOMMENDATION WITH MULTI-FEATURE AND PERSONALIZED TEST-TIME INFERENCE |
| 14888 | Dynamic Feature Selection on Variable Feature Sets Using Features of Features |
| 2216 | Dynamic Frequency Domain Curriculum Learning: A Novel Framework for Adaptive Image Forgery Detection |
| 17250 | Dynamic Fusion for Large Language Models Compression |
| 7470 | DYNAMIC GATING FUSION AND MULTIMODAL CONTRASTIVE LEARNING FOR GRAPH-BASED DISEASE DIAGNOSIS |
| 12429 | DYNAMIC INTRA-INTER PARTITION LEARNING FOR BUILDING RECONSTRUCTION FROM POINT CLOUDS |
| 1185 | DYNAMIC KALMAN FUSION FOR ROBUST CONTINUOUS SIGN LANGUAGE RECOGNITION |
| 6510 | DYNAMIC LANGUAGE ADAPTATION AND COLLABORATIVE MEMORY MODELING FOR VISION-LANGUAGE TRACKING |
| 17297 | DYNAMIC MULTI-EXPERT PROJECTORS WITH STABILIZED ROUTING FOR MULTILINGUAL SPEECH RECOGNITION |
| 15685 | DYNAMIC MULTI-PATH LEARNING FOR OUT-OF-DISTRIBUTION NODE CLASSIFICATION ON HETEROPHILIC GRAPH |
| 9787 | DYNAMIC MULTI-REWARD OPTIMIZATION FOR MULTI-ROUND PREFERENCE-ALIGNED DIFFUSION |
| 11567 | DYNAMIC NOISE-AWARE MULTI LORA FRAMEWORK TOWARDS REAL-WORLD AUDIO DEEPFAKE DETECTION |
| 8194 | DYNAMIC PROTOTYPE REFINEMENT FOR OUT-OF-DISTRIBUTION DETECTION: BALANCING COMPACTNESS AND DIVERSITY |
| 10625 | Dynamic Self-Distillation Former for Weakly Supervised Semantic Segmentation |
| 5410 | Dynamic Semantic Path Routing with Learnable Priors for Image Captioning |
| 5813 | DYNAMIC SEQUENCING AND GNN-BASED POSTED-PRICE DESIGN FOR COMBINATORIAL AUCTIONS |
| 10444 | DYNAMIC SPECTROGRAM ANALYSIS WITH LOCAL-AWARE GRAPH NETWORKS FOR AUDIO ANTI-SPOOFING |
| 14291 | Dynamic Spike-and-Slab Particle Filtering for Topology Tracking |
| 7055 | DYNAMIC STATE SPACE MODELS FOR CROSS-MODALITY FUSION |
| 1961 | Dynamic Summary Generation for Interpretable Multimodal Depression Detection |
| 11134 | DYNAMICAL ISOMETRY BASED RIGOROUS FAIR NEURAL ARCHITECTURE SEARCH |
| 7788 | Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training |
| 14954 | Dynamic-M With Dual-Stage Sparsity and Cross-Scale Structural Coherence for Generalized Industrial Anomaly Detection |
| 16239 | DYNAPREDICT: ALTERNATING PREDICTIVE AND REAL ITERATION FOR EFFICIENT DEEP REINFORCEMENT LEARNING TRAINING |
| 2796 | DYNDETECT: DYNAMIC ROUTING FOR ROBUST MULTI-MODAL MEDIA MANIPULATION DETECTION AND GROUNDING |
| 17339 | DyPANet: Efficient Event-driven Eye Tracking via Dynamic Path Adaptation and ROI Filtering |
| 6255 | DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers |
| 5855 | E2E-AEC: IMPLEMENTING AN END-TO-END NEURAL NETWORK LEARNING APPROACH FOR ACOUSTIC ECHO CANCELLATION |
| 6161 | Early Prediction Method for Learners at Risk Based on Multi-source Feature Fusion |
| 17818 | EASY TURN: INTEGRATING ACOUSTIC AND LINGUISTIC MODALITIES FOR ROBUST TURN-TAKING IN FULL-DUPLEX SPOKEN DIALOGUE SYSTEMS |
| 11270 | EATS2: Enabling Efficient and Accurate Trajectory Similarity Computation via Self-Training |
| 16951 | EBAD-GS: Deblurring Gaussian Splatting with Event-driven Bundle Adjustment |
| 14558 | EBCF: Strict Error-Bounded Compression of Numerical Climate Data with Discrete Normalizing Flows |
| 16928 | EBEVTRACK: ESTIMATED BIRD’S-EYE VIEW FOR MULTI-OBJECT TRACKING |
| 5955 | ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals |
| 8921 | ECHOFAKE: A REPLAY-AWARE DATASET FOR PRACTICAL SPEECH DEEPFAKE DETECTION |
| 4872 | ECHORAG: A TWO-STAGE FRAMEWORK FOR AUDIO-TEXT RETRIEVAL AND TEMPORAL GROUNDING |
| 11283 | ECHO-TRAFFIC: CROSS-MODAL FEATURE AUGMENTATION FOR TRAFFIC TRANSFORMER PRE-TRAINING |
| 16530 | ECM: Enhancing Compressibility of Quantized Vision Encoder and LLM for Large Vision-Language Models |
| 7910 | ECONOMICALLY CONSTRAINED CYCLE-CONSISTENT GENERATIVE NETWORKS FOR RISK-NEUTRAL DENSITY ESTIMATION: CRISIS-ROBUST PRICING AND HEDGING |
| 12590 | ECSA: DUAL-BRANCH EMOTION COMPENSATION FOR EMOTION-CONSISTENT SPEAKER ANONYMIZATION |
| 15058 | EDB-NET: ENTROPY DUAL-BRANCH NETWORK FOR FEW-SHOT TEXT CLASSIFICATION |
| 5149 | EDGE COLLABORATIVE GAUSSIAN SPLATTING WITH INTEGRATED RENDERING AND COMMUNICATION |
| 5485 | EDGE-AWARE SCALE PREDICTION FOR 3D GAUSSIAN SPLATTING |
| 10834 | EDGEPOSE: SELECTIVE AND ADAPTIVE DIFFUSION FILTERING FOR REAL-TIME HUMAN POSE ESTIMATION ON EDGE DEVICES |
| 13998 | EDGESPOT: EFFICIENT AND HIGH-PERFORMANCE FEW-SHOT MODEL FOR KEYWORD SPOTTING |
| 1954 | EDITMEM: ENHANCING MULTI-HOP FACT VERIFICATION VIA EDITABLE MEMORY |
| 3198 | EDITS: ENHANCING DATASET DISTILLATION WITH IMPLICIT TEXTUAL SEMANTICS |
| 4850 | EDN-Gaussian: Edge-Directed Densification with Covariance Narrowing for Blur-Robust 3D Gaussian Splatting |
| 17122 | EDPOTRANS: ENHANCED DIRECT PREFERENCE OPTIMIZATION FOR MACHINE TRANSLATION BETWEEN LOW-RESOURCE LANGUAGE AND CHINESE WITH LIMITED MONOLINGUAL DATA |
| 9658 | EduGesture: A Dataset of Teachers' Hand Gestures toward Pedagogical Intentions |
| 5676 | EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection using Self-Attention Attractors |
| 8563 | EFFECT OF PROPAGATION DELAYS ON CELL-FREE MASSIVE MIMO SYSTEMS |
| 12295 | Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models |
| 1994 | EFFICIENT AND GLOBAL INTERACTION-AWARE RETRAINING-FREE TOKEN PRUNING FOR VISION TRANSFORMERS |
| 6298 | EFFICIENT AND SCALABLE TOBIT GAUSSIAN PROCESS REGRESSION FOR MODELING AIR QUALITY DATA |
| 2066 | Efficient Audio-Visual Inference via Token Clustering and Modality Fusion |
| 11977 | EFFICIENT CATEGORY-LEVEL 6D POSE ESTIMATION VIA POSE-AWARE FEATURE LEARNING |
| 18952 | Efficient CNNs via Passive Filter Pruning |
| 11666 | Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming |
| 15079 | Efficient Distillation of Large Language Models using Group Relative Policy Distillation |
| 15710 | EFFICIENT EXPOSURE FUSION VIA FINE-TUNING A LOW-LIGHT ENHANCEMENT MODEL |
| 13544 | EFFICIENT FEW-SHOT LEARNING FOR EDGE AI VIA KNOWLEDGE DISTILLATION ON MOBILEVIT |
| 14084 | Efficient Gaussian Process Learning via Subspace Projections |
| 2214 | EFFICIENT MOIRÉ ARTIFACT REMOVAL IN RAW AND SRGB DOMAINS VIA SPIKING NEURAL NETWORKS |
| 14796 | Efficient Multi-LoRA Deployment via Shared KV-Cache with Task-Adaptive Tokens |
| 1375 | EFFICIENT OFFLINE REINFORCEMENT LEARNING WITH PROGRESSIVE HEURISTIC BLENDING IN COMPLEX ENVIRONMENTS |
| 4409 | EFFICIENT ONLINE PEER ADAPTATION IN MULTI-AGENT COMPETITION AND COOPERATION VIA VISION LANGUAGE MODEL |
| 7918 | Efficient Plug-and-Play Method for Dynamic Imaging via Kalman Smoothing |
| 17670 | EFFICIENT PROGRESSIVE TRAINING FRAMEWORK FOR IDENTITY-CONSISTENT FACE SWAPPING |
| 9498 | Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization |
| 12719 | EFFICIENT RECONSTRUCTION OF TEXTURELESS OBJECTS VIA QUALITY-AWARE AND DEPTH-ENHANCED GAUSSIAN SPLATTING |
| 13799 | Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data |
| 7719 | EFFICIENT SELF-SUPERVISED LEARNING FOR REMOTE SENSING VIA SPARSE CONVOLUTIONAL MIXTURE-OF-EXPERTS |
| 8137 | Efficient Subset Selection-based Algorithms for Factorizing Low-Rank Matrices with Application to Robust PCA |
| 11055 | Efficient Synthetic Data Selection via Pontryagin's Maximum Principle |
| 10099 | EFFICIENT TRANSFORMER AND INTERLEAVED CONTEXT CLUSTER FOR FAST POINT CLOUD REGISTRATION |
| 7680 | Efficient Uncertainty Quantification for Full Waveform Inversion via Shot-Encoded Hessian |
| 4977 | EFFICIENT VISUO-TACTILE LEARNING VIA FINE-GRAINED ALIGNMENT AND IMPORTANCE-AWARE TOKEN RETENTION |
| 9908 | EFFICIENT WIDEBAND SPARSE ARRAYS FOR HIGH-RESOLUTION DOA ESTIMATION |
| 15760 | EFFICIENT3D-AD: TOKEN-EFFICIENT AND VIEW-AWARE ZERO-SHOT 3D MULTIMODAL ANOMALY DETECTION |
| 12285 | EG-GCN: Enthalpy-Guided graph convolutional networks |
| 17826 | EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction |
| 2601 | EGMR-YOLO: ADVANCED TUBERCULOSIS DIAGNOSIS VIA A REFINED YOLO MODEL |
| 4109 | EGOGEN: EGOCENTRIC INTERACTION VIDEO GENERATION WITH 3D HAND STRUCTURE CONSTRAINTS |
| 3345 | EGOPRESSDIFF: MULTIMODAL VIDEO DIFFUSION FOR EGOCENTRIC UV-DOMAIN HAND-PRESSURE ESTIMATION |
| 2840 | EHDN: AN ENHANCED HOMOGRAPHY DECOMPOSITION NETWORK FOR ROBUST PLANAR OBJECT TRACKING |
| 10946 | EICA: An Emotional Inertia-Contagion-Aware Alignment for Emotion Recognition in Conversations |
| 17794 | EIVF: EFFICIENT IVFPQ SEARCH FOR ON-DEVICE ARM PROCESSORS |
| 19144 | Embracing Cacophony: Explaining and Improving Random Mixing in Music Source Separation |
| 15275 | Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling |
| 10992 | EMISSIVE-GS: RELIGHTABLE RECONSTRUCTION AND EMISSION EDITING VIA GAUSSIAN SPLATTING |
| 5420 | EM-MAMP: LOW-COMPLEXITY SIGNAL RECOVERY WITH PARAMETER LEARNING |
| 12889 | EMODIFFUSION: MODELING EMOTION EVOLUTION WITH DIFFUSION FOR DIVERSE AND COHERENT DIALOGUE GENERATION |
| 13421 | EMODRIVE: AN EMOTION-AWARE VISION-LANGUAGE MODEL FOR HUMAN-CENTRIC AUTONOMOUS DRIVING |
| 10280 | EMOE: EIGENBASIS-GUIDED ROUTING FOR MIXTURE-OF-EXPERTS |
| 17812 | EMORL-TTS: REINFORCEMENT LEARNING FOR FINE-GRAINED EMOTION CONTROL IN LLM-BASED TTS |
| 10023 | EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis |
| 6751 | EMOTION AND ACOUSTICS SHOULD AGREE: CROSS-LEVEL INCONSISTENCY ANALYSIS FOR AUDIO DEEPFAKE DETECTION |
| 18894 | Emotion Classification with Visibility Graphs |
| 11060 | EMOTIONAL DAMAGE: INVESTIGATING SAFETY VULNERABILITIES OF LARGE AUDIO-LANGUAGE MODELS UNDER SPEAKER EMOTIONAL VARIATIONS |
| 5265 | EMOTIONAL DIMENSION CONTROL IN LANGUAGE MODEL-BASED TEXT-TO-SPEECH: SPANNING A BROAD SPECTRUM OF HUMAN EMOTIONS |
| 6251 | EMOTION-ALIGNED GENERATION IN DIFFUSION TEXT TO SPEECH MODELS VIA PREFERENCE-GUIDED OPTIMIZATION |
| 11085 | Emotion-Aware Learning with Class-Balanced Optimization for Dynamic Facial Expression Recognition |
| 9678 | Emotri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue |
| 9759 | EMO-TTA: IMPROVING TEST-TIME ADAPTATION OF AUDIO-LANGUAGE MODELS FOR SPEECH EMOTION RECOGNITION |
| 10101 | EMPEROR: EFFICIENT MOMENT-PRESERVING REPRESENTATION OF DISTRIBUTIONS |
| 5692 | EMPIRICAL ANALYSIS OF APPROXIMATE MESSAGE PASSING UNDER NON-I.I.D. MEASUREMENTS WITH COMPARISON TO STATE EVOLUTION |
| 1860 | EMPOWERING ECONOMIC SIMULATION THROUGH SITUATION-AWARE LLM-DRIVEN GENERATIVE SYSTEM |
| 4546 | EMPOWERING MULTIMODAL RESPIRATORY SOUND CLASSIFICATION WITH COUNTERFACTUAL ADVERSARIAL DEBIASING FOR OUT-OF-DISTRIBUTION ROBUSTNESS |
| 12468 | EMPOWERING THE TAIL: ADAPTIVE SEMANTIC NEIGHBORHOOD ENHANCEMENT FOR LONG-TAIL REASONING IN TEMPORAL KNOWLEDGE GRAPHS |
| 1432 | EMPOWERING TRANSFORMERS SPECTRALLY: TOWARDS COMPREHENSIVE PATTERN LEARNING FOR IMAGE DEMOIRÉING |
| 10152 | EMS-Mixer: Extreme Multi-Scale Mixing for Time Series Forecasting |
| 11048 | EMU: EMOTION UNDERSTANDING IN THE WILD - A NATURALISTIC MULTIMODAL DATASET AND BENCHMARK |
| 11552 | ENABLING EFFICIENT AND ACCURATE PRIVACY-PRESERVING IMAGE-TEXT RETRIEVAL IN PUBLIC CLOUD |
| 12958 | Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers |
| 3755 | ENABLING ON-DEVICE LIFE-THREATENING ARRHYTHMIA DETECTION VIA PERSONALIZED ADAPTIVE INFERENCE FOR IMPLANTABLE DEVICES |
| 5530 | Encoder-Decoder Symmetric Nonnegative Matrix Tri-Factorization for Graph Clustering |
| 10742 | ENCODING EMOTION THROUGH SELF-SUPERVISED EYE MOVEMENT RECONSTRUCTION |
| 14732 | ENCORE: ENTROPY-GUIDED CROPPING AND ATTENTION REGULARIZATION FOR ROBUST VISION–LANGUAGE UNDERSTANDING |
| 17447 | END-END-EDGE COLLABORATIVE FRAMEWORK FOR ADAPTIVE CONTENT-AWARE VIDEO ANALYTICS |
| 9625 | End-fire Target Bearing Estimation in Passive SONAR Employing End-to-End Deep Neural Networks with Focal Angular Loss |
| 5195 | END-TO-END EFFICIENT DENOISING FOR RADAR MICRO-DOPPLER SPECTROGRAMS USING FOURIER KOLMOGOROV-ARNOLD NETWORK |
| 13580 | END-TO-END INDOOR LOCALIZATION FOR BLUETOOTH 5 BASED ON A DUAL-BRANCH NETWORK |
| 8506 | END-TO-END SPEAKER VERIFICATION WITH UNCERTAINTY-AWARE EVIDENTIAL SCORING |
| 16017 | END-TO-END STORY VISUALIZATION FRAMEWORK WITH PENALTY-BASED EVALUATION USING VISION-LANGUAGE MODELS |
| 14493 | ENERGY PROFILING OF VIDEO PLAYBACK |
| 13914 | ENERGY-AWARE IMAGES VIA PIXEL VALUE REDUCTION: THE IMPACT OF COMPRESSION ON ATTENUATION MAPS. |
| 11246 | ENHANCE BALANCE BETWEEN GENERALIZATION AND PERSONALIZATION FOR VISION-LANGUAGE MODELS IN FEDERATED LEARNING |
| 10794 | Enhance Deformation-Tolerant Unsupervised Infrared and Visible Image Fusion via Hybrid Feature Representation Learning |
| 15777 | ENHANCE MESSAGE PASSING WITH CLUSTER-AWARE VIRTUAL NODES FOR SEMI-SUPERVISED NODE CLASSIFICATION |
| 10645 | ENHANCED CROSS-MEDIUM COMMUNICATION USING MULTI-SENSOR FUSION AND KALMAN FILTERING |
| 5048 | ENHANCED GENERATIVE MACHINE LISTENER |
| 6394 | ENHANCED GRAPH NEURAL NETWORKS USING K-HOP GAUSSIAN DIFFUSION |
| 16745 | ENHANCED GRAPH TRANSFORMER WITH SERIALIZED GRAPH TOKENS |
| 11716 | Enhanced Time-Frequency Representation of Nonstationary Signals via Cubic Polynomial Phase Synchroextracting Transform |
| 17809 | Enhanced Video Compression with Context-Aware Dynamic Neural Adapter |
| 3602 | ENHANCED VOLUMETRIC VIDEO STREAMING THROUGH ANCHOR-BASED VIEWPORT PREDICTION |
| 10982 | Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation |
| 15740 | ENHANCING ADVERSARIAL TRANSFERABILITY WITH INTEGRATED TIME-FREQUENCY MOMENTUM ITERATIVE ATTACK |
| 16354 | ENHANCING AUDIO QUESTION-ANSWERING PERFORMANCE THROUGH LOG-LIKELIHOOD GUIDED REWARD FUNCTIONS |
| 3202 | ENHANCING AUTOMATIC DRUM TRANSCRIPTION WITH ONLINE DYNAMIC FEW-SHOT LEARNING |
| 17210 | ENHANCING CLIP-BASED WEAKLY-SUPERVISED VIDEO ANOMALY DETECTION VIA OPTIMAL TRANSPORT |
| 1633 | ENHANCING CROSS-VIEW GEO-LOCALIZATION GENERALIZATION VIA GLOBAL-LOCAL CONSISTENCY AND GEOMETRIC EQUIVARIANCE |
| 11627 | Enhancing Debate Dialogue Generation via Dual-Dimensional Reflection and Refinement |
| 14880 | ENHANCING DIALOGUE-RELATED SPEECH TASKS WITH GENERATED SPOKEN DIALOGUES |
| 17034 | ENHANCING DOCUMENT-LEVEL MACHINE TRANSLATION VIA FILTERED SYNTHETIC CORPORA AND TWO-STAGE LLM ADAPTATION |
| 3737 | Enhancing domain generation through pluggable Style Randomization |
| 14095 | ENHANCING DOPPLER AND FMCW RADARS VIA UNLIMITED SENSING |
| 14971 | ENHANCING FAKE NEWS DETECTION WITH LLM-GENERATED MULTI-DIMENSIONAL EXPLANATIONS AND MULTI-CHANNEL FUSION |
| 1606 | ENHANCING GRAPH-BASED RETRIEVAL-AUGMENTED GENERATION VIA QUERY-AWARE PATH REASONING |
| 17520 | Enhancing Guidance for Missing Data in Diffusion-Based Sequential Recommendation |
| 16322 | ENHANCING INTER-LEAD CORRELATIONS: A NOVEL DIFFUSION GAN FRAMEWORK FOR 12-LEAD ECG GENERATION |
| 4057 | Enhancing Knowledge Base Question Answering with Reinforced Hop-wise Logical Form Generation |
| 9736 | Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals |
| 17869 | Enhancing Low-Resource Document-Level Relation Extraction with Coarse-to-Fine Prediction |
| 18890 | ENHANCING LOW-RESOURCE SPEECH RECOGNITION WITH NON-LINEAR CROSS-LINGUAL MAPPINGS |
| 13032 | ENHANCING MULTILINGUAL LLM-BASED ASR WITH MIXTURE OF EXPERTS AND DYNAMIC DOWNSAMPLING |
| 11893 | Enhancing Multivariate Time Series Forecasting from a Temporal Decoupling Perspective |
| 3146 | ENHANCING NOISE ROBUSTNESS FOR NEURAL SPEECH CODECS THROUGH RESOURCE-EFFICIENT PROGRESSIVE QUANTIZATION PERTURBATION SIMULATION |
| 2818 | ENHANCING ONLINE RL FINE-TUNING VIA ADAPTIVE Q-FUNCTION SELECTION |
| 16332 | ENHANCING PERSONALIZED FEDERATED CONTINUAL LEARNING WITH CLIENT-SPECIFIC SEMANTIC KNOWLEDGE |
| 14611 | ENHANCING POST-TRAINING QUANTIZATION VIA FUTURE ACTIVATION AWARENESS |
| 16444 | Enhancing QAOA Ansatz via Multi-Parameterized Layer and Blockwise Optimization |
| 1960 | Enhancing Quantization for Visual AutoRegressive Generation via Uncertainty Identification |
| 6908 | ENHANCING REFERRING EXPRESSION COMPREHENSION WITH PIXEL-WORD CORRELATION AND CROSS-LAYER REGULARIZATION |
| 8274 | ENHANCING RISK AWARENESS IN LLM AGENTS VIA PROBING SAFETY BOUNDARIES |
| 11574 | Enhancing Social Emotion Prediction with Persona-driven Comment Generation and Graph-based Information Fusion |
| 10598 | ENHANCING SPATIAL RELATIONSHIPS IN TEXT-TO-IMAGE GENERATION WITH STRUCTURED INFORMATION |
| 18231 | Enhancing Spatio-Temporal Forecasting with Spatial Neighbourhood Fusion: A Case Study on Mobility in Peru |
| 10808 | ENHANCING SPEAKER VERIFICATION WITH LAYER-WISE MIXTURE-OF-EXPERTS ON PRE-TRAINED MODELS |
| 7982 | ENHANCING SPEAKER VERIFICATION WITH W2V-BERT 2.0 AND KNOWLEDGE DISTILLATION GUIDED STRUCTURED PRUNING |
| 16386 | ENHANCING SPEECH INTELLIGIBILITY PREDICTION FOR HEARING AIDS WITH COMPLEMENTARY SPEECH FOUNDATION MODEL REPRESENTATIONS |
| 12539 | ENHANCING STABILITY AND REPRODUCIBILITY OF GRAPH INFORMATION BOTTLENECK FOR MENTAL DISORDER DIAGNOSIS |
| 13165 | ENHANCING UAV CLASSIFICATION VIA SPHERICAL HARMONIC TRANSFORM AND VIRTUAL MULTI-CHANNEL DENOISING |
| 6168 | ENHANCING VALUE ALIGNMENT OF LLMS WITH MULTI-AGENT SYSTEM AND COMBINATORIAL FUSION |
| 4881 | ENHF-YOLO: ENHANCED HIGH-FREQUENCY DOMAIN FEATURE EXTRACTION OF SMALL TARGETS IN REMOTE SENSING |
| 1577 | ENRICH VISUAL FEATURES BY HOLISTIC SAMPLING AND HIERARCHICAL CONDENSING IN MULTIMODAL LARGE LANGUAGE MODELS |
| 17547 | Enriching Tail Manifolds via Feature Synthesis and Margin Optimization for Long-Tailed Remote Sensing Recognition |
| 14247 | ENSEMBLE FOR REDUCING TARGET SPEECH EXTRACTION ERRORS |
| 15440 | Ensuring Reliable Participation in Subjective Video Quality Tests Across Platforms |
| 12358 | ENTITY ALIGNMENT AND STRUCTURAL PERTURBATION FOR COMMONSENSE KNOWLEDGE GRAPH REASONING |
| 15128 | ENTROCUT: ENTROPY-GUIDED ADAPTIVE TRUNCATION FOR EFFICIENT CHAIN-OF-THOUGHT REASONING IN SMALL-SCALE LARGE REASONING MODELS |
| 17576 | ENTROLLM: ENTROPY ENCODED WEIGHT COMPRESSION FOR EFFICIENT LARGE LANGUAGE MODEL INFERENCE ON EDGE DEVICES |
| 10806 | EntroLog: An Adaptive and Self-Improving Framework for Efficient Log Analysis |
| 4970 | Entropy-Aware Multimodal Preference Optimization for Factuality Alignment in Medical Visual Question Answering |
| 9424 | ENTROPYGS: AN EFFICIENT ENTROPY CODING ON 3D GAUSSIAN SPLATTING |
| 6633 | ENTROPY-GUIDED DATA-EFFICIENT TRAINING FOR MULTIMODAL REASONING REWARD MODELS |
| 15484 | Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec |
| 13690 | Environment-Aware MIMO Channel Estimation in Pilot-Constrained Upper Mid-Band Systems |
| 14205 | EOSIGN: EDGE-EFFICIENT ONE-SHOT ISL VIDEO SYNTHESIS FROM CODE-MIXED SPEECH WITH SIGNER CONSISTENCY AND TEMPORAL STABILITY |
| 17811 | EPED: A NOVEL REINFORCEMENT LEARNING-DRIVEN FRAMEWORK FOR EARLY PHISHING SCAMS DETECTION IN ETHEREUM |
| 15333 | EPO: Enhanced Preference Optimization with Multi-Response Data for LLMs via Stochastic Softmax |
| 14946 | Equipping Large Language Model with Directional Speech Understanding Capabilities |
| 18892 | EQUIRIPPLE MIMO BEAMPATTERN SYNTHESIS USING CHEBYSHEV APPROXIMATION |
| 3559 | Equivariant Deep Equilibrium Models for Imaging Inverse Problems |
| 16609 | Equivariant Hamiltonian Graph Neural Networks for Generalizing Dynamics of Magnetic Pendulum System |
| 18207 | ERASING YOUR VOICE BEFORE IT’S HEARD: TRAINING-FREE SPEAKER UNLEARNING FOR ZERO-SHOT TEXT-TO-SPEECH |
| 9037 | ERE-LLM: Entity-Relation Extraction With Large Language Model in Professional Domains |
| 6094 | ERFORMER: EVENT-RGB FUSION TRANSFORMER WITH ADAPTIVE BRIGHTNESS CONTROL FOR LOW-LIGHT IMAGE ENHANCEMENT |
| 12998 | Erosion Attack for Adversarial Training to Enhance Semantic Segmentation Robustness |
| 5274 | E-RRC: ENHANCED RANGE RESTRICTION CLIPPING FOR ROBUST VISION TRANSFORMERS ON EDGE DEVICES |
| 15429 | Error Bound Based Exact Penalization for Cardinality-Constrained Clustering |
| 2621 | ES4D-Net: Foreground-aided 3D Object Detection Based on Extremely Sparse 4D Radar Point Cloud |
| 6064 | ESINET: ENHANCING STRUCTURAL INTEGRITY IN SCRIBBLE-SUPERVISED CAMOUFLAGED OBJECT DETECTION |
| 4132 | E-SocialNav: Efficient Socially Compliant Navigation with Language Models |
| 13899 | ESTIMATING HAND-RELATED FEATURES FROM SPEECH USING MACHINE LEARNING |
| 17187 | ESTIMATING RESPIRATORY EFFORT FROM NOCTURNAL BREATHING SOUNDS FOR OBSTRUCTIVE SLEEP APNOEA SCREENING |
| 5244 | ESTIMATION OF THE HURST EXPONENT OF NOISY OR BLURRED FRACTAL TEXTURES. APPLICATION TO COMPUTER-AIDED MAMMOGRAM ANALYSIS. |
| 4017 | Etude: Piano Cover Generation with a Three-Stage Approach --- Extract, strucTUralize, and DEcode |
| 15869 | EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding |
| 16823 | EVA: ENHANCING ANIME VIDEO GENERATION VIA REINFORCEMENT LEARNING |
| 7517 | EVALUATING BIAS IN SPOKEN DIALOGUE LLMS FOR REAL-WORLD DECISIONS AND RECOMMENDATIONS |
| 14235 | EVALUATING COMPOSITIONAL STRUCTURE IN AUDIO REPRESENTATIONS |
| 14573 | EVALUATING DISENTANGLED REPRESENTATIONS FOR CONTROLLABLE MUSIC GENERATION |
| 14157 | EVALUATING EMOTION RECOGNITION IN SPOKEN LANGUAGE MODELS ON EMOTIONALLY INCONGRUENT SPEECH |
| 10161 | EVALUATING HIGH-RESOLUTION PIANO SUSTAIN PEDAL DEPTH ESTIMATION WITH MUSICALLY INFORMED METRICS |
| 11896 | Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets |
| 10233 | EVALUATING TEST-TIME ADAPTATION FOR FACIAL EXPRESSION RECOGNITION UNDER NATURAL CROSS-DATASET DISTRIBUTION SHIFTS |
| 14057 | EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation |
| 6696 | EVENT CAMERA DEPTH ESTIMATION FROM EPIPOLAR PLANE IMAGES |
| 14427 | Event classification by physics-informed inpainting for distributed multichannel acoustic sensor with partially degraded channels |
| 13538 | EVENT-AIDED SEMANTIC SCENE COMPLETION |
| 17722 | Event-driven Neuromorphic Near-Field Radar Imaging |
| 16108 | EVOALPHA: AN LLM-ENHANCED EVOLUTIONARY FRAMEWORK FOR FORMULAIC ALPHA MINING |
| 13609 | EVOLVING AASIST: TOWARDS SCALABLE AND GENERALIZABLE ANTI-SPOOFING MODELS |
| 4192 | EXPECTATION PROPAGATION DETECTOR EXPLOITING OVERLAPPING BLOCK STRUCTURES FOR HIGHLY CORRELATED MIMO SYSTEMS |
| 13791 | EXPERIENCE-DRIVEN DYNAMIC EXITS FOR LLMS WITH REINFORCEMENT LEARNING |
| 19040 | EXPLAINABLE DEEP LEARNING ANALYSIS FOR RAGA IDENTIFICATION IN INDIAN ART MUSIC |
| 17660 | EXPLAINABLE DEEPFAKE DETECTION WITH RL ENHANCED SELF-BLENDED IMAGES |
| 19028 | EXPLAINABLE DNN-BASED BEAMFORMER WITH POSTFILTER |
| 10038 | Explaining Face Verification Decisions with Pairwise Facial Feature Explanation |
| 15530 | EXPLICIT TIME-FREQUENCY DYNAMICS FOR SKELETON-BASED GAIT RECOGNITION |
| 4533 | EXPLOITING BACKDOOR TRIGGER TOWARDS UNLEARNABLE EXAMPLES |
| 16026 | Exploiting Latent and Implicit chain of thought for Efficient multi-hop question answering |
| 2084 | EXPLOITING SCATTERS FOR SENSING SECURITY IN ISAC SYSTEMS |
| 16929 | EXPLOITING SPARSE-TEMPORAL DYNAMICS VIA RELEVANCE NETWORKS FOR UAV TRACKING |
| 13425 | EXPLOITING THE PROPERTIES OF AN ADAPTIVE BLACK-BOX ATTACK AGAINST FEDERATED LEARNING |
| 16930 | EXPLORATION BEYOND BUDGET: TRAINING LARGE LANGUAGE MODELS TO EXPLORE UNDER TRUNCATION CONSTRAINTS |
| 3894 | Exploring Audio Hallucination in Egocentric Video Understanding |
| 9210 | Exploring Confidence as a Reward to Advance LLMs Reasoning |
| 16536 | EXPLORING FINE-TUNING OF LARGE AUDIO LANGUAGE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING UNDER LIMITED SPEECH DATA |
| 11467 | Exploring How Audio Effects Alter Emotion with Foundation Models |
| 5883 | EXPLORING RESOLUTION-WISE SHARED ATTENTION IN HYBRID MAMBA-U-NETS FOR IMPROVED CROSS-CORPUS SPEECH ENHANCEMENT |
| 13572 | EXPLORING SSL DISCRETE TOKENS FOR MULTILINGUAL AUTOMATIC SPEECH RECOGNITION |
| 17232 | Exploring the Existence of Over-Squashing in Directed Networks |
| 10828 | Exploring Unlabeled Data for Vision-Language Models Beyond Greedy Hard Pseudo-labels |
| 10546 | EXPRESSIVE VOICE CONVERSION WITH CONTROLLABLE EMOTIONAL INTENSITY |
| 3346 | Exterior sound field estimation based on physics-constrained kernel |
| 19120 | EXTERNAL DIVISION OF TWO PROXIMITY OPERATORS—PART I: DEBIASED FEATURE GROUPING |
| 19121 | EXTERNAL DIVISION OF TWO PROXIMITY OPERATORS—PART II: GENERALIZATION AND PROPERTIES |
| 19097 | EXTRACTING FORMULAE IN MANY-VALUED LOGIC FROM DEEP NEURAL NETWORKS |
| 16292 | EXTREMOPROMPT: ADVANCING MIXTURE OF SOFT PROMPTS TO THE LIMIT |
| 13318 | F2G-AMD: Feature-to-Graph Affinity with Large-Kernel Attention for AMD Grading using Fundus Images |
| 3741 | F5E-TTS: ENHANCING SPEECH SYNTHESIS BY ALIGNING TEXT WITH RICH SEMANTIC REPRESENTATIONS |
| 5595 | FABEM: FREQUENCY-AWARE BOUNDARY ENHANCEMENT MODULE FOR SMALL OBJECT DETECTION |
| 17954 | FACESLEUTH-R: ADAPTIVE ORIENTATION-AWARE ATTENTION FOR ROBUST MICRO-EXPRESSION RECOGNITION |
| 11767 | Face-Voice Association with Inductive Bias for Maximum Class Separation |
| 14604 | FAC-FACODEC: CONTROLLABLE ZERO-SHOT FOREIGN ACCENT CONVERSION WITH FACTORIZED SPEECH CODEC |
| 10494 | FACLIP : LEARNABLE FINE-GRAINED PROMPTS AND MULTI-SCALE FUSION FOR ZERO-SHOT ANOMALY DETECTION |
| 12764 | FADEMEM: BIOLOGICALLY-INSPIRED FORGETTING FOR EFFICIENT AGENT MEMORY |
| 4325 | FAIRCG: MITIGATE COUNTERFACTUAL AND GROUP BIAS IN MACHINE LEARNING |
| 5116 | FAIRMOO: ACHIEVING FAIRNESS IN DISTRIBUTED LEARNING VIA CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION |
| 10708 | FAIRNESS-AWARE GRAPH REPRESENTATION LEARNING THROUGH LOW-FREQUENCY BIAS SEPARATION |
| 17173 | Fairness-oriented decoupled user association and resource allocation in fully-decoupled RAN: A two-layer MAB approach |
| 13735 | FAITH: ENHANCING TIME SERIES FORECASTING WITH FREQUENCY-BASED ADAPTIVE INPUT HORIZON |
| 11019 | FAKE IMAGE DETECTION ON NOISE RESIDUAL SPECTRA VIA RANDOM-FEATURE SINGLE-LAYER NEURAL NETWORKS |
| 18930 | Fake Path Co-Construction Source Location Privacy Protection Scheme Design For UWSNs |
| 2021 | FAKE SPEECH WILD: DETECTING DEEPFAKE SPEECH ON SOCIAL MEDIA PLATFORM |
| 10908 | Fake-HR1: Rethinking Reasoning of vision language model for Synthetic Image Detection |
| 9677 | Fall Detection with Sound Diffusion Field: Integrating Audible Sound Event and Acoustic Speed Estimation |
| 7992 | FAN-RFID: EXFILTRATING DATA FROM AIR-GAPPED SYSTEMS VIA FAN-INDUCED RFID MODULATION |
| 15074 | FANSR: Frequency Adaptive Network for Efficient Image Super-Resolution |
| 10575 | FAO-FORMER: LEARNING DISENTANGLED SEMANTIC REPRESENTATIONS WITH FREQUENCY-AWARE ORTHOGONAL TRANSFORMER |
| 10146 | FAST AND ACCURATE TEMPORAL SUPER-RESOLUTION VIA RESIDUAL-AWARE COUPLED TENSOR FACTORIZATION |
| 2767 | Fast and Accurate Text-to-Motion Generation through Discrete-Guided Continuous Modeling |
| 10819 | Fast and Robust Triple Tensor Decomposition With Data Corruption |
| 10142 | FAST INTER- AND INTRA-MODE DECISION FOR VIDEO-BASED DYNAMIC MESH CODING |
| 2616 | Fast Low-light Enhancement and Deblurring for 3D Dark Scenes |
| 10229 | FAST SINGLE-SNAPSHOT HARMONIC RECOVERY WITH 2D SPARSE ARRAYS USING BCCB MATRICES |
| 16452 | FAST SPARSE NONNEGATIVE MATRIX FACTORIZATION WITH MANIFOLD ACCELERATION |
| 4767 | FAST: FUSION-BASED ANOMALY SEARCH ON A TREE IN HIERARCHICAL HETEROGENEOUS SYSTEMS |
| 13298 | FAST_QR: FAST, ACCURATE AND STABLE QUANTILE REGRESSION FOR TIME-SERIES ANALYSIS VIA ADAPTIVE HUBER SMOOTHING |
| 5730 | FASTAV: EFFICIENT TOKEN PRUNING FOR AUDIO-VISUAL LARGE LANGUAGE MODEL INFERENCE |
| 11174 | FASTEAGLE: CASCADED DRAFTING FOR ACCELERATING SPECULATIVE DECODING |
| 3646 | FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement |
| 5451 | FAST-GS: Frequency Aware Space-time Gaussian Splatting for Photorealistic Dynamic Novel View Synthesis |
| 17946 | FAST-SLOW LORA: ACHIEVING EFFICIENT CONTINUAL LEARNING VIA FAST-SLOW LEARNING AND REDUNDANCY PRUNING |
| 1778 | FAST-ULCNET: A FAST AND ULTRA LOW COMPLEXITY NETWORK FOR SINGLE-CHANNEL SPEECH ENHANCEMENT |
| 9366 | FC-FORMER: EFFICIENT FEATURE CODING FOR MACHINES VIA A HYBRID CNN-TRANSFORMER ARCHITECTURE |
| 2936 | FC-MOE: FLIP CONSISTENT MIXTURE OF EXPERTS ARE GOOD LEARNERS FOR UNIFIED FACE ATTACK DETECTION |
| 12321 | FCR: MULTI-VIEW COMPOSITIONAL RETRIEVAL FOR TIME SERIES FORECASTING WITH LARGE LANGUAGE MODELS |
| 4924 | FCSTG-InceptionNet: Temporal Lag and Mesoscale Spatio-Temporal Features Modeling for EEG-Based Diagnostics |
| 13469 | FC-VFI: FAITHFUL AND CONSISTENT VIDEO FRAME INTERPOLATION FOR HIGH-FPS SLOW MOTION VIDEO GENERATION |
| 11318 | FDCA-CLIP: FREQUENCY-ENHANCED DUAL-SEMANTIC CROSS-MODAL ALIGNMENT FOR ZERO-SHOT SPATIO-TEMPORAL ACTION LOCALIZATION |
| 17152 | FDCNET: FREQUENCY DOMAIN CHANNEL ATTENTION AND CONVOLUTION FOR LIPREADING |
| 14951 | FDCP-MATCH: A NEW MODEL WITH FREQUENCY DOMAIN AND CLASS PROMPT FOR GENERALIZED FEW-SHOT SEMANTIC SEGMENTATION |
| 15410 | FDS-MANET: A HYPERSPECTRAL CLASSIFICATION NETWORK DRIVEN BY BIDIRECTIONAL MAMBA WITH FREQUENCY DOMAIN ENHANCEMENT AND GRAPH MODULATION |
| 1684 | FEATURE IDENTIFICATION FOR HIERARCHICAL CONTRASTIVE LEARNING |
| 10420 | FEATURE PROJECTION LEARNING FOR BETTER VISION-LANGUAGE REASONING |
| 11078 | FEATURE-GUIDED UNSIGNED DISTANCE FUNCTIONS ESTIMATION FOR SURFACE RECONSTRUCTION |
| 9448 | FED: A FINE-GRAINED ENHANCED DUAL-ROUTING NETWORK FOR MULTIMODAL SARCASM DETECTION |
| 12893 | FedALP: Lightweight Personalized Federated Learning with Adaptive Low-Rank Adapters |
| 12849 | FedAVOT: Exact Distribution Alignment in Federated Learning via Masked Optimal Transport |
| 13040 | FEDCADS: ROBUST FEDERATED LEARNING VIA DUAL DISTILLATION AND PARTICIPATION-AWARE OPTIMIZATION UNDER NON-IID DATA |
| 15394 | FEDCOMPASS: FEDERATED CLUSTERED AND PERIODIC AGGREGATION FRAMEWORK FOR HYBRID CLASSICAL-QUANTUM MODELS |
| 3148 | FedDBP: Enhancing Federated Prototype Learning with Dual-Branch Features and Personalized Global Fusion |
| 6956 | FEDD-NET: A FREQUENCY DIAGONAL FEATURE ENHANCED DUAL-BRANCH DIFFUSION NETWORK FOR LOW-LIGHT IMAGE ENHANCEMENT |
| 4030 | Federated Camouflaged Poisoning Attack in Federated Unlearning |
| 10458 | Federated Clustering without k: Adaptive Prototype Aggregation on Heterogeneous Data |
| 6782 | FEDERATED HETEROGENEOUS LANGUAGE MODEL OPTIMIZATION FOR HYBRID AUTOMATIC SPEECH RECOGNITION |
| 10297 | FEDERATED IMAGE CLUSTERING WITH KNOWLEDGE INTERACTION |
| 16836 | FEDERATED JOINT LEARNING FOR DOMAIN AND CLASS GENERALIZATION |
| 14083 | FEDERATED SMOOTHING ADMM FOR ROBUST LOCALIZATION |
| 8072 | FED-GAME: PERSONALIZED FEDERATED LEARNING WITH GRAPH ATTENTION MIXTURE-OF-EXPERTS FOR TIME-SERIES FORECASTING |
| 10684 | FEDGPAI: PERSONALIZED FEDERATED LEARNING BASED ON PARAMETER SENSITIVITY ADAPTIVE INTERPOLATION |
| 13954 | FEDLA: FILTER-WISE LEARNABLE AGGREGATION FOR FEDERATED LEARNING UNDER NON-IID DATA |
| 2780 | FED-MET: MEMORY-EFFICIENT ELASTIC TRAINING IN FEDERATED LEARNING |
| 4803 | FEDON: BLACK-BOX UNTARGETED MODEL POISONING VIA MULTI-OBJECTIVE REINFORCEMENT LEARNING |
| 13041 | FEDPAK: SERVER-CENTRIC PROTOTYPE REFINEMENT WITH ADAPTIVE MARGINS AND GENERATIVE KNOWLEDGE TRANSFER FOR HETEROGENEOUS FEDERATED LEARNING |
| 1493 | FedPGP: Adaptive Feature Alignment for Personalized Global Prototypes in Federated Learning |
| 4183 | Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation |
| 2513 | FEDPLA: PROTOTYPE-ALIGNED LOW-RANK ADAPTATION FOR MULTIMODAL FEDERATED LEARNING |
| 15998 | FEDPROLN: CLASS PROTOTYPE-ENHANCED FEDERATED LEARNING FOR LONG-TAILED NOISY LABELS |
| 5250 | FEDPROTOALIGN: FEDERATED PROTOTYPE ALIGNMENT UNDER IDENTITY INCONSISTENCY FOR GAIT RECOGNITION |
| 16010 | FedRD: Reducing Divergences for Generalized Federated Learning via Heterogeneity-aware Parameter Guidance |
| 3187 | FEDRL-SATOPT: FEDERATED REINFORCEMENT LEARNING FOR JOINT ROUTING AND COMPUTING IN DYNAMIC LEO SATELLITE NETWORKS |
| 8056 | FedSKU: Defending Backdoors in Federated Learning Through Selective Knowledge Unlearning |
| 15221 | FEDZKD: ZEROTH-ORDER DUAL-ADAPTER DISTILLATION FOR FEDERATED FINE-TUNING |
| 12212 | Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models |
| 5630 | FEFusionMap:A LiDAR-Camera Fused Semantic Map Generation Frame Via Multi-modal Feature Enhancement |
| 14296 | FEMTOMODELS FOR EEG ARTIFACT REMOVAL: A PARAMETER LOWER-BOUND FOR GENERALISABLE EOG DENOISING |
| 13598 | FEW-SHOT AND PSEUDO-LABEL GUIDED SPEECH QUALITY EVALUATION WITH LARGE LANGUAGE MODELS |
| 17798 | FEW-SHOT BEARING FAULT DIAGNOSIS USING MULTI-SCALE FEATURE EXTRACTION AND ATTENTION-BASED PROTOTYPE MATCHING |
| 18325 | Few-shot Learning via Multi-modal Representation Integration |
| 1906 | FEW-SHOT OBJECT DETECTION VIA CONDITIONAL VARIATIONAL ADAPTIVE MEMORY ENHANCEMENT |
| 3869 | FEW-SHOT RECOGNITION OF AUDIO DEEPFAKE GENERATORS USING GRAPH-BASED PROTOTYPE ADAPTATION |
| 14892 | FGAL-DM: PRIVACY-PRESERVING SEMANTIC COMMUNICATION VIA FEDERATED GENERATIVE ADVERSARIAL LATENT DIFFUSION MODELS |
| 2713 | FGAPA: FEATURE-GUIDED ADVERSARIAL PROTOTYPE ALIGNMENT FOR HYPERSPECTRAL CROSS-DOMAIN FEW-SHOT CLASSIFICATION |
| 3279 | FGFF-NET: FREQUENCY-GUIDED FEATURE FUSION NETWORK FOR VISIBLE-INFRARED OBJECT DETECTION |
| 15897 | FGGM: FISHER-GUIDED GRADIENT MASKING FOR CONTINUAL LEARNING |
| 16779 | FGSANet: A Frequency-Guided and Structure-Aware Framework for Robust Sheep Breed Recognition |
| 14211 | FIBKD: A FIBER BUNDLE-BASED FRAMEWORK FOR EFFECTIVE KNOWLEDGE DISTILLATION |
| 8645 | FIDIC:FINE-GRAINED CONVERSATIONAL EMOTION RECOGNITION VIA INDIVIDUAL DIFFERENCES IN INERTIA AND CONTAGION |
| 18891 | FieldFormer: Self-supervised Reconstruction of Physical Fields via Tensor Attention Prior |
| 10403 | FIG: FREQUENCY-BASED INTEGRATED GRADIENTS FOR ROBUST FEATURE ATTRIBUTION |
| 15448 | FILTER THEN ATTEND: IMPROVING ATTENTION-BASED TIME SERIES FORECASTING WITH SPECTRAL FILTERING |
| 2808 | FILTER-GROUP MIXTURE-OF-EXPERTS MODEL FOR REMOTE SENSING FORGED TARGET PERCEPTION |
| 9874 | FINBED: A UNIFIED MULTIMODAL EMBEDDING FRAMEWORK FOR FINANCIAL REPRESENTATION LEARNING |
| 15264 | FINE-GRAINED FRAME MODELING IN MULTI-HEAD SELF-ATTENTION FOR SPEECH DEEPFAKE DETECTION |
| 12454 | FINE-GRAINED GESTURE RECOGNITION VIA NARROW-KERNEL CNN AND ATTENTION-BASED SEMG-ACC FUSION |
| 6983 | Fine-Grained Hashing via Center Similarity Guided Quantization |
| 1928 | Fine-grained Text-to-Image Synthesis with Semantic Refinement |
| 15721 | FineLongCLIP: Advancing Fine-Grained Image-Text Matching via a Dual-Branch Visual Encoder Capturing Global and Detailed Features |
| 3521 | FINE-TUNED DEEP SUBSPACE CLUSTERING NETWORKS |
| 7009 | FINE-TUNING BIGVGAN-V2 FOR ROBUST MUSICAL TUNING PRESERVATION |
| 15139 | FINE-TUNING LARGE MULTIMODAL MODELS FOR AUTOMATIC PRONUNCIATION ASSESSMENT |
| 10500 | FINE-TUNING MODEL WATERMARKS AGAINST EXTRACTION ATTACKS BY REHEARSAL |
| 13974 | FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition |
| 13596 | FINLUMEN: A GAME-THEORETIC MULTI-AGENT FRAMEWORK FOR RATIONAL PORTFOLIO MANAGEMENT |
| 17786 | FINMCP-BENCH: BENCHMARKING LLM AGENTS FOR REAL-WORLD FINANCIAL TOOL USE UNDER THE MODEL CONTEXT PROTOCOL |
| 4643 | FINSENTLLM: MULTI-LLM AND STRUCTURED SEMANTIC SIGNALS FOR ENHANCED FINANCIAL SENTIMENT FORECASTING |
| 10763 | FINUA: GENERATING DIVERSE USER INTERACTIONS FOR FINANCIAL DIALOGUE SYSTEMS THROUGH USER SIMULATION |
| 13629 | FIPNET: SYNERGISTIC FEATURE ENHANCEMENT AND IDENTITY PURIFICATION FOR CLOTHES-CHANGING PERSON RE-IDENTIFICATION |
| 14515 | First Results on RIS-Enabled Multi-Layer Localization: A Joint Terrestrial and Non-Terrestrial Method |
| 7147 | First-order and second-order detectors for matched subspace detection on graphs |
| 8131 | Fisher Scoring algorithm for Time-delay and Doppler estimation |
| 14266 | FIXED-POINT EQUALIZATION IN DIAGONAL EXPECTATION PROPAGATION: SCALAR DECOUPLING AND BAYES-MMSE OPTIMALITY |
| 9399 | FLAME: EMPOWERING FROZEN LLMS FOR KNOWLEDGE GRAPH COMPLETION |
| 14368 | FLASHFOLEY: FAST INTERACTIVE SKETCH2AUDIO GENERATION |
| 4566 | FLASH-UNLEARN: ON-THE-FLY, TRAINING-FREE LARGE LANGUAGE MODELS UNLEARNING THROUGH SUBSPACE DISTRIBUTION FILTERING |
| 15978 | F-LBQ: FINE-GRAINED LOW BIT QUANTIZATION FOR EFFICIENT AND ACCURATE OBJECT DETECTION |
| 13338 | FLEXIBLE FILTER DESIGN USING DEEP OSCILLATORY NEURAL NETWORKS |
| 13151 | FLEXI-LORA: EFFICIENT LORA FINETUNING WITH INPUT-ADAPTIVE DYNAMIC RANKS |
| 9955 | FLEXIO: FLEXIBLE SINGLE- AND MULTI-CHANNEL SPEECH SEPARATION AND ENHANCEMENT |
| 10634 | FLIPCON: FLIPPED CONTRASTIVE LEARNING FOR FINE-GRAINED DOA REPRESENTATION |
| 15940 | FLOW INTELLIGENCE: ROBUST FEATURE MATCHING VIA TEMPORAL SIGNATURE CORRELATION |
| 6050 | FLOW MATCHING-BASED ACTIVE LEARNING FOR RADIO MAP CONSTRUCTION WITH LOW-ALTITUDE UAVS |
| 3587 | FLOWGPT: A GPT-CONDITIONED VISION-MAMBA FRAMEWORK FOR FINE-GRAINED URBAN FLOW INFERENCES |
| 17341 | FLOWIID: SINGLE-STEP INTRINSIC IMAGE DECOMPOSITION VIA LATENT FLOW MATCHING |
| 1742 | FlowMemRep:Automated workflow with Memory-Aware for Smart Contract Vulnerability Repair Using LLMs |
| 6992 | FLOWSE-GRPO: TRAINING FLOW MATCHING SPEECH ENHANCEMENT VIA ONLINE REINFORCEMENT LEARNING |
| 16704 | FlowSGG: A Single-Stage Framework for Dynamic Scene Graph Generation via Temporal Propagation |
| 10081 | Fluid Antenna Assisted Anti-Jamming Communication in Low-Altitude Wireless Networks |
| 18864 | FMAPLS: BAYESIAN LABEL SHIFT ESTIMATION BASED ON DYNAMIC DIRICHLET PARAMETER ADAPTATION |
| 5616 | FM-Fusion: A Flow Matching Approach for Multi-Modal Image Fusion |
| 3660 | FMSP-IR: Frequency Modulation and Structure Priors for All-in-One Image Restoration |
| 5436 | FMTFUSE: EDGE FOURIER-ENHANCED MULTI-SCALE TRANSFORMER FOR MULTI-MODAL IMAGE FUSION |
| 16217 | FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model |
| 15483 | FOCA: MULTIMODAL MALWARE CLASSIFICATION VIA HYPERBOLIC CROSS-ATTENTION |
| 5013 | FOCALCODEC-STREAM: STREAMING LOW-BITRATE SPEECH CODING VIA CAUSAL DISTILLATION |
| 13109 | FOCALLINK: DENSELY MODULATED CONTRASTIVE LEARNING FOR TRAJECTORY ASSOCIATION IN MULTI-OBJECT TRACKING |
| 7735 | FOCUS BEFORE REASONING: A BIDIRECTIONAL SELECTION FRAMEWORK FOR NOISE-MITIGATION IN KNOWLEDGE-BASED VISION QUESTION ANSWERING |
| 10999 | FOCUSFUZZ: TOWARDS EFFICIENT AND DIRECTED FUZZING FOR RTL PROCESSOR DESIGNS |
| 7377 | FODGE: HIGH-FIDELITY DANCE GENERATION VIA FULL-BODY OPTIMIZATION |
| 17851 | FOLEYBENCH: A BENCHMARK FOR VIDEO-TO-AUDIO MODELS |
| 18289 | FOLLOWING THE TRACE: A STRUCTURED PATH TO EMPATHETIC RESPONSE GENERATION WITH MULTI-AGENT MODELS |
| 12551 | FONTMIMICKER: ENHANCING STYLIZED FONT GENERATION VIA FREQUENCY-AWARE DIFFUSION AND DEFORMABLE ALIGNMENT |
| 17249 | FoodCLIP: Advancing Food Analysis via Large-scale Pre-training |
| 1982 | Foreground-Enhanced Coarse-to-Fine Detection for UAV Small Objects |
| 1337 | FORGERYFUSION: DIFFUSION-DRIVEN REAL FACE MODELING FOR GENERALIZABLE FACE FORGERY DETECTION |
| 13297 | FORGETMARK: STEALTHY FINGERPRINT EMBEDDING VIA TARGETED UNLEARNING IN LANGUAGE MODELS |
| 14070 | ForkNet: Direction-Aware and Wavelet-Guided Dual-Encoder Network for Image Fusion |
| 15085 | FORSE: A RETRIEVAL-AUGMENTED FRAMEWORK FOR TIME SERIES FORECASTING |
| 6672 | FORWARD CONVOLUTIVE PREDICTION FOR FRAME ONLINE MONAURAL SPEECH DEREVERBERATION BASED ON KRONECKER PRODUCT DECOMPOSITION |
| 5408 | FORWARD-BACKWARD PRIORS: INTEGRATING PLUG-AND-PLAY AND REGULARIZATION BY DENOISING VIA MONOTONE OPERATOR THEORY |
| 5053 | FOSK: FAST OPEN-VOCABULARY 3D INSTANCE SEGMENTATION VIA CONSENSUS-FILTERED KNOWLEDGE DISTILLATION |
| 9627 | Fostering Accuracy and Generalization Ability in Gaze Estimation by Gaze-Relevant Feature Normalization |
| 1232 | FOUNDATION MODELS-GUIDED MULTI-LEVEL MOTION DECOUPLING VIA GAUSSIAN SPLATTING FOR MONOCULAR VIDEO RECONSTRUCTION |
| 18412 | Fourier Pruning for Large Language Models Compression |
| 16620 | FOURIER REGULARIZATION IN UNROLLED ALGORITHM FOR UNIVERSAL DEMOSAICKING |
| 17858 | FP-ANet: A fixed-point Attention Network for Hybrid-field THz Ultra-Massive MIMO Channel Estimation |
| 10989 | FPGA IMPLEMENTATION OF ACCURATE AND LOW-COST KEYWORD SPOTTING |
| 4265 | FPI-DET: A FACE–PHONE INTERACTION DATASET FOR PHONE-USE DETECTION AND UNDERSTANDING |
| 6638 | Fractal Generative Distillation |
| 18137 | FragLDM: Fragment-Guided Latent Diffusion Model for 3D Molecular Generation |
| 15007 | FRAME-STACKED LOCAL TRANSFORMERS FOR EFFICIENT MULTI-CODEBOOK SPEECH GENERATION |
| 16775 | FREDNET: A FREQUENCY AND DECOMPOSED-SPATIAL NETWORK FOR INDUSTRIAL DEFECT SIGNAL DETECTION |
| 5329 | Free2Frame: A Training-Free Framework for Video Understanding with Memory Boosting |
| 4031 | FREEANIMATE: TRAINING-FREE HUMAN IMAGE ANIMATION WITH PREVIEW-GUIDED DENOISING |
| 18299 | FREQ-DP NET: A DUAL-BRANCH NETWORK FOR FENCE REMOVAL USING DUAL-PIXEL AND FOURIER PRIORS |
| 15544 | FREQKAN: FREQUENCY-DOMAIN KOLMOGOROV-ARNOLD NETWORK FOR ADAPTIVE MODULATION RECOGNITION |
| 13103 | FreqMTA: Multi-Token Attention for Stable Frequency-Domain Long-Term Time Series Forecasting |
| 3852 | FREQUENCY-AWARE CONTRASTIVE LEARNING AND SPECTRAL DISENTANGLEMENT FOR UNSUPERVISED IMAGE DERAINING |
| 7259 | Frequency-Aware Dynamic Graph Learning via Pseudo-Spectral Decomposition for Metro Flow Forecasting |
| 6957 | Frequency-Aware Mamba: Exploiting Frequency-Domain Priors to Alleviate Class Imbalance in Medical Image Segmentation |
| 5142 | FREQUENCY-AWARE Y-SHAPE DECLOUDFORMER FOR SAR-ASSISTED CLOUD REMOVAL |
| 3462 | Frequency-Decoupled Learning for Joint Thin-Cloud Removal and Pansharpening |
| 18958 | FREQUENCY-DIRECTION AWARE MULTICHANNEL SELECTIVE FIXED-FILTER ACTIVE NOISE CONTROL BASED ON MULTI-TASK LEARNING |
| 2790 | Frequency-Domain Driven Recurrent Attention with Linear Complexity for Time Series Forecasting |
| 15794 | FREQUENCY-ENHANCED AND CONFLICT-ADAPTIVE ODE FRAMEWORK FOR TRAINING-FREE CONSISTENT VIDEO EDITING |
| 8188 | FREQUENCY-GUIDED MULTI-LEVEL REASONING FOR SCENE GRAPH GENERATION IN VIDEO |
| 14036 | FREQUENCY-INDEPENDENT AMBISONICS UPSCALING USING DEEP LEARNING |
| 1611 | Frequency-Modulated Differential Transformer for Semantic Segmentation of Remote Sensing Images |
| 16973 | From Base to Novel: Semantic-Guided Visual Concept Transfer in Few-Shot Image Classification |
| 11287 | FROM COLD-START TO STABILIZATION: A DUAL-PROTOTYPE FRAMEWORK FOR ONLINE ANY-SHOT CONTINUAL LEARNING |
| 13732 | FROM CONTRAST TO COMMONALITY: AUDIO COMMONALITY CAPTIONING FOR ENHANCED AUDIO-TEXT CROSS-MODAL UNDERSTANDING IN MULTIMODAL LLMS |
| 16090 | From Decomposition to Fusion: Anomaly Detection with Temporal Correlation-Data Dependency Discrepancy Analysis |
| 8718 | FROM DESIGN TO INDUCTION: A NEW PARADIGM FOR RESPONDENT-CENTRIC PSYCHOLOGICAL SCALE GENERATION |
| 3428 | From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks |
| 10283 | FROM DISTORTION TO EXPRESSION: PARALLEL MULTI-HOP GRAPH SIGNAL PROCESSING UNDER HETEROPHILY |
| 6435 | FROM ECG SIGNALS TO DIAGNOSTIC REPORTS: A UNIFIED FRAMEWORK WITH MULTI-MODAL ENCODER AND FINE-TUNED LLM FOR AUTOMATED REPORT GENERATION |
| 7676 | FROM FIXED POSITIONS TO FREE-FORM SIGNALS: VIRTUAL MICROPHONE SIGNAL ESTIMATION FOR GENERAL-PURPOSE SPATIAL AUDIO PROCESSING |
| 16391 | FROM HALLUCINATION TO ARTICULATION: LANGUAGE MODEL-DRIVEN LOSSES FOR ULTRA LOW-BITRATE NEURAL SPEECH CODING |
| 12187 | From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition |
| 12093 | FROM HYPE TO INSIGHT: RETHINKING LARGE LANGUAGE MODEL INTEGRATION IN VISUAL SPEECH RECOGNITION |
| 13753 | FROM INDEPENDENCE TO INTERACTION: SPEAKER-AWARE SIMULATION OF MULTI-SPEAKER CONVERSATIONAL TIMING |
| 4615 | From Intent to Invocation: A Reasoning-First Framework for Natural Language to Penetration Testing Commands |
| 15882 | FROM KNOWING TO DOING PRECISELY: A GENERAL SELF-CORRECTION AND TERMINATION FRAMEWORK FOR VLA MODELS |
| 17821 | From Lightweight Client Models to a Foundation Model in One Shot with Generative Distillation for Medical Image Segmentation |
| 4713 | From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs |
| 12986 | FROM PAST TO FUTURE: LEVERAGING EVENT CAUSALITY FOR EXPLAINABLE PREDICTION WITH LARGE LANGUAGE MODELS |
| 11268 | FROM PER-TIMESTEP DECIDERS TO HOLISTIC STRATEGY GENERATORS: EVOLVING STRATEGIC COMPLEXITY IN LLMS |
| 12673 | FROM PHASED ARRAYS TO MIMO: RIS-ENABLED WAVEFORM DIVERSITY IN RADAR |
| 6804 | FROM POWERSGD TO POWERSGD+: LOW-RANK GRADIENT COMPRESSION FOR DISTRIBUTED OPTIMIZATION WITH CONVERGENCE GUARANTEES |
| 17938 | FROM PRETRAINING TO ROBUSTNESS: BENCHMARKING SSL MODELS FOR NOISE-ROBUST SPEECH EMOTION RECOGNITION |
| 13556 | FROM SEMANTIC SHIFTS TO CAUSAL CUES: COUNTERFACTUAL LEARNING FOR HATEFUL MEME DETECTION |
| 15565 | From Silent Flows to Speaking Guardians: LLM-Enhanced Framework for IoT Anomaly Detection |
| 2718 | From Synthetic to Wild: Dual Alignment for Unsupervised Domain Adaptation RGBT Crowd Counting |
| 9204 | FROM TOKEN TO LINE: ENHANCING CODE GENERATION WITH A LONG-TERM PERSPECTIVE |
| 15674 | From WMMSE to XMMSE Algorithm: An Old Tune in a Fast New Key |
| 6068 | FRONTEND TOKEN ENHANCEMENT FOR TOKEN-BASED SPEECH RECOGNITION |
| 3857 | F-SEGMAN: FEW-SHOT DOMAIN ADAPTATION CRACK SEGMENTATION |
| 16385 | FS-LoRA: Fast and Slow Low-Rank Adaptation for Class Incremental Learning |
| 2035 | FTIN: FREQUENCY-TIME INTEGRATION NETWORK FOR INERTIAL ODOMETRY |
| 11182 | FULL BAND DENOISING OF ROOM IMPULSE RESPONSE IN THE WAVELET DOMAIN WITH DICTIONARY LEARNING |
| 10707 | FULL-DUPLEX-BENCH V1.5: EVALUATING OVERLAP HANDLING FOR FULL-DUPLEX SPEECH MODELS |
| 5682 | FULL-TO-MISSING MODALITY KNOWLEDGE DISTILLATION FOR MULITMODAL 3D SEMANTIC SEGMENTATION |
| 16821 | FUN-SSL: FULL-BAND LAYER FOLLOWED BY U-NET WITH NARROW-BAND LAYERS FOR MULTIPLE MOVING SOUND SOURCE LOCALIZATION |
| 2681 | FUSEMOS: PERCEPTUAL EVALUATION OF TEXT-TO-MUSIC GENERATION WITH DUAL-ENCODER FUSION AND RANKING-AWARE COMPOSITE LOSS |
| 9716 | Fusing Image and Saliency Modalities for Robust Label Restoration with Transformers |
| 10202 | FUSION OF TRANSFORMER AND CNN ATTENTION NETWORKS FOR LEARNED IMAGE COMPRESSION |
| 7535 | FUSIONEDIT: SEMANTIC FUSION AND ATTENTION MODULATION FOR TRAINING-FREE IMAGE EDITING |
| 15527 | FUZZY MEMBERSHIP-ENHANCED UNCERTAINTY-AWARE FUSION FOR MULTI-VIEW CLASSIFICATION |
| 10802 | FWF-NET: A LEARNABLE FOURIER-WAVELET FUSION NETWORK FOR PDE OPERATOR LEARNING |
| 15150 | FW-VTON: FLATTENING-AND-WARPING FOR PERSON-TO-PERSON VIRTUAL TRY-ON |
| 2903 | FXSEARCHER:GRADIENT-FREETEXT-DRIVENAUDIOTRANSFORMATION |
| 4110 | G2LST: Global to Local Stackelberg Decision Model for computation offloading under Mobile Internet of Things |
| 5400 | G2P-Rec: Graph-to-Prompt Synergistic Reasoning for Knowledge-Enhanced Recommendation |
| 13841 | G4CDR: A 4D GeoSOT Grid-Graph for Real-Time UAV Conflict Detection and Resolution |
| 12633 | G-AFS: GRAPH-GUIDED ADAPTIVE KEYFRAME SAMPLING FOR VIDEO SUMMARIZATION |
| 10308 | GALA: DUAL ALIGNMENTS FOR UNSUPERVISED DOMAIN ADAPTATION WITH LIMITED SOURCE LABELS |
| 18088 | GalaxyEdit: Large Scale Image Editing Dataset with Enhanced Diffusion Adapter |
| 14104 | GAME-THEORETIC INSIGHTS INTO MULTI-AGENT LLM DEBATE FOR ENHANCED CLINICAL QUESTION ANSWERING |
| 9827 | GAME-TIME: EVALUATING TEMPORAL DYNAMICS IN SPOKEN LANGUAGE MODELS |
| 17323 | GAMMA: GENERALIZABLE AI-GENERATED IMAGE DETECTION VIA MULTI-TASK AND MANIPULATION-AUGMENTED SUPERVISION |
| 13761 | GaRA: Gated Low-rank Adaptation for Fine-tuning Time-series Foundation Models |
| 9482 | GAUSSIAN CLOUD MODEL BAYESIAN NEURAL NETWORKS: A VARIATIONAL INFERENCE FRAMEWORK FOR RELIABLE PREDICTION |
| 16645 | GAUSSIAN LOCALITY PRIOR FOR CONTRAST–RECONSTRUCTION LEARNING:STATE–SPACE MODEL-BASED TIME–SERIES ANOMALY DETECTION |
| 17921 | GAUSSIAN MESH RENDERER FOR LIGHTWEIGHT DIFFERENTIABLE RENDERING |
| 11362 | Gaussian Process State-Space Models for Irregularly Sampled Sequential Data |
| 16572 | Gaussian Processes for Sensor Repositioning in PDE-Driven Systems |
| 11307 | GAUSSIAN SPATIAL INTERACTION WITH LONG-RANGE CONTEXT FUSION FOR RADAR-CAMERA 3D OBJECT DETECTION |
| 1231 | GAUSSIAN SPLATTING WITH HYBRID DEFORMATION AND MULTI-SCALE DEPTH REGULARIZATION FOR DYNAMIC SINGLE-VIEW VIDEO RECONSTRUCTION |
| 1356 | GAUSSIAN2SCENE: 3D SCENE REPRESENTATION LEARNING VIA SELF-SUPERVISED LEARNING WITH 3D GAUSSIAN SPLATTING |
| 13404 | Gaussian-grounded Contextual Hierarchical Inference for Weakly Supervised Video Anomaly Detection |
| 7381 | GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer |
| 17420 | GCE-UQ: QUANTIFYING AND DECOMPOSING UNCERTAINTY IN GRAPH COUNTERFACTUAL EXPLANATIONS |
| 3495 | GCFNET: GLOBAL FEATURE ENHANCEMENT AND CLUSTERING-BASED RECONSTRUCTION FOR INDUSTRIAL IMAGE CAPTIONING |
| 16769 | GDCNET: GENERATIVE DISCREPANCY COMPARISON NETWORK FOR MULTIMODAL SARCASM DETECTION |
| 13310 | GD-COLLAB: GENERATOR-DISCRIMINATOR MULTI-AGENT COLLABORATION FOR AUTOMATED GRAPH ANOMALY DETECTION |
| 3235 | GDIFFUSE: DIFFUSION-BASED SPEECH ENHANCEMENT WITH NOISE MODEL GUIDANCE |
| 2463 | GEIA: GENERATIVE ENHANCEMENT INVERSION ATTACK TARGETING MACHINE UNLEARNING |
| 9157 | GELINA: UNIFIED SPEECH AND GESTURE SYNTHESIS VIA INTERLEAVED TOKEN PREDICTION |
| 5446 | GEN3D: GENERATING DOMAIN-FREE 3D SCENES FROM A SINGLE IMAGE |
| 18117 | GENCHO: ROOM IMPULSE RESPONSE GENERATION FROM REVERBERANT SPEECH AND TEXT VIA DIFFUSION TRANSFORMERS |
| 11345 | GENDEN-GS: GENERATIVE-PRIOR-DRIVEN DENSIFICATION FOR SPARSE-VIEW 3D GAUSSIAN SPLATTING |
| 16306 | GENERALIZABILITY OF PREDICTIVE AND GENERATIVE SPEECH ENHANCEMENT MODELS TO PATHOLOGICAL SPEAKERS |
| 1050 | GENERALIZABLE DETECTION OF AUDIO DEEPFAKES |
| 3584 | GENERALIZABLE SPECULAR SCENE RECONSTRUCTION VIA ANISOTROPIC FILTERING AND ASG-ENHANCED GAUSSIAN SPLATTING |
| 12240 | GENERALIZABLE SPEECH DEEPFAKE DETECTION VIA INFORMATION BOTTLENECK ENHANCED ADVERSARIAL ALIGNMENT |
| 12375 | GENERALIZABLE SPEECH DEEPFAKE DETECTION VIA META-LEARNED LORA |
| 15249 | Generalization In One-Step Contextual Bandit Based DoA Estimation With Passive Backscatter Tags |
| 1573 | GENERALIZED MULTIDIMENSIONAL CHINESE REMAINDER THEOREM (MD-CRT) FOR MULTIPLE INTEGER VECTORS |
| 15313 | GENERATING LOCALIZED AUDIBLE ZONES USING A SINGLE-CHANNEL PARAMETRIC LOUDSPEAKER |
| 1582 | Generating Moving 3D Soundscapes with Latent Diffusion Models |
| 15476 | GENERATING TRAINING TARGETS FOR REAL-WORLD SPEECH ENHANCEMENT VIA CLOSE-TO-DISTANT MICROPHONE PROJECTION |
| 14533 | GENERATIVE AUDIO EXTENSION AND MORPHING |
| 5194 | GENERATIVE MODEL-BASED COMPRESSED SENSING FOR MMWAVE CHANNEL ESTIMATION THROUGH SEQUENTIAL PATH RECONSTRUCTION |
| 8883 | GENERATIVE MULTI-MODAL EXPLAINABLE RECOMMENDATION |
| 17810 | Generative Spatiotemporal Modeling for Uncertainty Quantification in High-Dimensional Physical Systems |
| 2450 | GENFACTS-GENERATIVE COUNTERFACTUAL EXPLANATIONS FOR MULTI-VARIATE TIME SERIES |
| 6258 | GenFRC: Generative Feature Replay and Calibration for Non-Exemplar Class-Incremental Learning |
| 16419 | GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference |
| 8212 | GEN-SER: WHEN THE GENERATIVE MODEL MEETS SPEECH EMOTION RECOGNITION |
| 1180 | GEODESIC PROTOTYPE MATCHING VIA DIFFUSION MAPS FOR INTERPRETABLE FINE-GRAINED RECOGNITION |
| 13864 | Geo-Human: Geometrically-Guided 3D Gaussian Splatting for High-Fidelity Human Reconstruction under Sparse Views |
| 5556 | GEOMETRIC CONSTRAINT-ENHANCED DATA ASSOCIATION FOR MULTI-TARGET LOCALIZATION IN DISTRIBUTED MIMO RADAR SYSTEMS |
| 5986 | GEOMETRIC IMAGE SYNCHRONIZATION WITH DEEP WATERMARKING |
| 13498 | GEOMETRY-AWARE RECONSTRUCTION OF LARGE VISION-LANGUAGE MODELS FROM DENSE INTO MIXTURE-OF-EXPERTS |
| 14142 | GHIN: GATED HIERARCHICAL INTERACTION NETWORK FOR MULTIMODAL SARCASM DETECTION |
| 4628 | GIFT: A Generative Imagined Fine-Tuning Framework for Visual Place Recognition |
| 7901 | GIREG: GEOMETRIC-IMAGE COLLABORATIVE POINT CLOUD REGISTRATION |
| 14246 | GLA-GRAD++: AN IMPROVED GRIFFIN-LIM GUIDED DIFFUSION MODEL FOR SPEECH SYNTHESIS |
| 6259 | GLAP: General contrastive audio-text pretraining across domains and languages |
| 10637 | GLASS-SAM: TRANSPARENT OBJECT SEGMENTATION USING FRACTAL-ENHANCED SAM WITH SHAPE CONTEXT-BASED REWARD |
| 2622 | GLDPC-Net: Global-Local Dual-Scale Fusion and Geometry-aware Synchronization for Denoising Point Cloud Completion |
| 5744 | Global Context-Aware Multi-Instance Learning for Whole Slide Image Classification |
| 14370 | GLORIA: GATED LOW-RANK INTERPRETABLE ADAPTATION FOR DIALECTAL ASR |
| 11688 | GLUCOAPRL: AHEAD-PLANNING REINFORCEMENT LEARNING MECHANISM FOR SAFE BLOOD GLUCOSE REGULATION |
| 11571 | GLUCOMIXER: AN EFFICIENT GLUCOSE MONITORING MODEL WITH MIXERS |
| 16124 | GLUE: Gradient-free Learning to Unify Experts |
| 14501 | GMAMBAFLOW: GLOBAL-AWARE MAMBA BASED COST VOLUME AGGREGATION FOR OPTICAL FLOW |
| 15601 | GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining |
| 13900 | Goal-Oriented Joint Source–Channel Coding: Distortion–Classification–Power Trade-off |
| 11614 | GOFN: GRADIENT ORTHOGONAL FUSION NETWORK FOR SINGLE-IMAGE TRANSPARENT WATERMARK DETECTION AND REMOVAL |
| 17685 | GO-MLVTON: GARMENT OCCLUSION-AWARE MULTI-LAYER VIRTUAL TRY-ON WITH DIFFUSION MODELS |
| 12290 | GPS-GS: GEOMETRY-AWARE PROGRESSIVE OPTIMIZATION WITH SYNERGISTIC PSEUDO-VIEWS FOR SPARSE-VIEW GAUSSIAN SPLATTING |
| 11677 | GRADERAG: BLACK-BOX SEMANTIC PATH INJECTION ATTACKS ON GRAPH RAG SYSTEMS |
| 2913 | GradFusion: Recognition-Compatible Face Anonymization via Semantic Gradient Editing and Latent Fusion |
| 11842 | GRADIENT BOOSTING FOR ONLINE TWO-STAGE ADAPTIVE GROUP TESTING |
| 17841 | GRADIENT-GUIDED LEARNABLE WINDOW ATTENTION FOR EDGE-ENHANCED SUPER-RESOLUTION |
| 15376 | GRAM-SCHMIDT FEATURE SELECTION FOR CLASS ACTIVATION MAPS |
| 5709 | GRANULAR-BALL BASED MULTI-VIEW OUTLIER DETECTION |
| 4639 | GRAPH DISTRIBUTION-VALUED SIGNALS: A WASSERSTEIN SPACE PERSPECTIVE |
| 6169 | GRAPH FOURIER TRANSFORMER WITH STRUCTURE-FREQUENCY INFORMATION |
| 13174 | Graph Hodge-Laplacian Particle Filtering for Communication-Efficient Distributed Tracking |
| 19147 | GRAPH LAPLACIAN LEARNING WITH EXPONENTIAL FAMILY NOISE |
| 18899 | Graph Neural Network-Based GrUNet and Attention Transformer Adjacency Matrix for Video Denoising |
| 4971 | GRAPH NEURAL NETWORK-BASED REINFORCEMENT LEARNING FOR COOPERATIVE NETWORK LOCALIZATION |
| 15387 | GRAPH NEURAL NETWORKS IN LARGE SCALE WIRELESS COMMUNICATION NETWORKS: SCALABILITY ACROSS RANDOM GEOMETRIC GRAPHS |
| 6316 | GRAPH NEURAL NETWORKS WITH DIVERSITY-AWARE NEIGHBOR SELECTION AND DYNAMIC MULTI-SCALE FUSION FOR MULTIVARIATE TIME SERIES FORECASTING |
| 10274 | Graph of Thoughts Signal Modeling for Sequential Recommendation |
| 17456 | Graph Signal Generative Diffusion Models |
| 14676 | Graph Topological Rectification with Guaranteed Reduction of Class Ambiguous Regions |
| 17179 | GRAPH TRANSFORMERS FOR AUTOMOTIVE RADAR CLUTTER AND TARGET CLASSIFICATION AT THE EDGE |
| 14297 | Graph-Aware Diffusion for Signal Generation |
| 16889 | GRAPH-AWARE LEARNING RATES FOR DECENTRALIZED OPTIMIZATION |
| 11615 | Graph-based 3D Human Pose Estimation using WiFi Signals |
| 18248 | GRAPH-BASED EMOTION CONSENSUS PERCEPTION LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATION |
| 11658 | Graph-Based Image Selection for High-Quality 3D Gaussian Splatting |
| 2129 | Graph-Based Learning of Spectro-Topographical EEG Representations with Gradient Alignment for Brain-Computer Interfaces |
| 5523 | GRAPH-BASED MODELING OF HETEROGENEOUS DATA FUSION WITH ENTERPRISE ASSOCIATION RELATIONSHIPS: ENHANCING CORPORATE CREDIT RATING |
| 14304 | GRAPHEME: GRAPH NEURAL NETWORKS WITH MULTI-EXPERT FUSION FOR EMOTION-CAUSE PAIR EXTRACTION |
| 15766 | GRAPH-ENHANCED PROTOTYPE ADAPTATION FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION |
| 11661 | GRAPH-GUIDED CONTRASTIVE LEARNING FOR INCOMPLETE MULTI-VIEW CLUSTERING WITH CONSISTENT GLOBAL GRAPH |
| 13049 | GRAPH-MAMBA COLLABORATIVE LEARNING NETWORK FOR CAMOUFLAGED OBJECT DETECTION |
| 5920 | GRAPHMD: A TWO-MODULE DIFFUSION FRAMEWORK FOR SMOOTH AND CONSISTENT MOLECULAR DYNAMICS |
| 5964 | GRAPHPL: LEVERAGING GNN FOR EFFICIENT AND ROBUST MODALITIES IMPUTATION IN PATCHWORK LEARNING |
| 4944 | GRASP: GRoup-shApley feature Selection for Patients |
| 4370 | GratingNet: A Novel 1D-CNN-BiLSTM Architecture with Attention for Optical Grating Parameter Measurement from Diffraction Spectra |
| 16992 | Grey-Box Prompt Tuning with Graph Alignment for Speech-Language Models |
| 9426 | GRIDLESS DOA ESTIMATION FOR LARGE-SCALE WIDEBAND MODELS: A NONCONVEX FACTORED l0 ATOMIC NORM APPROACH |
| 10562 | Gridless Non-coherent DOA Estimation for Uniform Linear Arrays Aided by a Reference Signal with Periodic Phase Variation |
| 2518 | GRIT: Grounding Through Reasoning and Iteraive Thinking in Adverse Weather |
| 5234 | GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis |
| 14331 | Gromov-Wasserstein Graph Coarsening |
| 13406 | GROUP RELATIVE POLICY OPTIMIZATION FOR TEXT-TO-SPEECH WITH LARGE LANGUAGE MODELS |
| 3596 | Group-Sparse Gaussian Process Regression for Inhomogeneous Sound Field Estimation |
| 1791 | GS-3I: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images |
| 4875 | GSDFUSE: CAPTURING COGNITIVE INCONSISTENCIES FROM MULTI-DIMENSIONAL WEAK SIGNALS IN SOCIAL MEDIA STEGANALYSIS |
| 9001 | G-SFADA: GRADIENT-INSPIRED SOURCE-FREE ACTIVE DOMAIN ADAPTATION FOR SEMANTIC SEGMENTATION |
| 3591 | GS-MARK: DEEP ROBUST WATERMARKING FOR GRAPH SIGNALS |
| 1965 | GSPrivacy:Attribute-Preserving Face Anonymous Framework VIA Fully Controllable Gaussian Head Avatar |
| 6061 | GSTA: EFFICIENT TRAINING SCHEME WITH SIESTAED GAUSSIANS FOR MONOCULAR 3D SCENE RECONSTRUCTION |
| 18409 | GSTNET: A GEOSPATIAL-TEMPORAL GRAPH NETWORK FOR GROUP PERSON RE-IDENTIFICATION |
| 15050 | GTCL: Graph-Text Contrastive Learning meets Log Anomaly Detection |
| 7143 | GTFMN: Guided Texture and Feature Modulation Network for Low-Light Image Enhancement and Super-Resolution |
| 12387 | GTLITEPOSE: A LIGHTWEIGHT ARCHITECTURE MODEL INTEGRATING GRAPH CONVOLUTION AND TRANSFORMER |
| 16321 | GTMA: Dynamic Representation Optimization for OOD VLMs |
| 10255 | GUI-ARP: ENHANCING GROUNDING WITH ADAPTIVE REGION PERCEPTION FOR GUI AGENTS |
| 4469 | GUIDED BAYESIAN CONSOLIDATION FOR CLASS-INCREMENTAL CONTINUAL LEARNING THROUGH VARIATIONAL CONSTRAINTS AND NOISE PERTURBATIONS |
| 13913 | GUIDING EFFICIENT LLM INSTRUCTION-TUNING VIA GRADIENT FLOW MATCHING |
| 15812 | GVNP-GS: GEOMETRY-ANCHORED AND VIEW-AWARE NEURAL PROXIES FOR SPARSE-VIEW GAUSSIAN SPLATTING |
| 10075 | H²DFD: SELF-SUPERVISED FAKE NEWS DETECTION VIA A NOVEL HYPERBOLIC HYPERGRAPH DIFFUSION MODEL |
| 10599 | H3GM: HISTORY-GUIDED GLOBAL GEOMETRIC METRIC FOR SINGLE IMAGE TO 3D SCENE GENERATION |
| 16036 | HACG: Contribution-Based Dynamic Grouping with Hierarchical Graph Attention for Multi-Agent Cooperation |
| 13184 | HAD: HYBRID ADVERSARIAL DISTILLATION AGAINST ADVERSARIAL ATTACKS |
| 18168 | Hadamard Tensor Ring for Efficient Low-Rank Fine-Tuning |
| 3358 | HADEN: Hierarchical Attentive Alignment and Dual-Contrastive Enhancement Network for Multimodal Few-Shot Relation Extraction |
| 4260 | HAIR NOISE ANALYSIS AND MITIGATION FOR SMART GLASSES AUDIO CAPTURES |
| 2218 | HALLUCINATION DETECTION VIA INTERNAL STATES AND STRUCTURED REASONING CONSISTENCY IN LARGE LANGUAGE MODELS |
| 18109 | HAM-SAM2: ENHANCING SAM2 FOR VISUAL OBJECT TRACKING WITH ADAPTIVE MOTION MODELING AND HIERARCHICAL MEMORY BANK |
| 5279 | HandFusion: Efficient Cross-modal Fusion Network for RGB-D based 3D Hand Mesh Reconstruction |
| 3387 | Handling Heterogeneous Features: Modeling Continuous-Discrete Feature Interaction for Time Series Anomaly Detection via Conditional Diffusion |
| 15400 | Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection |
| 11503 | HAO-QCB: Towards Robust Quantization-Conditioned Backdoor Attack with Hidden Activation Offset |
| 13519 | Hardware-Efficient Cognitive Radar: Multi-Target Detection with RL-Driven Transmissive RIS |
| 3037 | HARMONET: MUSIC GROUNDING BY SHORT VIDEO VIA HARMONIC RESAMPLE AND DYNAMIC SPARSE ALIGNMENT |
| 14735 | HARMONIC PARAMETER DESIGN IN THE APPROXIMATED ONE-BIT HERMITE LAW |
| 10279 | HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling |
| 10042 | Harmonized Evolutionary Reinforcement Learning |
| 3306 | HARNESSING MASKED GENERATIVE TRANSFORMERS FOR EFFECTIVE KNOWLEDGE DISTILLATION |
| 12443 | Harnessing the Gradient: Enhanced Cross-Prompt Attacks on Large Vision-Language Models |
| 19099 | HARNESSING WAVEFRONT CURVATURE AND SPATIAL CORRELATION IN NONCOHERENT MIMO COMMUNICATIONS |
| 18197 | HASAP: HIERARCHICAL ACOUSTIC-SEMANTIC ANNOTATION PIPELINE FOR SCRIPTED SPEECH DATA |
| 14268 | Hashing-Baseline: Rethinking hashing in the age of pretrained models |
| 8158 | HA-VITNET: DUAL-DOMAIN COLLABORATIVE LEARNING FOR SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES |
| 7046 | HAVT-IVD: HETEROGENEITY-AWARE CROSS-MODAL NETWORK FOR AUDIO-VISUAL SURVEILLANCE: IDLING VEHICLES DETECTION WITH MULTICHANNEL AUDIO AND MULTISCALE VISUAL CUES |
| 4368 | HCGAN: HARMONIC-COUPLED GENERATIVE ADVERSARIAL NETWORK FOR SPEECH SUPER-RESOLUTION IN LOW-BANDWIDTH SCENARIOS |
| 7596 | HCL-CSC: HIERARCHICAL CONTRASTIVE LEARNING WITH IDS-AWARE CHARACTER SIMILARITY FOR CHINESE SPELLING CORRECTION |
| 7590 | HC-MONET: HIERARCHICAL CONTINUOUS MASKED OPERATOR NETWORK WITH A SHARED SPECTRAL MIXTURE TIME KERNEL FOR IRREGULAR TIME SERIES |
| 5415 | HCPT: Hierarchical Cross-modal Prompt Tuning |
| 15278 | HCTPOSE: HYBRID CNN-TRANSFORMER NETWORK FOR SELF-SUPERVISED MULTI-VIEW 3D HUMAN POSE ESTIMATION |
| 17659 | HD-NEXUS: A HIERARCHICAL DECOUPLING FRAMEWORK FOR MULTI-MODAL, MULTI-TASK ASSISTIVE DRIVING PERCEPTION |
| 6188 | HD-PPT: HIERARCHICAL DECODING OF CONTENT- AND PROMPT-PREFERENCE TOKENS FOR INSTRUCTION-BASED TTS |
| 18875 | HDRSL Net for Accurate High Dynamic Range Imaging-Based Structured Light 3D Reconstruction |
| 15725 | HEAD-AWARE VISUAL CROPPING: ENHANCING FINE-GRAINED VQA WITH ATTENTION-GUIDED SUBIMAGE |
| 14303 | Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation |
| 9208 | HEBBIAN LEARNING WITH GLOBAL DIRECTION |
| 9403 | HELA: HYPER-EFFICIENT LIGHTWEIGHT ARCHITECTURE FOR IMAGE FINE-TUNING |
| 15973 | HEMD-SEGNET: A HIERARCHICAL ENCODER-MIXER-DECODER SEGMENTATION NETWORK FOR EXTRACTING LAKES FROM REMOTE SENSING IMAGES |
| 13683 | HERGNET: A FAST NEURAL SURROGATE MODEL FOR SOUND FIELD PREDICTIONS VIA SUPERPOSITION OF PLANE WAVES |
| 4981 | HETEROGENEOUS ADVERSARIAL FEDERATED LEARNING |
| 15881 | Heterogeneous Feature Mutual-Calibration Assisted Online Distillation for Efficient Face Anti-Spoofing |
| 7665 | Heterogeneous Parallel Framework with Spatio-temporal Conditional Random Field for 3D Human Pose Estimation |
| 2944 | HETEROGENEOUS SELF-SUPERVISED ACOUSTIC PRE-TRAINING WITH LOCAL CONSTRAINTS |
| 10918 | HETEROGENEOUS SPATIAL TEMPORAL GRAPH NEURAL NETWORK FOR MULTIVARIATE TIME SERIES FORECASTING |
| 13571 | HEURISTIC SYNTHESIS FROM BELIEF STATES: ROBUST PLANNING UNDER AMBIGUOUS NATURAL LANGUAGE INSTRUCTIONS |
| 1858 | HFDFORMER: MONOCULAR 3D HUMAN RECONSTRUCTION VIA LAYER-WISE HIERARCHICAL FEATURE DECOUPLING TRANSFORMER |
| 13715 | HFGNET: MITIGATING BOUNDARY DISTORTION FOR SONAR IMAGE SEGMENTATION WITH HIGH FREQUENCY GUIDANCE STRATEGY |
| 11543 | HFSQVAE: HIERARCHICAL VECTOR QUANTIZATION WITH RESIDUALS FOR FREQUENCY-SPECIFIC EMBEDDING |
| 12743 | HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training |
| 15665 | HIBAR: A HIDDEN BACKDOOR ATTACK ON LLM RECOMMENDATION SERVICES VIA MULTI-TURN DIALOGUE MANIPULATION |
| 10316 | HIDIFF-ENERGY: A HIERARCHICAL DIFFUSION MODEL FOR MULTI-SCALE LONG-TERM ENERGY DATA GENERATION |
| 12782 | HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO |
| 4528 | Hierarchical Channel Aggregation with Entropy-Driven Distillation for Federated Segmentation |
| 13476 | Hierarchical Contrastive Learning of Point Clouds Based on P-Norm Pooling |
| 8278 | HIERARCHICAL CONTRASTIVE LEARNING WITH SPEECH LANGUAGE MODEL FOR SEPARATING SIMILAR SPEAKERS |
| 10424 | HIERARCHICAL CORRELATION COST VOLUME FOR STEREO MATCHING |
| 14397 | Hierarchical Discrete Flow Matching for Multi-Codebook Codec-based Text-to-Speech |
| 2812 | Hierarchical Graph Convolutional Network with Depression-oriented Priors |
| 17016 | HIERARCHICAL MARL FOR TASK ALLOCATION: DISTRIBUTED SUBTASK SELECTION WITH MUTUAL INFORMATION |
| 5867 | HIERARCHICAL ORTHOGONAL RESIDUAL SPREAD FOR PRECISE MASSIVE EDITING IN LARGE LANGUAGE MODELS |
| 17643 | Hierarchical Patch Collaboration with DINOv3 for Efficient Dichotomous Image Segmentation |
| 3016 | Hierarchical Solver for Reassembling Mixed Puzzles of Eroded Gaps |
| 12082 | Hierarchical Sparse Vector Transmission for Ultra Reliable and Low Latency Communications |
| 10026 | HIERARCHICAL TOKENIZATION OF MULTIMODAL MUSIC DATA FOR GENERATIVE MUSIC RETRIEVAL |
| 17319 | Hierarchical Voting Decoder for Resolving Knowledge Conflicts |
| 10024 | Hierarchy-aware Dynamic Contrastive Learning and Structural Relation Constraints for Hierarchical Text Classification |
| 18159 | HierSG: Hierarchical Semantic Gaussian Representation for 3D Occupancy |
| 16428 | HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset |
| 17208 | Hi-Former: A Hierarchical Transformer Pedestrian-Vehicle Detector |
| 10307 | HIGH QUALITY UNDERWATER IMAGE COMPRESSION WITH ADAPTIVE COLOR CORRECTION |
| 16442 | Higher-Order Feature Attribution: Bridging Statistics, Explainable AI, and Topological Signal Processing |
| 18081 | HIGH-FIDELITY SPEECH ENHANCEMENT VIA DISCRETE AUDIO TOKENS |
| 2899 | HIGH-FREQUENCY DETAIL COMPENSATION AND MULTI-SCALE FEATURE FUSION NET FOR UAV REMOTE SENSING OBJECT DETECTION |
| 5259 | High-Frequency-Aware Omni-Aggregation Transformer for Image Super-Resolution |
| 8199 | HIGH-LOW FREQUENCY NETWORK FOR SPACE-TIME VIDEO SUPER-RESOLUTION |
| 17001 | HIGH-QUALITY TRANSMISSION OF HYPERSPECTRAL IMAGE BASED ON SEMANTIC COMMUNICATION |
| 17386 | High-resolution Contrastive Framework for Generalizable AI-generated Image Detection |
| 18921 | HILBERT TRANSFORM ON GRAPHS: LET THERE BE PHASE |
| 16435 | HILO: HIERARCHICAL FEATURE FUSION VIA LOCAL-GLOBAL ATTENTION FOR MULTIMODAL EMBEDDINGS |
| 13843 | HIMNN:A HIERARCHY-AWARE MULTIMODAL NEURAL NETWORK FOR ELECTROLYTE FORMULATIONS PROPERTY PREDICTION |
| 10381 | HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK |
| 9546 | HINT: HIERARCHICAL INTER-FRAME CORRELATION FOR ONE-SHOT POINT CLOUD SEQUENCE COMPRESSION |
| 15363 | HIPPOCAMPAL-INSPIRED ASSOCIATE MEMORY FRAMEWORK FOR FEW-SHOT CLASSIFICATION |
| 13980 | HI-READER: A HIERARCHICAL COGNITIVE FRAMEWORK FOR MULTI-PAGE DOCUMENT VISUAL QUESTION ANSWERING |
| 16011 | HISEM-RL: HIERARCHICAL SEMANTIC-DRIVEN REINFORCEMENT LEARNING FOR ADAPTIVE VR VIDEO TRANSMISSION |
| 2508 | HISTORICAL INTERACTION RETROSPECTIVE NETWORK FOR TEMPORAL KNOWLEDGE GRAPH REASONING |
| 12371 | HIUFORMER: A HIERARCHICAL U-SHAPED TRANSFORMER WITH FREQUENCY-DIVIDED DUAL-PATH ATTENTION FOR MULTIVARIATE TIME SERIES FORECASTING |
| 11925 | HLF: A HIERARCHICAL LOCALIZATION FRAMEWORK FOR JOINT MOMENT RETRIEVAL AND HIGHLIGHT DETECTION |
| 15465 | HM-AVATAR: TOWARDS REALISTIC LOOSE GARMENT MODELING WITH HIERARCHICAL MLPS |
| 1212 | HMD: Enhancing Vision Transformer Distillation via Mask Reconstruction |
| 16942 | HMVLA: HYPERBOLIC MULTIMODAL FUSION FOR VISION-LANGUAGE-ACTION MODELS |
| 12832 | H-NNPBFDAF: HIERARCHICAL NEURAL NETWORK PARTITIONED BLOCK FREQUENCY DOMAIN ADAPTIVE FILTER WITH NOVEL BLOCK ACTIVATION PROBABILITY |
| 2366 | Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention |
| 9726 | Homomorphic Convolution Reimagined: Eliminating Rotation Bottlenecks for Practical Privacy-Preserving CNN Inference |
| 17330 | Homomorphic-Controlled Augmentation for Time Series Forecasting |
| 5282 | HO-MUCI: Hierarchical Optimization-Driven Path Planning for Multi-UAV Regional Collaborative Inspection |
| 5851 | HORIZON: A UNIFIED FRAMEWORK FOR PHASE-WISE RETRIEVAL-GENERATION OPTIMIZATION IN TASK-ORIENTED DIALOGUE SYSTEM |
| 17639 | HOTGAD: HIGH-ORDER AND TEMPORAL PATTERN RECONSTRUCTION FOR DYNAMIC GRAPH ANOMALY DETECTION |
| 11910 | HOT-P: HIERARCHICAL OPTIMAL TRANSPORT PROTOTYPING FOR SELF-SUPERVISED LEARNING |
| 10760 | HOW CAN QUANTUM DEEP LEARNING IMPROVE LARGE LANGUAGE MODELS? |
| 9786 | HOW DOES CRAMER-RAO BOUND ANALYSIS BENEFIT OPPORTUNISTIC RAIN FIELD RECONSTRUCTION |
| 6635 | How Does Instrumental Music Help SingFake Detection? |
| 9582 | HOW FAR DO SSL SPEECH MODELS LISTEN FOR TONE? TEMPORAL FOCUS OF TONE REPRESENTATION UNDER LOW-RESOURCE TRANSFER |
| 6792 | HOW MANY IRSs ARE REQUIRED TO REALIZE A FULL-RANK MIMO CHANNEL? |
| 14694 | HOW TO LABEL RESYNTHESIZED AUDIO? THE DUAL ROLE OF NEURAL AUDIO CODECS IN AUDIO DEEPFAKE DETECTION |
| 12703 | HPC-NERF: INTEGRATING HIGH-FIDELITY POINT CLOUDS WITH NEURAL RADIANCE FIELDS FOR ENHANCED 3D RECONSTRUCTION |
| 15823 | HPTune: Hierarchical Proactive Tuning for Collision-Free Model Predictive Control |
| 13480 | HREI: HYBRID LONG-SHORT RETRIEVAL AND EFFICIENT INFERENCE FOR KNOWLEDGE BASE QUESTION ANSWERING |
| 7900 | HSI-DM:TRAINING-FREE HIERARCHICAL STYLE INJECTION IN DIFFUSION MODELS FOR NATURAL CONTENT-STYLE FUSION |
| 16997 | HSRI: High-fidelity Shape Representation with Image Guidance |
| 10790 | HSSDCT: Factorized Spatial-Spectral Correlation for Hyperspectral Image Fusion |
| 10814 | HUMAN MESH RECOVERY FROM PARTIAL POINT CLOUD WITHOUT HUMAN ANNOTATIONS |
| 10496 | HUMAN-CENTRIC IMAGE EDITING VIA MOE-UNET DENOISING |
| 16425 | HUNT: DETECTING HALLUCINATIONS VIA MULTI-LAYER DISCRIMINATIVE REPRESENTATIONS IN LARGE LANGUAGE MODELS |
| 4863 | HUNTING THE STREAM: AN EFFICIENT AND LIGHTWEIGHT APPROACH FOR ENCRYPTED HLS LIVE STREAMING TRAFFIC IDENTIFICATION |
| 18100 | HuntingLLM: Risk-Driven Automated Red Teaming with Adaptive Attack Agents |
| 14466 | HVAC-EAR: EAVESDROPPING HUMAN SPEECH USING HVAC SYSTEMS |
| 14615 | HVD: HUMAN VISION-DRIVEN VIDEO REPRESENTATION LEARNING FOR TEXT-VIDEO RETRIEVAL |
| 6057 | HYBRID CHANNEL ESTIMATION WITH QUANTIZED PHASE FEEDBACK FOR OVER-THE-AIR COMPUTATION |
| 11881 | HYBRID PROGRESSIVE FUSION NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS |
| 1139 | HYBRID PRUNING: IN-SITU COMPRESSION OF SELF-SUPERVISED SPEECH MODELS FOR SPEAKER VERIFICATION AND ANTI-SPOOFING |
| 1694 | HYBRID QUANTUM–CLASSICAL GROUP SPARSE RECOVERY |
| 4626 | Hybrid Ranking with Collaborative Signals for LLM-Based Recommendation |
| 17255 | Hybrid Semantic-Complementary Transmission for High-Fidelity Image Reconstruction |
| 18110 | HYBRID ZEROTH-ORDER FINE-TUNING FOR LANGUAGE MODEL WITH CPU MEMORY ASSISTANCE |
| 7628 | HybridMask: Facial-Guided Cross-Modal Fusion for Multimodal Deepfake Detection |
| 17075 | HyFlowSE: Hybrid End-to-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning |
| 2010 | Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification |
| 9473 | HyperCool: Reducing Encoding Cost in Overfitted Codecs with Hypernetworks |
| 12309 | HYPERDEFORM: A CROSS-LEVEL SEMANTIC AND SPATIAL ADAPTIVE MODULE FOR ROBUST SCENE TEXT DETECTION |
| 14139 | HYPERDOA: ROBUST AND EFFICIENT DOA ESTIMATION USING HYPERDIMENSIONAL COMPUTING |
| 18240 | HyperFedFS: Heterogeneous Federated Few-Shot Learning with Hypergraph-driven Collaborative Aggregation |
| 16766 | HYPERGRAPH-BASED ASYMMETRIC EMBEDDING FRAMEWORK FOR ATTRIBUTE-MISSING GRAPH CLUSTERING |
| 14855 | HYPERSPARSE: FINDING COMPETITIVE HIGH-SPARSITY MODELS VIA HYPERNETWORKS |
| 19073 | Hyperspectral Information Extraction With Full Resolution From Arbitrary Photographs |
| 5153 | HYPERSPECTRAL OBJECT TRACKING METHOD BASED ON GENERAL EXPERT ADAPTER |
| 9588 | HYPERSTG: A SPATIAL-TEMPORAL SURVIVAL HYPERGRAPH NETWORK FOR TEMPORAL KNOWLEDGE GRAPH REASONING |
| 8544 | HYPERTEST: LOW RANK TEST-TIME ADAPTATION FOR CROSS-SCENE HYPERSPECTRAL IMAGE CLASSIFICATION |
| 5249 | I²CAR: INTRA- AND INTER-VARIATE CONSISTENCY CONTRASTIVE ADVERSARIAL REPRESENTATION LEARNING FOR MULTIVARIATE TIME SERIES ANOMALY DETECTION |
| 10284 | IADP-SNN: Integer Activation Dropping Spiking Neural Network for Underwater Acoustic Communication Signal Recognition |
| 7953 | IBMCT: Breaking the Cost Barrier in Industrial Internet of Things via High-Fidelity Virtual Sensing |
| 11779 | IBPCODEC : A LOW-BITRATE LIGHTWEIGHT SPEECH CODEC WITH INTER-BAND PREDICTION |
| 6284 | ICNET: INPUT-GUIDED CALIBRATION NETWORK FOR HIGH-FIDELITY POINT CLOUD COMPLETION |
| 12234 | ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation |
| 13074 | ICRE-COT: A RETRIEVAL-REVISED TWO-STAGE RANKING FRAMEWORK FOR LLM-BASED KNOWLEDGE GRAPH COMPLETION |
| 17083 | ICSMFT5: MULTI-FEATURE FUSION LARGE MODEL APPROACH FOR INDUSTRIAL CONTROL PROTOCOL REVERSE ENGINEERING |
| 12973 | I-DCCRN-VAE: AN IMPROVED DEEP REPRESENTATION LEARNING FRAMEWORK FOR COMPLEX VAE-BASED SINGLE-CHANNEL SPEECH ENHANCEMENT |
| 17212 | IDEAvatar: Identity-Preserving Avatar Generation With Controllable Emotions |
| 16302 | IDENTIFIABILITY OF ROTATING STELLAR SURFACES FROM ASTROMETRIC JITTER |
| 12618 | Identifying birdsong syllables without labelled data |
| 15206 | Identifying common backbones of interactions underlying food webs via non-deterministic alignments |
| 15780 | Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations |
| 16493 | IDENTITY LEAKAGE THROUGH ACCENT CUES IN VOICE ANONYMISATION |
| 4055 | IdentityGuard: Context-Aware Restriction and Provenance for Personalized Synthesis |
| 17042 | IESGN-OCC: An Instance-Enhanced Sparse Guidance Network for Vision-based Occupancy Prediction |
| 9995 | IEUOD: IMPROVING UNDERWATER OBJECT DETECTION VIA SHALLOW FEATURE GUIDANCE FROM UNDERWATER IMAGE ENHANCEMENT MODELS |
| 15577 | IGCNet: Dual-Branch Implicit Feature and Global Context Network for Agricultural Parcel Delineation |
| 11128 | IG-CODIFF:CONTRASTIVE DIFFUSION MODELS WITH CROSS-INSTANCE GRAPH CONSTRUCTION FOR TABULAR DATA SYNTHESIS |
| 2673 | IG-DETR: INSTANCE-GUIDED DYNAMIC QUERIES FOR SMALL OBJECT DETECTION |
| 1414 | IGRS-YOLO: Illumination-Guided Iterative Residual Decoupling Reflection Enhancement for Low-Light Small Object Detection |
| 11591 | IGSA: An Information-Guided Synchronized Attack Framework for High-Transferability Multimodal Attack |
| 12432 | I-LORA: AN ADAPTIVE RANK ALLOCATION APPROACH USING INTEGRATED GRADIENTS |
| 14640 | ILSA: Information Loss-guided Sparsity Allocation for Pruning Large Language Models |
| 7921 | Image Ordinal Regression Based on Hierarchy Coherent Transformation with Normalized Binary Classifiers |
| 18146 | IMAGE-PIXEL REALIGNMENT FOR OPEN-VOCABULARY SEMANTIC SEGMENTATIONVIA SELF-TRAINING |
| 9167 | iMathBench: Is Your Multi-modal Large Language Model Ready to Solve Mathematical Problems Embedded in Images? |
| 18204 | I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search |
| 11335 | IMITATOR: A HIGHLY TRANSFERABLE ADVERSARIAL PROPERTY-DRIVEN STRATEGYFOR TARGETED ATTACKS |
| 11891 | IMPACT OF PCR CYCLES AND MUTATION RATE ON LINEAR DNA BARCODE DETECTION |
| 10422 | Impact of Phonetics on Speaker Identity in Adversarial Voice Attack |
| 12255 | IMPACT OF QUANTIZATION IN NEAR-FIELD CHANNEL MODELING |
| 10127 | IMPERCEPTIBLE ADVERSARIAL EXAMPLE GENERATION CONTROLLED BY HIGH-FREQUENCY SIGNAL |
| 15631 | Implicit Degradation Representation and Adaptive Dictionary Learning for Underwater Image Compression |
| 11184 | IMPORTANCE OF BALANCE: LIGHTWEIGHT TRANSFORMER VIA SIGNED GRAPH ALGORITHM UNROLLING FOR EEG SIGNAL DENOISING |
| 6128 | Improve MLLM Benchmark Efficiency through Interview |
| 16652 | IMPROVED CONVEX RELAXATION FOR 4-PAM SIGNAL RECOVERY |
| 15960 | Improving Active Learning for Melody Estimation by Disentangling Uncertainties |
| 2816 | Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training |
| 4211 | IMPROVING AUDIO EVENT RECOGNITION WITH CONSISTENCY REGULARIZATION |
| 5542 | IMPROVING AUDIO QUESTION ANSWERING WITH VARIATIONAL INFERENCE |
| 6722 | IMPROVING AUTOMATIC SPEECH RECOGNITION BY MITIGATING DISTORTIONS INTRODUCED BY SPEECH ENHANCEMENT UNDER DRONE NOISE |
| 2339 | IMPROVING BINAURAL DISTANCE ESTIMATION IN REVERBERANT ROOMS THROUGH CONTRASTIVE AND MULTI-TASK LEARNING |
| 11459 | IMPROVING CONTEXTUAL ASR VIA MULTI-GRAINED FUSION WITH LARGE LANGUAGE MODELS |
| 4495 | IMPROVING CONTINUOUS SIGN LANGUAGE RECOGNITION VIA LIGHTWEIGHT ADAPTIVE TEMPORAL MIXING |
| 14204 | IMPROVING CROSS-DOMAIN GENERALIZATION OF LIGHTWEIGHT TRANSFORMERS ON NAMED ENTITY RECOGNITION USING SELF-TRAINING |
| 13822 | IMPROVING DIFFUSION INVERSE PROBLEM SOLVING WITH STRUCTURE CONSISTENCY REGULARIZATION |
| 1595 | IMPROVING FEW-STEP GENERATION OF RECTIFIED FLOW MODELS WITH CONSISTENT GRADIENTS |
| 6186 | IMPROVING INTERPRETABILITY IN GENERATIVE MULTITIMBRAL DDSP FRAMEWORKS VIA SEMANTICALLY-DISENTANGLED MUSICAL ATTRIBUTES |
| 14774 | Improving Maximum Margin Backdoor Detection by Class Subspace Decorrelation |
| 18945 | IMPROVING NUMERICAL STABILITY OF NORMALIZED MUTUAL INFORMATION ESTIMATOR ON HIGH DIMENSIONS |
| 13563 | IMPROVING QUANTIZED GLOSS-FREE SIGN LANGUAGE TRANSLATION MODEL VIA DISENTANGLED ARITHMETIC-PROMPTING |
| 11992 | Improving Representation Learning for Long-tailed Visual Recognition |
| 15747 | Improving Sign Language Translation via Gloss Guided Temporal and Representation Alignment |
| 11731 | IMPROVING TEXT-INSTANCE ALIGNMENT OF FOREGROUND CONDITIONED OUT-PAINTING VIA CUSTOMIZED CONCEPT EMBEDDING |
| 5806 | Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning |
| 1281 | IMPROVING THE SPEAKER ANONYMIZATION EVALUATION’S ROBUSTNESS TO TARGET SPEAKERS WITH ADVERSARIAL LEARNING |
| 16602 | IMPROVING WEAKLY SUPERVISED SCENE GRAPH GENERATION VIA NOISE-AUGMENTED TEXT EMBEDDINGS AND CLASS REWEIGHTING |
| 10893 | IM-RACG: INFORMATION DENSITY-BASED ADAPTIVE MASKING STRATEGY FOR RETRIEVAL-AUGMENTED CODE GENERATION |
| 14600 | INCOMPLETE MULTI-VIEW CLUSTERING VIA RECONSTRUCTING VIEWS AND STRUCTURE UNIFICATION |
| 13166 | Incomplete vocabulary learning for fine-grained visual recognition |
| 6713 | INCONVAD: A TWO-STAGE DUAL-TOWER FRAMEWORK FOR MULTIMODAL EMOTION INCONSISTENCY DETECTION |
| 5963 | INCORPORATING PRIORS IN LEARNING: A RANDOM MATRIX STUDY UNDER A TEACHER STUDENT FRAMEWORK |
| 16697 | INCORPORATING SLIDING WINDOW ATTENTION INTO MAMBA FOR LIGHT FIELD IMAGE SUPER-RESOLUTION |
| 12404 | INCREMENTAL LEARNING FOR AUDIO CLASSIFICATION WITH HEBBIAN DEEP NEURAL NETWORKS |
| 13751 | INCREMENTAL ORIGIN TRACING OF LLM-GENERATED TEXT WITH IDIOSYNCRASY ENHANCEMENT |
| 17453 | INDIVIDUAL RISKY ACTION WARNING OF WEARABLE SENSORS ON PERSONALIZED HEALTH PROFILES VIA LLM |
| 5257 | INDIVIDUALIZE THE HRTF NEURAL FIELD USING ANTHROPOMETRIC PARAMETERS WEIGHTED BY DIRECTION-ATTENTION |
| 15174 | Inference Scaling in Knowledge Graph Construction for Enhanced Graph-RAG |
| 18900 | Infinite Factorial Linear Dynamical Systems for Transient Signal Detection |
| 16840 | INFLUENCE OF CLEAN SPEECH CHARACTERISTICS ON SPEECH ENHANCEMENT PERFORMANCE |
| 13523 | INFLUENCE-AWARE CURATION AND ACTIVE SELECTION FOR INDUSTRIAL AND SURVEILLANCE SOUND EVENTS |
| 6630 | INFORMATION-PRESERVING DOWNSAMPLING AND BIDIRECTIONAL FUSION FOR MULTI-SCALE TIME SERIES FORECASTING |
| 10925 | INFORMATION-SEEKING TRANSMIT BEAMFORMING FOR COGNITIVE ULTRASOUND |
| 3655 | INFUSING ARBITRARY IDENTITIES: GENERATING VISUALLY HIDDEN FACES VIA DIFFUSION MODELS |
| 12198 | INPUT-ADAPTIVE DIFFERENTIABLE FILTERBANKS VIA HYPERNETWORKS FOR ROBUST SPEECH PROCESSING |
| 12273 | INPUT-FAITHFUL SPARSE-VIEW 3D GAUSSIAN SPLATTING WITH DIFFUSION PRIORS |
| 12079 | InsightRec: Enhancing Sequential Recommendation through Reasoning-Aware Preference Optimization |
| 12943 | INSS: INVISIBLE SAMPLE-SPECIFIC BACKDOOR ATTACK VIA INVERTIBLE HIDDEN NEURAL NETWORKS |
| 6750 | INSTANCERSR: REAL-WORLD SUPER-RESOLUTION VIA INSTANCE-AWARE REPRESENTATION ALIGNMENT |
| 17898 | InstantPhoto: Instance-level Mask Generation via Attention-based Anchor Guidance for Realistic Photo Customization |
| 1722 | InstructAudio: Unified speech and music generation with natural language instruction |
| 5150 | INSTRUCTION GUIDED MULTI OBJECT IMAGE EDITING WITH QUANTITY AND LAYOUT CONSISTENCY |
| 14437 | Instrument Generation Through Distributional Flow Matching and Test-Time Search |
| 14705 | IN-SYNC: ADAPTATION OF SPEECH AWARE LARGE LANGUAGE MODELS FOR ASR WITH WORD LEVEL TIMESTAMP PREDICTIONS |
| 10563 | INTACT: INDUCING NOISE TOLERANCE THROUGH ADVERSARIAL CURRICULUM TRAINING FOR LIDAR-BASED SAFETY-CRITICAL PERCEPTION AND AUTONOMY |
| 19078 | Integrated DNN-based Parameter Estimation for Multichannel Speech Enhancement |
| 12858 | Integrating Segment-level Context into Frame Representations for Speaker Diarization |
| 2885 | Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization |
| 17879 | INTEGRATING STACKED INTELLIGENT METASURFACES AND POWER CONTROL FOR DYNAMIC EDGE INFERENCE VIA OVER-THE-AIR NEURAL NETWORKS |
| 2528 | Intelligent Character Segmentation Method for Ancient Tibetan Based on Character Structure and Attention-BiLSTM |
| 10323 | Interactive Consistency And Mutual Independence In Causality For Semi-Supervised Medical Image Segmentation |
| 17729 | INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS |
| 10920 | INTERMITTENT SEMI-WORKING MASK: A NEW MASKING PARADIGM FOR LLMS |
| 13639 | Interpolation-Aware Bitrate Ladder Optimization for Variable Framerate Video Streaming |
| 16387 | Interpretable Alzheimer's disease Detection via Multi-Scale Fusion of Disentangled Speech Features |
| 5586 | Interpretable CNN-based Enhancement for Nighttime Driving Image Perception |
| 14447 | INTERPRETABLE MODELING OF ARTICULATORY TEMPORAL DYNAMICS FROM REAL-TIME MRI FOR PHONEME RECOGNITION |
| 14181 | INTERPRETABLE MULTIMODAL CLASSIFICATION VIA CAUCHY-SCHWARZ DIVERGENCE-INDUCED GACS-KORNER COMMON INFORMATION |
| 14030 | INTERPRETABLE MUSIC HARMONIC ANALYSIS THROUGH MULTILINEAR MIXTURE OF EXPERTS |
| 2685 | INT-MEANFLOW: FEW-STEP SPEECH GENERATION WITH INTEGRAL VELOCITY DISTILLATION |
| 15848 | Intrinsic Neuronal Adaptation Supports Robust Spatio-Temporal Processing in Spiking Neural Networks |
| 5304 | Intrinsic Semantic Consistency Enhancement for Robust Hierarchical Understanding in VLMs |
| 11522 | INTRINSICGRID: GRID-BASED INTRINSIC DECOMPOSITION FOR FAST 3D SCENE RECONSTRUCTION |
| 1753 | Intrinsic-Preserving Cross-Modal Fusion for Small-Target Recognition in Intelligent Transportation |
| 7911 | Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization |
| 1321 | INVERSE HALFTONING VIA WEIGHTED SOBEL CONDITIONED DIFFUSION MODEL |
| 2045 | Inverse Rendering for High-Genus 3D Surface Meshes from Multi-view Images with Persistent Homology Priors |
| 8090 | INVERSE-HESSIAN REGULARIZATION FOR CONTINUAL LEARNING IN ASR |
| 13246 | Investigating Batch Inference in a Sequential Monte Carlo Framework for Neural Networks |
| 9733 | INVESTIGATING MODALITY CONTRIBUTION IN AUDIO LLMS FOR MUSIC |
| 5617 | INVESTIGATING THE EFFECT OF SENTENCE-LEVEL SYNTACTIC STRUCTURE ON INFORMATION LOSS IN THE HUMAN AUDITORY SYSTEM |
| 15114 | INVISIBLE BACKDOOR ATTACKS ON SELF-SUPERVISED LEARNING VIA MULTI-CHANNEL ADAPTIVE STEGANOGRAPHY |
| 1780 | IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data |
| 17261 | IPACue-TTS: Integrating Prosody and Articulatory Cues in Conditional Flow Matching for Multilingual Zero-Shot TTS |
| 11633 | IPI²: Mitigating Indirect Prompt Injections on Unmanned Aerial Vehicle Agents Using Physical Invariants |
| 9423 | IQ-LUT: INTERPOLATED AND QUANTIZED LUT FOR EFFICIENT IMAGE SUPER-RESOLUTION |
| 16182 | IR-HUNTER: AUTOMATED ANALYSIS OF INTENT REDIRECTION VULNERABILITIES IN ANDROID APPLICATIONS BASED ON HYBRID DYNAMIC AND STATIC APPROACHES |
| 9690 | IRPFUZZ: FUZZING INDUSTRIAL ROBOT PROTOCOL VIA LLM-DRIVEN TRAFFIC SEMANTIC ANALYSIS |
| 15724 | IRREGULAR MULTIVARIATE TIME SERIES MODELING VIA LATENT GRAPH-GUIDED GAUSSIAN PROCESS PRIORS |
| 13530 | IS PHASE REALLY NEEDED FOR WEAKLY-SUPERVISED DEREVERBERATION ? |
| 11858 | IS REPEATER-ASSISTED MASSIVE MIMO COMPATIBLE WITH DYNAMIC TDD? |
| 15151 | ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models |
| 7288 | ISOMETRIC IMMERSION LEARNING WITH RIEMANNIAN GEOMETRY FOR DISTORTION-FREE REPRESENTATION |
| 6763 | ISSE: AN INSTRUCTION-GUIDED SPEECH STYLE EDITING DATASET AND BENCHMARK |
| 11248 | ISTER: LINEAR TRANSFORMER FOR EFFICIENT MULTIVARIATE TIME SERIES FORECASTING |
| 14243 | It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion |
| 14962 | ITD-AWARE BINAURAL SPIKING NETWORKS FOR SOUND SOURCE LOCALIZATION |
| 12634 | ITDS-SQL: ENHANCING TEXT-TO-SQL PARSING BY IN-CONTEXT LEARNING WITH INFERENCE TIME DATA SYNTHESIS |
| 16808 | ITERATIVE AMORTIZED HIERARCHICAL VAE |
| 17724 | ITERATIVE REDUNDANCY-BASED HEAD PRUNING FOR EFFICIENT SELF-SUPERVISED SPEECH RECOGNITION MODELS |
| 18080 | JAILFUZZ: A ZERO-KNOWLEDGE AND STATE FEEDBACK BASED GRAY-BOX FUZZING FRAMEWORK FOR LARGE LANGUAGE MODELS |
| 18877 | Jamming and Impulsive Noise Uncertainty Aided Covert Communication in PLC Networks |
| 17471 | J-MoGen:Joint Differential Learning and Semantic Enhancing for Motion Generating |
| 4134 | JND-GS: JUST NOTICEABLE DIFFERENCE BASED 3D GAUSSIAN SPLATTING COMPRESSION |
| 16919 | JOINT ACTIVE RIS CONFIGURATION AND USER POWER CONTROL FOR LOCALIZATION: A NEUROEVOLUTION-BASED APPROACH |
| 10506 | Joint antenna selection and robust precoding design for muti-target DFRC |
| 6449 | Joint Antenna Selection and Subarray Structure Design via CNN for Hybrid Beamforming ISAC |
| 10156 | JOINT AUTOREGRESSIVE MODELING OF MULTI-TALKER OVERLAPPED SPEECH RECOGNITION AND TRANSLATION |
| 16461 | JOINT CALIBRATION AND DIRECTION-OF-ARRIVAL ESTIMATION FOR SPARSE LINEAR ARRAYS: IDENTIFIABILITY AND ARRAY DESIGN |
| 11735 | JOINT CLOUD AND HAZE REMOVAL BASED ON SPECTRAL HARMONIZER AND ATMOSPHERIC DISENTANGLER FOR REMOTE SENSING IMAGES |
| 4857 | JOINT COMPRESSION AND DIRECTION-OF-ARRIVAL ESTIMATION IN DISTRIBUTED SENSOR NETWORKS |
| 10963 | JOINT DEEP SECONDARY PATH ESTIMATION AND ADAPTIVE CONTROL FOR ACTIVE NOISE CANCELLATION |
| 19081 | Joint Enhancement and Bandwidth Extension for Radar Through-Barrier Speech Acquisition |
| 5910 | JOINT ESTIMATION OF LASER-ULTRASONICS RESONANCES IN THIN METAL PLATES |
| 5161 | JOINT ESTIMATION OF PIANO DYNAMICS AND METRICAL STRUCTURE WITH A MULTI-TASK MULTI-SCALE NETWORK |
| 10243 | Joint Estimation of Primary and Secondary Paths for Personalized Hearable Applications |
| 16548 | JOINT GRAPH-BASED MODALITY ALIGNMENT FOR ROBUSTNESS IN CONVERSATIONAL EMOTION RECOGNITION |
| 6742 | JOINT MODELING OF TYPICALITY AND UNCERTAINTY FOR SOT-BASED FEW-SHOT LLM REASONING |
| 6046 | JOINT MULTICHANNEL ACOUSTIC FEEDBACK CANCELLATION AND SPEAKER EXTRACTION VIA KALMAN FILTER AND DEEP NON-LINEAR SPATIAL FILTER |
| 1007 | JOINT MULTI-DIMENSIONAL FEATURES AND ACADEMIC NETWORK EMBEDDING FOR AUTHOR NAME DISAMBIGUATION |
| 5498 | Joint Optimization of Physical Layer Security in XL- IRS-Assisted ISAC Systems under Hybrid-Field Propagation |
| 12646 | Joint reconstruction and pansharpening for high-resolution hyperspectral single-pixel imaging |
| 1985 | JOINT REPRODUCTION NUMBER AND SPATIAL CONNECTIVITY STRUCTURE ESTIMATION VIA GRAPH SPARSITY-PROMOTING PENALIZED FUNCTIONAL |
| 2692 | Joint single-shot ToA and DoA estimation for VAA-based BLE ranging with phase ambiguity: A deep learning-based approach |
| 15088 | JOINT SUPERPIXEL AND SELF-REPRESENTATION LEARNING FOR SCALABLE HYPERSPECTRAL IMAGE CLUSTERING |
| 18148 | Joint Transmit Beamforming and Reflection Optimization for Beyond Diagonal RIS Aided Multi-Cell MIMO Communication |
| 13686 | Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis |
| 16329 | JPAD:Joint Prediction-Planning with Temporal Consistency for End-to-end Autonomous Driving |
| 10528 | JRAS: JOINTED REBALANCED ADJUSTMENT STRATEGY FOR LONG-TAILED VISUAL RECOGNITIONS |
| 18119 | JUDGE BEFORE ANSWER: CAN MLLM DISCERN THE FALSE PREMISE IN QUESTION? |
| 15572 | JUND-F0: A Novel Deep Learning Framework for Joint Unvoiced/Voiced Detection and F0 Estimation |
| 15753 | K Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function |
| 7708 | Kalman Filter Based Linear Deformable for Retinal Vessel Segmentation |
| 11317 | KAME: TANDEM ARCHITECTURE FOR ENHANCING KNOWLEDGE IN REAL-TIME SPEECH-TO-SPEECH CONVERSATIONAL AI |
| 6091 | KAN we make models simpler for Audio Deepfake Detection with Kolmogorov-Arnold Networks? |
| 5313 | KAN-ENHANCED TRANSFORMER WITH MULTISCALE PARALLEL POOLING FOR CLOUD REMOVAL |
| 17646 | KAN-LLM: Kolmogorov-Arnold Networks-enhanced Large Language Models For Time Series Forecasting |
| 13606 | KBNET: A KNOWLEDGE BRIDGING NETWORK FOR GENERALIZABLE DEEPFAKE DETECTION |
| 7018 | KD-CVG: A KNOWLEDGE-DRIVEN APPROACH FOR CREATIVE VIDEO GENERATION |
| 8389 | KDFNet: Kalman dynamic filtering network for multivariate time series forecasting |
| 11751 | KEEPING MODELS LISTENING: SEGMENT- AND TIME-AWARE ATTENTION RESCALING AT DECODING TIME |
| 9368 | KERNEL REGRESSION OF MULTI-WAY DATA VIA TENSOR TRAINS WITH HADAMARD OVERPARAMETRIZATION: THE DYNAMIC GRAPH FLOW CASE |
| 6055 | KG2QA: KNOWLEDGE GRAPH-ENHANCED RETRIEVAL-AUGMENTED GENERATION FOR COMMUNICATION STANDARDS QUESTION ANSWERING |
| 12050 | KGER: Knowledge Graph Error Detection and Refinement with Reinforcement Learning |
| 17847 | KG-TOOLPLAN: KNOWLEDGE GRAPH-GUIDED REASONING FOR EFFICIENT LLM TOOL SELECTION |
| 12696 | KINEMATIC PRIORS BENEFIT SKELETON-BASED ACTION RECOGNITION |
| 13216 | KINGUARD: HIERARCHICAL KINSHIP-AWARE FINGERPRINTING TO DEFEND AGAINST LARGE LANGUAGE MODEL STEALING |
| 1715 | KLGATE: LEVERAGING LLM EXPLANATIONS VIA KL-GUIDED GATING FOR MULTIMODAL SARCASM DETECTION |
| 10845 | Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels |
| 16987 | KNOWLEDGE DISTILLATION VIA GENERATIVE RECONSTRUCTION PATHWAYS FOR END-TO-END AUTOMATIC SPEECH RECOGNITION |
| 13566 | KNOWLEDGE EDITING WITH DEMONSTRATION SELECTION FOR MULTI-HOP QUESTION ANSWERING |
| 14960 | KNOWLEDGE-AWARE REFINEMENT FOR DETECTING AND ADDRESSING ANOMALIES IN KNOWLEDGE TRACING |
| 11386 | KNOWLEDGE-ENHANCED CONTRASTIVE LEARNING FOR GAIT EMOTION RECOGNITION |
| 8376 | KNOWLEDGE-MISMATCHED SEMANTIC COMMUNICATION FOR WIRELESS IMAGE TRANSMISSION: A UNIFIED INFORMATION BOTTLENECK APPROACH |
| 3996 | KPMG: A GRAPHICAL KOOPMAN-MAMBA APPROACH FOR FINANCIAL MARKETS |
| 9800 | KSDIFF: KEYFRAME-AUGMENTED SPEECH-AWARE DUAL-PATH DIFFUSION FOR FACIAL ANIMATION |
| 5065 | LABEL-CORRECTED WEIGHTED MULTI-SIMILARITY LOSS FOR NOISY CROSS-MODAL RETRIEVAL |
| 5723 | LACEREC: CONTROLLABLE SEQUENTIAL RECOMMENDATION WITH WEAK-SIGNAL ENHANCEMENT |
| 12766 | LAFUFU: LATENT ACOUSTIC FEATURES FOR ULTRA-FAST UTTERANCE RESTORATION |
| 17830 | Lagrangian Deep Learning for Private RIS-aided Localization: An Active Sensing Approach |
| 14722 | Lagrangian-Based Motion-Capture Model for Continuous Weather Forecasting |
| 15963 | LAKALMANTRACKER: ROBUST LEARNING-AIDED KALMAN FILTERING FOR MULTI-OBJECT TRACKING |
| 1891 | LAKAN: LANDMARK-ASSISTED ADAPTIVE KOLMOGOROV-ARNOLD NETWORK FOR FACE FORGERY DETECTION |
| 13659 | LAMB: LLM-BASED AUDIO CAPTIONING WITH MODALITY GAP BRIDGING VIA CAUCHY-SCHWARZ DIVERGENCE |
| 6048 | LAMER-SSL: LAYER-AWARE MIXTURE OF LORA EXPERTS FOR CONTINUAL MULTILINGUAL EXPANSION OF SELF-SUPERVISED MODELS WITHOUT FORGETTING |
| 10890 | LAMIGAUSS: PITCHING RADIATIVE GAUSSIAN FOR SPARSE-VIEW X-RAY LAMINOGRAPHY RECONSTRUCTION |
| 1903 | LAMUT: LIGHTING-AWARE MULTI-MATERIAL APPEARANCE TRANSFER FROM A SINGLE IMAGE |
| 9855 | LANDSCAPE ANALYSIS OF SIMULTANEOUS BLIND DECONVOLUTION AND PHASE RETRIEVAL |
| 16369 | LANGUAGE-INFUSED RETRIEVAL-AUGMENTED CTC WITH ADAPTIVE SOFT-HARD GATING FOR ROBUST CODE-SWITCHING ASR |
| 16759 | LANTERN: LANGUAGE MODEL ASSESSMENT ON NOISY AND TRANSFORMED TASKS FOR UNDERSTANDING ERROR AND ROBUSTNESS NUANCES |
| 3934 | LaPrune: Layout-Aware Pruning for Efficient Multimodal Large Language Models |
| 17282 | Large System Analysis of SURE based Hyper- parameter Optimizing in Sparse Bayesian Learning |
| 12592 | LARGE VISION MODELS CAN SOLVE MENTAL ROTATION PROBLEMS |
| 6669 | LARGE-SCALE EEG MODELS FOR MEDITATION STATE RECOGNITION |
| 9867 | LARGE-SYSTEM FIXED-POINT LAW AND DETERMINISTIC CLOSURE FOR SPARSE BAYESIAN LEARNING |
| 4053 | LATENT DOMAIN PROMPT LEARNING FOR VISION-LANGUAGE MODELS |
| 3341 | Latent DPO for Concept Erasure in Text-to-Video Diffusion Models. |
| 14270 | LATENT SPACE ORTHONORMALIZATION FOR HYPER-FINETUNING OF LANGUAGE MODELS |
| 6874 | LATENT TEMPORAL DISCREPANCY AS MOTION PRIOR: A LOSS-WEIGHTING STRATEGY FOR DYNAMIC FIDELITY IN T2V |
| 14549 | LATENT VARIABLE ESTIMATION VIA KERNEL AND GRAPH FOR GAUSSIAN PROCESS REGRESSION |
| 10483 | LATENTCOLORNET : A LATENT DIFFUSION-BASED FRAMEWORK FOR INFRARED IMAGE COLORIZATION |
| 14554 | LatentGuard: Robust Latent Watermarking for Deepfake Tracing and Forgery Localization |
| 11452 | LATENT-SPACE METRICS FOR COMPLEX-VALUED VAE OUT-OF-DISTRIBUTION DETECTION UNDER RADAR CLUTTER |
| 5298 | LATTICE-GUIDED CONSISTENCY REGULARIZATION OF DUAL-MODE TRANSDUCERS FOR AUTOMATIC SPEECH RECOGNITION |
| 16991 | Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification |
| 4731 | LAYER-WISE CONTRIBUTION EVALUATION FOR INCENTIVIZING PERSONALIZATION IN FEDERATED LEARNING |
| 18099 | Layout Robust Zero Shot Learning for Human Activity Recognition Using Wi-Fi Sensing in Unseen Environments |
| 1192 | LC-Sketch: A Layered-Carry Sketch for IoT Network Measurement |
| 13283 | LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool \\for Pre-trained Model-based Class-Incremental Learning |
| 15873 | LDG-PCGC: LOSSLESS DYNAMICALLY GROUPED POINT CLOUD GEOMETRY COMPRESSION |
| 6753 | LDINet: A Lightweight Dual Domain Interaction Network for Human Pose Estimation |
| 15819 | LEARN TO UNLEARN IN LARGE LANGUAGE MODELS |
| 10606 | LEARNABLE INSTANCE ATTENTION FILTERING FOR ADAPTIVE DETECTOR DISTILLATION |
| 5665 | Learnable Mel-frontend for Robust Underwater Acoustic Target Detection under Non-Target Interference |
| 14350 | Learning Affine-Equivariant Proximal Operators |
| 18173 | Learning Beyond the Gaussian Data: Learning Dynamics of Neural Networks on an Expressive and Cumulant-Controllable Data Model |
| 5825 | LEARNING CLASS SIMILARITIES FOR ENHANCED IMAGE OUT-OF-DISTRIBUTION DETECTION |
| 13719 | LEARNING CLASS-CONDITIONAL TEMPERATURE WITH ENTROPY ALIGNMENT FOR MEDICAL IMAGE CLASSIFICATION |
| 14292 | LEARNING CONSISTENT CAUSAL ABSTRACTION NETWORKS |
| 11611 | LEARNING CONTROLLABLE BLIND DENOISING VIA NOISE LEVEL MAP ESTIMATION AND MISMATCH TRAINING |
| 3661 | LEARNING CROSS-DOMAIN DISCREPANCY FOR IMAGE MANIPULATION LOCALIZATION |
| 3205 | LEARNING DEPTH GUIDANCE FOR CAMOUFLAGED OBJECT DETECTION WITHOUT ANNOTATIONS |
| 13688 | LEARNING DIRECTED ACYCLIC GRAPHS FROM MAX-TIMES STRUCTURAL EQUATION MODELS WITH SPARSE INPUT |
| 9757 | LEARNING DOMAIN-ROBUST BIOACOUSTIC REPRESENTATIONS FOR MOSQUITO SPECIES CLASSIFICATION WITH CONTRASTIVE LEARNING AND DISTRIBUTION ALIGNMENT |
| 6018 | LEARNING DUAL MIXTURE-OF-EXPERTS MODELS FOR UNIFIED IMAGE DERAINING |
| 4151 | LEARNING EXPLICITLY CONDITIONED SPARSIFYING TRANSFORMS |
| 2970 | LEARNING FAIR DOMAIN ADAPTATION WITH VIRTUAL LABEL DISTRIBUTION |
| 14353 | LEARNING FALSE DISCOVERY RATE CONTROL VIA MODEL-BASED NEURAL NETWORKS |
| 12652 | Learning Fill-in Reduction Ordering via Graph Policy Optimization for Sparse Matrices |
| 9560 | LEARNING FROM LABEL PROPORTIONS WITH SHRINKING BAG |
| 10382 | LEARNING FROM MULTIPLE EXPERTS: ALTERNATE ENSEMBLE DISTILLATION HASHING FOR LIGHTWEIGHT CROSS-MODAL RETRIEVAL |
| 16030 | LEARNING FROM NOISY LABELS: A CONFORMAL PREDICTION PERSPECTIVE |
| 17116 | LEARNING GRAPH FROM SMOOTH SIGNALS UNDER PARTIAL OBSERVATION: A ROBUSTNESS ANALYSIS |
| 1926 | LEARNING GRAPHICAL MODELS UNDER LOW-RANK FACTOR ANALYSIS STRUCTURE |
| 17564 | Learning Image-Text Matching with Optimal Partial Transport |
| 6103 | LEARNING LATENT SPACE FOR MULTI-ORDER / RESOLUTION GRAPH-REGULARIZED IMAGE DENOISER |
| 16785 | LEARNING LIGHT FIELD IMPLICIT NEURAL REPRESENTATIONS FOR ARBITRARY-SCALE SPATIAL-ANGULAR SUPER-RESOLUTION |
| 13744 | LEARNING LINEARITY IN AUDIO CONSISTENCY AUTOENCODERS VIA IMPLICIT REGULARIZATION |
| 14643 | LEARNING MIXTURE OF SPATIO-TEMPORAL EXPERTS FOR 3D HUMAN POSE ESTIMATION |
| 1484 | LEARNING MOTION TRENDS IN GAUSSIAN SPLATTING FOR MONOCULAR DYNAMIC RECONSTRUCTION |
| 1643 | LEARNING MULTI-COLOR SPACE IMPLICIT NEURAL REPRESENTATIONS FOR JOINT IMAGE DERAINING AND LOW-LIGHT ENHANCEMENT |
| 14896 | Learning Nonlinear Systems In-Context: From Synthetic Data to Real-World Motor Control |
| 9745 | Learning Non-Local Spatial-Spectral Correlation for Hyperspectral Image Super-Resolution |
| 19070 | Learning Optimal Graph Filters for Clustering of Attributed Graphs |
| 7832 | LEARNING PHYSICS-AWARE REPRESENTATION FOR DYNAMIC FLUID SCENES |
| 12941 | LEARNING PIEZOELECTRIC HYSTERESIS IN IN-EAR MEMS LOUDSPEAKERS FROM ACOUSTIC MEASUREMENTS |
| 16496 | LEARNING PRODUCT GRAPHS FROM TWO-DIMENSIONAL STATIONARY SIGNALS |
| 9864 | Learning Reference-Guided Exposure Correction with Hybrid Illumination Characteristics |
| 11095 | Learning Spatio-Temporal Variability for Cattle Re-Identification |
| 11946 | LEARNING THE STRUCTURE OF CONNECTION GRAPHS |
| 15187 | Learning Time-Varying Turn-Taking Behavior in Group Conversations |
| 7250 | Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR |
| 14800 | LEARNING TO CASCADE: A POMDP APPROACH TO SEQUENTIAL MODEL SELECTION |
| 2058 | LEARNING TO COARSE-TO-FINE REFINEMENT FOR CAMOUFLAGED OBJECT DETECTION |
| 16207 | Learning to Decrypt: A Cipher-guided Dynamic Expert Framework for Document Deblurring |
| 16222 | Learning to Intervene: Optimized Soft Intervention Selection for Causal Discovery |
| 19068 | LEARNING TO QUANTIZE AND PRECODE IN MASSIVE MIMO SYSTEMS FOR ENERGY REDUCTION: A GRAPH NEURAL NETWORK APPROACH |
| 10734 | Learning to Rotate Frames for Hyperbolic Graph Feature Extraction |
| 17631 | LEARNING TO SEE THROUGH DARKNESS: SELF-SUPERVISED EVENT-BASED VIDEO RECONSTRUCTION UNDER LENS FLARE |
| 14777 | Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model |
| 17059 | LEARNING WHAT TO HEAR: BOOSTING SOUND-SOURCE ASSOCIATION FOR ROBUST AUDIOVISUAL INSTANCE SEGMENTATION |
| 13001 | LEARNING-ENHANCED DISTRIBUTIONALLY ROBUST ADAPTIVE BEAMFORMING |
| 11950 | LEGAL∆: ENHANCING LEGAL REASONING IN LLMS VIA REINFORCEMENT LEARNING WITH CHAIN-OF-THOUGHT GUIDED INFORMATION GAIN |
| 10393 | LEND A HAND: SEMI TRAINING-FREE CUED SPEECH RECOGNITION VIA MLLM-DRIVEN HAND MODELING FOR BARRIER-FREE COMMUNICATION |
| 3998 | Length-Aware Rotary Position Embedding for Text-Speech Alignment |
| 14650 | LENSLESSMIC: AUDIO ENCRYPTION AND AUTHENTICATION VIA LENSLESS COMPUTATIONAL IMAGING |
| 10330 | LePER: Label‑Free Edge Polarity Reweighting for Heterophily |
| 10524 | Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants |
| 1433 | LESS: LARGE LANGUAGE MODEL ENHANCED SEMI-SUPERVISED LEARNING FOR SPEECH FOUNDATIONAL MODELS USING IN-THE-WILD DATA |
| 9804 | LET MORE EXPERTS SPEAK: BALANCING EXPLORATION AND EXPLOITATION IN PEFT FOR MIXTURE-OF-EXPERTS MODELS |
| 12113 | LETP: COUPLING ATTENTION LOCALIZATION AND COGNITIVE REASONING FOR EGO-CENTRIC MULTI-TASK DRIVING SCENE PERCEPTION |
| 3490 | LETPAV: LEXICON-ENHANCED TEXT WITH PROGRESSIVE AUDIO-VISUAL FUSION FOR MULTIMODAL SENTIMENT ANALYSIS |
| 5612 | LEVERAGING AUDIO-VISUAL DATA TO REDUCE THE MULTILINGUAL GAP IN SELF-SUPERVISED SPEECH MODELS |
| 19149 | Leveraging Content and Acoustic Representations for Speech Emotion Recognition |
| 15030 | LEVERAGING DIFFUSION U-NET FEATURES FOR PREDOMINANT INSTRUMENT RECOGNITION |
| 16295 | LEVERAGING LABEL PROPORTION PRIOR FOR CLASS-IMBALANCED SEMI-SUPERVISED LEARNING |
| 15806 | LEVERAGING LARGE LANGUAGE MODELS FOR TEXT NORMALIZATION OF NON-STANDARD WORDS IN TEXT-TO-SPEECH SYNTHESIS |
| 17112 | LEVERAGING LARGE MULTIMODAL MODELS FOR AUDIO-VIDEO DEEPFAKE DETECTION: A PILOT STUDY |
| 14001 | LEVERAGING LARGE SPEECH LANGUAGE MODELS AS EVALUATORS FOR EXPRESSIVE SPEECH |
| 16384 | LEVERAGING MULTIPLE SPEECH ENHANCERS FOR NON-INTRUSIVE INTELLIGIBILITY PREDICTION FOR HEARING-IMPAIRED LISTENERS |
| 6067 | LEVERAGING MULTI-SOURCE RETRIEVAL AND EXPERT FILTERING FOR LLM-BASED KNOWLEDGE GRAPH COMPLETION |
| 16574 | LEVERAGING OVERFITTING FOR LOW-COMPLEXITY AND MODALITY-AGNOSTIC JOINT SOURCE-CHANNEL CODING |
| 6137 | LEVERAGING POINT TRANSFORMER FOR 3D HUMAN MESH RECONSTRUCTION WITH INCOMPLETE POINT CLOUD |
| 14365 | LEVERAGING PREDICTION ENTROPY FOR AUTOMATIC PROMPT WEIGHTING IN ZERO-SHOT AUDIO-LANGUAGE CLASSIFICATION |
| 14690 | LEVERAGING SEGMENT-LEVEL SPEECH REPRESENTATIONS FOR LLM-BASED SPEECH RECOGNITION |
| 4574 | LEVERAGING SEMANTIC-AWARE COLLABORATION BETWEEN PLM AND LLM IN DATA AUGMENTATION FOR ENTITY-RELATIONSHIP EXTRACTION |
| 17256 | LEVERAGING SPEAKER AND LISTENER PERSONALITIES AND THEIR INTERACTIONS FOR SPEECH EMOTION RECOGNITION |
| 13868 | LEVERAGING WHISPER EMBEDDINGS FOR AUDIO-BASED LYRICS MATCHING |
| 11538 | LExTra: Folded Prompt and Split-Role Attention for Target Speaker Extraction |
| 12423 | LFMIM: Low-Frequency Enhanced Reconstruction for Few-Shot SAR-ATR |
| 15317 | LGF-Net: A Local-Global Feature Learning Framework for Intrusion Detection |
| 7042 | LGFNet: Local Correlation and Global Context Fusion for Multivariate Time Series Forecasting |
| 16123 | LG-STAFNET: EMOTION RECOGNITION IN AI-GENERATED MUSIC VIA LOCAL-GLOBAL SPATIO-TEMPORAL EEG FEATURE FUSION |
| 6754 | LGTNET:A DUAL-BRANCH MICRO-EXPRESSION RECOGNITION NETWORK WITH GROUPED CHANNEL ATTENTION AND DEFORMABLE WINDOWS |
| 4465 | LIBEMER: A NOVEL BENCHMARK AND ALGORITHMS LIBRARY FOR EEG-BASED MULTIMODAL EMOTION RECOGNITION |
| 13100 | LiDAR-based Human Activity Recognition through Laplacian Spectral Analysis |
| 6775 | Lie Bracket Geometry of Feature Learning in Neural Networks |
| 6093 | LIFT: A QUALITY-AWARE DATA SELECTION FRAMEWORK FOR LOW-RESOURCE MACHINE TRANSLATION |
| 15773 | Light Field Image Super-Resolution with Multi-Scale Context Aggregation Mamba |
| 17287 | LIGHTCSEG: LIGHTWEIGHT CRACK SEGMENTATION NETWORK WITH ADAPTIVE SOBEL AND LOCAL ENHANCEMENT |
| 3213 | LIGHTOL: A LIGHTWEIGHT ONTOLOGY LEARNING FRAMEWORK WITH LARGE LANGUAGE MODELS |
| 11660 | LIGHTPONZI: EFFICIENT MULTIMODAL DETECTION OF PONZI SCHEMES IN ETHEREUM SMART CONTRACTS |
| 14053 | Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation |
| 11244 | LIGHTWEIGHT AND PERCEPTUALLY-GUIDED VOICE CONVERSION FOR ELECTRO-LARYNGEAL SPEECH |
| 16819 | LIGHTWEIGHT CHROMATIC-AWARE WHITE BALANCE VIA STATE-SPACE MODELING |
| 17356 | LIGHTWEIGHT IMAGE SUPER-RESOLUTION VIA EFFICIENT SHIFT CONVOLUTION AND EDGE-ENHANCED ATTENTION |
| 11639 | LIGHTWEIGHT IMPLICIT NEURAL NETWORK FOR BINAURAL AUDIO SYNTHESIS |
| 14719 | Lightweight Multitask-Oriented Semantic Communication via Foundational Knowledge Distillation |
| 16664 | Lightweight Phoneme-Conditioned Bandwidth Extension for Body-Conducted Speech |
| 16943 | LIGHTWEIGHT RGB-T TRACKING WITH MOBILE VISION TRANSFORMERS |
| 11576 | LIKQA: Lightweight Image Data Quality Assessment via Iterative Optimization and KAN-Based Models |
| 18924 | LIMITATIONS OF DATA-DRIVEN SPECTRAL RECONSTRUCTION: AN OPTICS-AWARE ANALYSIS |
| 18957 | Linear Convergence of Plug-and-Play Algorithms With Kernel Denoisers |
| 3639 | Linear Cross-Attention Guided Feature Pyramid Networks for Crowd Counting |
| 14067 | LINGOMETER: ON-DEVICE PERSONAL SPEECH WORD COUNTING SYSTEM |
| 13170 | LINGUARD: AUTHENTICATING SPEECH RECORDINGS USING SPEECH RECOGNITION AND WATERMARK |
| 16323 | LIPSAM: LIPSCHITZ-CONTINUOUS AMPLITUDE MODIFIER FOR AUDIO SIGNAL PROCESSING AND ITS APPLICATION TO PLUG-AND-PLAY DEREVERBERATION |
| 11484 | LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency |
| 7339 | Lisa: Lightweight Yet Superb Neural Speech Coding |
| 11912 | LISTEN, BUT DON'T LEAK: SENSITIVE DATA PROTECTION FOR PRIVACY AWARE AUTOMATIC SPEECH RECOGNITION WITH ACOUSTIC TRIGGERS |
| 16979 | Listening to UAV: 3D Trajectory Estimation via Acoustic Transformer |
| 15022 | LiteEngine: Lightweight Low-precision Inference Engine for Efficient DNN Inference |
| 4472 | LIVE4D: DECOUPLED OBJECT AND SCENE MODELING VIA DOUBLE-BRANCH 4D DIFFUSION |
| 7129 | LLAC: LEARNED LOSSLESS AUDIO CODEC |
| 13723 | LLM BASED EDGE-ASSISTED UAV INFERENCE AGAINST JAMMING |
| 2042 | LLM4RIM: Leveraging Large Language Model for Radar-based In-vehicle Monitoring |
| 9671 | LLM-ADAPTIVE REASONING CLUSTERING FOR AUTOMATED CHAIN-OF-THOUGHT PROMPTING |
| 5369 | LLMBA: ADAPTING LARGE LANGUAGE MODELS FOR BEHAVIOR ANALYTICS IN ZERO TRUST NETWORKS |
| 15975 | LLM-BASED POST-ASR ERROR CORRECTION FOR DISORDERED SPEECH |
| 3915 | LLM-DRIVEN KNOWLEDGE GRAPH ENCODING FOR FINANCIAL RISK |
| 16436 | LLM-Driven Scenario-Aware Planning for Autonomous Driving |
| 6244 | LLMEKEREC: EXPLAINABLE RECOMMENDATION VIA KNOWLEDGE GRAPH PATH REASONING WITH LLMS |
| 10181 | LLM-Guided Hierarchical Reinforcement Learning for Black-Box Adversarial Attacks Against Malware Detectors |
| 13232 | LLM-GUIDED SAM FOR LEFT VENTRICLE SEGMENTATION AND FUNCTIONAL ANALYSIS IN ECHOCARDIOGRAPHY |
| 1418 | LLMPopcorn: Exploring LLMs as Assistants for Popular Micro-video Generation |
| 13813 | LLMs Cannot Reliably Generate Architectural Design Images (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks |
| 12523 | LLMS DO NOT ALWAYS SAY NO: A PARALLEL MULTI-AGENT JAILBREAK FRAMEWORK VIA PROBABILISTIC VULNERABILITY |
| 17522 | LMM-ENHANCED MULTIMODAL SEQUENTIAL RECOMMENDATION USING CONSISTENCY-GUIDED HYBRID ATTENTION |
| 4681 | LMOSA-STEREO: LIGHTWEIGHT STEREO MATCHING WITH MIXTURE-OF-SCENE AND GEOMETRIC SPATIAL ATTENTION |
| 16575 | LMS-WHISPER: EFFICIENT LIGHTWEIGHT WHISPER FOR MULTI-STUTTER SPEECH CLASSIFICATION |
| 6325 | Local Rate Analysis of Scaled Gradient Descent for Matrix Completion |
| 8425 | LOCALEDIT: COMPLEX VIDEO LOCAL EDITING WITH MASK-GUIDED DIFFUSION INPAINTING |
| 11575 | LOCALIZATION OF A CONSTANT VELOCITY MOVING RIGID BODY IN 2-D BY SUCCESSIVE TOA MEASUREMENTS |
| 12246 | LOCALIZING SPEECH DEEPFAKES BEYOND TRANSITIONS VIA SEGMENT-AWARE LEARNING |
| 13140 | LOFEMECHO: RESOURCE-EFFICIENT AND SCALABLE ECHOCARDIOGRAPHIC CARDIAC FUNCTION ASSESSMENT |
| 12343 | LOG ANOMALY DETECTION VIA HYBRID STATE-SPACE RECURRENT ENCODING AND META-CONTRASTIVE LEARNING |
| 15705 | Logic-ORiented Retriever Enhancement via Contrastive Learning |
| 1030 | LOGIX: LOCAL-GLOBAL MIXERS FOR TIME SERIES REPRESENTATION LEARNING |
| 2188 | LOGPTR: VARIABLE-AWARE LOG PARSING WITH POINTER NETWORK |
| 15585 | LONG CHAIN-OF-THOUGHT COMPRESSION VIA FINE-GRAINED GROUP POLICY OPTIMIZATION |
| 15526 | LONGSPEECH: A SCALABLE BENCHMARK FOR TRANSCRIPTION, TRANSLATION AND UNDERSTANDING IN LONG SPEECH |
| 3041 | LONG-TAILED TIME SERIES CLASSIFICATION WITH NOISY LABELS |
| 4744 | Look, Listen and Segment: Towards Weakly Supervised Audio-visual Semantic Segmentation |
| 19133 | LOOKING AROUND FLATLAND: END-TO-END 2D REAL-TIME NLOS IMAGING |
| 12842 | LOOSE COUPLING OF SPECTRAL AND SPATIAL MODELS FOR MULTI-CHANNEL DIARIZATION AND ENHANCEMENT OF MEETINGS IN DYNAMIC ENVIRONMENTS |
| 15492 | LORA-ENHANCED DYNAMICS: A STRONG BASELINE FOR TRANSFERABLE PERSON RE-IDENTIFICATION ADVERSARIAL ATTACK |
| 3261 | LOSS-ONLY KNOWLEDGE TRANSFER: SIMILARITY-GUIDED LOSS WITH LLM PRIORS |
| 9718 | Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching |
| 13079 | LOTUSDIS: A THAI FAR-FIELD MEETING CORPUS FOR ROBUST CONVERSATIONAL ASR |
| 5372 | LOW RANK QUANTIZATION ADAPTATION FOR LARGE LANGUAGE MODEL |
| 17245 | Low-Bandwidth High-Fidelity Speech Transmission With Generative Latent Joint Source-Channel Coding |
| 2852 | LOW-COMPUTATION DETECTION METHOD FOR UNKOWN LFM SIGNALS BELOW NOISE FLOOR |
| 12317 | LOW-FREQUENCY HARMONIC CONTROL FOR SPEECH INTELLIGIBILITY IN OPEN-EAR HEADPHONES |
| 4942 | LOW-LATENCY AUDIO FRONT-END REGION-OF-INTEREST BEAMFORMING FOR SMART GLASSES |
| 18064 | LOW-LEVEL CONTINUAL TEST-TIME ADAPTATION FOR IMAGE RESTORATION |
| 14467 | LOW-POWER END-TO-END COCHLEAR IMPLANT SPEECH DENOISING WITH SPIKING NEURAL NETWORKS |
| 13013 | LOW-RANK AND SPARSE MODEL MERGING FOR MULTI-LINGUAL SPEECH RECOGNITION AND TRANSLATION |
| 18904 | Low-Rank Covariance Matrix Recovery From Rank-One Measurements: An Analytical Solution |
| 7655 | LOW-RANK SYMMETRIC INFORMATION BOTTLENECKS IN MULTI-VIEW SUBSPACE CLUSTERING |
| 6423 | LOW-RANK WEIGHTED AMPLITUDE AND PHASE FUSION FOR CSI-FINGERPRINT LOCALIZATION |
| 13346 | LOW-RESOURCE GUIDANCE FOR CONTROLLABLE LATENT AUDIO DIFFUSION |
| 11195 | LOW-RESOURCE IN-CAR INFANT DETECTION USING IR-UWB RADAR |
| 2320 | LOW-RESOURCE SPEECH-BASED EARLY ALZHEIMER’S DETECTION VIA CROSS-LINGUAL AND FEW-SHOT TRANSFER LEARNING |
| 12734 | LP-CFM: PERCEPTUAL INVARIANCE-AWARE CONDITIONAL FLOW MATCHING FOR SPEECH MODELING |
| 12969 | LPCVAE: A CONDITIONAL VAE WITH LONG-TERM DEPENDENCY AND PROBABILISTIC TIME-FREQUENCY FUSION FOR TIME SERIES ANOMALY DETECTION |
| 3019 | LSAFE: Edge-Guided Lightweight Network for Remote Sensing Salient Object Detection via Dynamic Multi-Scale Fusion |
| 4297 | LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning |
| 14241 | LTS-GS: LOCAL TEMPORAL SLICES FOR ADAPTIVE DYNAMIC 3D GAUSSIAN SPLATTING |
| 14949 | LUMIDIFF: LUMINANCE-PRIOR GUIDED DIFFUSION FOR PERCEPTUALLY BALANCED LOW-LIGHT IMAGE SIGNAL RECOVERY |
| 7554 | LUSEEL: LANGUAGE-QUERIED BINAURAL UNIVERSAL SOUND EVENT EXTRACTION AND LOCALIZATION |
| 7793 | LVC: A LIGHTWEIGHT COMPRESSION FRAMEWORK FOR ENHANCING VLMS IN LONG VIDEO UNDERSTANDING |
| 5132 | LVD-GS: GAUSSIAN SPLATTING SLAM FOR DYNAMIC SCENES VIA HIERARCHICAL EXPLICIT-IMPLICIT REPRESENTATION COLLABORATION RENDERING |
| 10158 | LYAPUNOV-CONSTRAINED INTEGRAL REINFORCEMENT LEARNING FOR STABLE ADMITTANCE CONTROL IN NON-RIGID ENVIRONMENTS |
| 16470 | LYTIMET: TOWARDS ROBUST AND INTERPRETABLE STATE-VARIABLE DISCOVERY |
| 13462 | M2DP: A MULTI-SCALE ASSOCIATION LEARNING FRAMEWORK FOR MULTI-CATEGORY DEMAND PREDICTION UNDER PUBLIC HEALTH EMERGENCIES |
| 11390 | M2EA: MULTI-VAE MANIFOLD ENVELOPE ALIGNMENT FOR CHALLENGING AI-GENERATED IMAGE DETECTION |
| 11685 | M2FNET: MULTI-LEVEL MODALITY-FUSED NETWORK FOR ROBUST FINGERPRINT AND FINGER VEIN RECOGNITION |
| 5624 | M²-PROTOLLM: AN LLM-DRIVEN FRAMEWORK FOR WHITELIST RULE EXTRACTION IN ICS PROTOCOLS |
| 9597 | M2REG: UNSUPERVISED MULTI-SCALE REGISTRATION FOR MULTIMODAL MICROSCOPY |
| 10313 | M2TRACKFORMER: TRANSFORMER-BASED MMWAVE TRACKING WITH LOST TARGET RE-ACQUISITION CAPABILITY |
| 5345 | M3FEND: MULTI-MODAL MIXTURE OF EXPERTS WITH ADVERSARIAL GATING FOR MULTI-MODAL FAKE NEWS DETECTION |
| 11000 | M3GQA: A MULTIMODAL MULTI-HOP AND KNOWLEDGE GRAPH-BASED FRAMEWORK FOR QUESTION ANSWERING |
| 3924 | M3NET: MULTIVARIATE TIME SERIES CLASSIFICATION VIA MULTISCALE PATCHING AND CHANNEL MIXING |
| 18989 | M4SER: MULTIMODAL, MULTIREPRESENTATION, MULTITASK, AND MULTISTRATEGY LEARNING FOR SPEECH EMOTION RECOGNITION |
| 11374 | MAC-SAM: Mask-Aware Category-Guided Segment Anything Model for Interactive Image Segmentation |
| 15346 | MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation without Vector Quantization |
| 13458 | MAGE: A COARSE-TO-FINE SPEECH ENHANCER WITH MASKED GENERATIVE MODEL |
| 12258 | MAGE-KT: Multi-Agent Graph-Enhanced Knowledge Tracing with Subgraph Retrieval and Asymmetric Fusion |
| 14843 | MAGF-UIENET: A MULTISCALE ATTENTION GUIDED FUSION NETWORK FOR UNDERWATER IMAGE ENHANCEMENT |
| 4596 | MAGICITY4D: CONTROLLABLE AND EDITABLE 4D CITY SCENE GENERATION USING MLLM-ENHANCED PROCEDURAL CONTENT GENERATION |
| 3004 | Magnet Tracking by a Magnetic Sensor Array with Interactive Multiple Model Estimation for Small-Scale Applications |
| 15803 | MAGNITUDE DIFFERENCE CONDITIONED ALL-IN-ONE IMAGE RESTORATION |
| 17164 | MaHaWave-Net: A Lightweight Multi-Scale Model for Fine-Grained Medical Image Segmentation |
| 4367 | MAIA: A MULTIDIMENSIONAL BENCHMARK FOR ASSESSING MEDICAL AI AGENTS |
| 17278 | MAKE A GAME: A NOVEL PARADIGM FOR INTERACTIVE GAME RENDERING |
| 3956 | Make Your MoVe: Make Your 3D Contents by Adapting Multi-View Diffusion Models to External Editing |
| 14568 | MAKING DIALOGUE GROUNDING DATA RICH: A THREE-TIER DATA SYNTHESIS FRAMEWORK FOR GENERALIZED REFERRING EXPRESSION COMPREHENSION |
| 12910 | MALEFA: MULTI-GRANULARITY LEARNING AND EFFECTIVE FALSE ALARM SUPPRESSION FOR ZERO-SHOT KEYWORD SPOTTING |
| 14779 | MA-MAE3D: MEMORY AUGMENTED-BASED MAE3D NETWORK FOR POINT CLOUD COMPLETION |
| 13965 | Mamba-Based Encoder-Decoder for Multi-Scale Feature Fusion in Remote Sensing Object Detection |
| 10752 | MambaFormer: State-Space Augmented Self-Attention with Down–Up Sampling for Monaural Speech Enhancement |
| 1516 | MambaHDR: Ghost-free High Dynamic Range Imaging with State-Space Model |
| 7335 | MAMBA-VP: MULTIMODAL VIEWPORT PREDICTION VIA TRAJECTORY-FILTERED TEMPORAL MULTI-SCALE AND VISUAL SPATIOTEMPORAL SCANNING |
| 13612 | MANGAVOX: DATASET OF ACTED VOICES ALIGNED WITH MANGA IMAGES TOWARDS COMPUTER UNDERSTANDING OF AUDIO COMICS |
| 16116 | Manifold-Optimization-Based 3D Sound Source Mapping with Unknown Camera-Microphone Array Relative Pose |
| 17881 | ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance |
| 16139 | MAPD-MAMBA:MODALITY-ADAPTIVE PERCEPTION-DRIVEN MAMBA FUSION NETWORK |
| 6419 | MAPEX: A MULTI-AGENT PIPELINE FOR KEYPHRASE EXTRACTION |
| 10086 | MAPROUTE-BENCH: EVALUATING SPATIAL REASONING ON TOP-VIEW MAPS IN VISION-LANGUAGE MODELS |
| 16497 | MAR: EFFICIENT LARGE LANGUAGE MODELS VIA MODULE-AWARE ARCHITECTURE REFINEMENT |
| 4789 | MARCO-VOICE: A UNIFIED FRAMEWORK FOR EXPRESSIVE SPEECH SYNTHESIS WITH VOICE CLONING |
| 16252 | MARE: Multi-Agent Role Embedding for Role-Consistent Generation in Multi-Agent Systems |
| 2588 | MARITIME INFRARED SMALL-TARGET DETECTION VIA LOCAL CONTRAST MEASUREMENT WITH A NOVEL WINDOW AND CENTRAL DIRECTIONAL CONSISTENCY |
| 13282 | Marking the Margin: Robust DNN watermarking against Removal Attacks via sculpting decision boundaries |
| 10616 | MARKSWEEP: A NO-BOX REMOVAL ATTACK ON AI-GENERATED IMAGE WATERMARKING VIA NOISE INTENSIFICATION AND FREQUENCY-AWARE DENOISING |
| 4780 | MASA: Query-Free Black-Box Adversarial Attack on Text-to-Image Generation via Multi-modal Adaptive Semantic Optimization |
| 13929 | MASDROID: A Multi-Agent System for Enhancing the Analysis of Android Malware |
| 14850 | MaskDiff-Traj: A UNIFIED TRAJECTORY IMPUTATION AND GENERATION FRAMEWORK VIA PATTERN-GUIDED MASKED DIFFUSION |
| 13935 | MASKED PROJECTION MODELLING FOR SPARSE-VIEW CRYO-EM RECONSTRUCTION |
| 4556 | MASK-FREE THANGKA RESTORATION VIA RETRIEVAL-GUIDED DIFFUSION WITH SEMANTIC AND STRUCTURAL ALIGNMENT |
| 15905 | Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks? |
| 11353 | MASK-GUIDED BACKTRACK DECODING: ENABLING SELF-CORRECTION IN LLM REASONING |
| 2952 | MASKVCT: MASKED VOICE CODEC TRANSFORMER FOR ZERO-SHOT VOICE CONVERSION WITH INCREASED CONTROLLABILITY VIA MULTIPLE GUIDANCES |
| 16095 | MASSIVE MIMO WITH FEWER RF CHAINS USING SIGMA-DELTA RIS |
| 11496 | MASTER-ASSISTED DISTRIBUTED UPLINK OPERATION FOR CELL-FREE MASSIVE MIMO NETWORKS |
| 2440 | MATCHGAUSSIAN: ADAPTIVE DENSIFICATION WITH MATCHING PRIORS FOR GENERALIZABLE GAUSSIAN SPLATTING |
| 16068 | MATCHING REVERBERANT SPEECH THROUGH LEARNED ACOUSTIC EMBEDDINGS AND FEEDBACK DELAY NETWORKS |
| 3974 | MATCHMIX: PHASE-MATCHED FRAME MIXING FOR TEMPORALLY CONSISTENT MOTION AUGMENTATION |
| 4381 | MATE: MATRYOSHKA AUDIO-TEXT EMBEDDINGS FOR OPEN-VOCABULARY KEYWORD SPOTTING |
| 12558 | MATHHALU: A BENCHMARK FOR MATHEMATICAL REASONING PROCESS HALLUCINATION DETECTION IN LARGE REASONING MODELS |
| 1959 | MATHPHYS-GUIDED COARSE-TO-FINE ANOMALY SYNTHESIS WITH SQE-DRIVEN BI-LEVEL OPTIMIZATION FOR ANOMALY DETECTION |
| 12121 | MATRIX-STRUCTURED HIERARCHICAL CONVOLUTIONAL MODELING FOR PRONUNCIATION ASSESSMENT AND MISPRONUNCIATION DETECTION |
| 11774 | MATTER: Multiscale Attention for Registration Error Regression |
| 16791 | MAVNET: DEEP LEARNING-BASED FMCW RADAR FRAMEWORK FOR MOTION-RESILIENT VITAL SIGN MONITORING ON PYNQ SOC |
| 8322 | MAXIMIZING SECURE ENERGY EFFICIENCY IN UAV-ASSISTED BACKSCATTERING NETWORKS USING DEEP REINFORCEMENT LEARNING |
| 17038 | MAXIMUM ENTROPY-BASED EFFICIENT FUZZY GRAPH CLUSTERING |
| 17247 | Maximum Likelihood Measurement Noise Estimation for Block-Time Domain Kalman Filters |
| 12944 | MCF: Text LLMs for Multimodal Emotional Causality |
| 3481 | MCF-Net: A Mamba-based Efficient Network for Radar Jamming Recognition |
| 4315 | M-CGL: MAMBA-ENHANCED CONCEPT GUIDED LEARNING FOR FINE-GRAINED IMAGE CLASSIFICATION |
| 13051 | MCI-OTFusion: A multimodal model for MCI detection and cognitive score prediction |
| 12114 | MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism |
| 12610 | MCMC method with integrated adiabatic modes model for range-dependent matched-field geoacoustic inversion |
| 13260 | MC-MRX: REFERENCE- AND MIDI-GUIDED MUSIC SOURCE EXTRACTION WITH CONTRASTIVE LEARNING |
| 16434 | mCoT-VLA: Towards Robust Vision–Language–Action Models via Multimodal Chain-of-Thought |
| 10791 | MCPO: DYNAMIC MASKING AND MULTI-COMPARISON POLICY OPTIMIZATION ALGORITHM FOR LLM REINFORCEMENT LEARNING |
| 11377 | MDBoost: A Multi-Dimensional Reweighting Framework for Robust Gradient Boosting |
| 15500 | MDFDet: A Multi-modal Dynamic Fusion Algorithm for RGB-Infrared Object Detection |
| 3300 | MDPO: MULTI-DIMENSIONAL LABEL ENHANCED DIRECT PREFERENCE OPTIMIZATION FOR EFFICIENT MULTIMODAL LLM FINE-TUNING |
| 2598 | MDSF-DET: MODALITY DECOUPLING AND SYNERGISTIC FUSION DETECTOR |
| 5752 | MEAN-FIELD-ENABLED ROBUST ANTI-JAMMING TRANSIMISSION FOR LARGE-SCALE AERIAL RIS NETWORKS |
| 4592 | MEANFLOW-ACCELERATED MULTIMODAL VIDEO-TO-AUDIO SYNTHESIS VIA ONE-STEP GENERATION |
| 2705 | MEANFLOWSE: ONE-STEP GENERATIVE SPEECH ENHANCEMENT VIA CONDITIONAL MEAN FLOW |
| 4902 | MEANSE: EFFICIENT GENERATIVE SPEECH ENHANCEMENT WITH MEAN FLOWS |
| 1704 | MEANVC: LIGHTWEIGHT AND STREAMING ZERO-SHOT VOICE CONVERSION VIA MEAN FLOWS |
| 11590 | MEANVOICEFLOW: ONE-STEP NONPARALLEL VOICE CONVERSION WITH MEAN FLOWS |
| 6236 | MEASURE-TRANSFORMED PRINCIPAL COMPONENT ANALYSIS |
| 12453 | MEASURING AND REDUCING INTRINSIC BOUNDARY NOISE IN TEMPORAL ACTION SEGMENTATION |
| 2081 | MEASURING PROSODY DIVERSITY IN ZERO-SHOT TTS: A NEW METRIC, BENCHMARKING, AND EXPLORATION |
| 15472 | MEBM: Exploring the Synergy of Mixture of Experts in Background Matting |
| 3638 | MECAP-R1: EMOTION-AWARE POLICY WITH REINFORCEMENT LEARNING FOR MULTIMODAL EMOTION CAPTIONING |
| 4543 | Medical Federated Learning under Long-Tailed and Non-IID Distributions |
| 10355 | MedSpeak: A Knowledge Graph-Aided ASR Error Correction Framework for Spoken Medical QA |
| 16062 | MEGATEMPQA: A MILLION-SCALE TEMPORAL QUESTION-ANSWER DATASET FOR REDUCING LLM HALLUCINATIONS |
| 12826 | MEIE:A PROMPT-DRIVEN FRAMEWORK FOR MIXED-EXPOSURE IMAGE ENHANCEMENT WITH ADAPTIVE 3D LUTS |
| 11483 | MELA-TTS: JOINT TRANSFORMER-DIFFUSION MODEL WITH REPRESENTATION ALIGNMENT FOR SPEECH SYNTHESIS |
| 10888 | MELT: IMPROVE COMPOSED IMAGE RETRIEVAL VIA THE MODIFICATION FREQUENTATION-RARITY BALANCE NETWORK |
| 10881 | MEM4TEETH: Memory-Guided Point Cloud Completion for Dental Reconstruction |
| 17815 | Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation |
| 2382 | MemFormer: Memory-enhanced Transformer with Multi-task Learning for Video Anomaly Detection |
| 10711 | MEMORY FOOTPRINT IMAGES: A U-NET APPROACH FOR ADVANCED CACHE PREFETCHING |
| 9761 | MEMORY-EVOLUTION AND REFLECTION-AUGMENTED AGENTS |
| 17147 | MEMORYPROMPT:MEMORY-AUGMENTEDMULTI-LAYERPROMPTINGFORVISION-LANGUAGEMODELS |
| 15361 | Meow: End-to-End Outline Writing for Automatic Academic Survey |
| 10858 | MERLINet: Multi-Exposure Reflection Elimination Network for Real-World Scenes |
| 4164 | MESHRF: RESIDUAL FUSION OF VERTICES, EDGES, AND FACES FOR MESH UNDERSTANDING |
| 1513 | MESSAGE PASSING-BASED PARALLEL MULTI-TARGET JOINT DETECTION AND ESTIMATION IN DISTRIBUTED PASSIVE MIMO RADAR |
| 9693 | Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making |
| 12811 | META-REINFORCEMENT LEARNING WITH CONTEXTUAL BIAS REDUCTION |
| 14624 | MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning |
| 5312 | METRA: Robust Encrypted Traffic Detection Against Adversarial Attacks via Multi-Task Learning and Label Denoising |
| 16749 | MEVAR: MOBILITY-ENHANCED VEHICLE TRAJECTORY RECONSTRUCTION FROM CAMERA SENSING NETWORKS |
| 9025 | MFA-Align: Aligning by Disagreeing for Efficient and Low-Cost Personalized Alignment of Aesthetic VLLMs |
| 5778 | MFF-Net:Image Manipulation Localization Method Based on Multi-scale Feature Fusion Network |
| 5597 | MFF-RVRDI: MULTIMODAL FUSION FRAMEWORK FOR ROBUST VIDEO RECORDING DEVICE IDENTIFICATION |
| 5062 | MFM-DETR: MISALIGNMENT-FREE MULTIMODAL DETR FOR MARITIME OBJECT DETECTION |
| 11393 | MGHFED: ENHANCING HETEROGENEOUS SUBGRAPH FEDERATED LEARNING THROUGH ADVERSARIAL META-PATH GENERATION |
| 10780 | MH-A3: Metropolis-Hastings Anomaly-Aware Augmentation for Contrastive Graph Anomaly Detection |
| 11106 | MHPTRACK: EFFICIENT MULTIMODAL HISTORICAL PROMPTING FOR AERIAL TARGET TRACKING |
| 18888 | Micro-Image Domain View Synthesizer for Free Navigation With Focused Plenoptic Cameras |
| 7932 | MICROPHONE-LESS MEASUREMENT OF THREE-DIMENSIONAL RADIATING IMPULSE RESPONSE OF SOUND SOURCE USING SPHERICAL HARMONIC-DOMAIN ACOUSTO-OPTIC TOMOGRAPHY |
| 15035 | MIDAS: A Dynamic Cross-GPU KV Cache Offloading Framework For LLM On GPU Cluster Systems |
| 4159 | MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding |
| 1004 | MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model |
| 17732 | MIHT-NET: A DEEP-UNROLLED FRAMEWORK FOR SPARSE SIGNAL RECOVERY |
| 6033 | MILORE-SSL: SCALING MULTILINGUAL CAPABILITIES IN SELF-SUPERVISED MODELS WITHOUT FORGETTING |
| 16770 | MIMO Array Calibration in Non-stationary Channels with Residual Surfaces and Slepian Spherical Harmonics |
| 10219 | MIND THE GAP: DATA REWRITING FOR STABLE OFF-POLICY SUPERVISED FINE-TUNING |
| 10644 | MIND THE NOISE, ALIGN THE FINE: CONFIDENCE-AWARE MASKED IMAGE MODELING FOR TEXT-BASED PERSON RE-IDENTIFICATION |
| 13993 | MIND THE SHIFT: USING DELTA SSL EMBEDDINGS TO ENHANCE CHILD ASR |
| 14275 | MIND YOUR [m]S, CROSS YOUR [t]S: A LARGE-SCALE PHONEMIC ANALYSIS OF SPEECH REPRODUCTION IN MODERN SPEECH GENERATORS |
| 1402 | MINIMIZATION OF NONSMOOTH WEAKLY CONVEX FUNCTION OVER PROX-REGULAR SET FOR ROBUST LOW-RANK MATRIX RECOVERY |
| 14124 | MINIMIZING ADC PRECISION FOR ANALOG IN-MEMORY COMPUTING |
| 14826 | MI-PRUN: OPTIMIZE LARGE LANGUAGE MODEL PRUNING VIA MUTUAL INFORMATION |
| 9648 | MIRAGE: NOISE-AWARE BAYESIAN CALIBRATION WITH MUTUAL INFORMATION FOR RELIABLE RAG |
| 12414 | MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning |
| 15237 | MIRRORTALK: FORGING PERSONALIZED AVATARS VIA DISENTANGLED STYLE AND HIERARCHICAL MOTION CONTROL |
| 12465 | MISA: MULTI-STAGE INTERACTIVE SELF-ATTENTION FOR CONSISTENT SUBJECT-DRIVEN TEXT-TO-IMAGE GENERATION |
| 17688 | Misclassification Rates and Privacy-Utility Trade-offs in Graph Convolutional Networks via Subsampling Stability |
| 10201 | MISPRONUNCIATION DETECTION AND DIAGNOSIS WITHOUT MODEL TRAINING: A RETRIEVAL-BASED APPROACH |
| 6261 | MISSPECIFIED CRAMÉR-RAO BOUNDS ON SNR ESTIMATION |
| 14366 | MIST: Micro-Image Shuffling Tool for Codec-Agnostic Plenoptic Video Compression |
| 12267 | MISTA: Compact Multi-Identity Structure-aware Tensorized Avatars |
| 16462 | MITA: A HIERARCHICAL MULTI-AGENT COLLABORATION FRAMEWORK WITH MEMORY-INTEGRATED AND TASK ALLOCATION |
| 8026 | MITIGATING ATTENTION SINKS AND MASSIVE ACTIVATIONS IN AUDIO-VISUAL SPEECH RECOGNITION WITH LLMS |
| 14219 | MITIGATING DATA REPLICATION IN TEXT-TO-AUDIO GENERATIVE DIFFUSION MODELS THROUGH ANTI-MEMORIZATION GUIDANCE |
| 13657 | MITIGATING DECEPTIVE KNOWLEDGE EDITING IN LLMS VIA DIFFUSION SYNTHESIS |
| 10249 | MITIGATING DOMAIN SHIFT IN ULTRASONIC WAVEFIELD PATTERN ANALYSIS THROUGH TEST-TIME TRAINING |
| 17709 | Mitigating entity bias in Relation Extraction with Pair-Training |
| 3893 | MITIGATING FALSE ALARMS IN OPEN-SET SPEAKER IDENTIFICATION WITH A DECOUPLED FRAMEWORK |
| 2402 | MITIGATING HALLUCINATION IN FINANCIAL RETRIEVAL-AGUMENTED GENERATION VIA FINE-GRAINED KNOWLEDGE VERIFICATION |
| 17609 | Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping |
| 15511 | MITIGATING INTRA-SPEAKER VARIABILITY IN DIARIZATION WITH STYLE-CONTROLLABLE SPEECH AUGMENTATION |
| 6133 | MITIGATING LANGUAGE PRIOR-INDUCED HALLUCINATIONS VIA BI-LEVEL CONTRASTIVE DECODING |
| 16454 | MITIGATING OBJECT AND RELATIONSHIP HALLUCINATION IN LARGE VISION LANGUAGE MODEL WITH MULTI-AGENT GUIDANCE |
| 13976 | MIX2MORPH: LEARNING SOUND MORPHING FROM NOISY MIXES |
| 4206 | MIX-CLAP: ADAPTIVE FUSION OF KNOWLEDGE-DISTILLED AUDIO EMBEDDINGS FOR NOISE-AWARE AUDIO-LANGUAGE MODELS |
| 19124 | Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm |
| 1837 | MixGAN-based Non-blind Bandwidth Extension for Audio Codec |
| 14889 | Mix-Persona Comment Generation for LLM Fine-Tuning in Multimodal Crisis Post Classification |
| 2435 | MixStyle-Augmented Meta-Learning for Cross-Domain Infrared-Visible Image Fusion |
| 6097 | MIXTURE OF EXPERTS FOR RECOGNIZING DEPRESSION FROM INTERVIEW AND READING TASKS |
| 9545 | Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR |
| 5140 | Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition |
| 9973 | Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of moving talkers |
| 5686 | MIXTURES OF LIGHTWEIGHT ARTICULATORY EXPERTS FOR MULTILINGUAL ASR |
| 14710 | MLLM-EMPOWERED ACTIVE LEARNING WITH GENERATED ATTRIBUTES FOR MICROSCOPIC ALGAE IMAGE CLASSIFICATION |
| 12520 | MMAUDIOSEP: TAMING VIDEO-TO-AUDIO GENERATIVE MODEL TOWARDS VIDEO/TEXT-QUERIED SOUND SEPARATION |
| 11596 | MMC: Min-Max Calibration for Test-Time Prompt Tuning in Vision-Language Models |
| 6208 | MMFast: Rethinking Vision-Language Interaction in Efffcient MLLMs |
| 1202 | MMIndoor3D: Multi-View Multimodal 3D Indoor Scene Generation with Material Information |
| 16236 | MMNAD: A GENERALIZED MULTI-SCENARIO ATTACK DETECTION METHOD FOR SOFTWARE DEFINED NETWORKING |
| 13871 | MM-NO: Learning Physical Operators from Heterogeneous Data via Cross-Modal Attention Fusion |
| 3278 | mmSRFormer: Efficient Transformer for Sparse mmWave Radar Point Cloud Super-Resolution |
| 11402 | mmWave-Diffusion: A Novel Framework for Respiration Sensing Using Observation-Anchored Conditional Diffusion Model |
| 14985 | MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech |
| 4244 | MÖBIUS FOURIER BASIS FOR DAGS WITH NONNEGATIVE EDGE WEIGHTS |
| 5353 | MOC: Mamba-based Multi-Scale One-Class Time-Series Anomaly Detection |
| 10523 | Modality-Aware Token Filtering and Common Feature Enhancement Network for Multi-modal Vehicle Re-Identification |
| 1885 | Modality-Decoupled RGB-Thermal Object Detector via Query Fusion |
| 7008 | MODEL EQUALITY TESTING OF BLACK-BOX LLM APIS VIA PREFIX TREE STATISTICS |
| 11201 | MODEL SHALL KNOW IT: BACKDOOR ATTACKS ON IMAGE CAPTIONING MODELS BY TEXTURAL REPRESENTATIONS |
| 15632 | MODELING AND INTEGRATION OF DYNAMIC METASURFACE ANTENNAS WITH CLUSTERED CHANNEL MODELS |
| 17348 | MODELING BOTH INTRA- AND INTER-UTTERANCE VARIABILITY FOR CONVERSATIONAL EMOTION RECOGNITION |
| 14659 | Modeling Inter-Segment Relationships in Speech for Dementia Detection with Audio Spectrogram Transformers and Graph Attention Networks |
| 12777 | MODELING STRATEGIES FOR SPEECH ENHANCEMENT IN THE LATENT SPACE OF A NEURAL AUDIO CODEC |
| 5207 | Modelling of a Marked Hawkes Process. |
| 12813 | MODERN STRUCTURE-AWARE SIMPLICIAL SPATIOTEMPORAL NEURAL NETWORK |
| 10183 | MODULARITY-FREE CONFLICT-AVERSE TRAINING FOR GENERALIZED PINNS |
| 3314 | MODWKAN: HARNESSING MAXIMAL OVERLAP DISCRETE WAVELET TRANSFORM AND KAN FOR TIME SERIES FORECASTING |
| 5789 | MoE-AMC: Enhancing Automatic Modulation Classification Using Mixture-of-Experts |
| 11885 | MO-GRPO-MED: A MULTI-OBJECTIVE FRAMEWORK FOR GENERATING SAFE AND HIGH-QUALITY DISCHARGE INSTRUCTIONS |
| 14410 | Moment-based Posterior Sampling for Multi-reference Alignment |
| 13762 | MOMENTS MATTER: POSTERIOR RECOVERY IN POISSON DENOISING VIA LOG-NETWORKS |
| 18561 | Mongoose: Do We Need Scanner for Vision Mamba? |
| 9413 | MONOCULAR 3D FACE RECONSTRUCTION VIA COARSE-TO-FINE LANDMARK REPRESENTATION |
| 19061 | Monotone Lipschitz-Gradient Denoiser: Explainability of Operator Regularization Approaches Free From Lipschitz Constant Control |
| 13314 | Moral5D: A Five-Dimensional Human-centered Method for Evaluating and Enhancing LLM Moral Reasoning |
| 14481 | MORE THAN A SHORTCUT: A HYPERBOLIC APPROACH TO EARLY-EXIT NETWORKS |
| 6110 | MORE: MULTIMODAL RELATIONSHIP ENHANCEMENT FOR UNBIASED SCENE GRAPH GENERATION |
| 13583 | MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR |
| 14529 | MOSA: MOTION-GUIDED SEMANTIC ALIGNMENT FOR DYNAMIC SCENE GRAPH GENERATION |
| 12624 | MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding |
| 7972 | MOTIONFLOW: TEXT-DRIVEN EMOTION-CONTROLLABLE HUMAN MOTION GENERATION VIA CONDITIONAL FLOW MATCHING |
| 16880 | MotionFusion: Fusing Motion and Saliency for Fast Video Large Language Model Inference |
| 17163 | MOTION-GUIDED SEMANTIC ALIGNMENT WITH NEGATIVE PROMPTS FOR ZERO-SHOT VIDEO ACTION RECOGNITION |
| 1468 | MotionPLLM: LLM-Based Generator for Part-Level Controllable Human Motion in Quantized Latent Space |
| 10017 | Mouthing-Enhanced Multimodal Hierarchical Contrastive Learning for Gloss-Free Sign Language Translation |
| 14265 | MOVi: Training-free Text-conditioned Multi-Object Video Generation |
| 10264 | MPDA: A MULTI-GRANULARITY PERTURBATION AND DUAL-FEATURE ANALYSIS FRAMEWORK FOR AI-GENERATED TEXT DETECTION |
| 18870 | MPHR: A Robust Algorithm for Estimating Nodal Head in Water Networks |
| 7667 | MPL-MOE: MULTI-MODAL PROMPT LEARNING WITH MIXTURE OF EXPERTS FOR MULTIVARIATE TIME SERIES FORECASTING |
| 4883 | MP-MVSNET: MULTI-VIEW STEREO NETWORK GUIDED BY BOTH MONOCULAR FEATURE AND GEOMETRIC PRIORS |
| 12939 | MPR: Memory Perturbation Regularization for Controllable Stability-Plasticity Balancing in Continual Trajectory Prediction |
| 1205 | MRFHAR: WAVELET-BASED CONTRASTIVE LEARNING FOR HUMAN ACTIVITY RECOGNITION BY FUSING RFID AND WIFI SIGNALS |
| 14146 | MR-FLOWDPO: MULTI-REWARD DIRECT PREFERENCE OPTIMIZATION FOR FLOW-MATCHING TEXT-TO-MUSIC GENERATION |
| 8189 | MSAT: MULTI-SCALE SEMANTIC-ALIGNED TRANSFORMER FOR MULTI-LABEL IMAGE CLASSIFICATION |
| 11803 | MSBENCH: CAN SPEECH LANGUAGE MODELS GENERATE MULTI-SPEAKER DIALOGUES IN ONE PASS? |
| 5330 | MSCT : DIFFERENTIAL CROSS-MODAL ATTENTION FOR DEEPFAKE DETECTION |
| 4539 | MSF-Mamba:Multi-Scale Frequency Mamba for Long-Term Time Series Forecasting |
| 17622 | MSF-SER: ENRICHING ACOUSTIC MODELING WITH MULTI-GRANULARITY SEMANTICS FOR SPEECH EMOTION RECOGNITION |
| 12084 | MSGCoOp: Multiple Semantic-Guided Context Optimization for Few-Shot Learning |
| 5577 | MSGFF: TRANSPARENT WATERMARK DETECTION AND REMOVAL VIA MULTI-SCALE GRADIENT FEATURE FUSION |
| 12668 | MSNAV: ZERO-SHOT VISION-AND-LANGUAGE NAVIGATION WITH DYNAMIC MEMORY AND LLM SPATIAL REASONING |
| 6281 | MSP-REID: HAIRSTYLE-ROBUST CLOTH-CHANGING PERSON RE-IDENTIFICATION |
| 13792 | MSTAR: CROSS-MODAL FUSION VIA MULTI-SOURCE REWARD MECHANISM FOR SPATIO-TEMPORAL AWARE REASONING |
| 15200 | MSVS: MULTI-SHELL VIEWPOINT SAMPLING FOR COMPREHENSIVE EVALUATION OF 3D WATERMARKING |
| 1672 | MTAD: A Three-Stage Framework for Machine Translation Agents Distillation |
| 15282 | MTEDS: MEMORY AND TIME EFFICIENT SPECULATIVE DECODING WITH DYNAMIC SPARSITY AND BYPASS SCHEDULING |
| 12280 | MT-HPDE: MULTIMODAL VISION TRANSFORMER FOR HAND POINT DIRECTION ESTIMATION USING ZERO-SHOT DIFFUSION SEGMENTATION |
| 13859 | MT-HUBERT: SELF-SUPERVISED MIX-TRAINING FOR FEW-SHOT KEYWORD SPOTTING IN MIXED SPEECH |
| 10618 | MTP-S2UT: ENHANCING SPEECH-TO-SPEECH TRANSLATION QUALITY WITH MULTI-TOKEN PREDICTION |
| 4897 | MTRP: Diversely Enhancing Multi-Turn Dialogue and Role-Playing Abilities of Large Language Models |
| 12595 | MTS-CR: Contrastive Representation Learning for Real-Time QoS Degradation Detection in Media Cloud Shared Instances |
| 9796 | MTSearch-R1: Reinforcement Learning for Flexible Multi-Tool Search with Large Language Models |
| 17119 | MUCO: MULTI-VIEW PATTERN REPRESENTATION LEARNING FOR SUBGRAPH COUNTING |
| 5341 | MUGSQA: NOVEL MULTI-UNCERTAINTY-BASED GAUSSIAN SPLATTING QUALITY ASSESSMENT METHOD, DATASET, AND BENCHMARKS |
| 18038 | MULTI LEVEL PATCH-WISE CONTRASTIVE SELF-SUPERVISED LEARNING WITH DYNAMIC SCALE-AWARE ATTENTION FOR AIRPORT OBJECT DETECTION |
| 13823 | Multi Stage Training With Dynamic Data Balancing For Multilingual Speech Recognition and Translation |
| 5574 | MULTI-AGENT BRAINSTORMING FOR INTERPRETING AND MITIGATING HALLUCINATION IN MULTIMODAL-LLM |
| 10061 | Multi-Agent Deep Reinforcement Learning-Based IoV Secure Data Transmission |
| 2477 | MULTI-AGENT DIAGNOSTIC COLLABORATION AND SEGMENTATION-AWARE RESIDUAL DECODING FOR HALLUCINATION-RESISTANT MEDICAL VQA |
| 4309 | Multi-Agent Honeypot-Based Request-Response Context Dataset for Improved SQL Injection Detection Performance |
| 3424 | MULTI-ANGLE VISUAL INFORMATION REPRESENTATION AND PROGRESSIVE ALIGNMENT NETWORK FOR JOINT MULTIMODAL ENTITY-RELATION EXTRACTION |
| 15157 | Multiantenna Channel Map Prediction With Missing Location Information Using Contrastive Learning and Graph Neural Networks |
| 18992 | MULTI-ATTRIBUTE GRAPH LEARNING FOR GEOSCIENCE APPLICATIONS |
| 14697 | MULTI-BAND FREQUENCY PROMPT TUNING FOR SOURCE-FREE CROSS-DOMAIN FEW-SHOT LEARNING |
| 14086 | Multibeam analog beamformer design for monostatic ISAC under Self-Interference |
| 14484 | MULTI-BLOCK ALTERNATING GRADIENT DESCENT AND MINIMIZATION FOR L+S COLUMN-WISE COMPRESSED SENSING |
| 12000 | MULTI-BRANCH COLLABORATIVE FEATURE PYRAMID NETWORK FOR SHORT-SPEECH SPEAKER VERIFICATION |
| 3702 | MULTI-CHANNEL SPEECH ENHANCEMENT FOR COCKTAIL PARTY SPEECH EMOTION RECOGNITION |
| 19102 | Multi-Channel Speech Enhancement Guided by Learning-based A Posteriori Speech Presence Probability Estimation |
| 5666 | MULTI-COURSE INTEGRATION FRAMEWORK BASED ON SUBJECT KNOWLEDGE GRAPHS |
| 13878 | Multi-Dictionary Learning for Low Rank Sparse Coding |
| 17980 | MULTI-DIFFERENTIAL FEATURE INTERACTION NETWORK FOR IMAGE CHANGE CAPTIONING TOWARDS LOW-LIGHT REMOTE SENSING SCENARIOS |
| 18887 | Multidimensional Polynomial Phase Estimation |
| 6694 | Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning |
| 13384 | MULTI-DOMAIN SHORT VIDEO ANOMALY NEWS DETECTION |
| 19119 | MULTIFACETED PRONUNCIATION FEEDBACK MODEL WITH INTERACTIVE HIERARCHICAL NEURAL MODELING |
| 1678 | MULTI-GATE CONVOLUTIONAL NEURAL NETWORK FOR EFFICIENT SINGLE IMAGE SUPER-RESOLUTION |
| 16725 | MULTI-GRANULARITY ATTRIBUTE PROMPT LEARNING FOR CLOTH-CHANGING PERSON RE-IDENTIFICATION |
| 3225 | MULTI-GRANULARITY SCORE-BASED GENERATIVE FRAMEWORK ENABLES EFFICIENT INVERSE DESIGN OF COMPLEX ORGANICS |
| 16540 | MULTI-HOP DEEP JOINT SOURCE-CHANNEL CODING WITH DEEP HASH DISTILLATION FOR SEMANTICALLY ALIGNED IMAGE RETRIEVAL |
| 16356 | Multi-layer attentive probing improves transfer of audio representations for bioacoustics |
| 17272 | MULTILINGUAL SUPERVISED PRETRAINING WITH LM-ASSISTED DECODING FOR VISUAL SPEECH RECOGNITION |
| 7884 | MULTI-MODAL BASED POINT CLOUD GEOMETRY COMPRESSION |
| 15226 | MULTIMODAL CO-TRAINING WITH SUBTRACTIVE UNLABELED-BENEFIT BOUNDS |
| 6654 | MULTIMODAL DEEP LEARNING METHOD FOR REAL-TIME SPATIAL ROOM IMPULSE RESPONSE COMPUTING |
| 11351 | MULTI-MODAL FAKE NEWS DETECTION VIA INTRA-CALIBRATED CROSS-MODAL FUSION AND MODALITY-WISE ATTENTION AGGREGATION |
| 11980 | MULTIMODAL FUSION-BASED IPCLIP NETWORK FOR MIXED REALITY SURGICAL ASSISTANCE |
| 7072 | MULTI-MODAL HIERARCHICAL FUSION WITH CROSS-AGENT FOR RGB-D SALIENT OBJECT DETECTION |
| 11991 | Multimodal LLMs as Expert Speech Annotators: Acoustic Macro-Descriptors for Parkinson's Detection |
| 17198 | MULTIMODAL MULTI-AGENT EMPOWERED LEGAL JUDGMENT PREDICTION |
| 3880 | Multimodal Privacy-Preserving Entity Resolution with Fully Homomorphic Encryption |
| 14425 | MULTIMODAL ROOM IMPULSE RESPONSE GENERATION THROUGH LATENT RECTIFIED FLOW MATCHING |
| 1950 | MULTIMODAL SELF-ATTENTION NETWORK WITH TEMPORAL ALIGNMENT FOR AUDIO-VISUAL EMOTION RECOGNITION |
| 10228 | Multimodal Sensing-Aided Beamforming Optimization for OFDM Systems |
| 9687 | MULTIMODAL SPEAKER-LISTENER COUPLING DYNAMICS OF SPEECH, PHYSIOLOGY, AND EMOTIONS USING HRV AND ENTROPY ANALYSIS |
| 13004 | MULTIMODAL TRANSFORMER WITH MULTIPERSPECTIVE TRAINING FOR PREDICTING SELF-EXPRESSION SKILLS FROM VIDEO INTERVIEW |
| 16688 | MULTIMODAL VARIATIONAL GRAPH NETWORK FOR MULTIMODAL SENTIMENT ANALYSIS |
| 10688 | Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis |
| 12829 | MULTI-OS: MULTIMODAL OOD SYNTHESIS ENHANCES OUT-OF-DISTRIBUTION DETECTION FOR VISION-LANGUAGE MODELS |
| 5320 | MULTI-PATCH HIERARCHICAL ADAPTIVE STATE SPACE MODEL FOR REMOTE SENSING IMAGE DEHAZING |
| 4655 | MULTI-PHYSICS: A COMPREHENSIVE BENCHMARK FOR MULTIMODAL LLMS REASONING ON CHINESE MULTI-SUBJECT PHYSICS PROBLEMS |
| 5802 | Multi-Polynomial Phase Signal Parameter Estimation using Time-Frequency Decomposition and Time-Series Representations |
| 9980 | Multi-Resolution Spectrograms Detection of LPI RADAR with Time-Frequency Attention augmented YOLO |
| 8713 | MULTI-SCALE ADAPTIVE NEIGHBORHOOD AWARENESS TRANSFORMER FOR GRAPH FRAUD DETECTION |
| 17630 | MULTI-SCALE AND MULTI-MODAL SELECTIVE FUSION FOR RGB-D VIDEO SALIENT OBJECT DETECTION |
| 2167 | MULTI-SCALE FREQUENCY PERCEPTION ENABLED SELECTIVE STATE-SPACE AND FEATURE PYRAMID COLLABORATIVE METHOD FOR SMALL OBJECT DETECTION |
| 16616 | Multi-scale Generative Modeling for Fast Sampling |
| 2254 | MULTI-SCALE POSITIVITY GRAPH TRANSFORMER FOR FINE-GRAINED IMAGE RECOGNITION |
| 1813 | MULTI-SCALE STATE SPACE MODELING FOR CROSS-MODAL INFRARED AND VISIBLE IMAGE FUSION |
| 4233 | MULTI-SCALE TASK-AWARE EEG REPRESENTATION LEARNING FOR COGNITIVE STATE RECOGNITION |
| 18153 | Multi-Source Domain Generalized Person Re-Identification with DualStyle Augmentation and Dynamic Memory classifier |
| 16094 | MULTISOURCE LOCALIZATION USING MULTIMARGINAL OPTIMAL TRANSPORT |
| 5106 | Multi-Source Transfer Learning and Field Extraction for Cross-Domain Protocol Reverse Engineering |
| 14354 | MULTI-SPEAKER DOA ESTIMATION IN BINAURAL HEARING AIDS USING DEEP LEARNING AND SPEAKER COUNT FUSION |
| 13620 | Multi-Stage Spatial Imagination and Fusion for Immersive Visual Text-to-Speech |
| 1655 | MULTISYNERGY ATTACK: MULTIMODAL SYNERGISTIC ADVERSARIAL ATTACK FOR DEPTH ESTIMATION |
| 12721 | MULTI-TASK LEARNING FOR SPEECH QUALITY ASSESSMENT USING ASR-DERIVED ENTROPY FEATURES |
| 15167 | MULTITASK LEARNING WITH LEARNED TASK RELATIONSHIPS |
| 11888 | MULTI-TASK TRANSFORMER FOR EXPLAINABLE SPEECH DEEPFAKE DETECTION VIA FORMANT MODELING |
| 13651 | MULTI-TURN PHYSICS-INFORMED VISION-LANGUAGE MODEL FOR PHYSICS-GROUNDED ANOMALY DETECTION |
| 11203 | Multi-User Channel Estimation with One-Bit ADCs: A Semi-Blind Approach |
| 14123 | Multiverse Kernel Adaptive Filtering |
| 4975 | MULTI-VIEW CROWD COUNTING WITH SELF-SUPERVISED LEARNING |
| 10900 | MULTI-VIEW FREQUENCY ALIGNMENT AND STATE SPACE PARAMETER FUSION FOR LIGHTWEIGHT CAMOUFLAGED OBJECT DETECTION |
| 19145 | MULTIVIEW GRAPH LEARNING WITH CONSENSUS GRAPH |
| 13904 | MULTI-VIEW HIERARCHICAL HYPERGRAPH NEURAL NETWORK FOR AUTOMATIC STUTTERING DETECTION |
| 3902 | MULTI-VIEW HYPERGRAPH-BASED CONTRASTIVE LEARNING FOR KNOWLEDGE TRACING |
| 10054 | Multiview Progress Prediction of Robot Activities |
| 17628 | MULTI-VIEW SPECTRAL CLUSTERING WITH ADAPTIVE REGRESSION |
| 11997 | MUSETOK: SYMBOLIC MUSIC TOKENIZATION FOR GENERATION AND SEMANTIC UNDERSTANDING |
| 14249 | MUSHRA–1S: A SCALABLE AND SENSITIVE TEST APPROACH FOR EVALUATING TOP-TIER SPEECH PROCESSING SYSTEMS |
| 5471 | MusicDETR: A Position-aware Spectral Note Detection Model for Singing Transcription |
| 16106 | MUSIC-GUIDED POINT-SCATTERER ATTENTION FOR SAR SUPER-RESOLUTION |
| 15156 | MUSICRS: BENCHMARKING AUDIO-CENTRIC CONVERSATIONAL RECOMMENDATION |
| 7809 | Mutual Information Regularized Weight Ensembles with Moving Average for Generalizable Re-identification |
| 10296 | Mutual Information-Based Joint Phase and Rate Optimization for RIS-Aided Communication |
| 17901 | MVGCD: MULTI-VIEW GRAPH FUSION NETWORK FOR GROUP COGNITIVE DIAGNOSIS |
| 3383 | MVI: HIGH-RESOLUTION ROADSIDE VEHICLE IMAGING BY MMWAVE |
| 17777 | MVIR: MULTI-VIEW VISUAL-SEMANTIC REPRESENTATION FOR FAKE NEWS DETECTION |
| 6778 | MVP: MODELING VARIANTS OF PROMPTS FOR VISION-LANGUAGE MODELS |
| 13229 | MVP-DIFF: MULTI-VIEW PRIORS LEARNING FOR DIFFUSION-BASED SINGLE-VIEW 3D POINT CLOUD RECONSTRUCTION |
| 5328 | MWNET: MULTI-BRANCH WAVELET NETWORK FOR PHOTOVOLTAIC SEGMENTATION IN REMOTE SENSING IMAGES |
| 17274 | N2CDrive: Negotiate to Cooperate for Multi-Agent Autonomous Driving via Large Vision-Language Model |
| 11907 | NATIVETOK: NATIVE VISUAL TOKENIZATION FOR IMPROVED IMAGE GENERATION |
| 17900 | NATURAL LANGUAGE TO SPATIAL AUDIO PARAMETERS: LIGHTWEIGHT DETERMINISTIC RENDERING FOR CREATIVE AUTHORING |
| 11696 | Navigating Modality Uncertainty: Modality-Interaction Enhanced Mixture-of-Experts for Multi-Modal Knowledge Graph Completion |
| 9542 | NCF-TTS: ENHANCING FLOW MATCHING BASED TEXT-TO-SPEECH WITH NEIGHBORHOOD CONSISTENCY FLOW |
| 14379 | NEAR-FIELD CHANNEL ESTIMATION AND ENVIRONMENT MAPPING: LOCALIZATION OF REFLECTORS AND SCATTERERS |
| 8125 | NEAR-FIELD CHANNEL ESTIMATION AND LOCALIZATION WITH COPRIME ARRAYS |
| 16578 | NEAR-FIELD SWIPT USING MASSIVE PHASED MULTISINE ANTENNA ARRAY |
| 13944 | NEAR-FIELD WIDEBAND BEAMFORMING FOR ISAC VIA ALGORITHM UNROLLING |
| 2811 | NEAR-LIGHT COLOR PHOTOMETRIC STEREO FOR MONO-CHROMATICITY NON-LAMBERTIAN SURFACE |
| 5033 | NEAR-OPTIMAL ONLINE GAIN CONTROL FOR MODULO ADCS |
| 18322 | NEDGCN: HIGH-QUALITY SAMPLE SELECTION AND NOISE-TOLERANT GRAPH NEURAL NETWORK VIA DIFFERENTIATED EDGE WEIGHTING |
| 11906 | Negative-Aware Routing Network with Adversarial Knowledge Injection for Efficient LLM Adaptation |
| 9708 | NEGEV: NEGATIVE SAMPLE-AWARE FINE-TUNING FOR OPEN-VOCABULARY OBJECT DETECTION |
| 17516 | Neighborhood-Aware Self-Paced Graph Clustering for Robust Data Partitioning |
| 17077 | NEON: One-Shot Text-to-Video Tuning via Noise Latent Dynamics |
| 2114 | Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification |
| 16276 | NETWORK-CONTROLLED REPEATERS UNDER POWER AMPLIFIER NON-LINEARITIES |
| 13050 | NEURAL ACOUSTIC MULTIPOLE SPLATTING FOR ROOM IMPULSE RESPONSE SYNTHESIS |
| 19031 | Neural Audio Synthesis for Sound Effects: A Scope Review |
| 12233 | Neural Forward Filtering for Speaker-Image Separation |
| 5525 | NEURAL NETWORK-BASED TIME-FREQUENCY-BIN-WISE LINEAR COMBINATION OF BEAMFORMERS FOR UNDERDETERMINED TARGET SOURCE EXTRACTION |
| 19011 | Neural Optimisation of Fixed Beamformers With Flexible Geometric Constraints |
| 6150 | Neural personal sound zones with flexible bright zone control |
| 11616 | NEURAL VARIABLE SPAN FILTERS FOR INTERPRETABLE MULTI-CHANNEL SPEECH ENHANCEMENT |
| 9591 | NEURERASE: SELECTIVE DEACTIVATION OF NEURONS FOR ERASING CONCEPTS IN DIFFUSION MODELS |
| 3137 | NEUROCAPSNET: A NEURO-INSPIRED CAPSULE NETWORK FOR MULTI-DIRECTION AUDITORY SPATIAL ATTENTION DETECTION |
| 2953 | NEUROHASH: A HYPERDIMENSIONAL NEURO-SYMBOLIC FRAMEWORK FOR SPATIALLY-AWARE IMAGE HASHING AND RETRIEVAL |
| 2388 | NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition |
| 10005 | NEURO-SYMBOLIC REACHABILITY REASONING FOR PHYSICALLY GROUNDED EMBODIED QUESTION ANSWERING |
| 14080 | nGPT as a Scalable Architecture for Speech Recognition and Translation |
| 13511 | NIFTY: A NON-LOCAL IMAGE FLOW MATCHING FOR TEXTURE SYNTHESIS |
| 14999 | NLDSI-BWE: NON LINEAR DYNAMICAL SYSTEMS-INSPIRED MULTI RESOLUTION DISCRIMINATORS FOR SPEECH BANDWIDTH EXTENSION |
| 10449 | NMGE: Nested Multi-Granularity Expert Groups for Complexity-Aware Routing in Multilingual Translation |
| 16310 | NN-BASED IN-LOOP FILTERING FOR ENHANCED COMPRESSION BEYOND VVC |
| 14234 | No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation |
| 6658 | NO VERIFIABLE REWARD FOR PROSODY: TOWARD PREFERENCE-GUIDED PROSODY LEARNING IN TTS |
| 14981 | No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting |
| 4240 | NOISE-ROBUST AV-ASR USING VISUAL FEATURES BOTH IN THE WHISPER ENCODER AND DECODER |
| 15590 | Noise-Robust Cross-Modal Hashing with Contrastive Weighting |
| 19084 | Noise-Robust Speaker Verification with Attenuated Speech Restoration and Consistency Training |
| 15714 | Noise-Robust Video Salient Object Detection in Spike Streams |
| 10215 | NOISE-TO-NOTES: DIFFUSION-BASED GENERATION AND REFINEMENT FOR AUTOMATIC DRUM TRANSCRIPTION |
| 9382 | NON-ASYMPTOTIC PERFORMANCE ANALYSIS OF DOA ESTIMATION BASED ON REAL-VALUED ROOT-MUSIC |
| 9516 | NON-BAYESIAN SOCIAL LEARNING FOR MODELING INTERACTING LARGE LANGUAGE MODEL AGENTS |
| 9653 | Non-Coherent Multi-Antenna Reception of Ambient Backscatter with Canonical Correlation Analysis |
| 10116 | NONCONVEX REGULARIZATION FOR FEATURE SELECTION IN REINFORCEMENT LEARNING |
| 17697 | NON-HOMOGENEOUS HAZE REMOVAL BASED ON DEEP UNFOLDING NETWORK FOR REMOTE SENSING IMAGES |
| 16511 | NON-LINE-OF-SIGHT VEHICLE DETECTION VIA AUDIO-VISUAL FUSION |
| 17181 | NON-UNIFORM HAZE REMOVAL FOR REMOTE SENSING IMAGE BASED ON WAVELET-DOMAIN HAZE-AWARE COMPLEMENTARY LEARNING |
| 3544 | NORD-PARL-TTS: FINNISH AND SWEDISH TTS DATASET FROM PARLIAMENT SPEECH |
| 4852 | NO-REFERENCE NIGHT-TIME IMAGE QUALITY ASSESSMENT VIA SELF-SUPERVISED AND META-LEARNING |
| 2860 | NORMALIGN: FEATURE NORM REGULARIZATION FOR CONFIDENCE CALIBRATION IN GRAPH NEURAL NETWORKS |
| 17548 | NOT ALL WEIGHT VECTORS ARE NEEDED: COVARIANCE-BASED VECTOR SELECTION TUNING FOR LARGE LANGUAGE MODELS |
| 16592 | NOT JUST DETECTION: ALIGNED-DRIVEN PURIFICATION OF INDIRECT PROMPT INJECTION FOR RELIABLE AGENT INTERACTION |
| 2445 | NRRN: NEWS REPRESENTATION RESTORATION NETWORK FOR MULTIMODAL FAKE NEWS DETECTION WITH MULTIMODAL COMPRESSION AND CAPSULE FUSION |
| 4809 | NSC-SL: A Bandwidth-Aware Neural Subspace Compression for Communication-Efficient Split Learning |
| 4446 | Nuclear Diffusion Models for Low-Rank Background Suppression in Videos |
| 13195 | NUMERICAL SPECTRUM LINKING: IDENTIFICATION OF GOVERNING PDE VIA KOOPMAN-CHEBYSHEV APPROXIMATION |
| 3688 | Object-aware Restoration Diffusion: Progressive and Interactive Framework for Blind Compressed Image Restoration |
| 15371 | OBSTRUCTIVE SLEEP APNEA ENDOTYPE PREDICTION DURING WAKEFULNESS USING VOICE BIOMARKERS |
| 5489 | OCCLUSION AWARE GRAPH TRANSFOREMR FOR 3D MULTI-OBJECT TRACKING |
| 3518 | Occlusion-Aware Triplet Learning for Robust Pedestrian ReID: Beyond Single-ID Labels and Data Augmentation |
| 4242 | OCCLUSION-ROBUST HUMAN RENDERING BASED ON TRI-PLANE RESTORATION |
| 18257 | OCR-Enhanced Multimodal ASR Can Read While Listening |
| 10757 | OCTIP: COMPACT GEOGRAPHY-AWARE IP EMBEDDINGS FOR NEAREST-NEIGHBOR IP SIGNAL RETRIEVAL |
| 7740 | OCTOPUS: ENHANCING DISTRIBUTIONAL REINFORCEMENT LEARNING THROUGH REGULARIZATION |
| 2813 | OctreeSplatting: Region-Aware Gaussian Densification via Dynamically Managed Octree |
| 9838 | ODSA: ONLINE DIFFERENTIABLE STRUCTURE ADAPTATION FOR TINY TCN ON IOT TIME SERIES |
| 9145 | OFF-THE-GRID MULTI-PITCH ESTIMATION USING OPTIMAL TRANSPORT |
| 6370 | OFHIE: OVERVIEW-THEN-FOCUS HIERARCHICAL INTERACTION ENCODING FOR VOXEL-BASED 3D OBJECT DETECTION |
| 9374 | OF-SemWat: HIGH-PAYLOAD TEXT EMBEDDING FOR SEMANTIC WATERMARKING OF AI-GENERATED IMAGES WITH ARBITRARY SIZE |
| 16481 | OG-PCL: EFFICIENT SPARSE RADAR POINT CLOUD PROCESSING FOR HUMAN ACTIVITY RECOGNITION |
| 14654 | OGRA-YOLOv8: Overlapping Gridded and Rhombus Attention for Underwater Object Detection |
| 14302 | OILSAM2: MEMORY-AUGMENTED SAM2 FOR SCALABLE SAR OIL SPILL DETECTION |
| 17366 | OKAN: ORTHOGONAL KOLMOGOROV-ARNOLD NETWORKS FOR ACCURATE AND INTERPRETABLE CAMERA POSE REGRESSION |
| 4139 | OMNI-AVSR: TOWARDS UNIFIED MULTIMODAL SPEECH RECOGNITION WITH LARGE LANGUAGE MODELS |
| 9247 | ON DEEPFAKE VOICE DETECTION - IT'S ALL IN THE PRESENTATION |
| 9603 | On Multiangle Discrete Fractional Periodic Transforms |
| 15602 | ON OPTIMIZATION OF POLES FOR ADAPTIVE FOURIER DECOMPOSITION-INSPIRED NEURAL LAYERS |
| 16363 | ON RANDOM POOLING OF LARGE-SCALE SCREENING WITH EXTREMELY SPARSE INFECTIONS |
| 11778 | ON THE DESIGN OF EFFICIENT NEURAL METHODS FOR GEOMETRY-AGNOSTIC MULTICHANNEL SPEECH ENHANCEMENT |
| 12983 | ON THE DESIGN OF HIGHER-ORDER TIME-INTENSITY MICROPHONE ARRAYS FOR PANORAMIC AUDIO RECORDING AND REPRODUCTION |
| 10663 | ON THE DOPPLER EFFECT AND COHERENCE TIME OF NEAR-FIELD SCATTERING-FREE CHANNELS |
| 5191 | On the Foundational Condition for Non-contact Vibration Measurement using Phase-based Microwave Interferometry |
| 8875 | On the Importance of a Multi-Scale Calibration for Quantization |
| 9776 | On the Optimality of Rate Balancing for Max-Min Fair Multicasting |
| 8167 | ON THE ROLE OF EXTRINSIC VALUE EXCHANGE IN EXPECTATION PROPAGATION FOR CODED MIMO SYSTEMS |
| 14444 | ON THE ROLE OF TRAINABLE PARAMETERS IN DIFFERENTIABLE FEEDBACK DELAY NETWORKS |
| 2069 | On the Security of RIS-Aided Wireless Communication Systems: RIS Codebook Attack and Camouflage Solution |
| 15348 | On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy |
| 1955 | ON THE SHOULDERS OF GIANTS: KNOWLEDGE-DRIVEN SELF-ADAPTIVE NETWORK FOR DISTILLATION |
| 15581 | ONCORAG: A Knowledge Graph-Augmented RAG Framework For Mechanism-Aware Oncology Recommendations |
| 14148 | ONE MODEL--THREE TASKS: DISCOVERING A SHARED WINNING TICKET FOR LOW-COMPLEXITY AUDIO INTELLIGENCE |
| 12733 | One Timestep Spiking Actor Network with Adaptive Global-connected Encoding and Threshold Learning |
| 13664 | ONE-BIT QUANTIZED PRECODER CHARACTERIZATION AND PARAMETER OPTIMIZATION IN MASSIVE MIMO SYSTEMS |
| 4505 | ONE-SHOT SEQUENTIAL FEDERATED LEARNING WITH DUAL-DISTILLATION |
| 3898 | ONE-STAGE SEMI-SUPERVISED SEMANTIC SEGMENTATION FOR ANOMALY DETECTION VIA CONSISTENCY REGULARIZATION AND STUDENT-TEACHER MODELS |
| 17865 | One-step Generative Distillation |
| 6536 | ONLINE CONTINUAL CATEGORY LEARNING WITH INVARIANT PROTOTYPES |
| 13860 | ONLINE CURSIVE HANDWRITING GENERATION USING TRACE TRANSFORMATION AND SYMBOL-INDEPENDENT POINT CLASSIFICATION MODEL |
| 6472 | ONLINE NEURAL FUSION OF DISTORTIONLESS DIFFERENTIAL BEAMFORMERS FOR ROBUST SPEECH ENHANCEMENT |
| 17202 | ONLINE REGISTER FOR DUAL-MODE SELF-SUPERVISED SPEECH MODELS: MITIGATING THE LACK OF FUTURE CONTEXT |
| 7831 | Online Sensor Selection for Object Detection via Bayesian Risk Minimization |
| 18893 | ONLINE SIMPLEX-STRUCTURED MATRIX FACTORIZATION |
| 17296 | ONLINE TEST-TIME ADAPTATION FOR SHADOW SEGMENTATION |
| 16186 | OPENHIER: AN OPEN-VOCABULARY HIERARCHICAL IMAGE CLASSIFICATION FRAMEWORK |
| 10002 | OPINION CONSENSUS FORMATION AMONG NETWORKED LARGE LANGUAGE MODELS |
| 3423 | OPINION-TREE-AWARE PROMPT TUNING FOR ASPECT SENTIMENT QUADRUPLE PREDICTION |
| 18884 | OPTIMAL DETECTION FOR A PROSPECT THEORETIC VARIANT OF THE NEYMAN-PEARSON PROBLEM |
| 14088 | OPTIMAL PLACEMENT OF MOVABLE ANTENNAS FOR ANGLE-OF-DEPARTURE ESTIMATION UNDER USER LOCATION UNCERTAINTY |
| 13789 | Optimal QAM Constellation for Over-the-Air Computation in the Presence of Heavy-Tailed Channel Noise |
| 9503 | OPTIMAL QUASI-CLIQUE DETECTION VIA LASRY-LIONS DOUBLE ENVELOPES |
| 12593 | OPTIMAL SENSOR PLACEMENT UNDER CONSTRAINTS FOR TARGET LOCALIZATION USING DRSS MEASUREMENTS |
| 9455 | OPTIMAL TRANSPORT BASED UNSUPERVISED RESTORATION LEARNING EXPLOITING DEGRADATION SPARSITY |
| 17009 | OPTIMIZED END-TO-END CODING WORKFLOW FOR IMAGE STORAGE AND RETRIEVAL USING JPEG DNA |
| 18895 | OPTIMIZED MULTISTAGE DECIMATION BASED ON OPTIMAL FACTORIZATION OF DECIMATION RATIO |
| 2534 | OPTIMIZED PARTITIONING ACCELERATION FOR VVC INTER CODING |
| 16839 | OPTIMIZING AUTOMATED JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS VIA EXPERIENCE ACCUMULATION |
| 14995 | OPTIMIZING DOMAIN-ADAPTIVE SELF-SUPERVISED LEARNING FOR CLINICAL VOICE-BASED DISEASE CLASSIFICATION |
| 14239 | OPTIMIZING SPEECH LANGUAGE MODELS FOR ACOUSTIC CONSISTENCY |
| 1650 | OPTIMIZING THE SIGNAL-TO-NOISE RATIO OF CONTEXTUAL INFORMATION FOR IMPROVED ATTRIBUTED TEXT GENERATION |
| 6233 | OptimUS: Optimization-based Unlimited Sampling Algorithm |
| 4158 | OR-DETR: Exploring Explicit Occlusion Relation Prior for Crowded Pedestrian Detection |
| 17575 | ORSC: OBJECT-AWARE REINFORCEMENT WITH SEMANTIC CONSISTENCY FOR HALLUCINATION MITIGATION IN MLLMS |
| 8876 | ORTHOGONAL APPROXIMATE MESSAGE PASSING ALGORITHMS FOR RECTANGULAR SPIKED MATRIX MODELS WITH ROTATIONALLY INVARIANT NOISE |
| 6285 | ORTHOGONAL APPROXIMATE MESSAGE-PASSING FOR SUBLINEAR SPARSITY |
| 3968 | Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation |
| 18093 | ORTHOVAD: WEAKLY SUPERVISED VIDEO ANOMALY DETECTION VIA PROTOTYPE ORTHOGONALITY LEARNING |
| 12617 | OSG: TRAINING-FREE OBJECTNESS, SEMANTICS, AND GEOMETRY FUSION FOR ZERO-SHOT REFERRING EXPRESSION COMPREHENSION |
| 3466 | OTD: DIFFUSION ON OT-STRUCTURED POINT CLOUDS FOR 3D SHAPE GENERATION |
| 17204 | OUT-OF-DISTRIBUTION DETECTION BASED ON TOTAL VARIATION ESTIMATION |
| 9078 | OVERCOMING BINNING DILEMMA: CUMULATIVE CALIBRATION FOR DOUBLY ROBUST LEARNING IN DEBIASED RECOMMENDATION |
| 4957 | Overconfidence in Investment Decisions: A Filtering and Control Framework |
| 9536 | OVID: Text-Guided Open-Vocabulary Dense Object Counting and Localization |
| 2296 | OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech |
| 3245 | P2CL: Prototype-Constrained Consistent Learning -Toward Controllable and Consistent Transfer |
| 5399 | PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition |
| 4172 | PADAM: Perceptual Audio Defect Assessment Model |
| 5063 | PADUM: PATCH-BASED DUAL-STREAM NETWORK WITH CNN AND MAMBA FOR TIME SERIES FORECASTING |
| 3951 | PAGE: A PHYSICS-AWARE GENERATIVE NETWORK FOR PRESSURE MAP SYNTHESIS |
| 5486 | PAGM:A PYRAMID ALIGNMENT AND ID GRAPH MATCHING MODEL FOR VIDEO OBJECT RE-IDENTIFICATION |
| 14591 | PAGS: PRIORITY-ADAPTIVE GAUSSIAN SPLATTING FOR DYNAMIC DRIVING SCENES |
| 2802 | PaintFlow: Stage-Aware Temporal Modeling for Text-to-Video Synthesis of Painting Processes |
| 16800 | Pairing Denoising Enhanced Hash-aware Distillation for Unsupervised Cross-modal Retrieval |
| 14979 | PAIRWISE DISTORTION DISTRIBUTION FOR COMPRESSION AND QUANTIZATION |
| 15506 | PALETTE: A BACKGROUND-ROBUST FINGERPRINTING ATTACK ON AIR INTERFACE DESIGNED FOR CROSS-ENVIRONMENT CHALLENGES |
| 5049 | PAM-COAT: PHYSICS-AWARE MULTIMODAL COATNET FOR IMBALANCED PULSAR CANDIDATE CLASSIFICATION |
| 6909 | PAMNet: Patch-Adaptive Mixing Network for Multivariate Time Series Forecasting |
| 8442 | PANKRAG: ENHANCING GRAPH RETRIEVAL VIA GLOBALLY AWARE QUERY RESOLUTION AND DEPENDENCY-AWARE RERANKING MECHANISM |
| 11461 | PanoIndoor and PanoOutdoor: Towards Comprehensive Datasets for Panoramic Instance Segmentation |
| 11691 | PAPER SUMMARY ATTACK: JAILBREAKING LLMS THROUGH LLM SAFETY PAPERS |
| 14223 | PAPR ANALYSIS OF RPDMA AND ORPDMA WITH PRIME POWER SUBCARRIERS |
| 3936 | PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models |
| 5351 | ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning |
| 16594 | PARAEDIT: UNIFYING PARALLEL TRANSPORT AND GENERATIVE FLOWS FOR HIGH-FIDELITY IMAGE EDITING |
| 5412 | PARAGSE: PARALLEL GENERATIVE SPEECH ENHANCEMENT WITH GROUP-VECTOR-QUANTIZATION-BASED NEURAL SPEECH CODEC |
| 14158 | PARALINGUISTIC EMOTION-AWARE VALIDATION TIMING DETECTION IN JAPANESE EMPATHETIC SPOKEN DIALOGUE |
| 5572 | Parallax-Aware Spatial Transformer: Fusing Physics and Learning for Terahertz Near-Field Localization |
| 6834 | PARALLEL DELAY-DOPPLER ESTIMATION VIA ORDER-REVERSED TWO-STAGE PRONY METHOD |
| 6332 | Parallel Randomized Coordinate Descent for Matrix Completion with Convergence Guarantees |
| 16119 | PARAMETER ADAPTATION IN HIDDEN MARKOV MODELS WITH EQUAL EXIT PROBABILITIES |
| 13646 | Parameter Localization and Relearning for Safety Disalignment in Large Language Models |
| 18876 | Parameter optimisation for a physical model of the vocal system |
| 11164 | PARAMETER-FREE MIXTURE OF EXPERTS FOR BLACK-BOX PROMPT TUNING |
| 7671 | Parametric Channel Estimation as an Enabler for RIS-Assisted Sensing |
| 9899 | PARAMETRIC MODELING AND LOCALIZATION OF SPATIALLY DISTRIBUTED TARGETS IN OFDM-MIMO RADAR SYSTEMS |
| 4365 | PARAMETRIC NEURAL AMP MODELING WITH ACTIVE LEARNING |
| 12993 | PARSIMONY, ORDER AND BALANCE: PRINCIPLES FOR COMPRESSING MIXTURE-OF-EXPERTS MODELS |
| 14347 | PART-CENTRIC DIFFUSION POLICY WITH VISION LANGUAGE MODEL FOR GENERALIZABLE ARTICULATED OBJECT MANIPULATION |
| 13448 | PAS-SE: PERSONALIZED AUXILIARY-SENSOR SPEECH ENHANCEMENT FOR VOICE PICKUP IN HEARABLES |
| 11726 | PASSEG: A MULTI-SCALE SEMANTIC SEGMENTATION FRAMEWORK FOR COMPLEX UAV IMAGERY IN PLATEAU SCIENTIFIC EXPEDITIONS |
| 15723 | PassMoE-P: Enhancing Password Guessing Using Large Language Models with Pattern-Specialized Mixture-of-Experts |
| 8396 | PAST AS PRIOR: REWEIGHTED PROXY GUIDANCE FOR STABLE ADVERSARIAL TRAINING |
| 7065 | PASTA-YOLO: AN ENHANCED DETECTOR FOR SMALL OBJECT DETECTION IN UAV IMAGERY |
| 13461 | PATCH FIRST, GENERATE THEN: A DEBIASED DIFFUSION MODEL FOR MULTIVARIATE TIME SERIES GENERATION |
| 15324 | PATCH-AWARE DECOMPOSITION AND DYNAMIC FUSION NETWORK FOR MULTIVARIATE TIME SERIES FORECASTING |
| 16079 | PATCH-AWARE-BASED NO-REFERENCE IMAGE QUALITY ASSESSMENT VIA MULTI-FACTOR CONTRASTIVE LEARNING |
| 1660 | Patch-based Active Source-Free Domain Adaptation for Annotation-Efficient Medical Image Segmentation |
| 14904 | PATHFINDER: MCTS AND LLM FEEDBACK-BASED PATH SELECTION FOR MULTI-HOP QUESTION ANSWERING |
| 8843 | PC2-MTO: A PRINCIPAL COMPONENT CLUSTERING APPROACH FOR MULTI-TASK OFFLOADING OPTIMIZATION IN IOV |
| 15308 | PC-SSL: A PREDICTIVE CODING-BASED SELF-SUPERVISED LEARNING FRAMEWORK FOR EEG EMOTION RECOGNITION |
| 4838 | PDConv: Priori Perceptual Dilated Convolution |
| 17061 | PD-Reweight for UAV Aerial Burrow Detection: A Plug-in Point-Distance Module for Rebalancing Sparse Tiny Objects |
| 13709 | PEEKING INTO THE FUTURE FOR CONTEXTUAL BIASING |
| 19027 | PEERRTF: ROBUST MVDR BEAMFORMING USING GRAPH CONVOLUTIONAL NETWORK |
| 2060 | PE-LORA: PARAMETER-EFFICIENT BAYESIAN LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS |
| 6155 | PENPLAN-PDDL: A MULTI-AGENT FRAMEWORK FOR AUTOMATED PENETRATION TESTING PLANNING WITH PDDL-BASED VERIFICATION |
| 14349 | PERCEPTION-GUIDED DIFFUSION FUSION WITH GRADIENT DESCENT POSTERIOR MAXIMIZATION |
| 6927 | PERCEPTUAL LOSS OPTIMIZED HRTF PERSONALIZATION IN SPHERICAL HARMONIC DOMAIN |
| 1379 | Perceptual Quality Assessment for Stylized Talking Heads |
| 12134 | PERCEPTUAL QUALITY OPTIMIZATION OF IMAGE SUPER-RESOLUTION |
| 18869 | PERFORMANCE ANALYSIS OF LINEAR DETECTION UNDER NOISE-DEPENDENT FAST-FADING CHANNELS |
| 7600 | Performance Analysis of Near-Field RIS-Assisted Networks with Minimum Additive Path-Loss Association |
| 6164 | Performance Bounds On Parameter Estimation for Relative Phases in Multi-Agent Wireless Systems |
| 15936 | PERFORMANCE OF JOINT TDOA AND FDOA ESTIMATION IN THE LARGE SATELLITE CONSTELLATION LIMIT |
| 5528 | PERFORMANCE OF REPEATER-ASSISTED MASSIVE MIMO SYSTEMS: TDD VS FDD |
| 6071 | PERFORMANCE-GUIDED REINFORCED ACTIVE LEARNING FOR OBJECT DETECTION |
| 9310 | PERFORMSINGER: MULTIMODAL SINGING VOICE SYNTHESIS LEVERAGING SYNCHRONIZED LIP CUES FROM SINGING PERFORMANCE VIDEOS |
| 17399 | Persona Drift Detection in Role-Playing Agents: A Multi-Dimensional Consistency Framework |
| 4252 | PERSONAAGENT WITH GRAPHRAG: COMMUNITY-AWARE KNOWLEDGE GRAPHS FOR PERSONALIZED LLM |
| 9555 | PERSONALIZED FEDERATED LEARNING BASED ON CLUSTERING KNOWLEDGE PROTOTYPE ALIGNMENT AND DISTRIBUTION-AWARE CONSISTENCY |
| 4659 | PERSONALIZED FEDERATED LEARNING VIA DECOUPLED VISUAL PROMPTS AND ADAPTIVE CLASSIFIER FUSION |
| 6214 | PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONAL SPEECH MODELS |
| 13576 | PERSUASION SHOULD BE DOUBLE-BLIND: A MULTI-DOMAIN DIALOGUE DATASET WITH FAITHFULNESS BASED ON CAUSAL THEORY OF MIND |
| 6524 | PERTURB TO PROTECT: LEVERAGING TEST-TIME DEFENSIVE PERTURBATIONS AGAINST ADVERSARIAL ATTACKS |
| 3842 | PERTURBATION-RESISTANT TRANSMIT BEAMFORMING |
| 15177 | PE-SLEUTH: PROGRAM-LEVEL SEMANTICS AND STATIC FEATURE FUSION FOR INTERPRETABLE RANSOMWARE DETECTION WITH LLMS |
| 11725 | PFE-NET: PROXY-GUIDED FREQUENCY ENHANCEMENT FOR CAMOUFLAGED OBJECT DETECTION |
| 15942 | PFLUXTTS: HYBRID FLOW-MATCHING TTS WITH ROBUST CROSS-LINGUAL VOICE CLONING AND INFERENCE-TIME MODEL FUSION |
| 4266 | PGDiff: Prior-Consistency Guided Diffusion for Unsupervised Image Restoration under Adverse Weather Conditions |
| 13500 | PGFed: Prompt-Guided Distillation for Personalized Federated Learning with Model Heterogeneity |
| 16508 | PG-SE: PREDICTIVE ACCELERATION AND CORRECTION FOR GENERATIVE SPEECH ENHANCEMENT |
| 12175 | PG-SELECT: PRIOR-GUIDED FEATURE SELECTION FOR UNSUPERVISED OBJECT DISCOVERY IN DUNHUANG MURALS |
| 3072 | PGSENET: PRIOR-GUIDED SPECTRUM ENHANCEMENT NETWORK |
| 9129 | Phase Consistency Enhanced Complex-Valued Neural Network for Radio Frequency Fingerprint Identification |
| 18188 | PHASE OPTIMIZATION DRIVEN WAVEFORM DESIGN WITH GOOD CORRELATION AND INFORMATION EMBEDDING PERFORMANCES FOR JOINT RADAR-COMMUNICATIONS |
| 14012 | PHASE-AWARE STATE SPACE MODELING WITH FINE-GRAINED IDENTITY DISENTANGLEMENT FOR DYNAMIC FACIAL EXPRESSION RECOGNITION |
| 10725 | PhaseFormer: Capturing Cross-Channel Phase and Trend Dynamics for Time Series Forecasting |
| 5208 | PHASEMARK: A POST-HOC, OPTIMIZATION-FREE WATERMARKING OF AI-GENERATED IMAGES IN THE LATENT FREQUENCY DOMAIN |
| 14471 | PHASE-ONLY POSITIONING IN DISTRIBUTED MIMO UNDER PHASE IMPAIRMENTS: AP SELECTION USING DEEP LEARNING |
| 12533 | PHASE-RETRIEVAL-BASED PHYSICS-INFORMED NEURAL NETWORKS FOR ACOUSTIC MAGNITUDE FIELD RECONSTRUCTION |
| 9998 | Phase-Space Signal Processing of Acoustic Data for Advanced Manufacturing in-situ Monitoring |
| 14061 | PHOENIXDSR: PHONEME-GUIDED AND LLM-ENHANCED DYSARTHRIC SPEECH RECOGNITION |
| 4343 | PHOMO: PATCH HOMOGENEITY FOR NO-REFERENCE INPAINTING ASSESSMENT AND VERIFICATION |
| 17346 | PHONEME-LEVEL VISUAL SPEECH RECOGNITION VIA POINT-VISUAL FUSION AND LANGUAGE MODEL RECONSTRUCTION |
| 16033 | PHONOLOGICAL TOKENIZER: PROSODY-AWARE PHONETIC TOKEN VIA MULTI-OBJECTIVE FINE-TUNING WITH DIFFERENTIABLE K-MEANS |
| 12151 | PHOTO SHIELDING: ROBUST PROTECTION AGAINST AI MANIPULATION OF IMAGES |
| 13113 | PHOTOMETRIC STEREO USING GAUSSIAN SPLATTING AND INVERSE RENDERING |
| 10482 | PHPTRIGGER: MULTI-ENGINE ASSISTED VULNERABILITY AUDITING AND VERIFICATION IN PHP APPLICATIONS |
| 5169 | PHRASED: Phrase Dictionary Biasing for Speech Translation |
| 14897 | PHYS-DIFF: A PHYSICS-INSPIRED LATENT DIFFUSION MODEL FOR TROPICAL CYCLONE FORECASTING |
| 3959 | PHYSHDR: WHEN LIGHTING MEETS MATERIALS AND SCENE GEOMETRY IN HDR RECONSTRUCTION |
| 8246 | PHYSICALLY DEPLOYABLE 3D OMNIDIRECTIONAL INFRARED ADVERSARIAL PATCHES |
| 3586 | PHYSICS AND DATA DRIVEN TRANSFORMER-MAMBA FRAMEWORK FOR FLOW FIELD |
| 10740 | Physics Informed Generative Models for Magnetic Field Images |
| 12281 | PHYSICS-AWARE NOVEL-VIEW ACOUSTIC SYNTHESIS WITH VISION-LANGUAGE PRIORS AND 3D ACOUSTIC ENVIRONMENT MODELING |
| 15269 | PHYSICS-BASED CHANNEL TRANSFORMATION FOR WIRELESS CONFIGURATIONS |
| 9769 | Physics-Encoded Learned Maximum Likelihood Estimation for Unknown Measurement Distribution |
| 5609 | PHYSICS-GUIDED DIFFUSION MODELS FOR ANCIENT BAMBOO SCRIPT RESTORATION |
| 12250 | Physics-Guided Learning with Hard-Soft Constraints for Urban Wind Assessment |
| 16956 | PHYSICS-INFORMED ANOMALY DETECTION OF TERRAIN MATERIAL CHANGE IN RADAR IMAGERY |
| 10809 | PHYSICS-INFORMED DIFFUSION GENERATION FOR GEOMAGNETIC MAP INTERPOLATION |
| 9968 | PHYSICS-INFORMED GNN FOR MEDIUM-HIGH VOLTAGE AC POWER FLOW WITH EDGE-AWARE ATTENTION AND LINE SEARCH CORRECTION OPERATOR |
| 7920 | PHYSICS-INFORMED HIERARCHICAL BAYESIAN MODELING FOR ANGLE-OF-ARRIVAL ESTIMATION WITH COMMERCIAL OFF-THE-SHELF RFID |
| 6815 | PHYSICS-INFORMED LEARNING OF NEURAL SCATTERING FIELDS TOWARDS MEASUREMENT-FREE MESH-TO-HRTF ESTIMATION |
| 18975 | Physics-Informed Neural Network-Driven Sparse Field Discretization Method for Near-Field Acoustic Holography |
| 9877 | PHYSICS-INFORMED NEURAL NETWORKS FOR OCEAN ACOUSTIC FIELD RECONSTRUCTION AND SOURCE LOCALIZATION |
| 8496 | PHYSICS-INFORMED VIDEO DIFFUSION FOR SHALLOW WATER EQUATIONS |
| 15266 | PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation |
| 15193 | PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC |
| 11818 | PICFORMER: PERCEPTION-INFERENCE-CONSISTENCY LOOP FOR OCCLUDED 3D POSE ESTIMATION |
| 1815 | PICOAUDIO2: TEMPORAL CONTROLLABLE TEXT-TO-AUDIO GENERATION WITH NATURAL LANGUAGE DESCRIPTION |
| 1534 | PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters |
| 15704 | PI-GNN: A Physics-Informed Graph Neural Network for Spatio-Temporal Diffusion Prediction |
| 3975 | PILED: PHYSICS-INFORMED LOW-LIGHT ENHANCEMENT AND DEBLURRING |
| 17310 | PINA: PROMPT INJECTION ATTACK AGAINST NAVIGATION AGENTS |
| 5551 | PINDEFECTNET: A TRANSFORMER FRAMEWORK FOR DETECTING DEFECTS IN MILLIMETER-SCALE POWER LINE LOCKING PINS |
| 4882 | PITH-Former: A Hierarchical Motion Prediction Framework Guided by Latent Goals and Driving Habits |
| 3838 | PI-TPDNET: A PHYSICS-INFORMED TREND-PERIOD DECOMPOSITION NEURAL NETWORK FOR AIR QUALITY PREDICTION |
| 9668 | PIXEL-PATCH GRAPH REGULARIZED GROUP SPARSE REPRESENTATION FOR SINGLE-IMAGE DENOISING |
| 9373 | PKW: PUBLIC KEY WATERMARKING FOR DEEP NEURAL NETWORK WITH FISHER-GUIDED EMBEDDING |
| 12526 | PLACE ANYWHERE: LEARNING SPATIAL REASONING FOR OCCLUSION-AWARE IMAGE COMPOSITION |
| 16834 | PLA-LOSS: POTENTIAL LABEL-AWARE TRAINING FOR TOP-K CLASSIFICATION |
| 9725 | Planning-oriented Adversarial Attack against End-to-End Autonomous Driving Systems |
| 10235 | PLANPERCEIVER: A UNIFIED FRAMEWORK FOR MULTI-LEVEL SCENE INFORMATION FUSION IN AUTONOMOUS DRIVING PLANNING |
| 3856 | PLNET: AN EFFICIENT PARAMETER AGGREGATION NETWORK FOR MULTIMODAL WHOLE HEART SEGMENTATION |
| 10674 | PLPP: PROMPT LEARNING WITH PERPLEXITY IS SELF-DISTILLATION FOR VISION-LANGUAGE MODELS |
| 14244 | PLUG-AND-PLAY DIFFUSION PRIORS FOR MULTILOOK COHERENT IMAGING WITH PROVABLE GUARANTEES |
| 6260 | PLUG-AND-PLAY EMOTION GRAPHS FOR COMPOSITIONAL PROMPTING IN ZERO-SHOT SPEECH EMOTION RECOGNITION |
| 14319 | PLUG-AND-PLAY FORWARD BACKWARD ALGORITHM TO RESTORE LANDSAT IMAGES: A PRELIMINARY STEP TO UNCOVER THE HISTORY OF SURFACE WATERS |
| 4075 | PLUG-AND-PLAY ROBUST VISION ENCODERS FOR MULTI-MODAL LARGE LANGUAGE MODELS VIA FULLY MULTI-MODAL ADVERSARIAL FINETUNING |
| 1969 | PLUG-AND-PLAY TEMPORAL FOURIER EMBEDDING FOR ROBUST LONG-HORIZON TRAFFIC FLOW FORECASTING |
| 12853 | PMMD: A POSE-GUIDED MULTI-VIEW MULTI-MODAL DIFFUSION FOR PERSON GENERATION |
| 10901 | P-MOE: PROXY-GUIDED MIXTURE-OF-EXPERTS NETWORK FOR FACE FORGERY DETECTION |
| 12598 | PMTNET-MTS: Control-Aware Multi-Step Forecasting For Rotary Kiln Tail Temperature |
| 12917 | PMW-DEHAZE: MULTI-SCALE WAVELET FUSION FOR IMAGE DEHAZING VIA MAMBA FRAMEWORK |
| 17455 | PocketDVDNet: Realtime Video Denoising for Real Camera Noise |
| 11319 | POEMCRAFT: MULTIMODAL POETRY GENERATION WITH PROSODY-GUIDED REFINEMENT AND BIASED ATTENTION |
| 10411 | Point-Pillar Feature Representation via Fine-Grained Fusion Network for 3D Object Detection |
| 17391 | POISONCRAFT: PRACTICAL POISONING OF RETRIEVAL-AUGMENTED GENERATIONFOR LARGE LANGUAGE MPDELS |
| 4290 | Polaris: Detecting Advanced Persistent Threat on Provenance Graphs via Siamese Masked Graph Representation |
| 5043 | POLARIZATION FINGERPRINT IDENTIFICATION METHOD BASED ON ARECA-NET |
| 12927 | POLARIZATION FINGERPRINT IDENTIFICATION VIA CLUSTER-DRIVEN SUB-CLASSIFIERS ROUTING |
| 14051 | POLYNOMIAL MIXING FOR EFFICIENT SELF-SUPERVISED SPEECH ENCODERS |
| 3195 | Poly-SVC: Polyphonic-Aware Singing Voice Conversion with Harmonic Modeling |
| 13203 | POSE-FREE INFANT GENERAL MOVEMENT ASSESSMENT USING BODY CONTOURS |
| 13376 | Position-Aware Self-supervised Representation Learning for Cross-mode Radar Signal Recognition |
| 4167 | POSITION-INVARIANT FINE-TUNING OF SPEECH ENHANCEMENT MODELS WITH SELF-SUPERVISED SPEECH REPRESENTATIONS |
| 11428 | POSITIVE–AND–MULTI-NEGATIVE LEARNING WITH ADAPTIVE REWEIGHTING FOR NOISY LABELS |
| 17374 | POST-HOC FAIRNESS ADJUSTMENT VIA COUNTERFACTUAL SENSITIVE ATTRIBUTES LEARNING |
| 14164 | POWER CONSUMPTION OF MODULO SAR ADCS: A SEMI-ANALYTICAL CASE STUDY |
| 14134 | POWER CONSUMPTION REDUCTION IN ELAA-ASSISTED ISAC SYSTEMS |
| 14041 | PPDD: A UNIFIED PUSH–PULL ADVERSARIAL OBJECTIVE IN FEATURE AND LOGIT SPACES FOR DATASET DISTILLATION |
| 10311 | PPFC: A REINFORCEMENT LEARNING-BASED FEEDBACK FRAMEWORK FOR HIGH-FIDELITY CHINESE POETRY-TO-IMAGE GENERATION |
| 14334 | PRECISION NEURAL NETWORKS: JOINT GRAPH AND RELATIONAL LEARNING |
| 13889 | PRECODER DESIGN IN MULTI-USER FDD SYSTEMS WITH VQ-VAE AND GNN |
| 13877 | Predict the Retrieval! Test Time Adaptation for Retrieval Augmented Generation |
| 16573 | PREDICTING EMOTIONS IN DIALOGUE RESPONSES BY MODELING IMPLICIT FACTORS |
| 10882 | Predictor-guided Robust Federated Learning against Backdoor Attacks |
| 4316 | PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting |
| 1772 | PREMAB: A MULTI-MODULE SHORT VIDEO RECOMMENDATION SYSTEM WITH FOUNDATION MODELS AND MAB TO SAVE COLD-START |
| 18862 | PRESENT: ZERO-SHOT TEXT-TO-PROSODY CONTROL |
| 12322 | Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression |
| 14997 | PRETRAIN-DPFL: MITIGATING NOISE DETRIMENT IN DIFFERENTIALLY PRIVATE FEDERATED LEARNING WITH MODEL PRE-TRAINING |
| 18979 | PRETRAINING AND FINE-TUNING TECHNIQUES FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION |
| 3258 | Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits |
| 10466 | PREVAD: PREVENTING UPSTREAM BIAS IN WEAKLY SUPERVISED VIDEO ANOMALY DETECTION |
| 14154 | Preventing Modality Collapse via Category-Guided Transition Regularization |
| 2978 | PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph |
| 18037 | PRIMAL VARIABLE DECOUPLING AND DIAGONAL PRECONDITIONING FOR PRIMAL-DUAL SPLITTING BEYOND LIPSCHITZ CONSTANT RESTRICTIONS |
| 12933 | PRINCIPLED COARSE-GRAINED ACCEPTANCE FOR SPECULATIVE DECODING IN SPEECH |
| 8082 | PRINCIPLE-GUIDED MULTIMODAL REASONING WITH MINIMAL HUMAN DEMONSTRATIONS |
| 11250 | PRINT2VOLUME: SYNTHETIC OCT-BASED 3D FINGERPRINT VOLUME GENERATOR |
| 3061 | PRIOR KNOWLEDGE DRIVEN MULTI-VIEW CLUSTERING |
| 4078 | PRIOR-CALIBRATED LONG-TAILED RECOGNITION VIA PERTURBED DISENTANGLED LOGIT ADJUSTMENT AND ADAPTIVE MIXUP |
| 11076 | Prism: Few-shot Synthesis of Socratic Questioning Dialogues in Chinese Counseling |
| 5675 | PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion |
| 6271 | PRISM: PROBABILISTIC AND ROBUST INVERSE SOLVER WITH MEASUREMENT-CONDITIONED DIFFUSION PRIOR FOR BLIND INVERSE PROBLEMS |
| 2662 | PRISM: PROPAGATING-BASED REFINED SEMANTIC FEATURES WITH BIPARTITE MATCHING FOR VISIBLE-INFRARED GROUP RE-IDENTIFICATION |
| 19051 | Privacy Disclosure of Similarity Rank in Speech and Language Processing |
| 13098 | PRIVACY-AWARE DESIGN OF DISTRIBUTED MIMO ISAC SYSTEMS |
| 9358 | PRIVACY-PRESERVATION OVER DIRECTED GRAPHS: A CASE STUDY OF AVERAGE CONSENSUS |
| 18238 | PRIVACY-PRESERVING EDGE-ASSISTED AUTHENTICATION AND KEY AGREEMENT PROTOCOL FOR RESOURCE-ASYMMETRIC IOT |
| 6583 | PrivacyShadow: Revealing Fine-Tuning Leakage in Vision-Language Models via Dual-Level Black-Box Attacks |
| 14226 | PROACTIVE SAFETY DELIBERATION: GUIDING LARGE REASONING MODELS WITH DISTILLED PRINCIPLES |
| 9279 | PROADS: PROVABLY SECURE AND ROBUST AUDIO DIFFUSION STEGANOGRAPHY WITH LATENT OPTIMIZATION AND BACKWARD EULER INVERSION |
| 3775 | PROBABILISTIC DEEP DISCRIMINANT ANALYSIS FOR WIND BLADE SEGMENTATION |
| 17707 | Probabilistic Device Discovery for Communication via UAVs |
| 15168 | Probabilistic Graphical Modeling for Biomedical Signal Completion with Non-Random Missingness on Patient Networks |
| 14288 | PROBING CONTENT AND CHANNEL IN SPEAKER VERIFICATION MODELS |
| 15695 | PROBING THE HIDDEN TALENT OF ASR FOUNDATION MODELS FOR L2 ENGLISH ORAL ASSESSMENT |
| 11232 | PROBING WHISPER FOR DYSARTHRIC SPEECH IN DETECTION AND ASSESSMENT |
| 4272 | PRODISTILL: A PROGRESSIVE PROMPTING FRAMEWORK FOR FINE-GRAINED VLM DISTILLATION |
| 1439 | PRODUCTION-SCALE DYNAMIC VOCABULARY ASR BIASING WITH WORD-LEVEL FST AND ROBUST TRAINING |
| 14811 | PROFICIENCY-AWARE ADAPTATION AND DATA AUGMENTATION FOR ROBUST L2 ASR |
| 11595 | PROGRESSIQA: PROGRESSIVE CURRICULUM AND ENSEMBLE SELF-TRAINING FOR FILTER-ALTERED IMAGE QUALITY ASSESSMENT |
| 14656 | Progressive Feature Distillation for Model-Heterogeneous Personalized Federated Learning |
| 10697 | Progressive Motion Interpolation for Humanoid Trajectory Tracking |
| 2332 | Progressive Thinking for Lane Detection: Holistic Priors to Focused Refinement |
| 15189 | PROGRESSIVELY INJECTING STRUCTURAL SEMANTICS FROM THE FREQUENCY DOMAIN INTO MAMBA FOR ACCURATE CURVILINEAR STRUCTURE SEGMENTATION |
| 18453 | ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody |
| 2757 | ProMist-5K: A Comprehensive Dataset for Digital Emulation of Cinematic Pro-Mist Filter Effects |
| 9771 | PROMPT-GUIDED MIXTURE-OF-EXPERTS FOR ROBUST MULTIMODAL SENTIMENT ANALYSIS WITH MISSING MODALITIES |
| 15895 | Prompt-Guided Multi-Scale Feature Pyramid Aggregation with Unified Channel-Spatial Transformer for Single Image Deraining |
| 4491 | PROMPTHASH: ROBUST INSTRUCTION WATERMARKS AGAINST PARAPHRASE AND SPLICING IN LLM FORENSICS |
| 9883 | PROMPTMAD: CROSS-MODAL PROMPTING FOR MULTI-CLASS VISUAL ANOMALY LOCALIZATION |
| 17961 | PromptPatch: Towards Precise and Stable Behavioral Patching in Large Language Models via Feedback-driven Prompt Optimization |
| 11422 | PROMPTSEP: GENERATIVE AUDIO SEPARATION VIA MULTIMODAL PROMPTING |
| 12370 | PROMPTSID: A SELF-ITERATIVE DISTILLATION FRAMEWORK FOR UNSUPERVISED ADAPTATION OF VISION-LANGUAGE MODELS |
| 10622 | PROPAGATING SIMILARITY, MITIGATING UNCERTAINTY: SIMILARITY PROPAGATION-ENHANCED UNCERTAINTY FOR MULTIMODAL RECOMMENDATION |
| 11269 | ProRank: Progressive Context Refinement for Reliable Retrieval-Augmented Generation |
| 4190 | PROSE: Probabilistic Reinforcement Learning Optimized by Success Estimation for Stage-Aware Cotton Irrigation Scheduling |
| 16799 | PROSODY-GUIDED HARMONIC ATTENTION FOR PHASE-COHERENT NEURAL VOCODING IN THE COMPLEX SPECTRUM |
| 6016 | PROST-LLM: PROGRESSIVELY ENHANCING THE SPEECH-TO-SPEECH TRANSLATION CAPABILITY IN LLMS |
| 9414 | PROTOLENS: A FINE-GRAINED AND ADAPTIVE INTERPRETATION FRAMEWORK FOR TIME SERIES DATA CLASSIFICATION WITH PROTOTYPES |
| 17695 | PROTOSAM:PROTOTYPE-AUGMENTED PROMPT LEARNING FOR SCRIBBLE-SUPERVISED SEMANTIC SEGMENTATION WITH SAM |
| 16615 | PROTOTYPE-BASED INFORMATION BOTTLENECK FOR EXPLAINABLE HETEROGENEOUS TEMPORAL GRAPH NEURAL NETWORKS |
| 2772 | Prototype-Based Pseudo-Label Denoising for Source-Free Domain Adaptation in Remote Sensing Semantic Segmentation |
| 15991 | PROTOTYPE-GUIDED CROSS-MODAL CONTRASTIVE LEARNING FOR CONTINUAL AUDIO-VISUAL SOUND SEPARATION |
| 12143 | PROTOTYPICAL SELF-TRAINING WITH PROGRESS-AWARE UPDATE FOR SOURCE-FREE DOMAIN ADAPTATION IN SEMANTIC SEGMENTATION |
| 4396 | Provable Unregistered Hyperspectral-Multispectral Image Fusion via Spectral Unmixing and Adversarial Learning |
| 6211 | PROXICBO: A CONSENSUS-BASED METHOD FOR COMPOSITE OPTIMIZATION |
| 6030 | PRSA: PREVENTING MALICIOUS SPEAKER RECOGNITION AND SPEECH SYNTHESIS SIMULTANEOUSLY WITH ADVERSARIAL EXAMPLES |
| 17424 | P-SAM: Parallel Semantic Decoding of SAM for Domain-Driven Prompt Generation in Pore Segmentation |
| 13077 | PSCC NET: A SIAMESE NETWORK FRAMEWORK FOR PSEUDO-VIDEO TEMPORAL MODELING AND SPATIOTEMPORAL FUSION IN REMOTE SENSING CHANGE DETECTION |
| 19098 | PSELDNETS: PRE-TRAINED NEURAL NETWORKS ON A LARGE-SCALE SYNTHETIC DATASET FOR SOUND EVENT LOCALIZATION AND DETECTION |
| 12476 | PSEUDO-SIAMESE NETWORK FOR PLANNING IN TARGET-ORIENTED PROACTIVE DIALOGUES |
| 5381 | PSGait: Gait Recognition using Parsing Skeleton |
| 12127 | PSGS: TEXT-DRIVEN PANORAMA SLIDING SCENE GENERATION VIA GAUSSIAN SPLATTING |
| 12170 | PSQ-PMC: A Hardware-Friendly Quantization Scheme for Spike-Based Neural Radiance |
| 2579 | PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-aware Audio-Driven Point-based Shape |
| 8445 | PTSE-T: PRESENTATION TARGET SPEAKER EXTRACTION USING UNALIGNED TEXT CUES |
| 13750 | PULL-PUSHING CANNY EDGE EXTRACTION |
| 14926 | PURIFICATION BEFORE FUSION: TOWARD MASK-FREE SPEECH ENHANCEMENT FOR ROBUST AUDIO-VISUAL SPEECH RECOGNITION |
| 9813 | PV-ARCNET: AN ADAPTIVE DENOISE END-TO-END DEEP LEARNING MODEL FOR RAPID DC ARC DETECTION IN PHOTOVOLTAIC SYSTEMS |
| 14022 | PWA: PROCESS-LEVEL WEB AGENT REINFORCEMENT LEARNING |
| 4586 | PYRAMATCH: MULTI-HEAD PYRAMID SCAN FOR MAMBA-BASED IMAGE MATCHING |
| 5914 | Q4Q: Quantum for Quantization in Large Language Models |
| 14727 | QA-ReID: Quality-Aware Query-Adaptive Convolution Leveraging Fused Global and Structural Cues for Clothes-Changing ReID |
| 14357 | QASTANET: A DNN-BASED QUALITY METRIC FOR SPATIAL AUDIO |
| 7344 | QCA-RAG: EFFICIENT RETRIEVAL FOR LLMS VIA QUERY COMPLEXITY AWARENESS |
| 12019 | QE-XVC: ZERO-SHOT CROSS-LINGUAL VOICE CONVERSION VIA QUERY-ENHANCEMENT AND CONDITIONAL FLOW MATCHING |
| 12347 | QFOCUS: CONTROLLABLE SYNTHESIS FOR AUTOMATED SPEECH STRESS EDITING TO DELIVER HUMAN-LIKE EMPHATIC INTENT |
| 18978 | QHARMA-GAN: QUASI-HARMONIC NEURAL VOCODER BASED ON AUTOREGRESSIVE MOVING AVERAGE MODEL |
| 6899 | QPNET: QUATERNION PHYSICS-DRIVEN NEURAL NETWORK FOR UNDERWATER POLARIZED IMAGE RECOVERY |
| 17557 | QP-SAM: Query-based Prompt Generation for Segment Anything Model in Urban Village Identification |
| 2449 | QUADRATIC FLOW: CONSTANT ACCELERATION AS A PRIOR FOR LEARNING BETTER VELOCITY FIELD |
| 12238 | QUADRATURE OVER-THE-AIR-COMPUTING FOR MULTIMODAL DUAL-STREAM SIGNAL PROCESSING |
| 4985 | QUALITY ASSESSMENT OF NOISY AND ENHANCED SPEECH WITH LIMITED DATA: UWB-NTIS SYSTEM FOR VOICEMOS 2024 |
| 17188 | Quality enhancement for anomaly detection via injective linear attention |
| 10051 | Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis |
| 15494 | Quantile Randomized Kaczmarz Algorithm with Whitelist Trust Mechanism |
| 8504 | QUANTIZATION-BASED SCORE CALIBRATION FOR FEW-SHOT KEYWORD SPOTTING WITH DYNAMIC TIME WARPING IN NOISY ENVIRONMENTS |
| 14673 | Quantum Adaptive Self-Attention for Financial Rebalancing: An Empirical Study on Automated Market Makers in Decentralized Finance |
| 14599 | QUANTUM GASP CODES FOR PRIVATE DISTRIBUTED MATRIX MULTIPLICATION |
| 13602 | Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures |
| 6036 | QUANTUM-INSPIRED FREQUENCY ATTENUATION FOR ENHANCED TARGETED FABRICATION ATTACKS IN OBJECT DETECTION |
| 6314 | QUERY-GUIDED PROTOTYPICAL LEARNING FOR FEW-SHOT DOCUMENT-LEVEL RELATION EXTRACTION |
| 15404 | QUERY-SCALABLE FEW-SHOT SEMANTIC SEGMENTATION VIA IN-CONTEXT VARIATIONAL INFERENCE |
| 11411 | Query-Specific Context-Enhanced Representation Learning for Temporal Knowledge Graph Reasoning |
| 17057 | QUSR: QUALITY-AWARE AND UNCERTAINTY-GUIDED IMAGE SUPER-RESOLUTION DIFFUSION MODEL |
| 11093 | Qwen-Simplify: Exploring Sentence Simplification via Qwen-based Reinforcement Learning Paradigm |
| 10701 | R3G: A REASONING-RETRIEVAL-RERANKING FRAMEWORK FOR VISION-CENTRIC ANSWER GENERATION |
| 10318 | R³-REC: REASONING-DRIVEN RECOMMENDATION VIA RETRIEVAL-AUGMENTED LLMS OVER MULTI-GRANULAR INTEREST SIGNALS |
| 16478 | RADAREYE: ROBUST LIQUID LEVEL TRACKING USING MMWAVE RADAR IN ROBOTIC POURING |
| 11706 | RADI: A RETRIEVAL-AUGMENTED DYNAMIC IN-CONTEXT LEARNING FRAMEWORK FOR AIGC IMAGE DETECTION |
| 11845 | RADIANCE FIELD RENDERING WITH ADAPTIVE COMPACT KERNEL FOR NOVEL VIEW SYNTHESIS |
| 18999 | RADIO MAP ESTIMATION VIA LATENT DOMAIN PLUG-AND-PLAY DENOISING |
| 9743 | RADIOLUNADIFF: ESTIMATION OF WIRELESS NETWORK SIGNAL STRENGTH IN LUNAR TERRAIN |
| 16753 | RADIOMETRIC VARIATION-AWARE ROBUST CHANGE DETECTION FOR MULTISPECTRAL SATELLITE IMAGES VIA CONVEX OPTIMIZATION |
| 2177 | RaFD: Flow-Guided Radar Detection for Robust Autonomous Driving |
| 10110 | RAFS: RETRIEVAL-AUGMENTED FEW-SHOT CAD SEGMENTATION |
| 9799 | RAINFALL RETRIEVAL FROM WIRELESS LINKS VIA HYBRID LEARNING WITH DYNAMIC GATING APPROACH |
| 15634 | RAME: ROLE-AWARE MULTI-VIEW EMBEDDING FOR TRANSFERABLE MULTI-AGENT REINFORCEMENT LEARNING |
| 16203 | RAMTIME: RETRIEVAL-AUGMENTED MEMORY FOR TIME SERIES FORECASTING |
| 1304 | RANDOM MATRIX-DRIVEN GRAPH REPRESENTATION LEARNING FOR BIOACOUSTIC RECOGNITION |
| 16855 | Ranking the Impact of Contextual Specialization in Neural Speech Enhancement |
| 10392 | RANKING-AWARE REINFORCEMENT LEARNING FOR ORDINAL RANKING |
| 11339 | RANKNB: RANKING-AWARE DIRECT PREFERENCE OPTIMIZATION FOR ALIGNMENT OF A NANOBODY DIFFUSION MODEL |
| 7685 | RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer |
| 12704 | RAPTM: Retrieval-Augmented Prompting for Short-Text Topic Modeling |
| 11108 | RASD-SR: A ROBUST ANOMALOUS SOUND DETECTION FRAMEWORK WITH SCORE RECALIBRATION |
| 9979 | RATE-DISTORTION ANALYSIS OF OPTICALLY PASSIVE VISION COMPRESSION |
| 5432 | Rationale-Augmented Fine-Grained Opinion Mining with Large Language Models |
| 16861 | RATIONALE-GUIDED LEARNING FOR MULTIMODAL EMOTION RECOGNITION |
| 14462 | RAVE: RATE ADAPTIVE VISUAL ENCODING FOR 3D GAUSSIAN SPLATTING |
| 11646 | RAVE: Retrieval and Scoring Aware Verifiable Claim Detection |
| 4948 | RAWMEF: MULTI-EXPOSURE FUSION FOR RAW HDR RECONSTRUCTION VIA HISTOGRAM ENHANCEMENT AND FREQUENCY ALIGNMENT |
| 16446 | RBA: TOWARDS ROBUST AND STEALTHY BACKDOOR ATTACK IN FEDERATED LEARNING |
| 9513 | RBAP AND RBAC: TWO NOVEL TYPES IN NONLINEAR RESIDUAL WEIGHTING FOR PHYSICS-INFORMED NEURAL NETWORKS |
| 1244 | RBDA: BLACK-BOX DOMAIN ADAPTATION PERSON RE-IDENTIFICATION WITH TEST-TIME ORIENTATION-AWARE RARE ATTRIBUTE-GUIDED RE-RANKING |
| 14839 | RCAL: Reinforced Cross-modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames |
| 16025 | RCLMATCH: REVISITING CONTRASTIVE LEARNING FOR SEMI-SUPERVISED SEMANTIC SEGMENTATION WITH CONSISTENCY REGULARIZATION |
| 4835 | RDQ: Learnable Kronecker Rotation Matrix Decomposition for Efficient Large Language Model Quantization |
| 14423 | RDSNET: EFFICIENT RADIAL-AWARE DEFORMABLE SAMPLING NETWORK FOR TOP-VIEW FISHEYE PEOPLE DETECTION |
| 9990 | Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading |
| 14293 | READING BETWEEN THE WAVES: ROBUST TOPIC SEGMENTATION USING INTER-SENTENCE AUDIO FEATURES |
| 10592 | Readout-Side Bypass for Residual Hybrid Quantum-Classical Models |
| 11149 | REALCOUNT: ROBUST OPEN-WORLD OBJECT COUNTING VIA DUPLEX CONTRASTIVE LEARNING |
| 4512 | REAL-TIME ANCHOR NODE SELECTION FOR UNDERWATER TDOA LOCALIZATION: A CONVEX-OPTIMIZATION-DRIVEN NEURAL FRAMEWORK |
| 5158 | REAL-TIME CARFAC COCHLEA MODEL ACCELERATION ON FPGA FOR UNDERWATER ACOUSTIC SENSING SYSTEMS |
| 17121 | REAL-TIME MARKOV MODELING FOR SINGLE-PHOTON LIDAR: 1000× ACCELERATION AND CONVERGENCE ANALYSIS |
| 16059 | REAL-TIME STREAMING MEL VOCODING WITH GENERATIVE FLOW MATCHING |
| 6917 | Real-Time Thermal Anomaly Detection via Commodity WiFi Sensing for Autonomous IoT Systems |
| 11798 | Real-World Adversarial Attacks on RF-Based Drone Detectors |
| 14853 | REANISOGS: REFLECTION-AWARE ANISOTROPIC NEURAL GAUSSIANS VIA K-PLANES |
| 13636 | REASON TO RETRIEVE: STRUCTURED CHAIN-OF-THOUGHT FOR TEXT-VIDEO RETRIEVAL |
| 12405 | Reason,Construct,Rehearse:A Dynamic Framework for Generating Verifiable Behavior Trees in Open Worlds |
| 11355 | REASONER-ASSISTED PLANNING: ENHANCE THE ABILITY OF GRAPH-RAG TO HANDLE COMPLEX QUESTIONS |
| 9959 | Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition |
| 9607 | REASONING DRIVEN CAPTIONS TO ASSIST NOISE ROBUST SPEECH EMOTION RECOGNITION |
| 16858 | Rebalancing Sparse Tiny Objects for UAV Detection with a Plug-in Point-Distance Module |
| 18035 | RECALL-LO: Enhancing Label-Only Membership Inference Against Large Language Models |
| 19041 | RECOGNIZING ORNAMENTS IN VOCAL INDIAN ART MUSIC WITH ACTIVE ANNOTATION |
| 12017 | RECOM: REALISTIC CO-SPEECH MOTION GENERATION WITH RECURRENT EMBEDDED TRANSFORMER |
| 3501 | Reconstructing Topology-Consistent Face Mesh by Volume Rendering from Multi-View Images |
| 3297 | Reconstruction of Spherical Sound Source Radiation Characteristics with Graph Signal Processing |
| 18026 | RECOVERING COMPRESSED TENSORS USING DEEP FACTORIZATION MODELS |
| 9566 | RECOVERING PERFORMANCE IN SPEECH EMOTION RECOGNITION FROM DISCRETE TOKENS VIA MULTI-LAYER FUSION AND PARALINGUISTIC FEATURE INTEGRATION |
| 14461 | RECOVERING WASSERSTEIN DISTANCE MATRICES FROM FEW MEASUREMENTS |
| 11534 | RECSUM: RECONSTRUCT COMPLEMENTARY AND CONSISTENT INFORMATION IN MULTIPLEX GRAPH FOR UNSUPERVISED SOCIAL SUMMARIZATION |
| 14403 | RECURRENT CONFIDENCE CHAIN: TEMPORAL-AWARE UNCERTAINTY QUANTIFICATION IN LARGE LANGUAGE MODELS |
| 19079 | Recurrent Neural Beamformer for Multichannel Speech Enhancement Under Adverse Noise Condition |
| 9865 | Recursive state estimation via approximate modal paths |
| 16690 | REDD-MFP: REGULARIZATION BY DIFFUSION DENOISING WITH MULTI-TIMESTEP FIXED-POINT OPTIMIZATION |
| 4296 | ReDO: Online Data Selection via Joint Relevance and Diversity Optimization |
| 14010 | REDUCING PROMPT SENSITIVITY IN LLM-BASED SPEECH RECOGNITION THROUGH LEARNABLE PROJECTION |
| 2201 | REDUCING THE SIZE EXPANSION OF AN IMAGE ENCRYPTED BY PAILLIER’S CRYPTOSYSTEM |
| 3276 | REDUNDANCY-AWARE FEATURE REFINEMENT FOR LIGHTWEIGHT IMAGE SUPER-RESOLUTION |
| 14355 | REFERENCE MICROPHONE SELECTION FOR GUIDED SOURCE SEPARATION BASED ON THE NORMALIZED L-P NORM |
| 10768 | REFERENCE-AWARE SFM LAYERS FOR INTRUSIVE INTELLIGIBILITY PREDICTION |
| 2157 | Reference-Aware Two-Stream Detector for Traffic Accident Detection in Road Surveillance Videos |
| 4279 | REFGEN: REFERENCE-GUIDED SYNTHETIC DATA GENERATION FOR ANOMALOUS SOUND DETECTION |
| 17833 | REFINEBRIDGE: GENERATIVE BRIDGE MODELS IMPROVE FINANCIAL FORECASTING BY FOUNDATION MODELS |
| 4941 | REFINING CROSS-MODAL CONTRADICTION VIA ITERATIVE FOCUSING FOR MULTIMODAL SARCASM DETECTION |
| 16972 | Refining Open-Vocabulary Semantic Segmentation via Regional Semantics and Visual Prototypes |
| 7510 | REFLECTING ON THE PAST: A MEMORY-AUGMENTED FRAMEWORK FOR SINGLE-POINT TARGET DETECTION |
| 16177 | REFLECTIVE CONFIDENCE: CORRECTING REASONING FLAWS VIA ONLINE SELF-CORRECTION |
| 14098 | Reflective Policy Optimization: Enhancing Reasoning in Large Language Models via Error Localization and Test-Time Self-Correction |
| 19122 | REFRAMING AUDIO DATA ANNOTATION AS DOMAIN ADAPTATION PROCESS: A MULTI-INDICATOR ACTIVE LEARNING FRAMEWORK |
| 14562 | Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding |
| 12809 | REGFUSE: REGISTRATION MEETS INFRARED AND VISIBLE IMAGE FUSION |
| 2211 | Region Energy-Aware Learning with Gaussian-Prior Convolution for Infrared Small Target Detection |
| 17460 | REGION GROWING PHYSICS-INFORMED NEURAL NETWORK FOR WIND FIELD RECONSTRUCTION FROM SPARSE DATA |
| 10256 | Region-Aware Brightness-Adaptive Enhancement Paradigm for Heterogeneous Illumination |
| 17079 | REGULARIZED INVERSE FILTER DESIGN FOR RIGID SPHERICAL MICROPHONE ARRAY PROCESSING: LAPLACE- AND TIME-DOMAIN REPRESENTATIONS |
| 3576 | Regularized Semi-Supervised Graph Purification Network for Financial Fraud Detection |
| 16307 | REGULARIZING FUNCTIONAL VECTORS TO MITIGATE FORGETTING IN PROMPT TUNING OF VISION-LANGUAGE MODELS |
| 6892 | Rehearsing High Confident Samples via Masked Optimal Transportation for Catastrophic Forgetting in Continual Named Entity Recognition |
| 13724 | Reinforced Active Learning for Change Point Detection |
| 5514 | REINFORCEMENT LEARNING DRIVEN FUSION: INTEGRATING VISUAL SEGMENTATION AND TEXTUAL SEMANTICS FOR SENTIMENT ANALYSIS |
| 13470 | REINFORCEMENT LEARNING FOR GNSS SPOOFING DETECTION: A MULTI-CLASS DQN APPROACH WITH TEXBAT |
| 10369 | REINFORCEMENT LEARNING FOR OPTIMIZED ADAPTIVE SAMPLING |
| 16751 | RELALIGN: LLM-BASED RELATION-FOCUSED CONTRASTIVE PRE-TRAINING AND ALIGNMENT FOR OPEN RELATION EXTRACTION |
| 10456 | RELATE: ENHANCE COMPOSED VIDEO RETRIEVAL VIA MINIMAL-REDUNDANCY HIERARCHICAL COLLABORATION |
| 14730 | RELATIONAL DUAL-GRANULARITY DISTILLATION FOR TEXT-BASED PERSON RETRIEVAL |
| 16534 | Relative Time Intervals Representation for Word-level Timestamping with Masked Training |
| 18980 | Relaxation-Free Min-k-Partition for PCI Assignment in 5G Networks |
| 10491 | RELIABLE DATABASE QUESTION ANSWERING WITH COLLABORATIVE AGENTS |
| 2960 | RELIC:Residual flow matching for Learned Image Compression |
| 6616 | RE-LL1: An Effective Regularized $(L,L,1)$-Tensor Decomposition Method For Video Background Modeling and Foreground Separation |
| 8093 | RELO-IRR: REFLECTION-GUIDED LORA FRAMEWORK FOR IMAGE REFLECTION REMOVAL |
| 18052 | RelUNet: Relative Channel Fusion U-Net for Multichannel \\ Speech Enhancement |
| 18227 | REMOTE MULTI-PERSON BLOOD PRESSURE MONITORING USING MMWAVE RADAR |
| 5056 | REMOTEDET-MAMBA: A HYBRID MAMBA-CNN NETWORK FOR MULTI-MODAL OBJECT DETECTION IN REMOTE SENSING IMAGES |
| 11479 | Repeater Swarms as Enablers of Fluid Antenna Multiple Access |
| 6222 | Repeater-Assisted Massive MIMO Full-Duplex Communications |
| 12008 | Representation-Based Data Quality Audits for Audio |
| 13600 | Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings |
| 8864 | RESBIDET: EFFICIENT DUAL-BRANCH SMALL OBJECT DETECTION FOR UAVS UNDER RESOURCE-CONSTRAINED CONDITIONS |
| 16072 | ResGaussian: 3D Gaussian Splatting with High-frequency Residual |
| 10983 | RESIDUAL DIFFUSION WITH FUSED ACCELERATED SHARED DISTRIBUTION AND FREQUENCY-ADAPTIVE SELECTION FOR UNIFIED IMAGE RESTORATION |
| 7687 | Residual Tokens Enhance Masked Autoencoders for Speech Modeling |
| 15053 | RESIDUAL VECTOR QUANTIZATION FOR COMMUNICATION-EFFICIENT MULTI-AGENT PERCEPTION |
| 6131 | RESIDUAL-ENHANCED ADAPTIVE KOOPMAN AUTOENCODER: A DEEP LATENT DYNAMICS MODEL FOR STOCK PREDICTION |
| 12088 | Resolution-Progressive Diffusion Model for Pansharpening |
| 15015 | RESOLVING LOW-RANK UPDATE LIMITATIONS FOR MEMORY-EFFICIENT VISUAL NEURAL NETWORK TRAINING |
| 3053 | RESONATE-AND-FIRE NEURONS MEET EMG: ENHANCING GESTURE CLASSIFICATION WITH SPIKING NEURAL NETWORKS |
| 15784 | Restricted Isometry for Variable-Density Continuous Frequency Sampling for Off-the-Grid Sparse Signals |
| 12340 | RETHINKING CHANGE DETECTION: BENCHMARKING MULTI-AGENT REMOTE SENSING IMAGE CHANGE UNDERSTANDING |
| 7528 | RETHINKING DATASET PRUNING: LET THE PIXEL SPEAK FOR ITSELF |
| 15959 | Rethinking Entity Disambiguation in Complex Modalities |
| 6196 | Rethinking Fusion: Disentangled Learning of Shared and Modality-Specific Information for Stance Detection |
| 16691 | RETHINKING LARGE LANGUAGE MODELS FOR IRREGULAR TIME SERIES CLASSIFICATION IN CRITICAL CARE |
| 11241 | RETHINKING MESSAGE PASSING IN DEEP UNFOLDING NETWORK FOR SNAPSHOT COMPRESSIVE IMAGING |
| 5664 | RETHINKING MULTI-SCALE PERCEPTION FOR CAMOUFLAGED OBJECT DETECTION |
| 15551 | RETHINKING MUSIC CAPTIONING WITH MUSIC METADATA LLMS |
| 5407 | Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency |
| 13594 | RETHINKING PSEUDO-LABELING: A UNIFIED DUAL-CCL FRAMEWORK FOR ROBUST SEMI-SUPERVISED SEMANTIC SEGMENTATION |
| 12083 | Rethinking Speech Representation Aggregation in Speech Enhancement: a Phonetic Mutual Information Perspective |
| 15804 | RETLLM: TRAINING AND DATA-FREE MLLMS FOR MULTIMODAL INFORMATION RETRIEVAL |
| 12192 | ReTools: Reflection-Enhanced Tool Invocation for Domain-Specific QA |
| 13677 | RETRIEVAL AUGMENTED PRETRAINED TRANSFORMER FOR COMPETING RISKS SURVIVAL IN STATISTICAL SIGNAL PROCESSING |
| 9709 | RETRIEVAL-AUGMENTED MULTI-AGENT MULTIMODAL FRAMEWORK FOR FAKE NEWS DETECTION |
| 10189 | RETRIEVAL-BASED SPECULATIVE DECODING FOR AUTOREGRESSIVE SPEECH SYNTHESIS |
| 15505 | RETRIEVEALL: A MULTILINGUAL NAMED ENTITY RECOGNITION FRAMEWORK WITH LARGE LANGUAGE MODELS |
| 7557 | REVIG: A CNN-GNN HYBRID MODEL WITH DYNAMIC REVERSED AXIAL GRAPH CONSTRUCTION FOR VISION TASKS |
| 1420 | Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective |
| 17930 | REVISITING DIRECT SPEECH-TO-TEXT TRANSLATION WITH SPEECH LLMS: BETTER SCALING THAN COT PROMPTING? |
| 13148 | REVISITING PROTOTYPES FOR OPEN-DOMAIN CONTINUAL LEARNING IN VISION-LANGUAGE MODELS |
| 17587 | REVISITING THE CONNECTION BETWEEN MCCA-GENVAR AND IVA-G: ROLE OF ORTHOGONALITY AND DEFLATION |
| 13016 | Revisiting the Seasonal Trend Decomposition for Enhanced Time Series Forecasting |
| 12391 | REWARD-BASED EFFICIENT DEMONSTRATION SELECTION FOR IN-CONTEXT LEARNING |
| 6967 | REWARD-GUIDED POLICY OPTIMIZATION WITH PHYSICAL PRIORS FOR UNDERWATER COLOR RESTORATION |
| 4481 | RFGAT: GENERATIVE ADVERSARIAL TEACHER FOR CROSS-DOMAIN RFID ACTIVITY RECOGNITION |
| 13340 | RFL-NLCP: ROBUST FEDERATED LEARNING WITH NON-IID DATA AND LIMITED CLIENT PARTICIPATION |
| 9945 | RFM-EDITING: RECTIFIED FLOW MATCHING FOR TEXT-GUIDED AUDIO EDITING |
| 11090 | RFSSM: A Recursive Frequency-Aware State Space Model for Pansharpening |
| 5230 | RGSC: Retrieve and then Generate Image-text Pairs from Semantic Concepts for Unsupervised Vision-Language Pre-training |
| 9875 | RHO-PERFECT: CORRELATION CEILING FOR SUBJECTIVE EVALUATION DATASETS |
| 6349 | RHOSI: Efficient Anti-Jamming Resource Allocation with Holographic Surfaces in UAV-enabled ISAC |
| 1409 | Riemannian adversarial attacks on Symmetric Positive Definite matrices |
| 3101 | Riemannian optimization on the manifold of unitary and symmetric matrices with application to BD-RIS-assisted systems |
| 5839 | RIR-FORMER: COORDINATE-GUIDED TRANSFORMER FOR CONTINUOUS RECONSTRUCTION OF ROOM IMPULSE RESPONSES |
| 11155 | RISC-V Microarchitecture Information Leakage Attack via Transient Execution |
| 5181 | RIS-ENHANCED INFORMATION-DECOUPLED SYMBIOTIC RADIO OVER BROADCASTING SIGNALS |
| 11670 | RIS-FUSION: RETHINKING TEXT-DRIVEN INFRARED AND VISIBLE IMAGE FUSION FROM THE PERSPECTIVE OF REFERRING IMAGE SEGMENTATION |
| 9701 | Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making |
| 4554 | RISKFUZZ: RISK-GUIDED FUZZING FOR DEEP LEARNING LIBRARIES |
| 12929 | RITA: Enhancing the Region-Independence for Transferable Targeted Attacks |
| 9931 | RLBR: REINFORCEMENT LEARNING WITH BIASING REWARDS FOR CONTEXTUAL SPEECH LARGE LANGUAGE MODELS |
| 4267 | RLCSC: REINFORCEMENT LEARNING ENHANCED CHINESE SPELLING CORRECTION WITH GLYPH-PHONETIC SIMILARITY |
| 3361 | RLSP-NER: REINFORCEMENT LEARNING OF SOFT PROMPTS FOR NER WITH LARGE LANGUAGE MODELS |
| 3157 | RLSW:REINFORCEMENT LEARNING-GUIDED SAMPLE WEIGHTING FOR DYNAMIC EARLY-EXITING NETWORKS |
| 3287 | RMCNet: Reflection and Moiré Removal for Virtual Production |
| 17183 | RMODGDF: A ROBUST STFT-DERIVED FEATURE FOR MUSICAL INSTRUMENT RECOGNITION |
| 16221 | RMT-KD: RANDOM MATRIX THEORETIC CAUSAL KNOWLEDGE DISTILLATION |
| 6135 | RNT2Vec: A Road-Network-Aware Trajectory Representation Model for Robust Similarity Computation |
| 4044 | RO-BENCH: LARGE-SCALE ROBUSTNESS EVALUATION OF MLLMS WITH TEXT-DRIVEN COUNTERFACTUAL VIDEOS |
| 11369 | ROBUST ACCENT IDENTIFICATION VIA VOICE CONVERSION AND NON-TIMBRAL EMBEDDINGS |
| 12493 | Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy |
| 1857 | ROBUST AND LIGHTWEIGHT F0 ESTIMATION THROUGH MID-LEVEL FUSION OF DSP-INFORMED FEATURES |
| 17402 | ROBUST BAYESIAN LAST LAYER MODELS WITH HEAVY-TAILED NOISE |
| 14038 | ROBUST COVARIANCE MATRIX ESTIMATION FOR UNIFORM RECTANGULAR ARRAY |
| 12608 | ROBUST CPD-BASED DOA ESTIMATION FOR ROTATING DISTRIBUTED ARRAY SYSTEMS UNDER INTER-NODE CALIBRATION ERROR |
| 10604 | ROBUST DEEP CROSS-MODAL HASHING VIA DUAL CONSENSUS LEARNING |
| 12497 | ROBUST DEEPFAKE AUDIO DETECTION VIA MULTI-LEVEL INTERMEDIATE FEATURE FUSION |
| 18943 | Robust Diffusion Recursive Algorithm for Distributed Widely-Linear Exponential Functional Link Network |
| 11956 | ROBUST DOA ESTIMATION FOR NON-COHERENT SUB-ARRAYS WITH NON-UNIFORM NOISE VARIANCES |
| 17316 | ROBUST DOA ESTIMATION WITH UNKNOWN SOURCE NUMBER VIA VIRTUAL ULA BEAMFORMING |
| 8352 | Robust Federated Fine-Tuning over Heterogeneous and Unreliable Communication Networks |
| 10942 | ROBUST GROUNDING WITH MLLMS AGAINST OCCLUSION AND SMALL OBJECTS VIA LANGUAGE-GUIDED SEMANTIC CUES |
| 17842 | ROBUST HYPERSPECTRAL ANOMALY DETECTION VIA CONSTRAINED DIFFERENCE-OF-CONVEX OPTIMIZATION UNDER MIXED NOISE CONTAMINATION |
| 5797 | ROBUST IN-BED HUMAN POSE AND SHAPE ESTIMATION FROM PRESSURE IMAGES WITH CLINICAL AWARENESS |
| 6589 | ROBUST IN-CONTEXT DEFENSES AGAINST JAILBREAKING OF LLMS VIA ROLE SPECIFICATION |
| 4228 | ROBUST KALMAN FILTER FOR ADDITIVE GAUSSIAN-STUDENT'S T DISTRIBUTION |
| 6436 | ROBUST KEYFRAME-CONSTRAINED SIGNAL MODELING FOR HUMAN MOTION SYNTHESIS |
| 16990 | Robust MAE-Driven NAS: From Mask Reconstruction to Architecture Innovation |
| 12130 | ROBUST MMSE PRECODING FOR OUT-OF-CLUSTER INTERFERENCE MITIGATION IN CELL-FREE MIMO SYSTEMS |
| 3003 | ROBUST ONLINE OVERDETERMINED INDEPENDENT VECTOR ANALYSIS BASED ON BILINEAR DECOMPOSITION |
| 16873 | Robust Open-World Object Detection through Evidential Learning |
| 18871 | ROBUST PARAMETER ESTIMATION OF NON-LINEAR STATE SPACE MODELS USING A DIVERGENCE-BASED ESTIMATOR |
| 17129 | Robust Personalized Recommendation under Hidden Confounding in MNA |
| 8069 | ROBUST PROVABLY SECURE IMAGE STEGANOGRAPHY VIA LATENT ITERATIVE OPTIMIZATION |
| 11606 | ROBUST RUMOR DETECTION ON SOCIAL MEDIA WITH DYNAMIC CONTRASTIVE LEARNING |
| 17975 | ROBUST SENTIMENT ANALYSIS VIA IMPORTANCE-GUIDED AUGMENTATION AND CONSISTENCY REGULARIZATION |
| 18881 | Robust Single-Shot 3D Reconstruction by Sparse-to-Dense Stereo Matching and Spline Function Based Parallax Modeling |
| 13467 | Robust Tensor Decomposition for Joint multiview Graph Learning and Community Detection |
| 5074 | Robust Test-time Adaptation by Unifying Principled Priors and Adaptive Feature Regularization |
| 7946 | ROBUST UNCERTAINTY ESTIMATION UNDER DISTRIBUTION SHIFT VIA DIFFERENCE RECONSTRUCTION |
| 4299 | ROBUST UNSUPERVISED SET-LEVEL ANOMALY DETECTION FOR SMALL TEST-TIME SETS |
| 15119 | ROBUST, ONLINE, AND ADAPTIVE DECENTRALIZED GAUSSIAN PROCESSES |
| 17981 | ROBUSTIFYING GRAPH LAPLACIAN REGULARIZATION AGAINST EDGE WEIGHT UNCERTAINTIES: AN INFIMAL CONVOLUTION APPROACH |
| 12036 | ROBUSTNESS OF AUDIO CLASSIFICATION MODELS AGAINST FILTER PERTURBATIONS |
| 10739 | RoCo: Robust Code for Fast and Effective Proactive Defense against Voice Cloning Attack |
| 17485 | ROLE-RL: ONLINE LONG-CONTEXT PROCESSING WITH ROLE REINFORCEMENT LEARNING FOR MULTIPLE LLMS IN THEIR OPTIMAL ROLES |
| 11321 | ROLE-SPECIALIST AND CONFIDENCE-SELECTIVE MULTI-TEACHER COLLABORATIVE DISTILLATION FOR MULTI-SOURCE DOMAIN ADAPTATION |
| 13185 | RoPFL: Robust and Privacy-Preserving Decentralized Federated Learning Framework |
| 12746 | ROTATIONALLY-INVARIANT AMP FOR COMPRESSED SENSING WITH MULTIPLE MEASUREMENT VECTORS |
| 15064 | ROTATION-DRIVEN FLEXIBLE SPARSE ARRAYS FOR HIGH-RESOLUTION DOA ESTIMATION |
| 1797 | Rotation-Invariant Point Cloud Segmentation via Neural Tangent Kernel-based Angle Selection |
| 5780 | Routing-Guided Multi-Expert LoRA Fine-Tuning for Image Restoration |
| 17108 | ROUTINGLLM: BOOSTING LLM PERFORMANCE FOR NETWORK ROUTING |
| 14984 | ROVLM: REGION-AWARE OPTIMAL VISION–LANGUAGE ALIGNMENT FOR ZERO-SHOT RECOGNITION |
| 13987 | RPFE: A RANGE-VIEW ENHANCED PILLAR FEATURE ENCODING METHOD FOR LIDAR-BASED 3D OBJECT DETECTION |
| 7861 | RPM-NET: RECIPROCAL POINT MLP NETWORK FOR UNKNOWN NETWORK SECURITY THREAT DETECTION |
| 11256 | RRPO: ROBUST REWARD POLICY OPTIMIZATION FOR LLM-BASED EMOTIONAL TTS |
| 13936 | RSC: Robust Self-correcting Watermark Model Based on Channel Control |
| 9495 | RSCC-Diff: A Novel Generative Paradigm Empowers Differential-Loss-Guided MLLM for Remote Sensing Change Captioning |
| 15045 | RSC-COT: VISUAL-COT REASONING AND REINFORCED OPTIMIZATION FOR REMOTE SENSING CHANGE CAPTIONING |
| 13912 | RSCOT: A RICH SEMANTIC CHAIN-OF-THOUGHT FOR REMOTE SENSING VQA BASED ON MODULE ANALYSIS AND MODEL COLLABORATION |
| 4016 | RSHR: HIERARCHICAL VISUAL REPRESENTATION AND STATE-SPACE REASONING FOR REMOTE SENSING VISUAL QUESTION ANSWERING |
| 5327 | RSoRA: Spiking-Inspired Low-Rank Adaptation for Noise-Robust Vision Transformers |
| 5127 | RTNLW: REVERSIBLE AND TAMPER-AWARE NATURAL LANGUAGE WATERMARKING SCHEME FOR MULTI-STAGE TRANSMISSION |
| 12260 | RTPNET: ROBUST DETECTION OF TRAFFIC PARTICIPANTS IN COMPLEX DRIVING SCENARIOS |
| 13277 | RugKeeper: A Multi-Agent LLM Framework for Rug Pull Token Detection |
| 5644 | RUMOR SPOTTER: A NOVEL TEXTUAL RUMOR DETECTION MODEL INTEGRATING RUMOR CLASSIFICATION AND MARKING |
| 3997 | S$^2$Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion |
| 14758 | S2S: Sentence-to-section Training with Multi-task Learning for LLM-Driven Song Generation |
| 13905 | S2TX: CROSS-ATTENTION MULTI-SCALE STATE-SPACE TRANSFORMER FOR TIME SERIES FORECASTING |
| 4802 | S²VD: A SUBSPACE-AWARE SVD METHOD FOR EFFICIENT LLM COMPRESSION |
| 1930 | S3-3DGS: STEERING SPHERICAL-HARMONIC SUBSPACES FOR SECURE 3DGS WATERMARKING |
| 4537 | S3G: STOCK STATE SPACE GRAPH FOR ENHANCED STOCK TREND PREDICTION |
| 8951 | SAD-SAM: MULTIDIMENSIONAL DISTRIBUTION-ALIGNED SPATIAL-AWARE DISTILLATION FOR SEGMENT ANYTHING MODEL |
| 2670 | SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM |
| 13964 | SAFEGEN: SCULPTING REPRESENTATION SPACE FOR SAFER AND SMARTER LLMS |
| 10887 | SAFEGRAD: GRADIENT SURGERY FOR SAFE LLM FINE-TUNING |
| 12311 | SAFE-IMM: ROBUST AND LIGHTWEIGHT RADAR-BASED OBJECT TRACKING ON MOBILE PLATFORMS |
| 16104 | SAFETR: VERIFIABLE SEMANTIC TREE-RING WATERMARK FOR DIFFUSION MODEL AGAINST FORGERY ATTACKS |
| 2209 | Safety Alignment Should Be Made More Than Just A Few Attention Heads |
| 9739 | SAFETY ANCHOR-GUIDED ADAPTIVE BIAS DECAY FOR JAILBREAK DEFENSE |
| 10710 | SAGA-SR: SEMANTICALLY AND ACOUSTICALLY GUIDED AUDIO SUPER-RESOLUTION |
| 9135 | SAGE: SEMANTIC-AWARE SHARED SAMPLING FOR EFFICIENT DIFFUSION |
| 12500 | SAGETRACK: ADAPTIVE UAV MULTI-OBJECT TRACKING WITH TEMPORAL ALIGNMENT AND SCENE-AWARE POLICIES |
| 3106 | SAICR: SYMMETRIC ALIGNMENT AND INTRA-CLASS CONTRASTIVE REFINEMENT FOR REFERRING IMAGE SEGMENTATION |
| 9479 | SAIL:SYNERGISTIC ANOMALY-INFORMED LEARNING FOR DEEPFAKE DETECTION WITH CLIP |
| 2078 | SAILING BEYOND SCARCITY: TASK-DRIVEN DIFFUSION MARINE DATA AUGMENTATION FOR MARINE OBJECT DETECTION |
| 5204 | SAIP: A PLUG-AND-PLAY SCALE-ADAPTIVE MODULE IN DIFFUSION-BASED INVERSE PROBLEMS |
| 2563 | SAKA: SPATIALLY-ADAPTIVE AND KEYFRAME-ANCHORED GRAPH NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION |
| 14489 | SALAD-VAE: Semantic Audio Compression with Language-Audio Distillation |
| 12027 | SALIENCY-GUIDED MULTI-SCALE FEATURE ENHANCEMENT NETWORK FOR INFRARED AND VISIBLE IMAGE FUSION |
| 4477 | SALM: STOCHASTIC ATTENTION WITH LEARNABLE MEMORY FOR MULTIVARIATE TIME SERIES ANOMALY DETECTION |
| 17605 | SAM Meets Mask2Former: A SegMoE-Hybrid Model for Semantic Segmentation |
| 6069 | SAMAS: A SPECTRUM-GUIDED MULTI-AGENT SYSTEM FOR ACHIEVING STYLE FIDELITY IN LITERARY TRANSLATION |
| 2758 | SAM-DRIVEN MULTI-SCALE GATED NETWORK FOR MULTIMODAL REMOTE SENSING IMAGE SEGMENTATION |
| 3868 | SAME: SIMILARITY-AWARE MIXTURE OF EXPERTS FOR GENERALIZED FACE ANTI-SPOOFING |
| 8193 | SAM-GT: SAM AS A GENERAL TEACHER ENHANCES MEDICAL IMAGE SEGMENTATION BY DISTILLING ONLY WHAT MATTERS |
| 6037 | SAM-GUIDED MULTI-VIEW FUSION FOR WEAKLY SUPERVISED 3D POINT CLOUD SEGMENTATION |
| 14042 | SAMM: Segment Anything Mamba Model for General Medical Image Segmentation |
| 2539 | SAMPLE EFFICIENT EXPERIENCE REPLAY IN NON-STATIONARY ENVIRONMENTS |
| 6682 | SAMPLE WEIGHT AVERAGING FOR STABLE PREDICTION |
| 18885 | SAMPLING AND UNIQUENESS SETS IN GRAPHON SIGNAL PROCESSING |
| 18611 | SAMPLING-RATE-AGNOSTIC SPEECH SUPER-RESOLUTION BASED ON GAUSSIAN PROCESS DYNAMICAL SYSTEMS WITH DEEP KERNEL LEARNING |
| 12967 | SANDWICHED IMAGE COMPRESSION: THE IMPACT OF DIFFERENTIABLE JPEG QUANTIZATION |
| 16963 | SAR SHIP WAKE DETECTION BASED ON SIAMESE NETWORK WITH MAMBA CROSS-DOMAIN FEATURE FUSION |
| 18162 | SAR-CAPTION RANKER: OPTIMIZING AUTOMATIC SAR IMAGE DESCRIPTIONS VIA RLAIF |
| 11684 | SARD: Similarity-Aligned Reminiscence and Distillation for Exemplar-Free Class-Incremental Learning |
| 4625 | SARNET: A SPIKE-AWARE CONSECUTIVE VALIDATION FRAMEWORK FOR ACCURATE REMAINING USEFUL LIFE PREDICTION |
| 7521 | SA-SSL-MOS: SELF-SUPERVISED LEARNING MOS PREDICTION WITH SPECTRAL AUGMENTATION FOR GENERALIZED MULTI-RATE SPEECH ASSESSMENT |
| 11359 | SATBADEDIT: TOWARDS EFFICIENT AND ROBUST MULTI-TRIGGER BACKDOOR INJECTION IN LARGE LANGUAGE MODELS |
| 14460 | SATURATION-AWARE SNAPSHOT COMPRESSIVE IMAGING: THEORY AND ALGORITHM |
| 12841 | SAUNA: SONG-LEVEL AUDIO & USER-LISTENING DATA NEURAL ALIGNMENT |
| 12819 | SAVER: STEGANOGRAPHY-AGNOSTIC VIDEO ERASURE AND RECONSTRUCTION |
| 10607 | SAVGBENCH: BENCHMARKING SPATIALLY ALIGNED AUDIO-VIDEO GENERATION |
| 17993 | SCAF: SOFT CLUSTER-AWARE FUSION WITH AFFINITY ALIGNMENT FOR MULTIVARIATE TIME SERIES FORECASTING |
| 15056 | SCALABLE ALGORITHMS FOR TREE CONNECTIVITY MAXIMIZATION |
| 14551 | SCALABLE BAYESIAN FINE-TUNING OF LLMS FOR MULTI-OBJECTIVE BAYESIAN OPTIMIZATION |
| 14238 | SCALABLE BEAMFORMING FOR VERY LARGE ANTENNA ARRAYS WITHOUT CSI |
| 17731 | SCALABLE EVALUATION FOR AUDIO IDENTIFICATION VIA SYNTHETIC LATENT FINGERPRINT GENERATION |
| 9352 | SCALABLE HESSIAN-FREE PROXIMAL CONJUGATE GRADIENT METHOD FOR NONCONVEX AND NONSMOOTH OPTIMIZATION |
| 4012 | SCALABLE INFORMATION LEAKAGE DETECTION IN IOT WEB INTERFACES |
| 14276 | Scalable LLM-Augmented DRL with Context-Aware Prompt Learning for O-RAN Slicing |
| 13085 | SCALE: Semantic Chunking And Label-delay Engine for Streaming Speech-LLM |
| 15021 | SCALE-AWARE SELF-SUPERVISED LEARNING FOR SEGMENTATION OF SMALL AND SPARSE STRUCTURES |
| 14416 | Scale-covariant spiking wavelets |
| 7504 | SCALEGS: TOWARDS SCALABLE AND EFFICIENT 3D GAUSSIAN SPLATTING |
| 17959 | ScaleMamba: Multi-scale Context Fusion for Training-Free Open-Vocabulary Remote Sensing Segmentation |
| 17632 | Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models |
| 6485 | SCALING AUDIO-VISUAL QUALITY ASSESSMENT DATASET VIA CROWDSOURCING |
| 16450 | SCALING MULTI-TALKER ASR WITH SPEAKER-AGNOSTIC ACTIVITY STREAMS |
| 2969 | SCALING SENTIMENT STRENGTH VIA SENTIMENT MIXING |
| 15325 | Scaling Spoken Language Models with Syllabic Speech Tokenization |
| 1017 | SCATTERFUSION: A HIERARCHICAL SCATTERING TRANSFORM FRAMEWORK FOR ENHANCED TIME SERIES FORECASTING |
| 16039 | SCATTERING MECHANISM-AWARE DEEP LEARNING FRAMEWORK FOR POLARIMETRIC SAR DECOMPOSITION |
| 13102 | SCATTERING TRANSFORMS FOR HETEROPHILIC GRAPHS USING COMPLEMENTARY BASE FILTERS |
| 17117 | SCENE: SEMANTIC-AWARE CODEC ENHANCEMENT WITH NEURAL EMBEDDINGS |
| 13225 | SCENERAG: SCENE-LEVEL RETRIEVAL-AUGMENTED GENERATION FOR VIDEO UNDERSTANDING |
| 13955 | SCFusion: Semantic and Contextual Fusion for Document-level Event Argument Extraction |
| 11878 | SCHK-HTC: SIBLING CONTRASTIVE LEARNING WITH HIERARCHICAL KNOWLEDGE-AWARE PROMPT TUNING FOR HIERARCHY TEXT CLASSIFICATION |
| 15541 | SCHROMIND: MITIGATING HALLUCINATIONS IN MULTIMODAL LARGE LANGUAGE MODELS VIA SOLVING THE SCHRODINGER BRIDGE PROBLEM |
| 15713 | SCI-GR: SEQUENTIAL CONTROLLABLE INPAINTING-BASED GENERATIVE REPLAY FOR CLASS-INCREMENTAL OBJECT DETECTION |
| 16249 | SCLVD: SOURCE CODE VULNERABILITY DETECTION VIA SEMANTIC CONTRASTIVE LEARNING |
| 12412 | SCORE-GUIDED MOTION PLANNING: LEARNING THE GRADIENT FIELD OF PROMISING REGIONS |
| 16285 | SCORENF: SCORE-BASED NORMALIZING FLOWS FOR SAMPLING UNNORMALIZED DISTRIBUTIONS |
| 10306 | SCORE-USOD: A GENERATIVE APPROACH TO UNDERWATER SALIENCY DETECTION |
| 15384 | SD2-MAMBA:SEMANTIC-DENSITY-DRIVEN MAMBA FOR ROBUST DOMAIN GENERALIZATION UNDERWATER OBJECT DETECTION |
| 5292 | SDFM: Spatial-dominated Flow Matching for Stochastic Human Motion Prediction |
| 1850 | SDGF: Fusing Static and Multi-Scale Dynamic Correlations for Multivariate Time Series Forecasting |
| 17807 | SDR-STE: SYNERGISTIC DISENTANGLEMENT AND REFINEMENT FOR PHOTOREALISTIC SCENE TEXT EDITING |
| 10771 | SDRTRANS-FUSE: IMAGE FUSION METHOD BASED ON DEPTHWISE SEPARABLE CONVOLUTION-ENHANCED TRANSFORMER |
| 13251 | Se3DGSMark: Securing Frequency-Based Watermarking with Token Chunking for 3DGS |
| 12708 | Seaf: Semantic-aware Frame Selection for Long-form Video Understanding |
| 6117 | Sealing Text-to-Image Models with Signet: A Lossless and Effective Watermarking Framework |
| 10515 | SEAM-Former: Infusing Waveform Semantics into Transformers for Explainable Myocardial Infarction Localization via 12-lead ECG |
| 9780 | SEARAG: SEMANTIC ENTROPY-GUIDED ADAPTIVE RETRIEVAL FOR MULTI-HOP QUESTION ANSWERING |
| 16055 | SEARCH-ON-GRAPH: EFFECTIVE AND RETRIEVAL-ENHANCED SEARCH ON KNOWLEDGE GRAPH FOR FAITHFUL LARGE LANGUAGE MODEL REASONING |
| 17694 | Secondary source placement for sound field control based on Ising model |
| 12878 | Second-order optimization of variable projection SVM models and road abnormality detection |
| 16261 | SECURE BACKSCATTERING WITH NON-COLLUDING JAMMER AND EAVESDROPPER |
| 2336 | SecureHDC-FL: Addressing Data Heterogeneity in Encrypted Federated Hyperdimensional Computing |
| 6025 | Securing INR-Based Steganography with Quantum Circuit-driven Weight Initialization |
| 17084 | SED: STRUCTURAL ENTROPY BASED SPEECH DISCRETIZATION FOR DISCRETE TOKEN-BASED ASR |
| 11276 | SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper |
| 15332 | SEDPA: SVD-ENHANCED DUAL PATH ATTENTION FOR EFFICIENT INFERENCE |
| 10722 | SEE NO EVIL: SEMANTIC CONTEXT-AWARE PRIVACY RISK DETECTION FOR AR |
| 5130 | SEE WHAT YOU NEED: QUERY-AWARE VISUAL INTELLIGENCE THROUGH REASONING-PERCEPTION LOOPS |
| 1968 | SEEING BEYOND DARKNESS: MULTI-DOMAIN TRANSFORMER FOR LOW-LIGHT IMAGE ENHANCEMENT |
| 16730 | SEEING IS BELIEVING: COMPREHENSIVE SELF-REFLECTIVE EVALUATION SYSTEM FOR LARGE MULTI-MODAL MODELS |
| 11249 | SEEING YOU IN THE NOISE: ACHIEVING DEGRADED OBJECT DETECTION WITH POSITIVE TEXT GUIDANCE |
| 16298 | SEEM: EXPLOITING BLACK-BOX TEXT ATTACKS TO MANIPULATE TOOL SELECTION |
| 13627 | SEFO: SEMANTIC-ENHANCED FUSION FOR ONLINE 3D INSTANCE SEGMENTATION |
| 13249 | SEGMENTWISE PRUNING IN AUDIO-LANGUAGE MODELS |
| 11328 | SELD-MoHA: A Fine-Tuning Method with the Mixture of Heterogeneous Adapters for Sound Event Localization and Detection |
| 15432 | Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition |
| 2564 | Selective Poisoning: Enhancing Backdoor Attacks on Graph Neural Networks with Limited Samples |
| 3456 | Self-Attention Decomposition for Training Free Diffusion Editing |
| 3746 | SELF-CALIBRATING INTEGRATE-AND-FIRE TIME ENCODING MACHINE |
| 7736 | SELF-CHILL: A DIVERGENCE-CONVERGENCE FRAMEWORK FOR MULTI-PATH GENERATION IN LLMS |
| 15956 | SELF-DISTILLATION PROTOTYPE LEARNING FOR WEAKLY SUPERVISED SEMANTIC SEGMENTATION |
| 5607 | SELF-PACED LEARNING FOR ACTIVE VISUAL GROUNDING IN ROBOTIC SCENARIOS |
| 15092 | SELF-PROMPTING WITH DEMO AUGMENTATION FOR OPEN-VOCABULARY ARGUMENT ROLE PREDICTION |
| 15099 | SELF-SUPERVISED DEPTH MAP SUPER-RESOLUTION VIA SPECTRAL-BIAS-AWARE KOLMOGOROV-ARNOLD NETWORK |
| 6749 | SELF-SUPERVISED DEPTH-CONSISTENCY FOR MESH RECONSTRUCTION IN THE LOOP |
| 5720 | SELF-SUPERVISED MONOCULAR DEPTH ESTIMATION VIA RGB-TO-THERMAL CROSS-MODAL DISTILLATION WITH CONFIDENCE AWARENESS |
| 15028 | SELF-SUPERVISED NOTE TRACKING AND MULTI-PITCH ESTIMATION VIA RECONSTRUCTION-BASED LEARNING |
| 3270 | SEMAMIL: SEMANTIC-AWARE MULTIPLE INSTANCE LEARNING WITH RETRIEVAL-GUIDED STATE SPACE MODELING FOR WHOLE SLIDE IMAGES |
| 17134 | SEMANTIC ALIGNMENT FRAMEWORK WITH DISTILLED SOFT LABELS FOR IMAGE-TEXT RETRIEVAL |
| 5638 | SEMANTIC ALIGNMENT INCOMPLETE MULTI-MODAL HASHING |
| 1974 | SEMANTIC ANCHOR TRANSFER FROM SHORT TO LONG SPEECH IN A DISTILLATION-BASED SUMMARIZATION FRAMEWORK |
| 8175 | SEMANTIC AND TEMPORAL-AWARE DISTILLATION FOR CLASS-INCREMENTAL LEARNING |
| 3885 | SEMANTIC COMMUNICATIONS VIA DENOISING DIFFUSION AUTOENCODER MODELS |
| 14938 | SEMANTIC MINING AND CROSS-CENTER SYNERGY FOR CROSS-MODAL PERSON RE-IDENTIFICATION |
| 1922 | Semantic Pilot Design for Data-Aided Channel Estimation Using a Large Language Model |
| 1916 | SEMANTIC REFORMULATION ENTROPY FOR ROBUST HALLUCINATION DETECTION IN QA TASKS |
| 2375 | Semantic Relation-Enhanced CLIP Adapter for Domain Adaptive Zero-Shot Learning |
| 15799 | Semantic Token-Guided Generative Latent Coding for Ultra-Low Bitrate Image Compression |
| 11511 | SEMANTICACHE: EFFICIENT KV CACHE COMPRESSION VIA SEMANTIC CHUNKING AND CLUSTERED MERGING |
| 15718 | SEMANTIC-AWARE 3D SCENE DECOMPOSITION USING SUPERQUADRICS |
| 3636 | SEMANTIC-AWARE ADDRESS SANITIZATION WITH METRIC DIFFERENTIAL PRIVACY |
| 16179 | Semantic-Aware Discrete Online Cross-Modal Hashing |
| 14993 | SEMANTIC-AWARE UAV COMMAND AND CONTROL FOR EFFICIENT IOT DATA COLLECTION |
| 6323 | SEMANTIC-AWARE UAV-ASSISTED DATA COLLECTION IN WPT-ENABLED SPACE–AIR–SEA INTEGRATED NETWORKS |
| 3682 | SEMANTIC-GUIDED MODAL ALIGNMENT FOR MULTIMODAL CARDIOVASCULAR DISEASE DETECTION |
| 9470 | Semantic-Guided Pseudo-Feature Attention Network For Audio-Visual Zero-Shot Learning |
| 10078 | SEMANTIC-GUIDED SLOW-FAST PRUNING OF VISUAL TOKENS FOR VISION-LANGUAGE MODELS |
| 1896 | SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems |
| 14339 | SEMI-SUPERVISED GNN FOR SOUND SOURCE LOCALIZATION WITH PREDICTION INTERVALS |
| 12930 | Sensor Array and Camera Fusion via Unbalanced Optimal Transport for 3D Source Localization |
| 9222 | SENTINEL MODEL AS A TRY: A DUAL-MODEL ARCHITECTURE FOR DEFENDING AGAINST DATA EXTRACTION ATTACKS IN RETRIEVAL-AUGMENTED GENERATION |
| 12416 | SEPARABLE DELAY AND DOPPLER ESTIMATION IN PASSIVE RADAR |
| 18518 | SEPARATE THIS, AND ALL OF THESE THINGS AROUND IT: MUSIC SOURCE SEPARATION VIA HYPERELLIPSOIDAL QUERIES |
| 16521 | SEP-IQA: HARNESSING MLLM SEMANTIC PREFERENCES FOR TRAINING-FREE IMAGE QUALITY ASSESSMENT |
| 14502 | SEP-ST Incorporating Speech Entity Prompt into Large Language models for speech translation |
| 14432 | Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study |
| 10015 | Sequential and Simultaneous Optimization of Microphone Array Geometry and Region-of-Interest Beamforming |
| 2370 | SEQUENTIAL MULTIPLE TESTING WITH THREE HYPOTHESES AND KNOWN NUMBER OF STREAMS FOLLOWING EACH HYPOTHESIS |
| 11960 | SESSION-LEVEL SPOKEN LANGUAGE ASSESSMENT WITH MULTIMODAL FOUNDATION MODEL VIA MULTI-TARGET LEARNING |
| 15101 | SF-CLIP: CLIP-BASED ARBITRARY STYLE IMAGE RETRIEVAL WITH STYLE AND FINE-GRAINED SEMANTIC ENHANCEMENT |
| 6718 | SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration |
| 13801 | SFENET: SPATIAL–FREQUENCY ENTANGLEMENT NETWORK FOR GENERALIZABLE DEEPFAKE DETECTION |
| 6529 | SFGNET: SEMANTIC AND FREQUENCY GUIDED NETWORK FOR CAMOUFLAGED OBJECT DETECTION |
| 3475 | SFKR: Semantic-Freezing for Knowledge-aware Recommendation |
| 6388 | SFL-GS: Semantic-Aware Feature Learning for 3D Gaussian Splatting |
| 8825 | SFLUT: EFFICIENT STYLE FUSION LOOKUP TABLE FOR IMAGE ENHANCEMENT |
| 5678 | SFM-TTS: LIGHTWEIGHT AND RAPID SPEECH SYNTHESIS WITH FLEXIBLE SHORTCUT FLOW MATCHING |
| 1937 | SF-MVD: SENSOR FAILURE-AWARE MULTI-MODAL VEHICLE DETECTION WITH LIDAR-RADAR FUSION IN FOGGY WEATHER |
| 5337 | SFN-NET: INTEGRATING SPATIAL-FREQUENCY FEATURE FUSION INTO DEEP UNFOLDING NETWORK WITH NESTA FOR COMPRESSIVE SENSING |
| 3367 | SFQA: A Comprehensive Perceptual Quality Assessment Dataset for Singing Face Generation |
| 8708 | SGAC: A SCENE GRAPH-GUIDED VISION-LANGUAGE UNDERSTANDING FRAMEWORK FOR ACTION REASONING |
| 15662 | SGA-GNN: Semantic-Guided Adaptive Graph Neural Network for Cold-Start Multimodal Recommendation |
| 2273 | SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians |
| 12186 | SGTE-SNN: Similarity-Guided Temporal Encoding for Radar Emitter Denoising and Recognition |
| 1517 | ShapeVVE: Variable Evaluator for Multivariate Time Series Shapelets Extraction |
| 11583 | Shapley Features for Robust Signal Prediction in Tactile Internet |
| 18329 | SHARED REPRESENTATION LEARNING FOR REFERENCE-GUIDED TARGETED SOUND DETECTION |
| 13215 | SHARED-WEIGHTS EXTENDER AND GRADIENT VOTING FOR NEURAL NETWORK EXPANSION |
| 5145 | SHARK: MODELING SEMANTIC HIERARCHY OF MEDICAL CODE VIA RESIDUAL K-MEANS QUANTIZATION |
| 4576 | SHARPNESS-AWARE MINIMIZATION WITH Z-SCORE GRADIENT FILTERING |
| 14091 | SHEAF LAPLACIAN LOCALIZATION FOR SUBGRAPH SIGNAL DIFFUSION |
| 10765 | SHIELDRAG: PRIVACY-PRESERVING APPROXIMATE NEAREST NEIGHBOR SEARCH FOR RETRIEVAL-AUGMENTED GENERATION SYSTEMS |
| 9567 | Shift- and stretch-invariant non-negative matrix factorization with an application to brain tissue delineation in emission tomography data |
| 5566 | Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training |
| 6329 | SHORT-SEGMENT SPEAKER VERIFICATION WITH PRE-TRAINED MODELS AND MULTI-RESOLUTION ENCODER |
| 9349 | SHRINKV: KEY-VALUE CACHE COMPRESSION WITH PROGRESSIVE HIDDEN STATES SHRINKING TO MITIGATE PREFILLING LATENCY |
| 18967 | Shuffled Linear Regression via Spectral Matching |
| 11640 | SIB-VMAMBA: SELF-SUPERVISED INFRARED DYNAMIC RANGE COMPRESSION VIA STRUCTURED INFORMATION BOTTLENECK |
| 12720 | Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing |
| 14996 | SIE3D: SINGLE-IMAGE EXPRESSIVE 3D AVATAR GENERATION VIA SEMANTIC EMBEDDING AND PERCEPTUAL EXPRESSION LOSS |
| 6294 | Sieve: Computationally Efficient Hierarchical Adversarial Feature Detection in Multi-Agent Perception |
| 15388 | SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models |
| 18868 | SIGNAL RECOVERY USING A SPIKED MIXTURE MODEL |
| 15401 | SIGNAL-DRIVEN JOINT SAFETY–COMFORT OBJECTIVE FOR REAL-TIME TRAJECTORY REPLANNING ON RUTTED ROADS |
| 17435 | SIGNED GRAPH UNLEARNING |
| 3933 | SIGN-SALD: A SKELETON-AWARE LATENT DIFFUSION MODEL FOR TEXT-DRIVEN SIGN LANGUAGE PRODUCTION |
| 11320 | SIMBA: DISENTANGLING GLOBAL-LOCAL AND DYNAMIC DEPENDENCIES FOR TIME SERIES FORECASTING |
| 11509 | SIMILARITY-AWARE ANISOTROPIC SHARPENING FOR TRAINING-FREE TEST-TIME ADAPTATION WITH DISTRIBUTIONAL DISCRIMINANTS |
| 4765 | SIM-MSTNET: SIM2REAL BASED MULTI-TASK SPATIOTEMPORAL NETWORK TRAFFIC FORECASTING |
| 1696 | Simple Aggregation Is Not Enough: Temporal Knowledge Graph Forecasting via Decentralized Multi-Chain Reasoning |
| 13429 | SIMPLICIAL GAUSSIAN MODELS: REPRESENTATION AND INFERENCE |
| 3791 | SIMTOKEN: A SIMPLE BASELINE FOR REFERRING AUDIO-VISUAL SEGMENTATION |
| 13795 | SIMULATORCODER: DNN ACCELERATOR SIMULATOR CODE GENERATION AND OPTIMIZATION VIA LARGE LANGUAGE MODELS |
| 14052 | SIMULSENSE: SENSE-DRIVEN INTERPRETING FOR EFFICIENT SIMULTANEOUS SPEECH TRANSLATION |
| 15986 | Sim-Weather: Efficient Similar Weather Retrieval with Physically Aligned Fingerprints |
| 3763 | SINDIFF: SPOKEN-TO-SIGN LANGUAGE GENERATION WITH TRANSFORMER-BASED DIFFUSION MODEL |
| 12780 | Sing What You Fit: A Perception-based Dataset and Benchmark for Vocal-Song Suitability Analysis |
| 16576 | Sing2Song: An Accompaniment Generation System based on Solo Singing |
| 6268 | Single Image Super-Resolution with Selective Perceptual Refinement and Distribution-Constancy Ranking |
| 14178 | SINGLE VIEW CAMERA-BASED DYNAMIC AIRFLOW SENSING |
| 1446 | Single-DMRS based CFO Estimation for Low Latency Cellular Communications |
| 14274 | SINGLE-MICROPHONE AUDIO POINT SOURCE DISCRIMINATIVE LOCALIZATION FROM REVERBERATION LATE TAIL ESTIMATION |
| 14322 | SINGLE-STEP CONTROLLABLE MUSIC BANDWIDTH EXTENSION WITH FLOW MATCHING |
| 17889 | SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment |
| 18179 | SIREN: SPATIALLY-INFORMED RECONSTRUCTION OF BINAURAL AUDIO WITH VISION |
| 11918 | SIRUP: A DIFFUSION-BASED VIRTUAL UPMIXER OF STEERING VECTORS FOR HIGHLY-DIRECTIVE SPATIALIZATION WITH FIRST-ORDER-AMBISONICS |
| 8651 | SKETCH AND VECTOR-GUIDED 3D SHAPE GENERATION VIA CROSS-MODAL DIFFUSION |
| 11429 | SkyMatte: a High-Quality Dataset for Improving Sky Image Matting |
| 10978 | SLAM: Sequential Learning Signal Modeling for Multi-Concept Knowledge Tracing |
| 9906 | SLAP: SCALABLE LANGUAGE-AUDIO PRETRAINING WITH VARIABLE-DURATION AUDIO AND MULTI-OBJECTIVE TRAINING |
| 4616 | Sliding-Cache VLA: Training-Free Acceleration of Vision Language Action Models via Foreground-Background Decoupling |
| 2755 | SLM-SS: Speech Language Model for Generative Speech Separation |
| 9842 | SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models |
| 15426 | SLOT FILLING AS A REASONING TASK FOR SPEECHLLMS |
| 15516 | SLTN: Shadow and Lighting Transformation Network for Efficient 3D Shape Recognition |
| 15504 | SMALL-SCALE CAMOUFLAGED OBJECT DETECTION FOR AGRICULTURAL AUTOMATION |
| 13834 | Smart Grid Topology Inference via Locational Margin Prices and Graph-based Voltage Interpolation |
| 15810 | SMEKGE: A SHRINKAGE-GUIDED META-ENSEMBLE OF KNOWLEDGE GRAPH EMBEDDING EXPERTS |
| 11801 | SMOGVLM: A SMALL, GRAPH-ENHANCED VISION-LANGUAGE MODEL |
| 4769 | SMOOTHCLAP: SOFT-TARGET ENHANCED CONTRASTIVE LANGUAGE -- AUDIO PRETRAINING FOR AFFECTIVE COMPUTING |
| 2972 | Snore Sound Classification Based on Physiological Features and Adaptive Loss Function |
| 17253 | SODA: A UNIFIED FRAMEWORK FOR JOINT ESTIMATION OF SPEAKER ORIENTATION AND DIRECTION OF ARRIVAL |
| 10413 | Soft Graph Transformer for MIMO Detection |
| 15790 | Soft Super-Pixel Partitioning for Certified Adversarial Robustness |
| 18580 | SoGRE: Boosting Logical Reasoning of LLMs via Solver-Guided Reasoning Enhancement |
| 15768 | SOLVING POISSON INVERSE PROBLEMS WITH DIFFUSION MODELS VIA THE PLUG-AND-PLAY SCHEME |
| 5112 | Solving the Helmholtz Equation via an enhanced Physics-Informed Neural Networks with an Enhanced Adaptive Strategy |
| 10457 | SONAR: Self-Distilled Continual Pre-training for Domain Adaptive Audio Representation |
| 14647 | SOUND SOURCE LOCALIZATION USING RELATIVE CIRCULAR HARMONIC COEFFICIENTS |
| 10271 | SOUNDCOMPASS: NAVIGATING TARGET SOUND EXTRACTION WITH EFFECTIVE DIRECTIONAL CLUE INTEGRATION IN COMPLEX ACOUSTIC SCENES |
| 1979 | SOUNDING HIGHLIGHTS: DUAL-PATHWAY AUDIO ENCODERS FOR AUDIO-VISUAL VIDEO HIGHLIGHT DETECTION |
| 1132 | Sounds That Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling |
| 14307 | SOURCE LOCALIZATION AND ACOUSTIC INVERSION USING BAYESIAN OPTIMIZATION WITH LOCAL GAUSSIAN PROCESSES |
| 17467 | SOURCE SEPARATION FOR A CAPPELLA MUSIC |
| 12934 | SOURCE-FREE CONCEPT BOTTLENECK MODEL ADAPTATION WITH CONFIDENCE-ADAPTIVE CONDITIONAL ENSEMBLE |
| 12647 | SOURCE-FREE DOMAIN ADAPTATION WITH LIGHT-WEIGHT TRANSFORMER AND CONSISTENCY LOSS |
| 17541 | SPACE-TIME ARC ABSTRACTION FOR UAV NETWORK RECONFIGURATION UNDER ADVERSARIAL ELECTRO-OPTICAL DISRUPTION |
| 12400 | SPADE: STRUCTURED PRUNING AND ADAPTIVE DISTILLATION FOR EFFICIENT LLM-TTS |
| 12880 | SPAM: Style Prompt Adherence Metric for Prompt-based TTS |
| 18280 | SPAN PRUNING AND SYNTACTIC AWARENESS FOR ASPECT SENTIMENT TRIPLET EXTRACTION |
| 10841 | SPAR-GS: SINGLE-VIEW POSE-FREE AUTOMOBILE RECONSTRUCTION WITH 3D GAUSSIAN SPLATTING |
| 13489 | SPARKLING TOGETHER: JOINT EDITING FOR MULTI-ACCESSORY VIRTUAL TRY-ON |
| 15798 | SPARSE AND ADAPTIVE SIMILARITY-BASED GRAPH EMBEDDING FOR UNSUPERVISED FEATURE SELECTION |
| 5758 | SPARSE AUTOENCODERS MAKE AUDIO FOUNDATION MODELS MORE EXPLAINABLE |
| 3282 | SPARSE BAYESIAN LEARNING WITH SIMPLE AND INTERPRETABLE DNNS EXPLOITING DATA-DRIVEN PRIORS |
| 6234 | Sparse Gradient Compression for Fine-Tuning Large Language Models |
| 11790 | SPARSE PHYSICAL ADVERSARIAL ATTACK ON VIDEO RECOGNITION BASED ON SPATIOTEMPORAL SEMANTIC REDIRECTION |
| 9442 | Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation |
| 17292 | SPARSE RECOVERY USING TIGHT FRAMES AND MINIMAX CONCAVE PENALTY |
| 17784 | SPARSE SIGNAL RECOVERY BASED ON LOWER-SEMICONTINUOUS 1-WEAKLY-CONVEX ENVELOPE OF MARGINAL FUNCTION |
| 5958 | SPARSE-UP: LEARNABLE SPARSE UPSAMPLING FOR 3D GENERATION WITH HIGH-FIDELITY TEXTURES |
| 4248 | Sparse-view Visual-acoustic Latent Learning for Novel-view Audio Synthesis |
| 9383 | Sparsity Induction for Accurate Post-Training Pruning of Large Language Models |
| 9823 | SPARSITY-AWARE TIME-FREQUENCY-CHIRP RATE REPRESENTATION FOR INCOMPLETE MICRO-DOPPLER SIGNAL |
| 13078 | Sparsity-Induced Reparametrization for Differentially Private Federated Learning |
| 6292 | SPARSITY-REGULARIZED LATENT DIFFUSION MODELS FOR RADAR CLUTTER SUPPRESSION |
| 3718 | SPATIAL COVARIANCE MATRIX RECONSTRUCTION FOR SPEECH ENHANCEMENT IN REVERBERANT MULTI-SOURCE ENVIRONMENTS |
| 15955 | SPATIAL RELATIONSHIP-ENHANCED SELF-SUPERVISED TRAJECTORY LEARNING FOR TRIP RECOMMENDATION |
| 9755 | SPATIAL-CLAP: LEARNING SPATIALLY-AWARE AUDIO–TEXT EMBEDDINGS FOR MULTI-SOURCE CONDITIONS |
| 13666 | SPATIALLY ADAPTIVE GLOBAL-LOCAL MATCHING FOR UNSUPERVISED FACIAL OPTICAL FLOW ESTIMATION |
| 5009 | SPATIALLY AWARE SELF-SUPERVISED MODELS FOR MULTI-CHANNEL NEURAL SPEAKER DIARIZATION |
| 5836 | Spatially Filtered Sparse Bayesian Learning for Direction-of-Arrival Estimation with Leaky-Wave Antennas |
| 3074 | SPATIALLY WEIGHTED FEATURES FOR SMALL OBJECT AERIAL RECOGNITION |
| 11014 | SPATIALLY-COUPLED OTFS SYSTEMS VIA BLOCK MARKOV SUPERPOSITION TRANSMISSION |
| 3728 | SPATIALNET-ECHO: REAL-TIME ACOUSTIC ECHO CANCELLATION VIA INTEGRATED NARROW-BAND AND CROSS-BAND PROCESSING |
| 4544 | Spatiotemporal Alignment for Remote Sensing Image Recovery via Terrain-Aware Diffusion |
| 12844 | SPATIOTEMPORAL GRADIENT DECOUPLING: ADVANCING ONLINE TRAINING OF RECURRENT SPIKING NEURAL NETWORKS |
| 5011 | SPATIOTEMPORAL STATE SPACE MODELING OF DYNAMIC BRAIN CONNECTIVITY IN COGNITIVELY NORMAL INDIVIDUALS AT RISK FOR ALZHEIMER’S DISEASE |
| 11901 | SPC-Seg: A SAM-Guided Progressive Consistency Framework with Anatomical Priors for Scribble-Supervised Segmentation |
| 8122 | SPDMOT: SPD AND EUCLIDEAN SYNERGIES FOR MULTI-OBJECT TRACKING |
| 15972 | Speaker Anonymisation for Speech-based Suicide Risk Detection |
| 9422 | SPEAKER ATTRIBUTED AUTOMATIC SPEECH RECOGNITION USING SPEECH AWARE LLMS |
| 18963 | Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model |
| 4664 | SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion |
| 10391 | SPEAKING CLEARLY: A SIMPLIFIED WHISPER-BASED CODEC FOR LOW-BITRATE SPEECH CODING |
| 11540 | Spectral Logit Sculpting: Adaptive Low-Rank Logit Transformation for Controlled Text Generation |
| 1506 | SPECTRAL OR SPATIAL? LEVERAGING BOTH FOR SPEAKER EXTRACTION IN CHALLENGING DATA CONDITIONS |
| 15420 | SPECTRAL-ALIGNED INFERENCE GUIDANCE FOR DIFFUSION-BASED IMAGE SUPER-RESOLUTION |
| 17549 | SPECTRAMORPH: STRUCTURED LATENT LEARNING FOR SELF-SUPERVISED HYPERSPECTRAL SUPER-RESOLUTION |
| 5228 | SPECTROGRAM EVENT BASED FEATURE REPRESENTATION FOR GENERALIZABLE AUTOMATIC MUSIC TRANSCRIPTION |
| 17634 | SPECTROGRAM RESTORATION AND CLASSIFICATION FRAMEWORK FOR MULTI-PERSON THROUGH-OBSTACLE HUMAN ACTIVITY RECOGNITION |
| 10906 | SPEECH EMOTION RECOGNITION BASED ON HIERARCHICAL TRANSFORMER WITH SHIFTED WINDOWS |
| 14449 | SPEECH QUALITY-BASED LOCALIZATION OF LOW-QUALITY SPEECH AND TEXT-TO-SPEECH SYNTHESIS ARTEFACTS |
| 18969 | Speech-FT: Merging Pre-Trained and Fine-Tuned Speech Representation Models for Cross-Task Generalization |
| 6180 | SPEECHMAPPER: SPEECH-TO-TEXT EMBEDDING PROJECTOR FOR LLMS |
| 5068 | SPGR: SOURCE-PATH GUIDED REPAIR FOR DEEP NEURAL NETWORKS |
| 5819 | S-PHiNe: PHYSICS-INFORMED MULTICHANNEL SPEECH ENHANCEMENT USING SPECTRO-SPATIAL FUSION FOR LOW-SNR CONDITIONS |
| 9437 | SPIDER: A SEMANTIC PRIOR-INFORMED DIFFUSION MODEL FOR ENHANCED MULTIMODAL RECOMMENDATION |
| 17519 | SPIKE-DRIVEN LOW-POWER SPEECH BANDWIDTH EXTENSION |
| 8625 | Spiking Adapter for Event-Based Action Recognition |
| 10008 | SPIKING ATTENTION NETWORK: A HYBRID NEUROMORPHIC APPROACH TO UNDERWATER ACOUSTIC LOCALIZATION AND ZERO-SHOT ADAPTATION |
| 18190 | Spiking Meets Causality: Efficient Granger Causal Discovery with Spiking Neural Networks |
| 12498 | SPIKING NEURAL NETWORKS FOR ORDINAL REGRESSION |
| 14851 | Spiking Self-Organizing Maps with Convergence Guarantees for Unsupervised Radar Signal Deinterleaving |
| 10689 | SPIKING TEMPORAL-ENHANCED NETWORK FOR ZERO-SHOT AUDIO-VISUAL LEARNING |
| 11086 | SPIKING-NEURO-OPTIMAL-TRANSPORT (S-NOT): A ROBUST SNN FRAMEWORK FOR SPATIO-TEMPORAL PATTERN LEARNING |
| 11680 | SP-MCQA: EVALUATING INTELLIGIBILITY OF TTS BEYOND THE WORD LEVEL |
| 7852 | SPOC: Safety-aware planning under partial observability and physical constraints |
| 15080 | S-PRESSO: ULTRA LOW BITRATE SOUND EFFECT COMPRESSION WITH DIFFUSION AUTOENCODERS AND OFFLINE QUANTIZATION |
| 16553 | SPRING REVERB EMULATION WITH HYBRID GATED CONVOLUTIONAL NETWORKS AND STATE SPACE MODELS |
| 15104 | SP-UNet: Robust Single-Snapshot DOA Estimation via Signal Manifold Recovery |
| 17491 | SPUTTER-AWARE FOCUSED PARTICLE BEAM MICROSCOPY |
| 2429 | SQP-Based Passive Coherent Localization via Joint DTD and AOA Measurements |
| 10984 | SRC4SYM: A WEAK SOURCE CODE ENHANCEMENT APPROACH FOR FUNCTION NAME RECOVERY UNDER VERSION MISMATCH SCENARIOS |
| 7670 | SRGAC: Self-Reference Guided Adaptive Classification for Generalizable Deepfake Detection |
| 1562 | SR-Gaussian: Depth-Feature Supervised Sparse Gaussian Splatting with Robust Initialization |
| 11378 | SROGS: Semantic-Regularized Optimization for Pose-Free Gaussian Splatting under Sparse Views |
| 3920 | SSCM: A SPATIAL SEMANTIC CONSISTENT MODEL FOR MULTI-CONTRAST MRI SUPER-RESOLUTION |
| 6785 | SSCR: EFFICIENT MULTIMODAL CLOUD REMOVAL FRAMEWORK VIA EXPLOITING STRUCTURAL SEMANTICS IN SAR |
| 15532 | SSG-DIT: A SPATIAL SIGNAL GUIDED FRAMEWORK FOR CONTROLLABLE VIDEO GENERATION |
| 9767 | SS-JDSC: SINGLE-SPEAKER JAPANESE DYSARTHRIC SPEECH CORPUS |
| 1851 | SSMEUN: SPATIAL-SPECTRAL MAMBA ENHANCED UNFOLDING NETWORK FOR PAN-SHARPENING |
| 15084 | S-SONDO: SELF-SUPERVISED KNOWLEDGE DISTILLATION FOR GENERAL AUDIO FOUNDATION MODELS |
| 16913 | SSRDWater: A Robust and Secure Watermarking of Large Language Models via Sentence-Level Semantic Relational Dependencies |
| 5252 | SSRFNet : Stage-wise SV-Mixer and RedimNet Fusion Network for Speaker Verification |
| 15666 | SSUN: Symmetric Cross-Stage State Interaction Deep Unrolling Network for Hyperspectral and Multispectral Image Fusion |
| 9808 | SSVD-O: PARAMETER-EFFICIENT FINE-TUNING WITH STRUCTURED SVD FOR SPEECH RECOGNITION |
| 9348 | Stability and Generalization of Adversarial Diffusion Training |
| 18879 | STABILIZING RED USING THE KOOPMAN OPERATOR |
| 16056 | Stable Generative Diffusion: Depth-Modality Aware and Adaptive Fusion for Camouflaged Object Detection |
| 4100 | STABLE LAYOUT IMAGE DIFFUSION FOR CONTENT-AWARE LAYOUT GENERATION |
| 10282 | STACODEC: SEMANTIC TOKEN ASSIGNMENT FOR BALANCING ACOUSTIC FIDELITY AND SEMANTIC INFORMATION IN AUDIO CODECS |
| 6060 | STAGED DIFFUSION WITH HYBRID MIXTURE-OF-EXPERTS (MOE) FOR MULTIMODAL SENTIMENT ANALYSIS |
| 11012 | STAGE-WISE ROBUST DISTILLATION FOR SPIKING NEURAL NETWORK TRAINING |
| 14616 | STAGL: A SIGN-TARGET AWARE GRAPH LEARNING FRAMEWORK FOR STANCE DETECTION |
| 14380 | STAMamba: Spatio-Temporal Adaptive State Space Model for 3D Human Pose Estimation |
| 3666 | STANCE-DRIVEN CONTROLLABLE STATEMENT GENERATION VIA COMPOSITIONAL ATTRIBUTE GRAPH PROMPTING WITH LLMS |
| 15863 | STANCEMSA: A MULTIMODAL SELF-ATTENTION FRAMEWORK FOR ACCOUNT-LEVEL IMPLICIT STANCE DETECTION IN SHORT VIDEOS |
| 3179 | STAR Meets Linear Attention: Linear Complexity-Preserving Enhanced Attention Mechanism for Vision Transformer |
| 7368 | STAR-RFF: Spatio-Temporal Sensing Assisted Robust Radio Frequency Fingerprint Identification via STGCN |
| 10584 | STARS: SPATIO-TEMPORAL REDUNDANCY-AWARE SPARSIFICATION FOR SATELLITE VIDEO OBJECT TRACKING |
| 15293 | STATE SPACE CLUSTERING FOR INTERPRETABLE FETAL HEART RATE CHARACTERIZATION |
| 8659 | ST-CFNET: A SPATIO-TEMPORAL ENHANCED NETWORK FOR REAL-TIME 4D PANOPTIC SEGMENTATION |
| 17828 | STCFORMER: ROBUST MALICIOUS TRAFFIC DETECTION VIA SHORT-TERM TRAFFIC PROFILING AND A HYBRID TRANSFORMER |
| 1234 | STD-GAUSSIANS: SPATIO-TEMPORAL DECOUPLED GAUSSIAN SPLATTING FOR SINGLE-VIEW DYNAMIC SCENE RECONSTRUCTION |
| 4600 | STDIFFUSION: A SPATIOTEMPORAL INTERPOLATION-ORIENTED DIFFUSION MODEL FOR SIGNAL SERIES LATENT REPRESENTATION GENERATION |
| 9756 | STDLC: Video Coding for Machines with Spatial-Temporal Decoupled Latent Composition |
| 2228 | STEMPHONIC: ALL-AT-ONCE FLEXIBLE MULTI-STEM MUSIC GENERATION |
| 4900 | STEP-STA: STEPWISE TOKEN-LEVEL SPATIO-TEMPORAL ATTENTION FOR ENCRYPTED TRAFFIC CLASSIFICATION |
| 15198 | StereoFoley: Object-Aware Stereo Audio Generation from Video |
| 11752 | STEREOPHONIC ACOUSTIC ECHO CANCELLATION USING AN IMPROVED AFFINE PROJECTION ALGORITHM WITH ADAPTIVE MULTIPLE SUB-FILTERS |
| 11361 | STHC-GS: SPATIO-TEMPORAL HIGH-FREQUENCY CONSISTENCY CONSTRAINTS FOR DYNAMIC URBAN SCENE RECONSTRUCTION |
| 10188 | ST-HNTM: Joint Speech-Text Neural Topic Modeling on the Hypersphere |
| 16755 | Still Thinking or Stopped Talking? Dialogue Silence Intention Classification Using Multimodal Large Language Model |
| 15364 | STIMULI-AWARE EMOTION ADAPTOR FOR ENHANCING LLM IN AFFECTIVE EXPLANATION CAPTIONING |
| 10360 | STNID: A SPATIOTEMPORAL MAMBA-BASED NEURAL IMPLICIT DYNAMICS MODEL FOR POINT CLOUD FORECASTING |
| 16524 | STOCHASTIC SHADOW DESCENT: TRAINING PARAMETRIZED QUANTUM CIRCUITS WITH SHADOWS OF GRADIENTS |
| 1503 | STPHYNET: PHYSICS-INTEGRATED SPATIOTEMPORAL NEURAL NETWORKS FOR EFFICIENT PDE SIMULATION |
| 16315 | STRATEGIC USER OFFLOADING AND SERVICE PROVIDER PRICING IN MOBILE EDGE COMPUTING |
| 16860 | STR-DIFFSEP: STREAMABLE DIFFUSION MODEL FOR SPEECH SEPARATION |
| 4115 | STREAMING SPEECH RECOGNITION WITH DECODER-ONLY LARGE LANGUAGE MODELS AND LATENCY OPTIMIZATION |
| 15697 | StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding |
| 9970 | STREAMMARK: A DEEP LEARNING-BASED SEMI-FRAGILE AUDIO WATERMARKING FOR PROACTIVE DEEPFAKE DETECTION |
| 14185 | STREAM-VOICE-ANON: ENHANCING UTILITY OF REAL-TIME SPEAKER ANONYMIZATION VIA NEURAL AUDIO CODEC AND LANGUAGE MODELS |
| 13771 | STRESS PREDICTION FROM TEMPORAL EMOTION TRAJECTORIES IN CLINICAL PATIENT-PHYSICIAN CONVERSATIONS |
| 18984 | STRIDE CONVERSION ALGORITHMS FOR CONVOLUTIONAL LAYERS AND ITS APPLICATION TO SAMPLING-FREQUENCY-INDEPENDENT DEEP NEURAL NETWORKS |
| 13324 | Strong Basin of Attraction for Unmixing Kernels With the Variable Projection Method |
| 14317 | STRONG CONVEXITY OF (KERNEL) LAPLACIAN REGULARIZATION |
| 11142 | STRPNET: VIDEO SALIENT OBJECT DETECTION VIA SPATIO-TEMPORAL SCENE RELATION PROPAGATION |
| 12035 | STRUCTSEMKGC: A STRUCTURE-AWARE KNOWLEDGE GRAPH COMPLETION METHOD WITH ADAPTIVE MULTIMODAL FUSION |
| 5080 | STRUCTSOUP: A UNIFIED ADAPTIVE FRAMEWORK FOR STRUCTURE-AWARE RETRIEVAL-AUGMENTED GENERATION |
| 13903 | STRUCTURAL DECOUPLING-DRIVEN LOSS-AWARE FILTER PRUNING |
| 12164 | STRUCTURE-AWARE ADVERSARIAL PURIFICATION: DYNAMIC MASKING AND ATTRIBUTION REFINEMENT IN DIFFUSION MODELS |
| 2420 | Structure-Aware Corpus Construction and User-Perception-Aligned Metrics for Large-Language-Model Code Completion |
| 15636 | STRUCTURE-AWARE DIFFUSION SCHRÖDINGER BRIDGE |
| 12210 | Structured Persona-Driven Authentic and Controllable Moment Generation |
| 16408 | STRUCTURED PRUNING VIA MULTI-OBSERVATION ITERATIVE HARD THRESHOLDING |
| 12913 | STRUCTURE-DRIVEN GRAPH NEURAL NETWORKS FOR SCALABLE MULTI-GROUP MULTICASTING |
| 10339 | STRUCTURE-GUIDED GRAPH REFINEMENT NETWORK FOR FACIAL AESTHETIC ASSESSMENT |
| 9866 | STRUFUZZ: ENHANCING STATEFUL PROTOCOL FUZZING WITH LLM-DRIVEN SEED STRUCTURE AWARENESS |
| 1346 | ST-WaveLLM: Spatio-Temporal Traffic Forecasting via Wavelet-Enhanced Large Language Models |
| 9845 | StyHarmo: Efficient Style-Specific Video Generation with Music Synchronization |
| 17014 | STYLE ATTACK DISGUISE: WHEN FONTS BECOME A CAMOUFLAGE FOR ADVERSARIAL INTENT |
| 10667 | STYLEBENCH: EVALUATING SPEECH LANGUAGE MODELS ON CONVERSATIONAL SPEAKING STYLE CONTROL |
| 6559 | StyleDecoupler: Generalizable Artistic Style Disentanglement |
| 15948 | STYLE-DISENTANGLED DIFFUSION FOR CONTROLLABLE AND IDENTITY-GENERALIZED SPEECH-DRIVEN BODY MOTION GENERATION |
| 4254 | STYLEPITCHER: GENERATING STYLE-FOLLOWING AND EXPRESSIVE PITCH CURVES FOR VERSATILE SINGING TASKS |
| 13817 | STYLIC: STYLIZED CAPITONS USING CONTRASTIVE VISION-LANGUAGE MODELS |
| 11871 | Stylized Text-to-Motion Synthesis via Multi-condition Latent Diffusion |
| 5982 | STYMAM:A MAMBA-BASED GENERATOR FOR ARTISTIC STYLE TRANSFER |
| 17123 | SUBARRAY ORTHOGONAL MATCHING PURSUIT FOR BLOCK-SPARSE SIGNALS WITH UNKNOWN BLOCK PARTITIONS |
| 7848 | SUBGRAPH LOCALIZATION IN THE SUBBANDS FOR PARTIALLY SPOOFED SPEECH DETECTION |
| 2075 | SUBJECTIVE EVALUATION OF FRAME RATE IN BITRATE-CONSTRAINED LIVE STREAMING |
| 7858 | Sub-Nyquist Frequency Estimation via Amplitude-Encoding Filters |
| 11533 | SUBQRAG: SUB-QUESTION DRIVEN DYNAMIC GRAPH RAG |
| 1257 | Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions |
| 16641 | SUBSPACE HYBRID ADAPTIVE FILTERING FOR PHONOCARDIOGRAM SIGNAL DENOISING |
| 14284 | SUBTRACTIVE MODULATIVE NETWORK WITH LEARNABLE PERIODIC ACTIVATIONS |
| 18995 | SUDAFIELD: SUBJECT- AND DATASET-AWARE NEURAL FIELD FOR HRTF MODELING |
| 19030 | SUFFICIENT CONDITIONS FOR CONVERGENCE OF RHT AND RHTP ALGORITHMS BASED ON RIC OF ORDER 2S |
| 10151 | SUMMARY ON THE MULTILINGUAL CONVERSATIONAL SPEECH LANGUAGE MODEL CHALLENGE: DATASETS, TASKS, BASELINES, AND METHODS |
| 15651 | Sum-Rate Maximization for DMA-Based Wideband Near-Field Systems with Lorentzian Responses |
| 10034 | SUNAC: Source-aware Unified Neural Audio Codec |
| 14162 | SUPER MONOTONIC ALIGNMENT SEARCH |
| 13803 | SUPERFICIAL TEXTURE SUPPRESSION NETWORK FOR GENERALIZED DEEPFAKE DETECTION |
| 5635 | SUPER-MULTIPLICATIVE NMF AND GENERIC ALGORITHMS WITH IMPROVED CONVERGENCE SPEED |
| 3599 | SUPERPIXEL INTEGRATED GRIDS FOR FAST IMAGE SEGMENTATION |
| 18018 | Superpixel-informed Continuous Low-Rank Tensor Representation for Multi-Dimensional Data Recovery |
| 17934 | SUPER-RESOLUTION GUIDED DIFFUSION NETWORK FOR MULTI-RESOLUTION REMOTE SENSING CHANGE DETECTION |
| 5862 | SUPERVISED MAKEUP TRANSFER WITH A CURATED DATASET: DECOUPLING IDENTITY AND MAKEUP FEATURES FOR ENHANCED TRANSFORMATION |
| 11524 | SUPPORT VECTOR DATA DESCRIPTION FOR RADAR TARGET DETECTION |
| 2092 | Support-Conditioned Dynamic Convolution for Few-Shot Object Detection |
| 4540 | SURE: SYNERGISTIC UNCERTAINTY-AWARE REASONING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS |
| 6762 | SURE-MED: SYSTEMATIC UNCERTAINTY REDUCTION FOR ENHANCED RELIABILITY IN MEDICAL REPORT GENERATION |
| 2769 | Surpassing Oneself: Self-Distillation from Past Failures |
| 10586 | SUSTAINABLE INCENTIVE FOR MODEL TRADING IN DECENTRALIZED AND PERSONALIZED FEDERATED LEARNING VIA DAG-BLOCKCHAIN CONSENSUS |
| 17100 | SUSTAIN-VLM: COORDINATED DATA AND COMPUTE FOR LOW-CARBON VISION LANGUAGE MODEL FINE-TUNING |
| 17089 | SVCF: ENABLING ZERO-SHOT CORRECTION OF REASONING STEPS IN MULTI-MODAL LARGE LANGUAGE MODELS |
| 7345 | SVPO: A LLM REINFORCEMENT LEARNING METHOD BASED ON STEPWISE VALUE ESTIMATION |
| 3147 | SWAN: Boosting Image Super-Resolution with Stochastic Wavelet Attention |
| 5733 | SWIN-DS: A DEEPLY SUPERVISED TRANSFORMER WITH GEOMETRIC GUIDANCE FOR ROBUST LACUNE DETECTION |
| 10795 | SWITCHCODEC: ADAPTIVE RESIDUAL-EXPERT SPARSE QUANTIZATION FOR HIGH-FIDELITY NEURAL AUDIO CODING |
| 18015 | SYMBOLIC GOAL-GUIDED INTRINSIC CURRICULA FOR LONG-HORIZON REINFORCEMENT LEARNING |
| 13794 | Symphony Rendering: MIDI and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers |
| 6409 | Synaspot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy |
| 5275 | SYNCHRONOUS SECONDARY PATH MODELING AND KRONECKER-FACTORIZED ADAPTIVE ALGORITHM FOR MULTICHANNEL ACTIVE NOISE CONTROL |
| 6277 | SYNCSPEECH: EFFICIENT AND LOW-LATENCY TEXT-TO-SPEECH BASED ON TEMPORAL MASKED TRANSFORMER |
| 17106 | SYNERGISTIC FOURIER-WAVELET NEURAL OPERATOR |
| 8053 | SYNERGISTIC HYBRID ATTENTION NETWORK: AN ENHANCED MULTI-MODAL INTERACTION ARCHITECTURE FOR EFFICIENT VISUAL QUESTION ANSWERING |
| 7314 | SYNERGISTIC STRUCTURE-AWARE GUIDED NETWORK FOR BINARY PROTOCOL FORMAT INFERENCE |
| 11391 | SYNERGY MAP–GUIDED SPECTRAL–DOMAIN ENHANCED NETWORK FOR CAMOUFLAGED OBJECT DETECTION |
| 10773 | SYNERGYWARPNET: ATTENTION-GUIDED COOPERATIVE WARPING FOR NEURAL PORTRAIT ANIMATION |
| 3704 | SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding |
| 4974 | SynthCloner: Synthesizer Preset Conversion via Factorized Codec with ADSR Envelope Control |
| 11183 | Synthesis-Driven Contrastive Learning for Unpaired Unsupervised Cloth-Changing Person Re-Identification |
| 3229 | Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition |
| 3272 | SYNTHETIC DATA DOMAIN ADAPTATION FOR ASR VIA LLM-BASED TEXT AND PHONETIC RESPELLING AUGMENTATION |
| 4889 | SYNTHETIC YET STRIKING? ASSESSING VOCAL CHARISMA IN TTS VIA PERCEPTUAL AND ALGORITHMIC MEASURES |
| 18022 | SYSSEC: SECURING SYSTEM CALLS VIA MULTI-INTENT ALIGNMENT |
| 4448 | TABULAR SYNTHESIS BASED ON BI-DIRECTIONAL FEEDBACK CONDITIONAL DIFFUSION MODELS |
| 13123 | TacExpert: a Pseudo-Temporal Mixture-of-Experts Framework for Open-Set Tactile Object Recognition |
| 5433 | Tackling Data Heterogeneity in Parameter-Efficient Federated Fine-Tuning of Large Language Models |
| 11260 | Tackling Sparse Interactions In Multimodal Session-Based Recommendation |
| 16038 | TAG: TEMPORAL-AWARE AUDIO GENERATION VIA LLM-GUIDED MANUAL CONSTRUCTION AND ATTENTION CONTROL |
| 14620 | TAGARELA - A PORTUGUESE SPEECH DATASET FROM PODCASTS |
| 5123 | Tag-U: Improving Social Media Role-Playing via Multimodal Tagging Strategies |
| 12152 | TAILORED TEXT INTEGRATION AND SEMANTIC DIFFERENTIAL ENHANCEMENT FOR FEW-SHOT CLASS-INCREMENTAL LEARNING |
| 11940 | TALPS: A FRAMEWORK FOR ADAPTIVE LEARNING OF TACTICS, TECHNIQUES, AND PROCEDURES CLASSIFICATION WITH LARGE LANGUAGE MODELS |
| 14755 | Taming Audio VAEs via Target-KL Regularization |
| 16346 | TAMING THE LIGHT: ILLUMINATION-INVARIANT SEMANTIC 3DGS-SLAM |
| 4729 | TAML: TASK-AWARE METRIC-DRIVEN META LEARNING FOR FEW-SHOT ACTION RECOGNITION |
| 14912 | TARA: TOKEN-AWARE RECALIBRATION AND ATTENTION FOR EXPLAINABLE PATHOLOGY REPORTS CLASSIFICATION |
| 14657 | TARGET DETECTION IN TWO-CHANNEL PASSIVE RADARS WITH INTER-RECEIVER COLLABORATION |
| 17739 | Target speaker anonymization in multi-speaker recordings |
| 8119 | Targeted Fine-Tuning of DNN-Based Receivers via Influence Functions |
| 12422 | TARGETED POOLED LATENT-SPACE STEGANALYSIS APPLIED TO GENERATIVE STEGANOGRAPHY, WITH A FIX |
| 16985 | TARGET-SPEAKER LLM-ASR WITH SPEAKER-AWARE SPEECH ENCODER |
| 9269 | Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis |
| 17495 | Task-Aware LLM Council with Adaptive Decision Pathways for Complex Task Support |
| 2380 | Task-Aware Modality-as-Experts Fusion of NIR and Microscopic Image for Textile Analysis |
| 16678 | Task-Oriented Sound Privacy Preservation for Sound Event Detection via End-to-End Adversarial Multi-task Learning |
| 10670 | TASR: TRAINING TASK-ALIGNED SUPER-RESOLUTION WITH A SEMANTIC AUTOENCODER LOSS |
| 14878 | TASS: TASK ALIGNED SUBSPACE SELECTION FOR KNOWLEDGE PRESERVING FINE-TUNING |
| 11577 | TASU: TEXT-ONLY ALIGNMENT FOR SPEECH UNDERSTANDING |
| 1196 | TAU: A BENCHMARK FOR CULTURAL SOUND UNDERSTANDING BEYOND SEMANTICS |
| 17772 | T-CACHE: FAST INFERENCE FOR MASKED GENERATIVE TRANSFORMER-BASED TTS VIA PROMPT-AWARE FEATURE CACHING |
| 9415 | T-CAMEL: TEAMMATE-CAUSAL-AWARE MULTI-AGENT LEARNING |
| 7753 | TCC: USING TOPIC CHAINS TO COMPRESS PROMPTS FOR LONG DOCUMENT QUESTION ANSWERING |
| 12184 | TCSST: A TEMPORAL KNOWLEDGE GRAPH EMBEDDING MODEL BASED ON SPACE SPIRAL TIMELINE |
| 3215 | TCT-LOSS: SHAPE-AWARE TIME-SERIES FORECASTING WITH A ZERO-SHOT TIME-COLUMN TRANSFORMER AUTOENCODER |
| 15673 | TC-zarr: an analysis-ready storage approach for Tropical Cyclone Data |
| 1008 | TDATTACK: ENHANCING TRANSFERABILITY OF UNRESTRICTED ADVERSARIAL EXAMPLES VIA TEXT-DRIVEN DIFFUSION |
| 5006 | Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing |
| 8042 | TEACHER-STUDENT DIFFUSION MODEL FOR TEXT-DRIVEN 3D HAND MOTION GENERATION |
| 16851 | Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation |
| 7949 | TEACHING THE TEACHERS: BOOSTING UNSUPERVISED DOMAIN ADAPTATION IN SPEECH RECOGNITION BY ENSEMBLE UPDATE |
| 6195 | TEAMo: Trait and Emotion Aware Motion Generation in 3D Human |
| 5087 | Tell Me What to Track: Infusing Robust Language Guidance for Enhanced Referring Multi-Object Tracking |
| 4613 | TEMPORAL DISTILLATION FOR MUSIC REPRESENTATION LEARNING |
| 17194 | Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks |
| 15876 | TEMPORAL-AWARE HETEROGENEOUS GRAPH REASONING WITH MULTI-VIEW FUSION FOR TEMPORAL QUESTION ANSWERING |
| 16069 | Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification |
| 16962 | Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis |
| 10839 | TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies? |
| 11451 | TENSLORA: TENSOR ALTERNATIVES FOR LOW-RANK ADAPTATION |
| 2895 | Tensorformer-Based Multimodal Depression Detection from Concurrent Gait Patterns and Physiological Signals |
| 8170 | TER: TEST-TIME EMBEDDING REGULARIZATION FOR CALIBRATION-AWARE PROMPT TUNING IN VISION-LANGUAGE MODELS |
| 15783 | TEST TIME ADAPTATION FOR SPEECH EMOTION RECOGNITION |
| 12252 | TESTAGENT: AUTOMATIC BENCHMARKING AND EXPLORATORY INTERACTION FOR EVALUATING LLMS IN VERTICAL DOMAINS |
| 4969 | TESTING THE EFFICIENT CODING HYPOTHESIS BEYOND HUMANS: THE AUDITORY KERNELS OF BAT VOCALIZATIONS |
| 10933 | TEST-TIME ADAPTATION FOR SPEECH ENHANCEMENT VIA MASK POLARIZATION |
| 12342 | TEST-TIME SCALING FOR AUDITORY COGNITION IN AUDIO LANGUAGE MODELS |
| 15513 | TEXT SEMANTICS-GUIDED DUAL-TEACHER KNOWLEDGE DISTILLATION FOR PARTIALLY RELEVANT VIDEO RETRIEVAL |
| 10581 | Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment |
| 5124 | Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment |
| 14598 | TEXT2TRY3D: TEXT-GUIDED 3D GARMENT GENERATION ON PARAMETRIC HUMAN MODELS |
| 10295 | TEXT-GUIDED DOMAIN ADAPTATION VIA DEEP MANIFOLD CONSTRAINTS AND NEIGHBORHOOD PROPAGATION |
| 11456 | TEXT-GUIDED ROI-AWARE PRUNING METHOD FOR LANGUAGE EMBEDDED 3DGS |
| 2854 | TEXTLESSRAG: END-TO-END VISUAL DOCUMENT RAG BY SPEECH WITHOUT TEXT |
| 17017 | TEXT-ONLY ADAPTATION IN LLM-BASED ASR THROUGH TEXT DENOISING |
| 13675 | TEXT-PRIOR-DRIVEN FEATURE INTERACTION FOR OPEN-VOCABULARY OBJECT DETECTION |
| 15813 | TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution |
| 19091 | TEXT-TO-SPEECH WITH LIP SYNCHRONIZATION BASED ON SPEECH-ASSISTED TEXT-TO-VIDEO ALIGNMENT AND MASKED UNIT PREDICTION |
| 3766 | TFF-ID: A TRAINING-FREE FRAMEWORK FOR INVERTIBLE AND DIVERSIFIED FACE ANONYMIZATION |
| 3736 | TF-GS: TEMPORAL-FREQUENCY FUSED GAUSSIAN SPLATTING FOR DYNAMIC VIEW SYNTHESIS |
| 8510 | TF-MAMBANET: A TEMPORAL AND FREQUENCY FUSED BIDIRECTIONAL MAMBA ARCHITECTURE FOR PPG FOUNDATION MODEL |
| 4215 | T-GEMS: TEXT-GUIDED EXIT MODULES FOR DECREASING CLIP IMAGE ENCODER |
| 4073 | TGPO: TREE-GUIDED PREFERENCE OPTIMIZATION FOR ROBUST WEB AGENT REINFORCEMENT LEARNING |
| 7952 | THANGKA: TEXT-HIERARCHICAL ALIGNMENT FOR NARRATIVE-GUIDED KNOWLEDGE-AWARE ASSOCIATION |
| 14028 | THE 3RD CLARITY PREDICTION CHALLENGE: A MACHINE LEARNING CHALLENGE FOR HEARING AID SPEECH INTELLIGIBILITY PREDICTION |
| 14398 | The Achilles' Heel of Angular Margins: A Chebyshev Polynomial Fix for Speaker Verification |
| 14281 | THE CURIOUS CASE OF VISUAL GROUNDING: DIFFERENT EFFECTS FOR SPEECH- AND TEXT-BASED LANGUAGE ENCODERS |
| 1365 | The Example Saturation Effect: The Hidden Role of Input Difficulty in In-Context Learning |
| 11199 | The Hidden Cost of Caching: Analyzing the Energy Expenditure of Placement in Cache-aided MISO Networks |
| 10060 | THE IMPACT OF ABSTRACT AND OBJECT TAGS ON IMAGE PRIVACY CLASSIFICATION |
| 11703 | The Impact of Antenna Spacing on DOA Estimation Error in Dense Arrays |
| 15778 | THE IMPACT OF AUDIO WATERMARKING ON AUDIO ANTI-SPOOFING COUNTERMEASURES |
| 19053 | THE INVERSE DRUM MACHINE: SOURCE SEPARATION THROUGH JOINT TRANSCRIPTION AND ANALYSIS-BY-SYNTHESIS |
| 12767 | The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs |
| 10567 | THE RL-R CHAT DATASET: EGOCENTRIC CONVERSATIONS AMONG FAMILIAR INTERLOCUTORS FOR MULTI-MODAL HEARING AUGMENTATION TECHNOLOGY |
| 3841 | THE ROLE OF PROSODIC AND LEXICAL CUES IN TURN-TAKING WITH SELF-SUPERVISED SPEECH REPRESENTATIONS |
| 10055 | THE SINGING VOICE CONVERSION CHALLENGE 2025: FROM SINGER IDENTITY CONVERSION TO SINGING STYLE CONVERSION |
| 15186 | The Stability-Plasticity Dilemma Revisited: A Brain-Inspired Continual Learning Method with Representation-Function Separation |
| 13662 | THE SYNERGISTIC ROLE OF AUDIO AND LARGE VIDEO-LANGUAGE MODEL IN SOURCE-FREE VIDEO DOMAIN ADAPTATION |
| 18120 | THEMIS: Bridging Documentation and Code to Uncover Access Control Vulnerabilities in GitLab |
| 3039 | Theory and application of circular relative harmonic coefficients |
| 5179 | THINK-AUGMENTED FUNCTION CALLING: IMPROVING LLM PARAMETER ACCURACY THROUGH EMBEDDED REASONING |
| 12801 | THINK-CLIP-SAMPLE: SLOW-FAST FRAME SELECTION FOR VIDEO UNDERSTANDING |
| 1996 | Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning |
| 13482 | THINKING WHILE LISTENING: SIMPLE TEST TIME SCALING FOR AUDIO CLASSIFICATION |
| 1908 | THREATSAGE: A MODULAR BENCHMARK FOR LLM-ORCHESTRATED BLUE TEAM DEFENSE OPERATIONS |
| 11323 | THREE SECONDS IS SUFFICIENT: A MULTI-PRONGED FRAMEWORK FOR MODEL-BASED SPEAKER ADAPTATION IN ASR UNDER DATA-SCARCE CONDITIONS |
| 6843 | THREE-STAGE DIFFUSION POLICY OPTIMIZATION FOR OFFLINE REINFORCEMENT LEARNING |
| 9936 | TICL: TEXT-EMBEDDING KNN FOR SPEECH IN-CONTEXT LEARNING UNLOCKS SPEECH RECOGNITION ABILITIES OF LARGE MULTIMODAL MODELS |
| 16230 | TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice |
| 5019 | TIERED TREATMENT EFFECT DECOMPOSITION FOR MULTI-TASK UPLIFT MODELING |
| 2299 | TIGHT REGRET BOUNDS FOR MEAN-REVERTING LINEAR BANDITS VIA RECURSIVE STATE ESTIMATION |
| 8112 | TIGHTNESS OF SEMIDEFINITE RELAXATION FOR QUATERNION-BASED ROTATION SYNCHRONIZATION PROBLEMS |
| 11768 | TIMBRE-AWARE AUDIO DIFFERENCE CAPTIONING FOR ANOMALOUS MACHINE SOUNDS WITHOUT PAIRED TRAINING DATA VIA SYNTHETIC PERTUBATIONS |
| 12236 | TIMBRE-BASED PRETRAINING WITH PSEUDO-LABELS FOR MULTI-INSTRUMENT AUTOMATIC MUSIC TRANSCRIPTION |
| 16013 | Time Series Anomaly Detection with Quantum Variational Methods and Set Covering |
| 17720 | TIME SERIES ATTRIBUTES GUIDED PRETRAINING DATA SELECTION FOR TIME SERIES FOUNDATION MODELS |
| 10293 | TIME SERIES DECOMPOSITION AND FUSION-BASED GRANGER CAUSALITY NETWORK FOR NONLINEAR CAUSAL INFERENCE |
| 16526 | TIME VS. LAYER: LOCATING PREDICTIVE CUES FOR DYSARTHRIC SPEECH DESCRIPTORS IN WAV2VEC 2.0 |
| 18032 | TIME-AWARE MULTI-EXPONENTIAL ANALYSIS TO OPTIMIZE ANALYTE IDENTIFICATION USING ZIF-8-90 |
| 5135 | TIMEDIFF: LEVERAGING DIFFERENTIAL DOMAIN REPRESENTATIONS FOR LONG TIME SERIES FORECASTING |
| 1766 | TIME-DOMAIN SYNTHESIS OF VIRTUAL SOUND SOURCE WITHIN PERSONALIZED SOUND ZONE USING A LINEAR LOUDSPEAKER ARRAY |
| 12635 | TIME-FREQUENCY ANALYSIS OF NON-UNIFORMLY SAMPLED SIGNALS VIA SAMPLE DENSITY ADAPTATION |
| 14282 | Time-Shifted Token Scheduling for Symbolic Music Generation |
| 3850 | TINYDROP: TINY MODEL GUIDED TOKEN DROPPING FOR VISION TRANSFORMERS |
| 9359 | TINYMU: A COMPACT AUDIO-LANGUAGE MODEL FOR MUSIC UNDERSTANDING |
| 17675 | TIPS Over Tricks: Simple Prompts for Effective Zero-Shot Anomaly Detection |
| 4721 | TIWNet : A Template-based Real-time Image Watermarking Method Using Invertible Neural Network |
| 7475 | TLDIFFGAN: A LATENT DIFFUSION-GAN FRAMEWORK WITH TEMPORAL INFORMATION FUSION FOR ANOMALOUS SOUND DETECTION |
| 10165 | TLD-PGD: TWO-STAGE LOW FREQUENCY DEGRADATION ADVERSARIAL ATTACK IN HYPERSPECTRAL IMAGE CLASSIFICATION |
| 13923 | TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation |
| 15497 | T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS |
| 2122 | TMS:Text-Prompted Multi-channel Speech Separation on Smart Glasses |
| 10519 | TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation |
| 14576 | TNET: TERRACE CONVOLUTIONAL DECODER NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION |
| 3390 | TOEPLITZ UNLABELED SENSING |
| 15774 | TOKCOINFER: TOKEN-LEVEL MULTI-MODEL COLLABORATION FOR ENERGY-EFFICIENT LLM INFERENCE |
| 7777 | TOKENCHAIN: A DISCRETE SPEECH CHAIN VIA SEMANTIC TOKEN MODELING |
| 11916 | TOP-1 COMPRESSION SUFFICES FOR FEDERATED UNLEARNING WITH THE HELP OF ADAPTIVE ERROR FEEDBACK |
| 11129 | TOPOBIND: MULTI-MODAL PREDICTION OF ANTIBODY-ANTIGEN BINDING FREE ENERGY VIA SEQUENCE EMBEDDINGS AND STRUCTURAL TOPOLOGY |
| 16142 | Topological Growth Serialization-based Mamba for 3D Point Clouds |
| 19012 | TOPOLOGICAL PERSISTENCE OF THE NEURAL EMBEDDING OF THE ARCHETYPAL SUBSPACE |
| 13805 | Topological Signal Processing for 3D Point Cloud Data |
| 16714 | TOPT: TASK-ORIENTED PROMPT TUNING FOR URBAN REGION REPRESENTATION LEARNING |
| 11401 | TOS: A TEAM OF SPECIALISTS ENSEMBLE FRAMEWORK FOR STEREO SOUND EVENT LOCALIZATION AND DETECTION WITH DISTANCE ESTIMATION IN VIDEO |
| 10631 | TOUR-TUPLE-7: A FINE-GRAINED 7-TUPLE GENERATIVE ASPECT-BASED SENTIMENT ANALYSIS BENCHMARK FOR TOURISM SERVICE QUALITY |
| 1997 | TOWARD CONVERSATIONAL USER INTERFACE VIA VOICE COMMAND CORRECTION |
| 8087 | Toward Cross-Dataset Clothes-Changing Re-Identification via Efficient Decoupled Adaptive Matching |
| 16131 | Toward Faithful Explanations in Acoustic Anomaly Detection |
| 19141 | Toward Generalized Iris Presentation Attack Detection: A Mask-and-Distill Mixture of Experts Approach |
| 12226 | TOWARD NON-PARAMETERIZED TIME SERIES EMBEDDING FOR EFFICIENT FORECASTING: A DYNAMICAL SYSTEM PERSPECTIVE |
| 11123 | TOWARD ROBUST AND EFFICIENT BEAT TRACKING VIA BEAT-AWARE ATTENTION |
| 11150 | TOWARD ROBUST IMITATION LEARNING VIA SEARCH-BASED INVERSE DYNAMICS WITH LIMITED EXPERT DEMONSTRATIONS |
| 11419 | TOWARD ROBUST NODE-LEVEL GRAPH OOD GENERALIZATION WITH SEMANTIC AWARENESS |
| 1403 | TOWARD ROBUST SAR SHIP DETECTION: DOMAIN-INVARIANT LEARNING VIA PHYSICS-DRIVEN AUGMENTATION AND RETROSPECTIVE ALIGNMENT |
| 15272 | Towards 2D Texture Binding via Personalized Text-to-Image Generation based on Texture-Object Decoupling |
| 15439 | TOWARDS ACCURATE QUANTIZATION FOR LARGE VISION-LANGUAGE MODELS VIA ZEROTH-ORDER GRADIENT OPTIMIZATION AND SECTIONED LOGARITHMIC QUANTIZER |
| 11809 | TOWARDS BLIND DATA CLEANING: A CASE STUDY IN MUSIC SOURCE SEPARATION |
| 10853 | TOWARDS BUILDING SPEECH LARGE LANGUAGE MODELS FOR MULTITASK UNDERSTANDING IN LOW-RESOURCE LANGUAGES |
| 4095 | TOWARDS DATA DRIFT MONITORING FOR SPEECH DEEPFAKE DETECTION IN THE CONTEXT OF MLOPS |
| 14844 | TOWARDS DISTANCE-AWARE SYNTHETIC AUDIO MIXTURES FOR UNIVERSAL SOUND SEPARATION |
| 13551 | Towards Dynamic World Model Generation with Monocular Video |
| 5544 | TOWARDS EFFECTIVE NEGATION MODELING IN JOINT AUDIO-TEXT MODELS FOR MUSIC |
| 6830 | Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances |
| 14115 | TOWARDS EVENT-DRIVEN RADARS: SPECTRAL SUPER-RESOLUTION AND HARDWARE |
| 3960 | Towards Explainable Privacy Preservation in Federated Learning via Shapley Value-Guided Noise Injection |
| 15257 | TOWARDS FAIR ASR FOR SECOND LANGUAGE SPEAKERS USING FAIRNESS PROMPTED FINETUNING |
| 9792 | TOWARDS LIGHTWEIGHT ADAPTATION OF SPEECH ENHANCEMENT MODELS IN REAL-WORLD ENVIRONMENTS |
| 1998 | Towards Memory-based Temporal Coherence in Pose-free 3D Gaussian Splatting |
| 9466 | TOWARDS MORE ACCURATE CROSS-MODAL VIDEO OBJECT DETECTION WITH LOWER COMPUTATIONAL COST |
| 5711 | TOWARDS MULTI-VIEW HIERARCHICAL VIDEO-TO-PIANO GENERATION WITH MIDI GUIDANCE |
| 13459 | TOWARDS NOISE-ROBUST SPEECH INVERSION THROUGH MULTI-TASK LEARNING WITH SPEECH ENHANCEMENT |
| 9663 | TOWARDS OBJECT-LEVEL MULTIMODAL TASK PLANNING FOR LONG-TERM ROBOTIC MANIPULATION WITH VISION LANGUAGE MODEL AND BEHAVIOR TREE |
| 6578 | TOWARDS OPEN-WORLD HUMAN-OBJECT INTERACTION REASONING WITH MULTIMODAL LARGE LANGUAGE MODEL |
| 17315 | TOWARDS ORTHOGRAPHICALLY-INFORMED EVALUATION OF SPEECH RECOGNITION SYSTEMS FOR INDIAN LANGUAGES |
| 10227 | TOWARDS PRACTICAL DIFFERENTIAL PRIVACY FOR DIFFUSION-BASED DATASET DISTILLATION |
| 13224 | TOWARDS PRIVACY-PRESERVING FINE-GRAINED VISUAL CLASSIFICATION VIA HIERARCHICAL LEARNING FROM LABEL PROPORTIONS |
| 14623 | TOWARDS REAL-TIME GENERATIVE SPEECH RESTORATION WITH FLOW-MATCHING |
| 12724 | TOWARDS RELIABLE TIME SERIES FORECASTING UNDER FUTURE UNCERTAINTY: AMBIGUITY AND NOVELTY REJECTION MECHANISMS |
| 1870 | TOWARDS ROBUST CROSS-COMPRESSION DEEPFAKE DETECTION |
| 14174 | TOWARDS ROBUST DYSARTHRIC SPEECH RECOGNITION: LLM-AGENT POST-ASR CORRECTION BEYOND WER |
| 9863 | Towards Robust Visual Continual Learning with Multi-Prototype Supervision |
| 2164 | TOWARDS SELF-EVALUATION OF SYCOPHANTIC HALLUCINATIONS IN MATHEMATICAL REASONING |
| 4483 | TOWARDS SEMANTICALLY FAITHFUL TEXT-TO-TIME SERIES GENERATION VIA AGENTS AND SPECTRAL CONDITIONING |
| 3350 | TOWARDS TRANSFERABLE CROSS-MODAL ADVERSARIAL ATTACKS VIA SEMANTIC CONSISTENCY DISRUPTION |
| 14112 | TPEformer: Temporal Patch Embedding Transformer |
| 4117 | TPFLOW: TOWARDS TOPOLOGICALLY-AWARE MOLECULAR GRAPH GENERATION VIA DISCRETE FLOW MATCHING |
| 5589 | TPP-LLM: TIME SERIES POPULARITY PREDICTION VIA LLM EMPOWERED BY TEXTUAL PROTOTYPE AND PROMPT |
| 14940 | TRACE: A TRIPLET-BASED ROBUSTNESS-AUGMENTED CAUSAL ENCODER FOR CAUSALITY GRAPH EVENT PREDICTION |
| 10400 | TRACE: OPTIMIZING MULTI-HOP QUESTION ANSWERING VIA CONFIDENCE-GUIDED RETRIEVAL ASSIMILATION |
| 15763 | TRACE: TRACKING AND ADDRESSING CROSS-DOMAIN CONFLICT FOR ENHANCED SEMANTIC SEGMENTATION |
| 16590 | Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework |
| 9113 | TRAFFIC ANOMALY DETECTION VIA DIMENSION-AWARE MULTI-VIEW ALIGNMENT |
| 9862 | TRAFFICGS: SPARSE-VIEW GAUSSIAN SPLATTING FOR DYNAMIC ROADSIDE TRAFFIC SCENE MODELING AND STREAMING |
| 6304 | TRAFFICHTG: REVOLUTIONIZING NETWORK TRAFFIC GENERATION WITH HIERARCHICAL TRANSFORMERS |
| 5884 | TRAFFICMOE: ADAPTIVE MULTI-PERSPECTIVE FEATURE FUSION FOR ENHANCING MALICIOUS TRAFFIC GENERAL DETECTION CAPABILITY |
| 3569 | Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio |
| 13782 | TRAIN2EXPLAIN: TRAINING OPTIMIZATION FOR EXPLANATION IMPROVEMENT |
| 15735 | Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction |
| 11017 | Training Flow Matching Models with Reliable Labels via Self-Purification |
| 13128 | TRAINING QUANTIZED SPIKING NEURAL NETWORKS WITH LOW-BIT GRADIENTS |
| 4570 | TRAINING STUDENTS FOR RESEARCH WITH QUANTUM AI SIMULATION TOOLS |
| 10006 | Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial ReaSoning |
| 14452 | TRAINING-FREE FRAMEWORK FOR DEFENDING UNSAFE IMAGE SYNTHESIS ATTACK |
| 16133 | TRAINING-FREE INFERENCE-TIME SCALING FOR AUDIO SOURCE SEPARATION |
| 4364 | Training-Free Layered Framework for Geometry-Aware Multilingual Text Editing |
| 13443 | TRAINING-FREE MULTIMODAL GUIDANCE FOR VIDEO TO AUDIO GENERATION |
| 15693 | Training-Free Prompt Compression via Shallow-Layer Structural-Semantic Fusion |
| 2247 | TRAINING-FREE SIGNAL RECONSTRUCTION UNDER DRFM JAMMING VIA SKEWNESS-ADAPTIVE GATING AND GEOMETRIC REFINEMENT |
| 10479 | TRAINING-FREE TEST-TIME ADAPTATION WITH BROWNIAN DISTANCE COVARIANCE IN VISION-LANGUAGE MODELS |
| 16656 | Trajectory-Enhanced Camera Motion Understanding for Multimodal Large Language Models |
| 14145 | TrajRS: Towards Certified Robustness in Pedestrian Trajectory Prediction |
| 9641 | Transfer Learning for Paediatric Sleep Apnoea Detection Using Physiology-Guided Acoustic Models |
| 14141 | Transfer Learning in Kernel Adaptive Filters with Dynamic Embeddings |
| 16518 | Transferable Adversarial Attacks against Visual Language Models via Staged Semantic Reframing |
| 3024 | TRANSFERABLE AUDIO LOTTERY TICKETS: GRADIENT ACCUMULATION FOR EXTREME SPARSITY |
| 10938 | TransferAnything: Arbitrary Style Transfer via Frequency-Aware Latent Optimization in Diffusion Models |
| 2861 | TRANSFORMER AND LATENT SCALABLE CONTRASTIVE LEARNING FOR GHOST-FREE HIGH DYNAMIC RANGE IMAGING |
| 2059 | TRANSFORMER IMAGE QUALITY ASSESSMENT WITH MULTIMODAL FEATURES FUSION |
| 15824 | TRANSPONDER-ASSISTED DIRECT TRACKING OF TIME-VARYING EMITTERS UNDER EXTREME BLOCKAGE |
| 7016 | TRANSWNET: DUAL-STREAM HIERARCHICAL FEATURE INTEGRATED NETWORK FOR IMAGE FORGERY LOCALIZATION |
| 12980 | Tree Reparameterized Belief Propagation for Gaussian Markov Random Fields |
| 6212 | TriAD: Tri-head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection |
| 5756 | Triage knowledge distillation for speaker verification |
| 7054 | TRIAGE: HIERARCHICAL VISUAL BUDGETING FOR EFFICIENT VIDEO REASONING IN VISION-LANGUAGE MODELS |
| 14667 | TRI-ATTENTION FUSION: JOINT TEMPORAL-SPECTRAL AND BIDIRECTIONAL MODELING FOR SPEECH SPOOFING DETECTION |
| 15538 | TRICON-FAIR: TRIPLET CONTRASTIVE LEARNING FOR MITIGATING SOCIAL BIAS IN PRE-TRAINED LANGUAGE MODELS |
| 7827 | TriFusion: A Self-Supervised Learning Enhanced Dual-Level Multimodal Framework for Traffic Classification |
| 13778 | Tri-Hybrid Beamforming Design for Integrated Sensing and Communications |
| 9967 | TRIM: A SELF-SUPERVISED VIDEO SUMMARIZATION FRAMEWORK MAXIMIZING TEMPORAL RELATIVE INFORMATION AND REPRESENTATIVENESS |
| 3479 | TRINET: A NOVEL AND MEMORY-EFFICIENT TENSOR NETWORK FOR HIGHER-ORDER TENSOR DECOMPOSITION |
| 17111 | TRIVISIONTALK: MANDARIN LIP-TO-SPEECH SYNTHESIS WITH MULTIPLE VISUAL PATTERN INFORMATION AND MULTI-SCALE HYBRID ATTENTION |
| 11705 | TRJSCC: Text-guided ROI-aware Deep Joint Source-Channel Coding |
| 1645 | TRM-UNET: AN EFFICIENT EVENT-GUIDED MOTION DEBLURRING NETWORK |
| 12355 | TRUST YOUR DEMONSTRATIONS: ENHANCING LLM-DRIVEN TEXT STYLE TRANSFER VIA CLUSTER-GUIDED SEMANTIC CONTRASTIVE DECODING |
| 1971 | TRUSTWORTHY AI VIA UNBIASED VALIDATION: FAIR MODEL SELECTION FOR PARKINSON’S DETECTION FROM VOICE |
| 4698 | TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS FOR CLIENT-SIDE CONTENT SCANNING |
| 16293 | TSAD-RAG: Boosting MLLM Time Series Anomaly Detection Via Retrieval-Augmented Generation |
| 4398 | TS-Agent: Reinforcement Learning Empowered LLM Agents for Financial Time Series Forecasting |
| 14948 | TSAR: Scalable Time Series Forecasting Meets Next-Scale Autoregressive Modeling |
| 9632 | TSQLORA: TOWARDS SENSITIVITY AND QUALITY LOW-RANK ADAPTATION FOR EFFICIENT FINE-TUNING |
| 17076 | TTA: TRANSCRIBE, TRANSLATE AND ALIGNMENT FOR CROSS-LINGUAL SPEECH REPRESENTATION |
| 10899 | TTCE: TRACING TIME CYCLES FOR TEMPORAL KNOWLEDGE GRAPH EMBEDDINGS |
| 19008 | TTSOPS: A CLOSED-LOOP CORPUS OPTIMIZATION FRAMEWORK FOR TRAINING MULTI-SPEAKER TTS MODELS FROM DARK DATA |
| 11442 | TUNING-FREE FIDELITY-CONSTRAINED DECODING FOR FAITHFUL LEGAL REASONING WITH OPEN-DOMAIN LARGE LANGUAGE MODELS |
| 18937 | Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization |
| 1736 | TUP: A Transferable Model for Wireless User Positioning with Few-Shot Learning |
| 4686 | TURN THE BLACK-BOX WHITE: INFERRING AGGREGATION RULES IN FEDERATED LEARNING THROUGH MULTI-TRIGGER GEOMETRY-AWARE BACKDOORS |
| 5928 | TURNING DATA HETEROGENEITY INTO A BACKDOOR SHIELD FOR PERSONALIZED FEDERATED LEARNING |
| 3787 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles |
| 13529 | TVP-UNET: THRESHOLD VARIANCE PENALTY U-NET FOR VOICE ACTIVITY DETECTION IN DYSARTHRIC SPEECH |
| 3541 | TWO-STAGE ATTENTION TRIPLE ENHANCEMENT AND U-KAN DIFFUSION FOR FEW-SHOT KNOWLEDGE GRAPH COMPLETION |
| 8065 | TWO-STAGE AUDIO-VISUAL TARGET SPEAKER EXTRACTION SYSTEM FOR REAL-TIME PROCESSING ON EDGE DEVICE |
| 12632 | TWO-STAGE CATEGORY-ANCHORED FACTORIZED DISENTANGLEMENT FOR CROSS-DOMAIN RECOMMENDATION |
| 6807 | TWO-STAGE GRID OPTIMIZATION FOR GROUP-WISE QUANTIZATION OF LLMS |
| 17801 | TWO-STAGE LANGUAGE MODEL FRAMEWORK FOR ACOUSTIC ECHO CANCELLATION |
| 5241 | TWO-TIMESCALE CHANNEL ESTIMATION FOR RIS-ASSISTED NEAR-FIELD COMMUNICATION |
| 11506 | UAFD: Unified Adaptive Frequency-Domain Detector for Generalizable Deepfake Detection |
| 3547 | UA-TTRL: Uncertainty-Aware Test-Time Reinforcement Learning |
| 13768 | UAV PATH PLANNING FOR RADIO FREQUENCY SIGNAL LOCALIZATION VIA CRLB-BASED UNCERTAINTY MINIMIZATION |
| 6693 | U-DAVI: UNCERTAINTY-AWARE DIFFUSION-PRIOR-BASED AMORTIZED VARIATIONAL INFERENCE FOR IMAGE RECONSTRUCTION |
| 5832 | UDT: UNSUPERVISED DUAL-PATH TARGET FEATURE REFINEMENT FOR ROBUST SAR AUTOMATIC TARGET RECOGNITION |
| 1746 | UJCODEC: AN END-TO-END UNET-STYLE CODEC FOR JOINT SPEECH COMPRESSION AND ENHANCEMENT |
| 2451 | ULTRALIGHT IPM-DAE: AN ULTRA-LIGHTWEIGHT ECG DENOISING AUTOENCODER VIA PARALLEL MAMBA AND MULTI-SCALE FUSION |
| 15822 | Ultra-Reliable Risk-Aggregated Sum Rate Maximization via Model-Aided Deep Learning |
| 3546 | ULTRASONIC IN-EAR DETECTION FOR EARBUDS |
| 4930 | UMA-SPLIT: UNIMODAL AGGREGATION FOR BOTH ENGLISH AND MANDARIN NON-AUTOREGRESSIVE SPEECH RECOGNITION |
| 19015 | U-MusT: A Unified Framework for Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio |
| 9913 | UMV: A MIXTURE-OF-EXPERTS VISION TRANSFORMER WITH MULTI-SPECTROGRAM FUSION FOR UNDERWATER SHIP NOISE CLASSIFICATION |
| 15549 | UNBOUNDED HAT: AN E2E BOUNDARY-INDEPENDENT AUTOMATIC SYLLABLE STRESS DETECTION WITH HIERARCHICAL ATTENTION BASED TIME-COMPRESSION |
| 5129 | UNCERTAINTY FACTORIZATION WITH LINEAR-TIME SEQUENTIAL MODELING FOR SPEAKER EMBEDDING |
| 10451 | UNCERTAINTY-AWARE 3D EMOTIONAL TALKING FACE SYNTHESIS WITH EMOTION PRIOR DISTILLATION |
| 15458 | Uncertainty-Aware Iterative Graph Reasoning for Document Event Causality Identification |
| 15787 | UNCERTAINTY-AWARE MULTIMODAL ADAPTIVE FUSION WITH MIXTURE-OF-EXPERTS FOR ZERO-SHOT VIDEO OBJECT SEGMENTATION |
| 9623 | Uncertainty-Aware Multi-Scale Feature Fusion with Transformer for Time Series Prediction |
| 1709 | UNCERTAINTY-AWARE PROTOTYPE LEARNING WITH VARIATIONAL INFERENCE FOR FEW-SHOT POINT CLOUD SEGMENTATION |
| 14190 | Uncertainty-Aware Sequence Classification with Probabilistic Selective State-Space Models |
| 16166 | UNCERTAINTY-AWARE WIRELESS LOCALIZATION WITH DIFFUSION MODELS |
| 5022 | UNCERTAINTY-GUIDED DOMAIN AUGMENTATION FOR DOMAIN GENERALIZATION IN SPEAKER VERIFICATION AND ANTI-SPOOFING |
| 6009 | UNCERTAINTY-GUIDED SPLATTING: A DUAL ADAPTIVE OPTIMIZATION FRAMEWORK FOR 3D SCENE RECONSTRUCTION |
| 14341 | Unconditional flow-based time series generation with equivariance-regularised latent spaces |
| 5052 | UNCOVERING PRIVACY RISKS IN TIMEGAN: NOVEL AND EFFECTIVE MEMBERSHIP INFERENCE ATTACKS |
| 15795 | UNDERSTANDING FRECHET SPEECH DISTANCE FOR SYNTHETIC SPEECH QUALITY EVALUATION |
| 2649 | UNDERSTANDING GENERALIZATION IN DECENTRALIZED LEARNING: A TIME-UNIFORM AND TOPOLOGY-AWARE ANALYSIS |
| 15443 | UNDERSTANDING PERSONALITY BASES |
| 11314 | UNDERSTANDING TEXTUAL CAPABILITY DEGRADATION IN SPEECH LLMS VIA PARAMETER IMPORTANCE ANALYSIS |
| 3343 | Understanding the Improvement in Model Quantization |
| 2830 | UNDERSTANDING THE STRENGTHS AND WEAKNESSES OF SSL MODELS FOR AUDIO DEEPFAKE MODEL ATTRIBUTION |
| 1608 | Unfettered Ink: Restoring Legibility and Stylistic Consistency in Immersive Air Handwriting |
| 2284 | UNICAMO: A UNIVERSAL PHYSICAL CAMOUFLAGE FOR MULTISPECTRAL OBJECT DETECTOR |
| 2328 | UniDiff-TTS: Aligner-Free Diffusion Speech Synthesis with Duration Guidance |
| 6838 | UNI-EDIT: CONSISTENT TEXT-DRIVEN EDITING FOR 3D GAUSSIAN SPLATTING |
| 18863 | Unified Analysis of Decentralized Gradient Descent: A Contraction Mapping Framework |
| 17145 | Unified Compression via Adaptive Bits Selection and structural reparameterization |
| 12425 | UNIFIED MODELING OF LAGGED AND SYNCHRONIZED RELATIONS IN MULTIVARIATE TIME SERIES FORECASTING |
| 12770 | UNIFIED MULTIMODAL AND MULTILINGUAL RETRIEVAL VIA MULTI-TASK LEARNING WITH NLU INTEGRATION |
| 18867 | UNIFIED NEURAL BACKDOOR REMOVAL WITH ONLY FEW CLEAN SAMPLES THROUGH UNLEARNING AND RELEARNING |
| 5500 | UNIGEO: A UNIFIED 3D INDOOR OBJECT DETECTION FRAMEWORK INTEGRATING GEOMETRY-AWARE LEARNING AND DYNAMIC CHANNEL GATING |
| 10464 | UNIKGLM : A UNIFIED LLM-DRIVEN MULTI-TASK REASONING FRAMEWORK FOR KNOWLEDGE GRAPH COMPLETION |
| 5971 | UNILORA: A UNIFIED FRAMEWORK FOR EFFICIENT AND SECURE LORA MANAGEMENT IN MULTI-TENANT LLM INFERENCE |
| 13585 | UNIMOCOLA: AN UNCERTAINTY-GUIDED MULTI-MODEL COLLABORATION FRAMEWORK FOR CROSS-LINGUAL NAMED ENTITY RECOGNITION |
| 7808 | UNIPACT: A MULTIMODAL FRAMEWORK FOR PROGNOSTIC QUESTION ANSWERING ON RAW ECG AND STRUCTURED EHR |
| 1238 | UniSTFormer: Unified Spatio-Temporal Lightweight Transformer for Efficient Skeleton-Based Action Recognition |
| 2182 | UNIVERSAL 3D POINT CLOUD ATTACK USING GAUSSIAN DISTRIBUTION MODELING |
| 1775 | UNIVERSAL DENOISING PATTERNS FOR DIFFUSION IMAGE DETECTION |
| 18968 | Universal Vessel Segmentation for Multi-Modality Retinal Images |
| 15913 | UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching |
| 16205 | UNLABELED TARGET-DOMAIN CALIBRATION FOR TABULAR CLASSIFIERS UNDER LABEL SHIFT |
| 12556 | UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion |
| 16924 | Unleashing the power of global-local synergy for multivariate time series forecasting |
| 4724 | Unleashing Vision Transformer Potential in Image Quality Assessment via Global-Local Adaptive Interaction |
| 12659 | UNLOCKING HIDDEN POTENTIAL IN POINT CLOUD NETWORKS WITH ATTENTION-GUIDED GROUPING-FEATURE COORDINATION |
| 19137 | UNLOCKING OFF-THE-GRID SPARSE RECOVERY WITH UNLIMITED SENSING: SIMULTANEOUS SUPER-RESOLUTION IN TIME AND AMPLITUDE |
| 1989 | UNLOCKING THE POTENTIAL OF SOCIAL MEDIA PREFERENCE FOR ANNOTATION-EFFICIENT LARGE LANGUAGE MODEL ALIGNMENT |
| 11824 | UNMIXX: UNTANGLING HIGHLY CORRELATED SINGING VOICES MIXTURES |
| 6929 | UNPAIRED INCREMENTAL HASHING FOR CROSS-MODAL RETRIEVAL IN NON-STATIONARY ENVIRONMENTS |
| 10174 | UNROLLED GRAPH NEURAL NETWORKS FOR CONSTRAINED OPTIMIZATION |
| 6248 | UNSEEN BUT NOT UNKNOWN: USING DATASET CONCEALMENT TO ROBUSTLY EVALUATE SPEECH QUALITY ESTIMATION MODELS |
| 13844 | UNSUPERVISED ADAPTATION OF AI DOA ESTIMATORS VIA DOWNSTREAM TRACKING |
| 14419 | Unsupervised Discovery and Analysis of the Vocal Repertoires and Patterns of Select Corvid Species |
| 2569 | UNSUPERVISED DOMAIN ADAPTATION WITH CONTRASTIVE LEARNING FOR CROSS-MODALITY AND CROSS-SITE MEDICAL IMAGE SEGMENTATION |
| 10886 | Unsupervised Learning To Hash with A Soft Winner-Take-All Mechanism |
| 1258 | UNSUPERVISED LEXICON LEARNING FROM SPEECH IS LIMITED BY REPRESENTATIONS RATHER THAN CLUSTERING |
| 8182 | UNSUPERVISED PROJECTION VIA CONVEX-HULL RADIUS MINIMIZATION FOR COMPACT CLUSTER REPRESENTATIONS |
| 11949 | UNSUPERVISED SENTENCE STRESS DETECTION IN L2 SPOKEN ENGLISH VIA ITERATIVE ADAPTATION OF WHISPER ASR FRAMEWORK |
| 16282 | Unsupervised TBD-MIG Detectors in Nonhomogeneous Clutter |
| 1600 | Unsupervised UAV Detection from Sparse LiDAR via Temporal Dispersion Signatures |
| 12859 | UNWRAPDIFF: A CONDITIONAL DIFFUSION MODEL FOR INSAR PHASE UNWRAPPING |
| 2269 | UP TO 36X SPEEDUP: MASK-BASED PARALLEL INFERENCE PARADIGM FOR KEY INFORMATION EXTRACTION IN MLLMS |
| 2956 | UP-AF: URBAN PERCEPTION VIA ACTIVE FINETUNING |
| 12579 | UPLINK PERFORMANCE OF MULTIPLE RIS-ASSISTED CELL-FREE MASSIVE MIMO SYSTEMS JOINTLY RIS PHASE SHIFT OPTIMIZATION |
| 2454 | USCTNET: A DEEP UNFOLDING NUCLEAR-NORM OPTIMIZATION SOLVER FOR PHYSICALLY CONSISTENT HSI RECONSTRUCTION |
| 10389 | USER-LEVEL SAFETY ALIGNMENT |
| 9729 | USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization |
| 1472 | Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks |
| 15842 | Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration |
| 8855 | UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model |
| 2337 | UVT-LM: UNIFYING VISUAL AND TACTILE PERCEPTION WITH LANGUAGE MODEL |
| 16380 | V2A-DPO: OMNI-PREFERENCE OPTIMIZATION FOR VIDEO-TO-AUDIO GENERATION |
| 14787 | V2R2: Hierarchical Dual-View Consistency with Dual-Representations for Network Alignment |
| 15286 | VAE-GENERATED SECOND-ORDER GLOBAL PROTOTYPES FOR HETEROGENEOUS FEDERATED LEARNING |
| 1642 | VARDet: Visual Autoregressive Multi-Scale Prediction and CLIP-Guided Semantics for UAV Small-Object Detection |
| 17546 | VARIABLE METRIC STOCHASTIC LINE-SEARCH FOR PRIMAL-DUAL HYBRID GRADIENT |
| 15594 | variance & greediness: a comparative study of metric-learning losses |
| 14327 | VARIATIONAL BAYESIAN FILTERING USING GAUSSIAN MIXTURES |
| 17262 | Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition |
| 4951 | VARIATIONAL NEAREST NEIGHBOR SIGN LANGUAGE TRANSLATION |
| 17573 | VBX FOR END-TO-END NEURAL AND CLUSTERING-BASED DIARIZATION |
| 5119 | VCE: A ZERO-COST HALLUCINATION MITIGATION METHOD OF LVLMS VIA VISUAL CONTRASTIVE EDITING |
| 14024 | VChangeCodec: An ultra Low-Complexity Neural Speech Codec with Built-in Voice Changer for Customized Real-time Communication |
| 3360 | VDCKAN: A KOLMOGOROV-ARNOLD DRIVEN MODEL FOR VOLUMETRIC DATA COMPRESSION |
| 14602 | Vector Quantization-based Watermarking for Autoregressive Generated Images |
| 6088 | Vector Quantized Intent Contrastive Learning for Sequential Recommendation |
| 9972 | VELOCITY POTENTIAL NEURAL FIELD FOR EFFICIENT AMBISONICS IMPULSE RESPONSE MODELING |
| 18913 | Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance |
| 6115 | VIA SCORE TO PERFORMANCE: EFFICIENT HUMAN-CONTROLLABLE LONG SONG GENERATION WITH BAR-LEVEL SYMBOLIC NOTATION |
| 16350 | VIB2SOUND: SEPARATION OF MULTIMODAL SOUND SOURCES |
| 10490 | Video Hashing via Transformer and KAN for Retrieval |
| 15138 | VIEWLEARNER: GNN-DRIVEN PRE-BUILT VIEWS FOR MULTI-TABLE NL2SQL |
| 14491 | VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation |
| 13870 | VIRTUAL CONSISTENCY FOR AUDIO EDITING |
| 12138 | VISA: Virtual Identity for Secure Face Anonymization |
| 17177 | VISCORTEX: HIERARCHICAL CORTICAL FUSION FOR FMRI IMAGE DECODING |
| 1107 | VISION KAN: TOWARDS AN ATTENTION-FREE BACKBONE FOR VISION WITH KOLMOGOROV-ARNOLD NETWORKS |
| 16366 | VISION MEETS LANGUAGE: ADAPTIVE JOINT PRUNING FOR EFFICIENT MULTIMODAL MODELS |
| 16024 | VISION-ENHANCED TIME SERIES FORECASTING BY DECOMPOSED FEATURE EXTRACTION AND COMPOSED RECONSTRUCTION |
| 9717 | Visual Contrastive Guidance for Improving Generalization of Gaze Estimation |
| 15345 | VISUAL KEYS TO SYMPHONIES: LATENT DIFFUSION FOR MULTI-SCENE VIDEO-TO-MUSIC GENERATION |
| 4083 | VISUAL SALIENCY STEERING DISTILLATION FOR MULTIMODAL CHAIN-OF-THOUGHT REASONING |
| 9614 | VISUAL-AIDED AIRCRAFT ILS DEVIATION ESTIMATION USING RAO-BLACKWELLIZED PARTICLE FILTERS ON LIE GROUPS |
| 18976 | VISUAL-INFORMED SPEECH ENHANCEMENT USING ATTENTION-BASED BEAMFORMING |
| 10179 | VisualPrism: Disperse-and-Focus Token Compression |
| 18172 | VITEX: VISUAL TEXTURE CONTROL FOR MULTI-TRACK SYMBOLIC MUSIC GENERATION VIA DISCRETE DIFFUSION MODELS |
| 2620 | VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink |
| 16931 | VIVIDVOICE: A UNIFIED FRAMEWORK FOR SCENE-AWARE VISUALLY-DRIVEN SPEECH SYNTHESIS |
| 3231 | VKT+: ENHANCING VISUAL KNOWLEDGE TRACING VIA NEURAL NETWORK ARCHITECTURE SEARCH |
| 3873 | VKTNet: A Hybrid Visual Kolmogorov-Arnold Transformer Network for Pedestrian Intention and Trajectory Prediction |
| 6421 | VL-ANODIFF:VISION-LANGUAGE GUIDED DIFFUSION FOR FEW-SHOT INDUSTRIAL ANOMALY SYNTHESIS |
| 3551 | VMambaMorph: a 3D Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module |
| 7844 | VMSP: Video-to-Music Generation with Two-Stage Alignment and Synthesis |
| 12139 | VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays |
| 8003 | VNODE: A PIECEWISE CONTINUOUS VOLTERRA NEURAL NETWORK |
| 4631 | VOCALNET-M2: ADVANCING LOW-LATENCY SPOKEN LANGUAGE MODELING VIA INTEGRATED MULTI-CODEBOOK TOKENIZATION AND MULTI-TOKEN PREDICTION |
| 17003 | VOICING-GUIDED DECOMPOSITION AND RECOMPOSITION FOR FEW-SHOT KEYWORD-INCREMENTAL LEARNING |
| 13419 | VOROGEOMNET: A GRAPH NEURAL NETWORK BASED ON VORONOI TESSELLATION FOR PROPERTY PREDICTION OF POROUS MATERIAL |
| 16796 | VOTING-BASED PITCH ESTIMATION WITH TEMPORAL AND FREQUENTIAL ALIGNMENT AND CORRELATION AWARE SELECTION |
| 15507 | VoxGuard: Evaluating user and attribute privacy in speech via Membership Inference Attacks |
| 15057 | VOXMORPH: SCALABLE ZERO-SHOT VOICE IDENTITY MORPHING VIA DISENTANGLED EMBEDDINGS |
| 4854 | VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency |
| 18058 | VP-GNN: A UNIFIED GRAPH FRAMEWORK FOR VARIABLE-WISE AND PATCH-WISE MODELING OF IRREGULAR CLINICAL TIME SERIES |
| 9940 | VQEzy: AN OPEN-SOURCE DATASET FOR PARAMETER INITIALIZATION IN VARIATIONAL QUANTUM EIGENSOLVERS |
| 7833 | VSE: VARIATIONAL STATE ESTIMATION OF COMPLEX MODEL-FREE PROCESS |
| 3716 | VSTYLE: A BENCHMARK FOR VOICE STYLE ADAPTATION WITH SPOKEN INSTRUCTIONS |
| 15067 | VT-Heads: Voice Cloning and Talking Head Generation From Text Based on V-DiT |
| 15238 | VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content |
| 16028 | WAM-UNET: A HYBRID U-NET ARCHITECTURE WITH WMAA AND AEMAMBA FOR MEDICAL IMAGE SEGMENTATION |
| 14450 | WARM: WEIGHT ALIGNMENT AND REMAPPING FOR EFFECTIVE NEURAL NETWORK REINITIALIZATION |
| 16523 | WARP QUANTIFICATION ANALYSIS: A FRAMEWORK FOR PATH-BASED SIGNAL ALIGNMENT METRICS |
| 12456 | WATEMP: ACOUSTIC-BASED NON-CONTACT WATER TEMPERATURE MEASUREMENT SYSTEM USING SMARTPHONES |
| 9568 | WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation |
| 13839 | Watermark Self-Repair Model: Robust Multimodal Watermark Generation via Anomaly-Aware Mask Restoration |
| 15391 | WAV2LEV: PREDICTING LEVENSHTEIN EDIT OPERATION SEQUENCES FOR FINE-GRAINED ESTIMATION OF AUTOMATIC SPEECH RECOGNITION ERROR |
| 13375 | WaveFormer: Cross-modal Fusion with Robust Multi-view Flow Representation for Encrypted Traffic Classification |
| 16080 | WaveFormer: Wavelet-Enhanced Transformer for Multi-Scale Representation Learning in Time Series Forecasting |
| 18918 | WAVEFORMS FOR COMPUTING OVER THE AIR: A GROUNDBREAKING APPROACH THAT REDEFINES DATA AGGREGATION |
| 17027 | WAVELET-AWARE ANOMALY DETECTION IN MULTI-CHANNEL USER LOGS VIA DEVIATION MODULATION AND RESOLUTION-ADAPTIVE ATTENTION |
| 5233 | WAVELET-DRIVEN SPATIAL-FREQUENCY MODULATION NETWORK FOR UNDERWATER IMAGE ENHANCEMENT |
| 18264 | WAVELETGAUSSIAN: WAVELET-DOMAIN DIFFUSION FOR SPARSE-VIEW 3D GAUSSIAN OBJECT RECONSTRUCTION |
| 12070 | WAVENEXT 2: CONVNEXT-BASED FAST NEURAL VOCODERS WITH RESIDUAL DENOISING AND SUB-MODELING FOR GAN AND DIFFUSION MODELS |
| 5326 | WAVE-PCU: WAVELET-BASED POINT CLOUD UPSAMPLING WITH HIERARCHICAL TRANSFORMERS |
| 11645 | WAVESPIKENET: A WAVELET-SPIKING FUSION ARCHITECTURE FOR AUDIO CLASSIFICATION ON EDGE DEVICES |
| 1714 | WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection |
| 14745 | WAVE-TRAINER-FIT: NEURAL VOCODER WITH TRAINABLE PRIOR AND FIXED-POINT ITERATION TOWARDS HIGH-QUALITY SPEECH GENERATION FROM SSL FEATURES |
| 19136 | WavJourney: Compositional Audio Creation With Large Language Models |
| 12540 | WAVLINK: COMPACT AUDIO–TEXT EMBEDDINGS WITH A GLOBAL WHISPER TOKEN |
| 9095 | WEATHER-R1: LOGICALLY CONSISTENT REINFORCEMENT FINE-TUNING FOR MULTIMODAL REASONING IN METEOROLOGY |
| 9511 | Weaving Time into Topics: A Neural-Dynamical Tapestry for Information Diffusion Modeling |
| 3634 | WEBEXPERT: DOMAIN-AWARE WEB AGENTS WITH CRITIC-GUIDED EXPERT EXPERIENCE FOR HIGH-PRECISION SEARCH |
| 16626 | WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent |
| 1430 | WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope |
| 13139 | WENETSPEECH-CHUAN: A LARGE-SCALE SICHUANESE CORPUS WITH RICH ANNOTATION FOR DIALECTAL SPEECH PROCESSING |
| 11920 | WGIP: LOW-LIGHT IMAGE ENHANCEMENT WITH 4D LUT BY WAVELET-GUIDED INTENSITY PRIOR |
| 15124 | WHAT IS THE RISK? EVALUATING THE IMPACT OF KNOWLEDGE DISTILLATION ON LLM VULNERABILITIES |
| 13426 | WHAT THE STUDENT LEARNS IN KNOWLEDGE DISTILLATION: A SUBSPACE VIEW AND EVIDENCE ON CONVOLUTIONAL RECURRENT NETWORK |
| 6058 | What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels |
| 10848 | WHEN AND HOW LONG DID THERAPY HAPPEN? SOFT-SUPERVISING TEMPORAL LOCALIZATION USING AUDIO-LANGUAGE MODELS |
| 16399 | WHEN AUDIO MATTERS: A LIGHTWEIGHT, HIERARCHICAL FUSION MODEL FOR SPEECH AND NON-VERBAL EMOTION RECOGNITION |
| 19029 | WHEN BAYESIAN TENSOR COMPLETION MEETS MULTIOUTPUT GAUSSIAN PROCESSES: FUNCTIONAL UNIVERSALITY AND RANK LEARNING |
| 11360 | WHEN CHILDREN TALK AND MACHINES LISTEN: TOWARD AN INTERPRETABLE SPEECH-BASED SCREENER FOR DUTCH DEVELOPMENTAL LANGUAGE DISORDER |
| 11869 | When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence |
| 2606 | WHEN LARGE VISION-LANGUAGE MODELS MEET PERSON RE-IDENTIFICATION |
| 5923 | WHEN MAMBA MEETS KAN: A HYBRID LEARNING NETWORK FOR ELECTRIC VEHICLE CHARGING DEMAND PREDICTION |
| 15731 | When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music LLMs |
| 15891 | WHEN SIGNALS BEND: CURVATURE-GUIDED SELECTIVE GRAPH REWIRING FOR FEW-SHOT BOT DETECTION |
| 7609 | WHEN SILENCE MATTERS: THE IMPACT OF IRRELEVANT AUDIO ON TEXT REASONING IN LARGE AUDIO-LANGUAGE MODELS |
| 9868 | WHEN THREE HEADS COLLABORATE: ATTENTION-DRIVEN FUSION FOR LONG-TAILED SEMI-SUPERVISED LEARNING |
| 12131 | WHEN VOICE MATTERS: A CONTROLLED STUDY OF AUDIO LLM BEHAVIOR IN CLINICAL DECISION-MAKING |
| 14457 | WHERE, NOT WHAT: COMPELLING VIDEO LLMS TO LEARN GEOMETRIC CAUSALITY FOR 3D-GROUNDING |
| 18131 | Which private attributes do VLMs agree on and predict well? |
| 10877 | Whisper with Benefits: A Unified Approach to Speech and Speaker Attribute Recognition |
| 13748 | WHISPER: COURTSIDE EDITION - ENHANCING ASR PERFORMANCE THROUGH LLM-DRIVEN CONTEXT GENERATION |
| 9917 | WHISPER-FEST: SINGLE-CHANNEL FAR-FIELD ENHANCED SPEECH-TO-TEXT WITHOUT PARALLEL DATA |
| 6489 | WHISPER-MLA: REDUCING GPU MEMORY CONSUMPTION OF ASR MODELS BASED ON MHA2MLA CONVERSION |
| 1178 | WHISPER-QF: LEVERAGING DUAL CROSS-ATTENTION Q-FORMER FOR SPEECH EMOTION RECOGNITION WITH MULTI-TASK LEARNING |
| 12711 | Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime |
| 13362 | WHO'S RELATED? FAST AND ACCURATE FAMILY RELATIONSHIP DETECTION IN CONVERSATIONS |
| 1528 | WHY DELETE? JUST MAKE IT NATURAL. MAXIMUM ENTROPY DISTRIBUTION DISTILLATION FOR LARGE LANGUAGE MODELS UNLEARNING |
| 4839 | WHY DO SPEECH LANGUAGE MODELS FAIL TO GENERATE SEMANTICALLY COHERENT OUTPUTS? A MODALITY EVOLVING PERSPECTIVE |
| 3672 | WHY TEMPORAL MODELING MODULES FALL SHORT IN TEMPORALLY SENSITIVE VIDEO-TEXT RETRIEVAL TASKS |
| 11715 | WICON: A LIGHTWEIGHT CONTINUAL LEARNING APPROACH FOR WIFI-BASED HUMAN ACTIVITY RECOGNITION VIA MASK-ADAPTIVE CLASSIFIER EXPANSION |
| 14374 | WIDEBAND DIRECTION-OF-ARRIVAL ESTIMATION THROUGH BLIND SPARSE LEAST SQUARE REGRESSION |
| 18865 | WIDEBAND DOA ESTIMATION BASED ON STOCHASTIC MAXIMUM LIKELIHOOD ESTIMATION WITH FLAT SPECTRA ASSUMPTION |
| 18889 | WIDEBAND RELATIVE TRANSFER FUNCTION (RTF) ESTIMATION EXPLOITING FREQUENCY CORRELATIONS |
| 2957 | WIDTH-ENHANCED FINE-TUNING FOR LONG-TAILED LEARNING |
| 11066 | WiFi-based Multi-user Activity Recognition via Id-Activity Decoupling |
| 2028 | WiFi-GEN: High-Resolution Indoor Imaging from WiFi Signals Using Generative AI |
| 17464 | WIFISIM: SIMULATING WIFI PROBE REQUESTS VIA AOSP ANALYSIS AND DEVICE BEHAVIOR MODELING |
| 15217 | WINDMOE: MIXTURE-OF-EXPERTS METHOD FOR WIND POWER FORECASTING UNDER EXTREME WEATHER CONDITIONS |
| 12306 | WINDOWED SUMMARYMIXING: AN EFFICIENT FINE-TUNING OF SELF-SUPERVISED LEARNING MODELS FOR LOW-RESOURCE SPEECH RECOGNITION |
| 10123 | WIRAG: RETRIEVAL-AUGMENTED GENERATION WITH LARGE LANGUAGE MODELS (LLM) FRAMEWORK FOR WIFI-BASED HUMAN ACTIVITY RECOGNITION |
| 12968 | WMOE-CLIP: WAVELET-ENHANCED MIXTURE-OF-EXPERTS PROMPT LEARNING FOR ZERO-SHOT ANOMALY DETECTION |
| 5503 | WPGST: WAVELET POOLING GROUP SWIN TRANSFORMER FOR SUPERPIXEL SEGMENTATION |
| 14385 | WRAPPER-AWARE RATE DISTORTION OPTIMIZATION IN FEATURE CODING FOR MACHINES |
| 13514 | WTRSS: UNLEASHING THE POWER OF WAVELET TRANSFORM IN RADAR SEMANTIC SEGMENTATION |
| 7578 | XAI-PRUNER: EXPLAINABILITY-DRIVEN PRUNING OF CNN AND TRANSFORMER |
| 3241 | Xi+: Uncertainty Supervision for Robust Speaker Embedding |
| 18903 | XLSR-MAMBA: A DUAL-COLUMN BIDIRECTIONAL STATE SPACE MODEL FOR SPOOFING ATTACK DETECTION |
| 6099 | XMix: Combating Extremely Noisy Labels via Local Smoothness in Self-Supervised Feature Space |
| 19043 | XPPG-PCA: Reference-Free Automatic Speech Severity Evaluation With Principal Components |
| 10047 | YOIO: Fast and Reliable Optical Flow Estimation Using Accurate and Holistic References |
| 4222 | ZEN-DARTS: MITIGATING PERFORMANCE COLLAPSE WITH SYNFLOW METRIC REGULARIZATION AND IMPROVED ARCHITECTURE PARAMETER INITIALIZATION |
| 7755 | Zero-Shot VISUAL GROUNDING in 3D Gaussians via View Retrieval |
| 15082 | ZIFORMER: TIME EFFICIENT SO(3)-EQUIVARIANT GRAPH NEURAL NETWORK FOR MOLECULAR SYSTEMS |
| 7836 | Zip Your Data: Length-Adaptive Visual Token Optimization for Efficient Multi-Modal Training |
| 9825 | ZIV-ZAKAI BOUND FOR DISTRIBUTED-ARRAY-BASED DOA ESTIMATION |
| 10062 | ZK-VSA: ZERO-KNOWLEDGE VERIFIABLE SPEAKER ANONYMIZATION LEVERAGING PHASE VOCODER WITH TIME-SCALE MODIFICATION |
| 14713 | Z-SCORES: A METRIC FOR LINGUISTICALLY ASSESSING DISFLUENCY REMOVAL |
| 16932 | ZSDA-ICM: Zero-Shot Domain Adaptation of Image Compression for Machines in Diverse Scenes |
| 8203 | ZSV2C-MLLM: Zero-Shot Visual Voice Cloning via Multimodal Large Language Models |
| 6021 | β-AVSDNET: A NOVEL END-TO-END NEURAL NETWORK ARCHITECTURE FOR AUDIO-VISUAL SPEAKER DIARIZATION |