MMSP-P2.8

EMOTALKER: EMOTIONALLY EDITABLE TALKING FACE GENERATION VIA DIFFUSION MODEL

Bingyuan Zhang, Xulong Zhang, Ning Cheng, Ping An Technology (Shenzhen) Co., Ltd., China; Jun Yu, University of Science and Technology of China, Hefei, China, China; Jing Xiao, Jianzong Wang, Ping An Technology (Shenzhen) Co., Ltd., China

Session:
MMSP-P2: Multimedia Generation and Synthesis Poster

Track:
Multimedia Signal Processing

Location:
Poster Zone 5A
Poster Board PZ-5A.8

Presentation Time:
Tue, 16 Apr, 16:30 - 18:30 (UTC +9)

Session Chair:
Jongyoo Kim, Yonsei University, South Korea
View Manuscript
Presentation
Discussion
Resources
Session MMSP-P2
MMSP-P2.1: ENHANCING REALISM IN 3D FACIAL ANIMATION USING CONFORMER-BASED GENERATION AND AUTOMATED POST-PROCESSING
Yi Zhao, Chunyu Qiang, Hao Li, Kuaishou, China; Yulan Hu, Renmin University of China, China; Wangjin Zhou, Kyoto University, Japan; Sheng Li, NICT, Japan, Japan
MMSP-P2.2: SimFall: A Data Generator For RF-based Fall Detection
Jiamu Li, Dongheng Zhang, Qi Chen, Yadong Li, Jianyang Wang, Wenxuan Li, Yang Hu, Qibin Sun, Yan Chen, University of Science and Technology of China, China
MMSP-P2.3: TALKING FACE GENERATION FOR IMPRESSION CONVERSION CONSIDERING SPEECH SEMANTICS
Saki Mizuno, Nobukatsu Hojo, Kazutoshi Shinoda, Keita Suzuki, Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Naotaka Kawata, Satoshi Kobashikawa, Ryo Masumura, NTT Corporation, Japan
MMSP-P2.4: Visually Guided Binaural Audio Generation with Cross-modal Consistency
Miao Liu, Jing Wang, Beijing Institute of Technology, China; Xinyuan Qian, University of Science and Technology Beijing, China; Xiang Xie, Beijing Institute of Technology, China
MMSP-P2.5: BINAURALMUSIC: A DIVERSE DATASET FOR IMPROVING CROSS-MODAL BINAURAL AUDIO GENERATION
Yunqi Li, School of Data Science and Intelligent Media, Communication University of China, Beijing 100024, China, China; Shulin Liu, School of Information and Communication Engineering, Communication University of China, Beijing 100024, China, China; Haonan Cheng, Long Ye, State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China, China
MMSP-P2.6: ENHANCING SPATIAL AUDIO GENERATION WITH SOURCE SEPARATION AND CHANNEL PANNING LOSS
Wootaek Lim, KAIST, ETRI, Korea, Republic of; Juhan Nam, KAIST, Korea, Republic of
MMSP-P2.7: TEXT-DRIVEN TALKING FACE SYNTHESIS BY REPROGRAMMING AUDIO-DRIVEN MODELS
Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro, Korea Advanced Institute of Science and Technology, Korea, Republic of
MMSP-P2.8: EMOTALKER: EMOTIONALLY EDITABLE TALKING FACE GENERATION VIA DIFFUSION MODEL
Bingyuan Zhang, Xulong Zhang, Ning Cheng, Ping An Technology (Shenzhen) Co., Ltd., China; Jun Yu, University of Science and Technology of China, Hefei, China, China; Jing Xiao, Jianzong Wang, Ping An Technology (Shenzhen) Co., Ltd., China
MMSP-P2.9: SPEECH-DRIVEN EMOTIONAL 3D TALKING FACE ANIMATION USING EMOTIONAL EMBEDDINGS
Seongmin Lee, Jeonghaeng Lee, Hyewon Song, Sanghoon Lee, Yonsei University, Korea, Republic of
MMSP-P2.10: PHISANET: PHONETICALLY INFORMED SPEECH ANIMATION NETWORK
Salvador Medina, Sarah Taylor, Carsten Stoll, Epic Games, United States of America; Gareth Edwards, Cubic Motion, United Kingdom of Great Britain and Northern Ireland; Alex Hauptmann, Shinji Watanabe, Carnegie Mellon University, United States of America; Iain Matthews, Epic Games, United States of America
MMSP-P2.11: ENHANCING IMAGE-TEXT MATCHING WITH ADAPTIVE FEATURE AGGREGATION
Zuhui Wang, Yunting Yin, I.V. Ramakrishnan, State University of New York at Stony Brook, United States of America
Contacts