MLSP-P40.7: HILO: HIERARCHICAL FEATURE FUSION VIA LOCAL-GLOBAL ATTENTION FOR MULTIMODAL EMBEDDINGS
Xinmeng Zuo, School of Computer Science and Technology, Xi’an Jiaotong University / Institute of Artificial Intelligence (TeleAI), China Telecom, China; Jiang Liu, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University / Institute of Artificial Intelligence (TeleAI), China Telecom, China; Tieliang Gong, School of Computer Science and Technology, Xi’an Jiaotong University, China; Zhongjiang He, Institute of Artificial Intelligence (TeleAI), China Telecom, China; Weizhan Zhang, School of Computer Science and Technology, Xi’an Jiaotong University, China; jingfeng chen, Carnegie Mellon University, China; Hao Sun, Institute of Artificial Intelligence (TeleAI), China Telecom, China