TP-V2.V13.7
CHINESE MANDARIN LIPREADING USING CASCADED TRANSFORMERS WITH MULTIPLE INTERMEDIATE REPRESENTATIONS
Xinghua Ma, Shilin Wang, Shanghai Jiao Tong University, China
Session:
Image & Video Interpretation and Understanding
Track:
Image and Video Analysis, Synthesis, and Retrieval
Location:
Gather.Town 13
Presentation Time:
Tue, 4 Oct, 22:00 - 23:00 China Standard Time (UTC +8)
Tue, 4 Oct, 16:00 - 17:00 Central European Time (UTC +2)
Tue, 4 Oct, 14:00 - 15:00 UTC
Tue, 4 Oct, 10:00 - 11:00 Eastern Time (UTC -4)
Tue, 4 Oct, 16:00 - 17:00 Central European Time (UTC +2)
Tue, 4 Oct, 14:00 - 15:00 UTC
Tue, 4 Oct, 10:00 - 11:00 Eastern Time (UTC -4)
Session Co-Chairs:
Jean-Christophe Pesquet, CentraleSupélec and Andrea Cavallaro, Queen Mary University of London and Rebecca Willett, University of Chicago
Presentation
Discussion
Resources
No resources available.
Session TP-V2.V13
TP-V2.V13.1: SCINET: SEMANTIC CUE INFUSION NETWORK FOR LANE DETECTION
Hao Yang, Xiamen University, China; Shuyuan Lin, Jinan University, China; Lin Cheng, Yang Lu, Hanzi Wang, xiamen University, China
TP-V2.V13.2: GCN-BASED MULTI-MODAL MULTI-LABEL ATTRIBUTE CLASSIFICATION IN ANIME ILLUSTRATION USING DOMAIN-SPECIFIC SEMANTIC FEATURES
Ziwen Lan, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
TP-V2.V13.3: CSTNet: Enhancing Global-to-Local Interactions for Image Captioning
Xin Yang, Ying Wang, Haishun Chen, Jie Li, Xidian Unversity, China
TP-V2.V13.4: ATTRIBUTE CONDITIONED FASHION IMAGE CAPTIONING
Chen Cai, Kim-Hui Yap, Suchen Wang, Nanyang Technological University, Singapore, Singapore
TP-V2.V13.5: THE BRIO-TA DATASET: UNDERSTANDING ANOMALOUS ASSEMBLY PROCESS IN MANUFACTURING
Kosuke Moriwaki, Gaku Nakano, Tetsuo Inoshita, NEC Corporation, Japan
TP-V2.V13.6: CONTEXT RELATION FUSION MODEL FOR VISUAL QUESTION ANSWERING
Haotian Zhang, Wei Wu, Inner Mongolia University, China
TP-V2.V13.7: CHINESE MANDARIN LIPREADING USING CASCADED TRANSFORMERS WITH MULTIPLE INTERMEDIATE REPRESENTATIONS
Xinghua Ma, Shilin Wang, Shanghai Jiao Tong University, China
TP-V2.V13.8: Coupling Attention and Convolution for Heuristic Network in Visual Dialog
Zefan Zhang, Tianling Jiang, Chunping Liu, Yi Ji, Soochow University, China
TP-V2.V13.9: DETECTION-IDENTIFICATION BALANCING MARGIN LOSS FOR ONE-STAGE MULTI-OBJECT TRACKING
Heansung Lee, Suhwan Cho, Sungjun Jang, Jungho Lee, Sungmin Woo, Sangyoun Lee, Yonsei University, Korea, Republic of
TP-V2.V13.10: IMPROVING ROBUSTNESS TO OUT-OF-DISTRIBUTION DATA BY FREQUENCY-BASED AUGMENTATION
Koki Mukai, Soichiro Kumano, Toshihiko Yamasaki, The University of Tokyo, Japan
TP-V2.V13.11: Adversarial Training of Anti-Distilled Neural Network with Semantic Regulation of Class Confidence
Zi Wang, Chengcheng Li, Husheng Li, University of Tennessee, Knoxville, United States of America
TP-V2.V13.12: Polygon-free: Unconstrained Scene Text Detection with Box Annotations
Weijia Wu, Hong Zhou, Zhejiang University, China; Enze Xie, Ping Luo, The University of Hong Kong, China; Ruimao Zhang, The Chinese University of Hong Kong, Shenzhen China, China; Wenhai Wang, Shanghai Artificial Intelligence Laboratory, China
TP-V2.V13.13: ADDING NON-LINEAR CONTEXT TO DEEP NETWORKS
Michele Covell, David Marwood, Shumeet Baluja, Google, Inc, United States of America
TP-V2.V13.14: Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Motonari Kambara, Komei Sugiura, Keio University, Japan
TP-V2.V13.15: NON-ITERATIVE OPTIMIZATION OF PSEUDO-LABELING THRESHOLDS FOR TRAINING OBJECT DETECTION MODELS FROM MULTIPLE DATASETS
Yuki Tanaka, Shuhei Yoshida, Makoto Terao, NEC Corporation, Japan