当前位置:首页>学术报告
张驰浩 博士:Towards Understanding the Terminal Phase of Training of Deep Neural Networks
浏览次数:

 
Title:
Towards Understanding the Terminal Phase of Training of Deep Neural Networks
Speaker:
张驰浩 博士,日本东京大学
Inviter: 张世华 研究员
Time & Venue:

2021.10.28 8:00 N625

Abstract:

Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Vardan Papyan et al. characterizes the TPT as Neural Collapse (NC), involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame(ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class-Center (NCC) decision rule. However, the NC described by Vardan Papyan et al. focuses on the behaviors of the last layer of deepnets; the behaviors of the deepnets' intermediate layers is still unclear. In this talk, I will briefly introduce the NC phenomena and discuss the future direction towards understanding the TPT of deepnets by investigating the behaviors of the intermediate layers.

Affiliation:  

学术报告中国科学院数学与系统科学研究院应用数学研究所
地址 北京市海淀区中关村东路55号 思源楼6-7层 南楼5-6、8层 100190
?2000-2013 京ICP备05058656号