FR4.R7.1

DIST-CURE: A Robust Distributed Learning Algorithm with Cubic Regularized Newton

Avishek Ghosh, Indian Institute of Technology, Bombay, India; Raj Kumar Maity, University of Massachusetts Amherst, United States; Arya Mazumdar, University of California, San Diego, United States

Session:
Distributed Learning

Track:
15: Distributed and Federated Learning

Location:
VIP

Presentation Time:
Fri, 12 Jul, 16:25 - 16:45

Session Chair:
Randall Berry, Northwestern University
Abstract
The problem of saddle-points avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks. The celebrated cubic-regularized Newton method of Nesterov and Polyak \cite{nest} is one of the most elegant algorithms to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we analyze the cubic-regularized Newton method in the distributed framework and simultaneously address several practical challenges that naturally arises, such as communication bottleneck and Byzantine attacks. To that end, we propose DISTributed CUbic REgularized Newton's method (\texttt{DIST-CURE}), and obtain convergence guarantees under several settings. We emphasize that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create \emph{fake local minima} near the saddle-points of the loss function (this is known as the saddle-point attack). Being a second order algorithm, the iteration complexity of \texttt{DIST-CURE} is much lower than its first order counterparts, and furthermore we can further compress to achieve communication efficiency. To address the challenge of Byzantine resilience, we employ norm based thresholding on the local solutions. We validate the performance of \texttt{DIST-CURE} with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in total iteration complexity.
Resources