Online Learning Under a Separable Stochastic Approximation Framework - Details

author：

Gan, Min (Gan, Min.) ^[1] | Su, Xiang-xiang (Su, Xiang-xiang.) ^[2] | Chen, Guang-yong (Chen, Guang-yong.) ^[3] (Scholars：陈光永) | Chen, Jing (Chen, Jing.) ^[4] | Chen, C. L. Philip (Chen, C. L. Philip.) ^[5]

Indexed by：

EI Scopus SCIE

Abstract：

We　propose　an　online　learning　algorithm　tailored　for　a　class　of　machine　learning　models　within　a　separable　stochastic　approximation　framework.　The　central　idea　of　our　approach　is　to　exploit　the　inherent　separability　in　many　models,　recognizing　that　certain　parameters　are　easier　to　optimize　than　others.　This　paper　focuses　on　models　where　some　parameters　exhibit　linear　characteristics,　which　are　common　in　machine　learning　applications.　In　our　proposed　algorithm,　the　linear　parameters　are　updated　using　the　recursive　least　squares　(RLS)　algorithm,　akin　to　a　stochastic　Newton　method.　Subsequently,　based　on　these　updated　linear　parameters,　the　nonlinear　parameters　are　adjusted　using　the　stochastic　gradient　method　(SGD).　This　dual-update　mechanism　can　be　viewed　as　a　stochastic　approximation　variant　of　block　coordinate　gradient　descent,　where　one　subset　of　parameters　is　optimized　using　a　second-order　method　while　the　other　is　handled　with　a　first-order　approach.　We　establish　the　global　convergence　of　our　online　algorithm　for　non-convex　cases　in　terms　of　the　expected　violation　of　first-order　optimality　conditions.　Numerical　experiments　demonstrate　that　our　method　achieves　significantly　faster　initial　convergence　and　produces　more　robust　performance　compared　to　other　popular　learning　algorithms.　Additionally,　our　algorithm　exhibits　reduced　sensitivity　to　learning　rates　and　outperforms　the　recently　proposed　slimTrain　algorithm　(Newman　et　al.　2022).　For　validation,　the　code　has　been　made　available　on　GitHub.

Keyword：

Approximation algorithms Artificial neural networks Convergence Convex functions Machine learning Machine learning algorithms Minimization Online learning Optimization recursive least squares stochastic approximation Stochastic processes Training variable projection

Community：

[ 1 ] [Gan, Min]Qingdao Univ, Inst Future, Qingdao 266071, Peoples R China
[ 2 ] [Gan, Min]Coll Comp Sci & Technol, Qingdao 266071, Peoples R China
[ 3 ] [Su, Xiang-xiang]Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350108, Peoples R China
[ 4 ] [Chen, Guang-yong]Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350108, Peoples R China
[ 5 ] [Su, Xiang-xiang]Univ Fujian, Fujian Key Lab Network Comp & Intelligent Informat, Key Lab Intelligent Metro, Fuzhou 350007, Peoples R China
[ 6 ] [Chen, Guang-yong]Univ Fujian, Fujian Key Lab Network Comp & Intelligent Informat, Key Lab Intelligent Metro, Fuzhou 350007, Peoples R China
[ 7 ] [Su, Xiang-xiang]Minist Educ, Engn Res Ctr Big Data Intelligence, Beijing 100101, Peoples R China
[ 8 ] [Chen, Guang-yong]Minist Educ, Engn Res Ctr Big Data Intelligence, Beijing 100101, Peoples R China
[ 9 ] [Chen, Jing]Jiangnan Univ, Sch Sci, Wuxi 214122, Peoples R China
[ 10 ] [Chen, C. L. Philip]Qingdao Univ, Coll Comp Sci & Technol, Qingdao 266071, Peoples R China
[ 11 ] [Chen, C. L. Philip]South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China