Conversation
Notices
-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Sunday, 18-Feb-2024 16:08:59 JST iced depresso huh. neat.
reading this paper on sparse neural network training. they mention trying to track which parts of layers tend to be used or not.
this is what an old ukraine scientist did when he made something called GMDH. he didn't make a huge mesh and mask it out--he has it grow the network layer by layer, then prunes it, and continues.
i discovered this because some obscure mac developer uses this method in software he sells to financial forecasting professionals.-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Sunday, 18-Feb-2024 16:11:11 JST iced depresso it basically takes all input parameters and adds a whole layer of neurons. then it runs that network through your standard gradient descent stuff. it then does a 'top k' pass so only the top k number of neurons (basically lowest error in testing) is kept and the rest are trashed. then it repeats the process again, until some stop condition.
wasn't thinking about it much but thats probably very similar to what this paper is doing, albeit in a way that is different and doesn't require changing how pytorch works. -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Sunday, 18-Feb-2024 16:12:20 JST iced depresso @Vo yeah its fascinating its a deep learning sparse model from ages past. i think he invented it in the 90s or something and it was long forgotten.
one of those things that reminds you slavs are very smart actually -
Embed this notice
Vo (vo@noauthority.social)'s status on Sunday, 18-Feb-2024 16:12:21 JST Vo @icedquinn >Pruning: GME...
-
Embed this notice