Conversation
Notices
-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 15:33:40 JST iced depresso
i think numenta got close to an agi and then got very quiet. either that or the budget ran out idk. they were very open about their cortical research, with one of the last papers being training one to recognize objects by touch (which is basically a form of hard attention,) and then suddenly got very clammed up and just talk about selling standard ML services now.
:neocat_what:-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 16:00:03 JST iced depresso
@s8n its the lack of sparsity.
cortical models have an inhibition mechanism that attempts to mirror how human brains do it. where columns basically fight over potential/depotentials when creating the paths to fire
they did another paper on doing sparsity on traditional ML models and they are cheaper and better to run that way. a different group made Top-Kast that takes the idea and makes it easier to slap in to pytorch.
basically densenets are crap and wrong -
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Wednesday, 18-Sep-2024 16:00:04 JST THOT POLICE
@icedquinn nobody has solved the context problem, all the AIs are still limited by an infinitesimal amount of memory -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 16:20:43 JST iced depresso
@s8n numenta was interesting because their models are based on actual neurology.
there is an input phase where sensory data is converted to sparse bit field activations
these are all concatenated to create the engram of a sensory moment
those are then connected to cortical columns (which human brains use) with weights, initially the connections are randomized, but it's not a full densenet.
given a sensory moment you run forward, then they crumple the result to a new engram bit field (taking only the top N highest voters)
the result is an engram (sparse bit field) which is used in various ways to make the decision for the model, where updates are filtered back with hebbian learning and only columns involved in a decision are subject to updates
thus it develops contextualized memory because different inputs fire different columns and with similar inputs you get similar columns active, matching some observations and theories about the brain having a universal storage format which is also capable of universal comparison
in numenta's case its just a bitwise intersection between engrams, and the more bits overlap the more similar it is.
they have other papers talking about how this model held up to fuzziness and damage pretty well. to the point they actually just use random subsets as a form of compression (ex. limiting an engram to say 20 random bits that were active, which enacts a kind of fuzzy learning the immune system is known to use) -
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Wednesday, 18-Sep-2024 16:20:44 JST THOT POLICE
@icedquinn I don't know much about the theory, just that the path is effectively a traversal through nodes in a matrix and each node is associated with some kind of action. I'm rather familiar with the end-to-end behavior of the entire system and how it responds to alterations in subsystems, like what changes by how much when one has more or less context token ability in an LLM or how likely a given model is to produce an erroneous result for a given string based on how strong an attention guidance factor you have. The comparisons between the system and the human brain are lost on me -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 16:22:26 JST iced depresso
@s8n to my knowledge their model was demoed as a very effective anomaly detector (its how i found out about them, through a foss project doinking their model to train those)
there were a couple soft experiments here and there but i don't think anyone other than cortical.io (and maybe some bankers being very quiet) have done much with the work. none of the people on the forums put them in a closed loop to see what they would do.
the damn things were capable of real-time learning though which nothing else is. -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 16:57:03 JST iced depresso
@s8n who knows. i have their papers stashed on a tablet, just haven't gotten around to re-reading them all.
i'm still highly curious what happens if the system is put in to a closed loop. especially since the sensorimotor paper seems very much like hard attention, meaning they figured out how to teach the damn thing to navigate spaces. -
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Wednesday, 18-Sep-2024 16:57:04 JST THOT POLICE
@icedquinn they probably made a deal with one of the bigger companies to trade that tech away -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 17:01:53 JST iced depresso
@s8n my brain is too zonked right now to contemplate how this knowledge could be used with mamba networks somehow.
i remember seeing those months ago and state space models were crushing LLMs in some ways.
i'm not sure if these could be merged in some meaningful way to make a language machine capable of real-time learning :cirno_what: -
Embed this notice
picofarad (picofarad@noauthority.social)'s status on Wednesday, 18-Sep-2024 17:25:46 JST picofarad
@icedquinn @s8n https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf
i'll just head off to bed now
iced depresso likes this. -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Wednesday, 18-Sep-2024 17:26:50 JST iced depresso
@picofarad @s8n i guess, but they were more trying to emulate the hardware than anything. they were still trained with compute and data its just the structure of it all is based on physical observations of a system known to do exactly the thing -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Thursday, 19-Sep-2024 03:05:02 JST iced depresso
@picofarad @s8n there was some work on sparse compute units but it's really hard to get your own chips made.
I remember reading a couple companies were looking in to making them because they are significantly more efficient to run, more biologically plausible, and handle context better.
GPUs push around huge numbers but demos show we don't actually need bigger than 8 bit weights most of the time. -
Embed this notice
picofarad (picofarad@noauthority.social)'s status on Thursday, 19-Sep-2024 03:05:03 JST picofarad
@s8n @icedquinn the 3090 is the best consumer card available, for sure, but i am biased, i have one.
I know a lot of places were just racking RTX titans 10 years ago for "AI" so i'm thinking along those lines - consumer GPUs bought because L40/H100 are "too hard to get"
Also my ranking of cards is like 3090[ |ti], 2080ti, 3060 12gb. everything else is "meh" for performance/watt/dollar, and if you only care about games the 3060 is the sweet spot for FHD gaming.
-
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Thursday, 19-Sep-2024 03:05:04 JST THOT POLICE
@picofarad @icedquinn that happens pretty rarely, there are few golden devices in the wild. I remember the xeon 5690s that were turned down from datacenters were insanely valuable wrt price / performance, but I don't remember another chip like that in recent memory
it's more likely those gpus will be mid and similar value to new parts. Especially considering the fact that they're ramping up right now with 40 series cards, and the 40 series is known to have a pretty hefty power draw in watts. imho the 3090 is a better product than the 4090 -
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Thursday, 19-Sep-2024 03:05:05 JST THOT POLICE
@picofarad @icedquinn this is the reason Urs Hoezle refused to deploy video cards in Google datacenters until the company essentially died in 2018. Considering the time and effort it would take to deploy that solution, and considering the rate of advancement in CPU technology, there has never been a point in history (up to and including now) in which GPU computation in the datacenter has made sense
Tech companies know that AI is the last scam they will be able to pull on investors before the next economic crash and are deploying these resources to fleece investors who are trying to gold rush AI technology. -
Embed this notice
picofarad (picofarad@noauthority.social)'s status on Thursday, 19-Sep-2024 03:05:05 JST picofarad
@s8n @icedquinn think of all the cheap-ass GPUs we'll be able to get after the crash though - i know so many children and low-income adults will love to have a decent computer for the first time in their lives!
-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Thursday, 19-Sep-2024 03:06:58 JST iced depresso
@picofarad @s8n I think this comes down to its easier for sociopaths to buy more metal than it is to cultivate more talent. Some of these systems are proven to be universal approximates and they are rich enough to just slam entire data centers at the problem. They don't have Kurzweils that actually make the bandpass filters to model a cochlea.
Partly why I don't care to read the papers anymore. They're too compute intensive for self learning. A brain is able to configure itself on only a 50W power supply. -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Thursday, 19-Sep-2024 03:10:55 JST iced depresso
@picofarad @s8n I wouldn't mind a huge graph search framework though. I posted about wanting one of those for non-AI related strategy solving where you describe changes in world state with a simple function and it uses it to just grind out huge graphs.
TLA+ kind of does this and they use it to verify models about CPUs and high stakes systems now.
There are some interesting applications to distributed graph searches where it comes to just making little models of agents and trying to find how to manage them -
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Thursday, 19-Sep-2024 03:55:27 JST iced depresso
@picofarad @s8n if it was i don't remember it -
Embed this notice
picofarad (picofarad@noauthority.social)'s status on Thursday, 19-Sep-2024 03:55:28 JST picofarad
@icedquinn @s8n was it you that said "they're using their wealth to get skills so that the skilled can't get wealth"?
-
Embed this notice
iced depresso (icedquinn@blob.cat)'s status on Thursday, 19-Sep-2024 03:59:11 JST iced depresso
@s8n @picofarad kurzweil wrote about it in how to create a mind. they used bandpass filters to emulate a cochlea and then quantized the filters to a codebook so they could use markov models in dragon speech.
the reason dragon speech worked so well is because they tuned it with evolutionary solvers (gradient descent wasn't hip yet) instead of expectation-maximization
E-M works but is kind of bad which is why the TTS voices sound machiney (its just averaging a lot of states together.) somebody went and re-did the old mixture model tests with gradient descent and found that it was doing just fine.
(dragon was also capable of adaption, which none of the google-shit tech does.) -
Embed this notice
THOT POLICE (s8n@posting.lolicon.rocks)'s status on Thursday, 19-Sep-2024 03:59:12 JST THOT POLICE
@icedquinn @picofarad that wasn't kurzweil I think it was Richard Lyon
-
Embed this notice