@meso they're trying to work out a way to ban self-hosted AI but they can't so far because anybody with a good GPU can do it. But they're thinking hard.
@Moon@meso The amount of backpedaling and attempting to put the genie back in the bottle is insane, but not surprising. Seeing Tim Berners-Lee say if he could do the internet over again he'd make it easier to censor tells me everything I need to know about where tech is today.
@meso there are youtube videos that walk through every step. I kind of muddled through it. I'm generating cute girls right now but I am soon gonna try to generate text like stories and stuff.
@Moon@meso I tried to get chatGTP to draw ascii art of a cat with an ampersand in its mouth representing a mouse-- It refused until I told it ampersands represented cinnamon rolls.
@Moon@Christmas_Man@meso This is the guy making 4bit quantized models for home use: https://huggingface.co/TheBloke GPTQ models are for GPU based inference, GGML are for CPU based inference (though you can get speed boost from moving some of the load on your GPU).
With 24Gb VRAM, you can run GPTQ 13b to 20b models with room to spare for extended (over 2048 token) context and keeping Stable Diffusion loaded at the same time. Or you are supposed to be just about able to run 30b models with 2048 context on a headless linux machine. Expect double digit tokens per second. Answers will pop up in seconds.
With GGML models your RAM is going to be your limit, and speed is going to depend on CPU, GPU, RAM speed and how much you can offload to GPU/VRAM. But in general it's likely to be MUCH slower than GPTQ, but if you're running as big a model as you can fit in your machine, expect single digit tokens per second. Expect to wait sometimes over a minute for an answer. Sometimes it's worth it, sometimes not. I've heard people say that the returns from 30b to 70b are quite a bit diminished (ie: it's not really noticeably smarter, just different).
@JAJAX@Moon@mesohttps://irishtechnews.ie/world-wide-web-founder-sir-tim-berners-lee-wants-google-and-facebook-to-tackle-fake-news/ https://www.wired.com/story/tim-berners-lee-world-wide-web-anniversary/ I saw it a while ago where he said he would have made it harder for things he doesn't like if he could do WWW over, but I can't find the link. I don't bookmark stuff because that was one of malware's favorite things to skim back in the day. Reading this he's talking out of both sides of his mouth. "I want data privacy (Google and Facebook end up poor) but I want Google and Facebook to work harder to 'combat misinformation' (with their ad money I took away, but this isn't going to inhibit free speech, no sir) but I want the web free and open to all (even though any political speech to the right of Ben Shapiro is viewed as misinformation and abuse by Big Tech and governments, and from what I can see, also Tim Berners-Lee)". In retrospect he's implicitly pro-censorship in that he doesn't understand the real-life consequences of his ideals.
@mrsaturday@Moon@meso The real problem with people like TBL is they're starting to realize that there's no real incentive for average people to dedicate their lives to the universal equity and brotherhood ideal, especially when it requires them to give up their own hard-won standard of living. So they've drawn back on all their ideas of freedom of speech etc, and they're gonna try to force people to behave how they think they ought to instead.