@evan@cosocial.ca @blogdiva@mastodon.social It's gonna make you look like you're a MAGA type and that's not gonna get you the attention you want, nor is it gonna make people feel safe.
Notices by Asta [AMP] (aud@fire.asta.lgbt)
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Saturday, 05-Apr-2025 12:02:42 JST Asta [AMP]
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:42:16 JST Asta [AMP]
@skinnylatte@hachyderm.io @trochee@dair-community.social haaaad a feeling, hah.
Well, and even with English, 'tokenization' is a pretty poor concept for capturing meaning.*
Hmmmmmmm... perhaps what I should do, then, is to not bake in any assumptions that I'll be even be doing something like tokenization, but instead model the relationship between 'phrase' and 'document' as lightly as possible for constructing a reverse index so that I can use whatever set of tools are appropriate for that language.
* big asterisk on this part because I suspect this is a very large and very deep can of worms and I should be careful what I say here, because even what is meant by meaning itself is probably rather contexual and... -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:42:15 JST Asta [AMP]
@skinnylatte@hachyderm.io @trochee@dair-community.social Now that I think about it, if you don't structure your data model on the assumption that your phrase will be "a set of tokens that are strings" matching against "tokens are that strings", that opens up the field pretty widely. If your phrase is instead an image, and your 'documents' are also images, well... (not sure that'd be easy to 'reverse index', but nothing precludes you from doing it. Even a poor attempt at 'tokenizing' images in this manner, so to speak, would likely yield some vaguely useful results).
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:31:25 JST Asta [AMP]
@trochee@dair-community.social @skinnylatte@hachyderm.io (to maybe more accurately phrase my question: is tokenization a concept basically born out how text is written in Romantic/Germanic/etc languages, and is it not so appropriate to try and model certain south Asian languages with it?)
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:31:25 JST Asta [AMP]
@trochee@dair-community.social @skinnylatte@hachyderm.io (I'm not a polyglot, but thankfully I'm not strictly a monoglot, either) I'm pretty much going with the idea that most assumptions I would make about language based on my knowledge aren't going to hold up, especially in languages that aren't as widely spoken or read, which is sort of where I would want to pay special attention.
Hmmmm. I wonder if there's a language that is both A. "underserved" by technical tools and B. rather difficult to tokenize? Sounds like a number of languages already fill condition B... and probably fill condition A.
Is there a better... mmm, model, either in the computational sense or otherwise, with which to approach how to break up the text? Or "in theory", could tokenization work, it's just that not enough work has been done? -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:27:45 JST Asta [AMP]
@skinnylatte@hachyderm.io ooooh, this looks extremely promising and probably exactly what I need! Thank you!
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:27:41 JST Asta [AMP]
@trochee@dair-community.social @skinnylatte@hachyderm.io Yeah, I'm trying to think of how I would construct a reverse index at the database/data model level, and I want to not bake in assumptions about the language at this level.
So having a quick run through of what's considered "best in class", library wise, for language handling should give me an idea of what the input and output need to look like (and explicitly, how I should store them, etc). -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:10:37 JST Asta [AMP]
heeeeeey #python cats!
anyone know of a decent multi-language text tokenizer?
To be clear: I am explicitly looking to use it for non-generative-AI and other [slop/scab/labor theft] purposes.
Not sure of the specific terms I need to be looking up, frankly, since I'm mostly just finding Python's built in tokenize library which seems to be focused just on Python code.
Thank you!
#techPosting -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 02:57:13 JST Asta [AMP]
@c0dec0dec0de@hachyderm.io @puppygirlhornypost2@transfem.social @dalias@hachyderm.io @inthehands@hachyderm.io @carstenfranke@mastodon.social Tesla has fixed a huge social problem! Yeah, there’s no such thing as the bystander effect with a Tesla, because there’s literally no way to help you out of a burning Tesla.
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 02:57:06 JST Asta [AMP]
@c0dec0dec0de@hachyderm.io @puppygirlhornypost2@transfem.social @dalias@hachyderm.io @inthehands@hachyderm.io @carstenfranke@mastodon.social A Tesla is the safest car in the world to burn alive in… for all the people watching you from the outside.
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Monday, 31-Mar-2025 08:43:06 JST Asta [AMP]
Hi! If you have a link to an article about someone that has been kidnapped or otherwise imprisoned without trial, regardless of circumstances, would you mind dropping it here? I would like to start keeping track of who we know has been taken and when.
Thank you! Boosts VERY much appreciated. Even if you can only remember a name without a video or article about them, that’s appreciated!
EDIT: a hashtag would be super useful for this kind of stuff too, I think. Any ideas? #USDisappearance? #YouShouldHaveTheBody? Anything really 😅
EDIT: #NoDueProcess -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Friday, 28-Mar-2025 04:12:43 JST Asta [AMP]
@JessTheUnstill@infosec.exchange I’m really happy that D&D (and others?) did away with “forced alignment” for races. Such a ridiculously bigoted idea to start with, I’m glad that at least in some places it’s going away.
I remember seeing people post/boost some artwork online when it happened and it was stuff like a Kobold barista and it was adorable and delightful.
It also, unsurprisingly, leads to better storytelling (because the characters are free to be sentient entities who have a cultural background and make individual choices, rather than a cardboard box with a couple loose leaf pages from an 18th century race “science” book stuffed inside). -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Saturday, 22-Mar-2025 08:41:42 JST Asta [AMP]
@puppygirlhornypost2@transfem.social jesus fucking Christ, I looked at a single file (config) for two seconds and it iterates through every environment variable looking for anything that starts with JASEUR and tries to fucking parse it
fucking god, segfault and code injection city -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 20-Mar-2025 02:29:26 JST Asta [AMP]
“Why would you want to do that, Audrey?” well, couple reasons that it might be interesting (but probably not worth it and complicated as fuck):
1. Reverse engineering of proprietary stacks? Not sure how possible or relevant/useful this would be
2. Shit like QEMU VirGL? If you could emulate a device to pass through, in theory you could install the normal compute and driver stack but handle it however you want on the host side?
3. Opening up usage of CUDA on non-CUDA devices?
4. Curiosity
No idea how realistic any of that is and whether it’s worth it, but it might be interesting to see if anything exists out there. And if it did allow more flexibility in running software with hardware options, that would be neat. -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 20-Mar-2025 02:29:17 JST Asta [AMP]
video card emulators: are those a thing? I wonder if anyone has tried to emulate an NVIDIA card…
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 20-Mar-2025 00:23:37 JST Asta [AMP]
@Taweret@timeloop.cafe wow, how can a poll be so wrong.
buncha frog haters -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Saturday, 15-Mar-2025 04:49:28 JST Asta [AMP]
@xgranade@wandering.shop “we have created a system with 43,200 precisely broken clocks which are stochastically rotated around to show you a clock when you want to check the time. The degree of rotation (and subsequent clock you are shown) is based on how tired you sound when you ask what time it is. This is much more efficient than a non broken clock and has nothing to do with my clock making business (that is made even cheaper since I don’t have to make working ones, just ones with a face).”
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 13-Mar-2025 12:04:33 JST Asta [AMP]
@jalcine@todon.eu I made a zero context/explanation post about ChatGPT being fascist shit the other week and I stand by it.
I am sure I will eventually expand on that point in a more… verbose manner. But it is a propaganda tool that can be pressed into service as what they think a simulacrum should be, all without a human being. It does nothing efficiently or well, but when your purpose is simply “get rid of others”… well. -
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 13-Mar-2025 03:49:54 JST Asta [AMP]
@mntmn@mastodon.social this looks like the old external modems (a super hot version of them anyway) and I looooove it 🖤
-
Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 06-Mar-2025 17:46:02 JST Asta [AMP]
@skinnylatte@hachyderm.io … understand why I have unhoused neighbors. Oh, I really like the way you wrote this; “unhoused neighbors”. Because that’s exactly they are; your neighbors. I think I am going to use that wording from now on.