"This satellite constellation could have been buried fiber."
yeah, rural internet is super bad. But it would be a lot better if internet companies hadn't taken federal and consumer funds to dig new fiber lines... and then just did not.
"you should let me be your fellow, because that sounds pretty gay since I'm a girl, right? so that's a proper good thing right there. score: audrey 1, society 0. but that's not all"
"this fellowship is good for me because I will not break up due to infighting over forbidden-yet-tempting artifacts of dark power, both because 1. I am but one person, and 2. I'm told that my haircut is all the rage in Paris right now and is literally called "the hobbit". I am not even fucking with you, this is true."
@AnarchoNinaWrites@jorts.horse the level of bullshit in their opinions. I almost wish they’d just say “I don’t give a shit, I’m a Nazi fuck and i support Nazi shit so the Nazi god gets to do what they want”. It’s the same fucking outcome but with less gaslighting.
I want to fucking take these ridiculous headlines and spam the inboxes of every person I haven’t spoken to in a decade who I know voted for this shit (and who don’t like being made Sad or Feeling Bad Vibes because of it) and scream in their fucking face. Lives are being ruined, anything good the state has done is being destroyed, people are being fucked whisked of the streets as test cases for the inevitable black bagging of known citizen dissenters but yeah, sorry you don’t like talking about “bad things”, you voted for a fucking Nazi rapist fuuuuuck you
judge: that is a clear violation of the first amendment but I’m not here to rule about that(!?), also I assume (?!) I’m not supposed to stand in the way of the government on ‘foreign relations’ so deport away!”
Tech/security question for those who might be in the know: let’s say I’m designing a piece of software and I have an API where two instances of the software speak to each other.
Let’s say I only control one instance; logically, is there any type of validation sequence I could implement that would let me know whether the second instance is running a modified version that isn’t trivial to bypass in some form or another? I feel like this is DRM territory (not trying to use it that way at all) in terms of technology, which probably means “no, especially when it’s open source”. I wouldn’t want anything ridiculous like signing requirements or some garbage; I just wanna know if there’s a way to flag the other instance as running a modded version so that I have more information when making the decision to as to whether I should allow interaction with the other server.
I suspect the answer is “no”, or effectively “no” in terms of the trade off between technical complexity and payoff.
@evan@cosocial.ca@blogdiva@mastodon.social It's gonna make you look like you're a MAGA type and that's not gonna get you the attention you want, nor is it gonna make people feel safe.
Well, and even with English, 'tokenization' is a pretty poor concept for capturing meaning.*
Hmmmmmmm... perhaps what I should do, then, is to not bake in any assumptions that I'll be even be doing something like tokenization, but instead model the relationship between 'phrase' and 'document' as lightly as possible for constructing a reverse index so that I can use whatever set of tools are appropriate for that language.
* big asterisk on this part because I suspect this is a very large and very deep can of worms and I should be careful what I say here, because even what is meant by meaning itself is probably rather contexual and...
@skinnylatte@hachyderm.io@trochee@dair-community.social Now that I think about it, if you don't structure your data model on the assumption that your phrase will be "a set of tokens that are strings" matching against "tokens are that strings", that opens up the field pretty widely. If your phrase is instead an image, and your 'documents' are also images, well... (not sure that'd be easy to 'reverse index', but nothing precludes you from doing it. Even a poor attempt at 'tokenizing' images in this manner, so to speak, would likely yield some vaguely useful results).
@trochee@dair-community.social@skinnylatte@hachyderm.io (to maybe more accurately phrase my question: is tokenization a concept basically born out how text is written in Romantic/Germanic/etc languages, and is it not so appropriate to try and model certain south Asian languages with it?)
@trochee@dair-community.social@skinnylatte@hachyderm.io (I'm not a polyglot, but thankfully I'm not strictly a monoglot, either) I'm pretty much going with the idea that most assumptions I would make about language based on my knowledge aren't going to hold up, especially in languages that aren't as widely spoken or read, which is sort of where I would want to pay special attention.
Hmmmm. I wonder if there's a language that is both A. "underserved" by technical tools and B. rather difficult to tokenize? Sounds like a number of languages already fill condition B... and probably fill condition A.
Is there a better... mmm, model, either in the computational sense or otherwise, with which to approach how to break up the text? Or "in theory", could tokenization work, it's just that not enough work has been done?
@trochee@dair-community.social@skinnylatte@hachyderm.io Yeah, I'm trying to think of how I would construct a reverse index at the database/data model level, and I want to not bake in assumptions about the language at this level.
So having a quick run through of what's considered "best in class", library wise, for language handling should give me an idea of what the input and output need to look like (and explicitly, how I should store them, etc).
anyone know of a decent multi-language text tokenizer?
To be clear: I am explicitly looking to use it for non-generative-AI and other [slop/scab/labor theft] purposes.
Not sure of the specific terms I need to be looking up, frankly, since I'm mostly just finding Python's built in tokenize library which seems to be focused just on Python code.
Hi! If you have a link to an article about someone that has been kidnapped or otherwise imprisoned without trial, regardless of circumstances, would you mind dropping it here? I would like to start keeping track of who we know has been taken and when.
Thank you! Boosts VERY much appreciated. Even if you can only remember a name without a video or article about them, that’s appreciated!
@JessTheUnstill@infosec.exchange I’m really happy that D&D (and others?) did away with “forced alignment” for races. Such a ridiculously bigoted idea to start with, I’m glad that at least in some places it’s going away.
I remember seeing people post/boost some artwork online when it happened and it was stuff like a Kobold barista and it was adorable and delightful.
It also, unsurprisingly, leads to better storytelling (because the characters are free to be sentient entities who have a cultural background and make individual choices, rather than a cardboard box with a couple loose leaf pages from an 18th century race “science” book stuffed inside).
Gaaaaaaay transfemme STEM kid; scientist by training, burned out?/former? PhD candidate in chemistry. Molecular dynamics, AI, supercomputing. Formerly @astatide@tech.lgbt & @astatide@hachyderm.io Anti-capitalist.Pronouns: she/her#photography #supercomputingSince there seems to be some disagreement about whether federating with threads is a good idea or not (eyeroll), I've switched to this account. The server isn't stable and might pop up and down. That'll be normal for a bit.