heeeeeey #python cats!
anyone know of a decent multi-language text tokenizer?
To be clear: I am explicitly looking to use it for non-generative-AI and other [slop/scab/labor theft] purposes.
Not sure of the specific terms I need to be looking up, frankly, since I'm mostly just finding Python's built in tokenize library which seems to be focused just on Python code.
Thank you!
#techPosting
Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
Asta [AMP] (aud@fire.asta.lgbt)'s status on Thursday, 03-Apr-2025 14:10:37 JST Asta [AMP]