*neural-network based transformation of audio to text is awesome (but notably not an LLM or text generation)