Well, the most obvious use case is an automatic caption generator, with a variety of details in the description, in multiple languages, and the ability to ask additional questions if a person doesn't understand something from the original captions.
The models with such capabilities already exist, but their accessible integration into mainstream software and operating systems raises questions. Perhaps corpos will address this in some way 🤷♀️