@Moon @kaia The base stable diffusion model has some semblance of that with it’s natural language approach to prompts (e.g. “a dog wearing a hat riding on a motorcycle”), but basically everything people have trained on top of it uses booru tags. You’d think someone would start an incentive to write descriptive texts so there’d be training data beyond those.