Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
That DΔrn Pooka :verified_think: (theququ@shitposter.world)'s status on Friday, 24-May-2024 06:13:51 JSTThat DΔrn Pooka :verified_think: I have not read the LLAVA paper but I am assuming it doesn't actually give any spatial information to the text model. It gets a lot of elements in the picture right, but can't determine their positioning. (Llama-3-8B is the model used).