Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://shitposter.world/objects/3aa93652-81a1-41a3-b53f-fb07f57a8b9a">That DΔrn Pooka :verified_think: (theququ@shitposter.world)'s status on Friday, 24-May-2024 06:13:51 JST</a><a href="https://shitposter.world/users/TheQuQu" title="theququ@shitposter.world"><img src="https://gnusocial.jp/theme/gnusocialjp/default-avatar-stream.png" width="48" height="48" alt="That DΔrn Pooka :verified_think:" style="position: absolute; left: 0; top: 0;">That DΔrn Pooka :verified_think:</a></section><article>I have not read the LLAVA paper but I am assuming it doesn't actually give any spatial information to the text model. It gets a lot of elements in the picture right, but can't determine their positioning. (Llama-3-8B is the model used).</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/3134515#notice-6206581">In conversation</a><time datetime="2024-05-24T06:13:51+09:00" title="Friday, 24-May-2024 06:13:51 JST">about 6 months ago</time> <span>from <span><a href="https://shitposter.world/objects/3aa93652-81a1-41a3-b53f-fb07f57a8b9a" rel="external" title="Sent from shitposter.world via ActivityPub">shitposter.world</a></span></span><a href="https://shitposter.world/objects/3aa93652-81a1-41a3-b53f-fb07f57a8b9a">permalink</a><h4>Attachments</h4><ol><li><label><a rel="external" href="https://gnusocial.jp/attachment/2682872">Untitled attachment</a></label><br><a href="https://media.shitposter.world/shitposter.club/4dbb59fac5b36a6e92fa75aa6e99a9cbe0ac7a2cebffd57f3a548f71a17be87b.png?name=D1v-Z32QdTIEtA.png" rel="external">https://media.shitposter.world/shitposter.club/4dbb59fac5b36a6e92fa75aa6e99a9cbe0ac7a2cebffd57f3a548f71a17be87b.png?name=D1v-Z32QdTIEtA.png</a></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
That DΔrn Pooka :verified_think: (theququ@shitposter.world)'s status on Friday, 24-May-2024 06:13:51 JST That DΔrn Pooka :verified_think:
I have not read the LLAVA paper but I am assuming it doesn't actually give any spatial information to the text model. It gets a lot of elements in the picture right, but can't determine their positioning. (Llama-3-8B is the model used).
In conversationabout 6 months ago from shitposter.worldpermalink
Attachments
1. Untitled attachment
  https://media.shitposter.world/shitposter.club/4dbb59fac5b36a6e92fa75aa6e99a9cbe0ac7a2cebffd57f3a548f71a17be87b.png?name=D1v-Z32QdTIEtA.png

Public

Embed Notice

HTML Code

Corresponding Notice