Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://fediscience.org/users/UlrichJunker/statuses/110637727966813897">Ulrich Junker (ulrichjunker@fediscience.org)'s status on Saturday, 01-Jul-2023 21:30:58 JST</a><a href="https://fediscience.org/@UlrichJunker" title="ulrichjunker@fediscience.org"><img src="https://gnusocial.jp/avatar/141039-48-20230701123058.webp" width="48" height="48" alt="Ulrich Junker" style="position: absolute; left: 0; top: 0;">Ulrich Junker</a><div><a href="https://sigmoid.social/@TedUnderwood/110633037439629613" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><p><a href="https://sigmoid.social/@TedUnderwood">@TedUnderwood</a> you are referring to <a href="https://fediscience.org/tags/RLHF" rel="tag">#RLHF</a> (reinforcement learning by human feedback) as a way of correcting transformer output by human authors. But this technique also covers learning preferences from humans and this aspect hasn’t found much attention in the debate of <a href="https://fediscience.org/tags/LLMs" rel="tag">#LLMs</a>, but may rather be determining for ChatGPT’s success. What is your opinion about this? <a href="https://proceedings.neurips.cc//paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html" rel="nofollow noreferrer">https://proceedings.neurips.cc//paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html</a></p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/1706819#notice-3365119">In conversation</a><time datetime="2023-07-01T21:30:58+09:00" title="Saturday, 01-Jul-2023 21:30:58 JST">Saturday, 01-Jul-2023 21:30:58 JST</time> <span>from <span><a href="https://fediscience.org/@UlrichJunker/110637727966813897" rel="external" title="Sent from fediscience.org via ActivityPub">fediscience.org</a></span></span><a href="https://fediscience.org/@UlrichJunker/110637727966813897">permalink</a><h4>Attachments</h4><ol><li><article><header><div>No result found on File_thumbnail lookup.</div><h5><a href="https://proceedings.neurips.cc//paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html">Training language models to follow instructions with human feedback</a></h5><div></div></header><div></div><footer></footer></article></li></ol></footer></blockquote>

Corresponding Notice

Embed this notice
Ulrich Junker (ulrichjunker@fediscience.org)'s status on Saturday, 01-Jul-2023 21:30:58 JSTUlrich Junker
in reply to
- Ted Underwood
@TedUnderwood you are referring to #RLHF (reinforcement learning by human feedback) as a way of correcting transformer output by human authors. But this technique also covers learning preferences from humans and this aspect hasn’t found much attention in the debate of #LLMs, but may rather be determining for ChatGPT’s success. What is your opinion about this? https://proceedings.neurips.cc//paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html
In conversationSaturday, 01-Jul-2023 21:30:58 JST from fediscience.orgpermalink
Attachments
1. No result found on File_thumbnail lookup.
  Training language models to follow instructions with human feedback

Public

Embed Notice

HTML Code

Corresponding Notice