GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Untitled attachment

Download link

Notices where this attachment appears

  1. Embed this notice
    Tero Keski-Valkama (tero@rukii.net)'s status on Saturday, 27-Jul-2024 07:57:06 JST Tero Keski-Valkama Tero Keski-Valkama

    How to refine data for #LLMs? What does it mean that the data has high quality?

    It's not about the data having fewer typos, or less wrong answers. Unless you are training a trivia bot.

    The power of LLMs comes from them modelling the latent processes behind the task trajectories, the data, especially when the processes contain intelligent thought.

    So, when you're generating synthetic data, or refining collected data, you will need to make sure the refinery output is of higher quality than its inputs.

    This means you need to:
    - Add intelligence. Make the new task trajectories perform deeper syntheses, pull in more relevant knowledge, take steps futher. Make more complex task performances out of simpler ones. Go through more possibilities. Go deeper meta-level and e.g. validate validations. Use search over alternative solutions.
    - Groom out bad data. Rank, criticize, evaluate, and either improve/fix bad data or recontextualize it.
    - Collect new data which is created by the data refinement processes themselves.
    - Add knowledge from external sources, and synthesize it with the knowledge already known. Also consider the next level implications of all the knowledge already acquired.
    - Apply skills to knowledge to produce new knowledge and new skills.

    LLMs are data-defined. Data isn't a static thing, it needs to be looked at philosophically.

    In conversation about 10 months ago from rukii.net permalink
  2. Embed this notice
    Foone🏳️‍⚧️ (foone@digipres.club)'s status on Friday, 02-Feb-2024 04:50:54 JST Foone🏳️‍⚧️ Foone🏳️‍⚧️
    in reply to

    oh my god

    I think this is even easier than I thought.

    So the function looks basically like this:
    validate_authorization_code (serial_number, expiration_date, max_users, authorization_code), right?

    and it takes the serial number and expiration_date and max_users and confirms the authorization_code matches some hash or something to make sure it fits those serial numbers and expiration_date and max_users

    In conversation about a year ago from digipres.club permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.