GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Fabian Giesen (rygorous@mastodon.gamedev.place)'s status on Thursday, 22-May-2025 09:10:57 JST Fabian Giesen Fabian Giesen

    What's that mysterious workaround?

    Core Huff6 decode step is described in https://fgiesen.wordpress.com/2023/10/29/entropy-decoding-in-oodle-data-x86-64-6-stream-huffman-decoders/

    A customer managed to get a fairly consistent repro for transient decode errors by overclocking an i7-14700KF by about 5% from stock settings ("performance" multiplier 56->59).

    It took weeks of back and forth and forensic debugging to figure out what actually happens, but TL;DR: the observed decode errors are all consistent with a single instruction misbehaving.

    In conversation about 3 days ago from mastodon.gamedev.place permalink

    Attachments

    1. No result found on File_thumbnail lookup.
      A small note on SIMD matrix-vector multiplication
      from fgiesen
      Suppose we want to calculate a product between a 4×4 matrix M and a 4-element vector v: $latex Mv = \begin{pmatrix}a_x & b_x & c_x & d_x \\ a_y & b_y & c_y & d_y \\ a_z…
    • scriptjunkie repeated this.
    • Embed this notice
      Fabian Giesen (rygorous@mastodon.gamedev.place)'s status on Thursday, 22-May-2025 09:11:02 JST Fabian Giesen Fabian Giesen
      in reply to

      This instruction:
      mov [rDest + <index>], ch

      under these conditions, when overclocked a bit, once the machine has "warmed up", seems to have around a 1/10000 chance of actually storing the contents of CL instead of CH to memory.

      (this was "fun" to debug.)

      The workaround: when we detect Raptor Lake CPUs, we now do

      shr ecx, 8
      mov [rDest + <index>], cl

      instead. This takes more FE and uop bandwidth, but this loop is mainly latency-limited, and this is off the critical path.

      In conversation about 3 days ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Thursday, 22-May-2025 12:58:41 JST Rich Felker Rich Felker
      in reply to
      • 🇺🇦 haxadecimal
      • Per Vognsen

      @pervognsen @brouhaha @rygorous I mean I'd just take that as indication you need to underclock these pieces of shit by 20% or treat them as ewaste. 🤷

      In conversation about 3 days ago permalink
    • Embed this notice
      🇺🇦 haxadecimal (brouhaha@mastodon.social)'s status on Thursday, 22-May-2025 12:58:42 JST 🇺🇦 haxadecimal 🇺🇦 haxadecimal
      in reply to

      @rygorous
      Or don't overclock?
      But definitely mad props on the detective work.

      In conversation about 3 days ago permalink
    • Embed this notice
      Per Vognsen (pervognsen@mastodon.social)'s status on Thursday, 22-May-2025 12:58:42 JST Per Vognsen Per Vognsen
      in reply to
      • 🇺🇦 haxadecimal

      @brouhaha @rygorous Fabian can clarify further but I thought it did happen without overclocking, just with extreme rarity? The overclocking is used to make it reproducible in a lab setting.

      In conversation about 3 days ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.