GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Miod Vallat (miodvallat@hostux.social)'s status on Sunday, 03-Aug-2025 04:46:18 JST Miod Vallat Miod Vallat

    Side-quest in the gcc/m88k work: on 88100, the DIV instruction is expensive (38 cycles). It is a bit less expensive on the 88110 (18 cycles), but still a heavy cost compared to other instructions.

    Gcc has logic to try and replace a divide by a constant, by a multiplication by its inverse in a Z/2**NZ ring if a proper N can be found. In other words, instead of dividing, it will multiply by a large number, and shift the result to the right by 32 bits.

    1/n

    In conversation about 3 months ago from hostux.social permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 03-Aug-2025 04:46:10 JST Rich Felker Rich Felker
      in reply to

      @miodvallat Shouldn't even 4 16x16->32 muls be better?

      In conversation about 3 months ago permalink
    • Embed this notice
      Miod Vallat (miodvallat@hostux.social)'s status on Sunday, 03-Aug-2025 04:46:12 JST Miod Vallat Miod Vallat
      in reply to

      And it turns out this was a quick side quest, and a failure: the reason why this logic only triggers when targetting the 88110 is because only the 88110 has the 32x32->64 multiply instruction. 88100 can only do 32x32->32 which is useless here.

      There goes a good idea...

      In conversation about 3 months ago permalink
    • Embed this notice
      Miod Vallat (miodvallat@hostux.social)'s status on Sunday, 03-Aug-2025 04:46:14 JST Miod Vallat Miod Vallat
      in reply to

      So the side quest is to figure out whether there is a way to cause this logic to trigger, and only fallback to the convoluted 88100 logic (which is there for $REASONS, such as early processors not causing a trap when dividing by zero...) when no suitable multiplication logic can be applied. 4/4

      In conversation about 3 months ago permalink
    • Embed this notice
      Miod Vallat (miodvallat@hostux.social)'s status on Sunday, 03-Aug-2025 04:46:15 JST Miod Vallat Miod Vallat
      in reply to

      I noticed that, when compiling with the `-m88110' option, this "replace a div by a constant with magic mult" logic would fire. And give a ~50% speedup for the operation.

      But not when compiling without any processor selection option, or when compiling with `-m88100'.

      This is because, in the not-targeting-88110 case, the compiler intrinsic for integer divide is more complicated (a "define_expand" rather than a "define_insn"), and this is likely preventing the optimization from being tried. 3/n

      In conversation about 3 months ago permalink
    • Embed this notice
      Miod Vallat (miodvallat@hostux.social)'s status on Sunday, 03-Aug-2025 04:46:17 JST Miod Vallat Miod Vallat
      in reply to

      Most processors have a 32x32 -> 64 integer multiplication routine, and with this you only need to keep the upper 32 bits of the result and you're done.

      (I'm taking shortcuts in the description, but this is quite close to what needs to be done).

      On 88100, integer multiply costs 3 cycles. Even with some overhead to compute the result after this multiply, this will still run circles around the DIV logic. 2/n

      In conversation about 3 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 03-Aug-2025 12:53:24 JST Rich Felker Rich Felker
      in reply to
      • John-Mark Gurney

      @encthenet @miodvallat The (x0+x1)*(y0+y1) product is a 17x17->33 mul not a 16x16->32 mul, no?

      In conversation about 3 months ago permalink
    • Embed this notice
      John-Mark Gurney (encthenet@flyovercountry.social)'s status on Sunday, 03-Aug-2025 12:53:25 JST John-Mark Gurney John-Mark Gurney
      in reply to
      • Rich Felker

      @dalias
      You can do it in three:
      https://en.m.wikipedia.org/wiki/Karatsuba_algorithm

      But takes a few more additions.

      @miodvallat

      In conversation about 3 months ago permalink

      Attachments


Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.