Public
- Public
- Network
- Groups
- Featured
- Popular
- People

Conversation

Notices

Embed this notice
pinskia (pinskia@hachyderm.io)'s status on Tuesday, 19-Nov-2024 11:33:10 JST pinskia

@camelcdr With a patch I am working on, aarch64 can vectorize this but only with -fno-vect-cost-model. The code generated is bad. Looks like GCC does not realize it could unroll the loop 4x to get a reasonible code generation (or with my patch just 2x).

In conversation about 5 months ago from hachyderm.io permalink
- Embed this notice
  pinskia (pinskia@hachyderm.io)'s status on Tuesday, 19-Nov-2024 11:50:47 JST pinskia
  in reply to
  
  @camelcdr Though looking at it again ARMv9-a's SVE should be able to optimize it. in a reasonible fashion I think. But neither GCC nor LLVM is able to handle it with SVE either.
  
  In conversation about 5 months ago permalink

Feeds