@camelcdr Though looking at it again ARMv9-a's SVE should be able to optimize it. in a reasonible fashion I think. But neither GCC nor LLVM is able to handle it with SVE either.