@paul @adam_chal I suspect the difference is because on x86_64 the default libc does not use FMA (since it is not part of x86_64v1) while on aarch64 (since FMA is part of armv8-a) will use FMA.
Especially when it is 2up difference.
Fused Multiple Add can make a huge difference since the addition is done in infinite precision after the multiply and only rounded afterwards.
Embed Notice
HTML Code
Corresponding Notice
- Embed this notice
pinskia (pinskia@hachyderm.io)'s status on Monday, 18-Nov-2024 07:03:20 JSTpinskia