So I think both GCC and Clang/Rust are able to generate code that enables all the supported SIMD features with "-march=native", but are they able to optimize code scheduling for the microarchitecture or not? I'm doubtful.
So I changed the compiler flags to "-march=znver3" ("-C target-cpu=znver3" for Rust) in /etc/portage/make.conf to ensure it's generating code for Zen 3 (actually Zen 3+, but code scheduling is the same).
Gentoo really makes me ponder optimizing packages for many CPU types.