@regehr @steve Also, re: ISA efficiency, I like re-posting this, by now, rather old image that shows you what the score really is.
This was on the Xeon Phis but the general trend holds to this day. (Source: https://people.eecs.berkeley.edu/~ysshao/assets/papers/shao2013-islped.pdf p. 3) NB this is an in-order core with 512b vector units.