@_wurli if people optimized #numpy to take automatically maximum advantage of all available hardware, e.g multi-core, gpu etc, it sort of makes moot any performance concerns around #python (for a wide range of applications).
The complexity of working with #cuda, c++ bindings or inventing new languages like #mojo would be largely mitigated on a 80/20 principle.
I dont know if there is an intrinsic obstacle for this to happen or if there is some other reason...