The wallclock time to complete factorization of a 512 bit RSA modulus seems to be limited mostly by the linear algebra.
When I did it last month, the parts that couldn't practically be sped up by doing distributed compute appear to have been doable in about 80 minutes.
The rest (polynomial selection and lattice sieving) appears to have needed about 14,000 core-minutes. Throwing 512 cores at that to allow for overhead should bring those phases under 40 minutes total.
All together, that's just under two hours.
Here's the timing logs for my run:
https://gist.github.com/ryancdotorg/ec3ef7d6263b981fb1dd7dc07e0c8bd7