Found some missing timing constraints on the RGMII and improved the clock structure in the FPGA a bit (apparently what I've been doing so far with a BUFG instead of a BUFIO to clock the IDDR was marginal at best).
It's now working (no packet loss) but still failing timing by a small amount. I can probably fix that with an IODELAY but looking to fix that with the delay lines on the PHY (since it has delays built in anyway).