Happy enough with this performance to move on from optimization for now, but for future reference I have noted a few low-hanging fruits: splitting chunks into smaller pieces (16x16x16 being obvious) for better frustrum culling and faster tesselation updates; reducing the size of vertex attributes and offloading some additional computation to shaders; and possibly moving tesselation off of the main thread