Huh.
Ok, so parameter size is determined by (base) model architecture (which makes sense).
So... my model, currently on gpt-2 small is about 124M
Could go higher, definitely, and I definitely probably will... just not sure if what I need is moar dataset or moar training or MOAR PARAMETERS
So many little levers and buttons and things to toy with.