@mcc It's not clean at all from a license POV
LLaMA is licensed by Facebook for non-commercial use only, and apparently they've started sending out DMCA take-downs: https://twitter.com/theshawwn/status/1638925249709240322
And the fine-tuning data was generated using OpenAI's GPT, which has terms that say you can't use it to help train a competing model
Cerebras-GPT is much more interesting - it appears to be cleanly Apache 2 licensed. It's not instruction-tuned yet though: https://simonwillison.net/2023/Mar/28/cerebras-gpt/