3/
🎯"Fung’s group is curating Indonesian-language data for training models...But collecting more data is unlikely to be enough, because the reams of English text are so large—and still growing"
▶ 🎙️ Yes! You won't be able to "collect your data" out of this problem. The same approach that worked for resourceFUL languages will not work for under-resourced one. We need to adopt radically different paradigms based on an equitable outlook.