@paninid @russellmcormond @argv_minus_one @Uair Most discussion centers around whether the corporate project can reach its business goals using the data set. Questions focus on whether the data has the needed features and labels (the structure of the data), whether the data is comprehensive enough, whether the sample distribution is similar to the intended use, whether class imbalances (too much of one label and not enough of another) exist, and how clean or dirty the data is w.r.t. noise or mislabeling, but not the potential biases of that data. Data quality is generally assessed by some engineer looking at a few samples and forming a personal opinion. The vast majority of projects I have been involved in were "fly by the seat of your pants", and it was a struggle to get people to pay attention even to the majority of the questions I just listed.