Some comments evoked an addition:
1. If the answer is missing in the training set, it's going to go undetectably very badly.
2. If the answer is weakly in the training set, its gonna get very weird.
3. If there is other domain's senses of semantics in the training set, the answer will get semantically "averaged" as it models the language structures of both domains. This is a side effect of having a predictive language model with zero semantic knowledge