Automatic transcription takes an audio signal and produces text. It's reasonably clear how to evaluate how well those systems work, both systematically (should we use this system for our purposes?) and in a particular use (is that really what was in the audio?).
>>