@jonny as a spectrogram reading tool I'm tempted to think about scoring the feature bundles instead of the IPA, like putting /b/ means [+voiced +stop +labial]
you might flag the features as black/yellow/green instead of the entire phoneme, so if the right answer was /v/ I'd have ?+voiced, ?+labial, ⚫+stop