@jonny@spacyoddity welp, yeah, the "targets" are not always consistently "motor"/production/articulatory targets, either, because THE GOAL IS COMMUNICATION
We already know that there is widespread diversity in AmE articulator production of syllable initial /r/ for example (with at least two modes of tongue shape, retroflex and apical bunching) but nobody seems to care about those differences so there is a lot of inter-speaker variation.
@trochee@spacyoddity and the fucked up part of that is how it might not even phenomenologically "sound like" there is any difference even though there are dramatic differences I'm how it acoustically "sounds." I swear I get how all the hard motor theory ppl just threw up their hands and were like "purely auditory phonetic perception is impossible" without any practical exposure to extremely nonlinear classification problems like we have a little bit of now via ANNs.
@spacyoddity@jonny one effect of treating these as articulatory gesture targets on different tracks is that "coarticulation" self-explains:
a [ə] schwa sound has no tongue-position targets, but does have a laryngeal activation target, so if it's preceded by a [t] and followed by a [k] it will tend to trace tongue position space from high/front to high/back and sound more like [ɨ].
@spacyoddity@jonny haha I'm going to get in trouble with my phonetics teachers
... But i think I'm making the argument that a phoneme stream can be considered target points for some (but often not all) of the articulators. Vowels are usually "turn on laryngeal action" and probably also "tongue position x,y", but consonants are more about "articulator z executes action a"
This is a related idea to treating each phoneme as a feature bundle, but not the same idea
@jonny yeah -- spectrogram reading and the wacky nonlinearities of speech production and perception are at least partly because speech production isn't a sequence of phonemes but the loosely time-coordinated movement of multiple articulators
Possible to think of each IPA segment as a target in an articulator Hilbert space
We could be writing something like a musical score with clefs for each instrument (lips, tongue root, tongue blade, breath, larynx) instead of IPA
@trochee omfg brilliant. or like at least having that to filter the keyboard- like I know this is a stop, not sure what kind yet, but only show me stops.
interesting your idea would be like a 3d wordle, where instead of each guess being a row, each guess would be a variable length list of features
@trochee I had not even thought about it as a learning tool but wow that is extremely true and im gonna send it to my lab.
and also I mean c'mon u don't deserve that kinda self talk since I mean phonetic <-> acoustic mappings are famously ill defined and interpreting a spectrogram visually is like the most vibes-based thing ever what with all the deep and fucked up nonlinearities of audition.
@jonny as a spectrogram reading tool I'm tempted to think about scoring the feature bundles instead of the IPA, like putting /b/ means [+voiced +stop +labial]
you might flag the features as black/yellow/green instead of the entire phoneme, so if the right answer was /v/ I'd have ?+voiced, ?+labial, ⚫+stop