In the last decade auditory-visual speech analysis has benefited greatly from advances in face motion measurement technology. Devices and systems have become widespread, more versatile, easier to use and cheaper. Statistical methods to handle multichannel data returned by the face motion measurements are readily available. However, no comprehensive theory or, minimally common framework to guide auditory-visual speech analysis has emerged. In this paper it is proposed that Articulatory Phonology [3] developed by Browman and Goldstein for auditory-articulatory speech production is capable of filling the gap. Benefits and problems are discussed.