In face-to-face communication both visual and auditory information play an obvious and significant role. In this presentation we will discuss work done, primarily at KTH, that aims at analyzing and modelling verbal and non-verbal communication from a multi-modal perspective. In our studies, it appears that both segmental and prosodic phenomena are strongly affected by the communicative context of speech interaction. One platform for modelling audiovisual speech communication is the ECA, embodied conversational agent. We will describe how ECAs have been used in our research, including examples of applications and a series of experiments for studying multimodal aspects of speech communication.