Computational Issues in Physically-based Sound Models

[Start] [Contents] Preface [Download] [Multimedia]


In the last decade, new techniques for digital sound generation have rapidly gained popularity. These methods can be generically referred to as physically-based, since the synthesis algorithms are designed by modeling the physical mechanisms that underlie sound production. At the same time, high quality digital audio converters have become standard hardware equipment on personal computers, and the available computational power permits real-time implementation of these synthesis algorithms. By analogy, physically-based modeling approach has been also adopted by the computer graphics community, for modeling radiosity or light propagation. Research both in audio and in graphics has shown that such models can provide convincing results and physically consistent control over the synthesis algorithms.

Non-speech sound conveys a large amount of information to the listener, and can be used to augment and modulate visual information. As opposed to the visual channel, the auditory channel is always open and has a primary importance in the perception of physical events whenever visual cues are lacking or confusing. Based on these considerations, research in multimedia systems is devoting more and more importance to sound, in order to complement vision and to provide a multimodal surrounding to the user. Moreover, there are cases where graphic rendering is not possible or cost effective, whereas audio rendering can be used with little computational overhead.

An important application of physically-based sound models is in human-computer interaction, where tactile or force information can be exchanged through suitable sensors and effectors. In this case, auditory information can greatly augment the sense of presence of the user and the realism of the interaction. In this respect, physically-based synthesis is advantageous over other techniques (such as sampling) for two main reasons: first, the physical description of the sound algorithms allows the user to interact with the sound objects. As an example, in a physical model of contact the friction sound of the user's hand on a surface changes with the pressure exerted by the user. Likewise, the sound produced by a struck object varies with the impact velocity. Second, physically-based models for audio and graphics can (in principle) be easily synchronized. This way, a high degree of perceptual coherence of acoustic and visual events can be achieved. The efforts toward the development of models for joint audio-visual synthesis is rewarded in terms of simplification in the design of multimedia systems.

Virtual musical instruments can be regarded as a particular case of human-computer interaction. When using an acoustical or electro-acoustical instrument, the player interacts with it in a complex way and exerts control by exchanging gestual, tactile and force information. Techniques used in commercial synthesizers are mostly based on wavetable methods (i.e., recorded and post-processed sounds), that allow little sound manipulation. Consequently, the only interface that has been widely used so far in commercial electronic instruments is the piano keyboard. Again, this interface provides the user only with very little control over the synthesized sounds. In this respect, physical models of musical instruments greatly improve the possibilities of interacting with the virtual instrument. As an example, sound production in a physical model of the violin is controlled by parameters such as bow velocity and pressure. Analogously, the control parameters in a clarinet model are the player's blowing pressure and mechanical parameters related to the player's embouchure. The design of accurate and efficient physical models of musical instruments encourages the development of more sophisticated interfaces, that in turn give the user access to a large control space.

Many of the above considerations also hold when discussing research in voice production and speech synthesis. Articulatory speech synthesizers produce speech signals through a physical description of the phonatory system in terms of lung pressure, vocal fold vibrations and vocal tract shape and articulation. The research in articulatory speech synthesis has to a large extent progressed in parallel with research in non-speech sound, with little exchange of information between these two research fields. However, the modeling techniques in the two fields are very similar in many cases. One advantage of the physical modeling approach with respect to other techniques (such as concatenative synthesis or LPC-based analysis/synthesis method) is that more realistic signals can be obtained. Moreover, the models can (in principle) be controlled using physical parameters, such as lung pressure, tension of the vocal folds, articulatory parameters of the vocal tract and the mouth. However, it must be stressed that the problem of control is still an open one, since finding direct mappings between the physical parameters and perceptual dimensions (loudness, pitch, register) is not a trivial task. Finally, analogously to non-speech sounds, articulatory models of speech synthesis can be synchronized with graphic articulatory models. It is known that the use of so called {\em talking heads}, where visual and audio speech signals are synthesized simultaneously, improve the perception of speech considerably.

In addition to synthesis, physically-based models of both speech and non-speech sound can also be used for coding purposes. In particular, MPEG-4 has implemented a standard named Structured Audio (SA) coding. While traditional lossless or perceptual coding techniques are based on the codification and transmission of the sound signal (i.e., the waveform), the SA standard codes and transmits the symbolic description of the sound (i.e., the model and its parameters). The main advantage of this approach is that ultra-low bit-rate transmission can be achieved. Physically-based models are a highly structured representation of sound. Moreover, the control parameters have a direct physical interpretation, and therefore vary slowly enough to be used for efficient coding.

The focus of this thesis is the development of accurate and efficient numerical methods for the design of physically-based sound models. Physical models are naturally developed in the continuous-time domain and are described through sets of ordinary and partial differential equations. In a subsequent stage, these equations have to be discretized. In order to minimize the numerical error introduced in the discretization step, to guarantee stability of the numerical algorithms, and to preserve as closely as possible the behavior of the continuous systems, accurate techniques are needed. At the same time, the numerical techniques have to produce efficient algorithms, that can be implemented in real-time. These two demands of accuracy and efficiency often require making trade-offs. As an example, implicit and iterative discretization methods guarantee sufficient accuracy but effect the efficiency of the resulting algorithms. Likewise, low sampling rates are preferable for efficient implementations, but the accuracy deteriorates.

The first two chapters review the existing literature and present the general techniques used in the rest of the thesis. The remaining chapters contain original results on various physical models: single reed systems in wind instruments, vocal folds in the human phonatory system, and contact forces in impacts between two resonating objects. It is shown that all of these systems can be (1) modeled using the same approaches, (2) interpreted using very similar structures and functional blocks, and (3) discretized using the same numerical techniques. It is also shown that the techniques used here provide a robust method for the numerical solution of non-linear physical models, while resulting in efficient computational structures.

Chapter 1 discusses in more detail the topics already addressed in this preface. The source modeling approach is compared to other sound synthesis paradigms, then the use of physical models for synthesis and coding purposes is analyzed.

Chapter 2 presents all the modeling paradigms and the numerical techniques that are used in the remaining of the thesis. One-dimensional waveguide structures and their applications to the modeling of acoustic bores are reviewed in detail. Lumped elements are discussed, as well as their use in modeling a large class of mechanical and acoustic systems. Finally, the issue of discretization is addressed. In particular, a numerical method is reviewed that provides an efficient solution of delay-free computational loops in non-linear algorithms.

Chapter 3 discusses single reed modeling. A lumped model is reviewed, and an efficient and accurate discretization scheme is developed. It is shown that the behavior of the resulting digital reed closely resembles that of the physical system. The limitations of existing lumped models are discussed, and an improved non-linear model is formulated. In this new formulation, the interaction of the reed with the mouthpiece and the player's embouchure are taken into account, although at the expense of a slight increase of the model complexity.

Chapter 4 presents results on models of voice production. Attention is focused on vocal fold modeling rather than on the vocal tract. The Ishizaka-Flanagan (IF) lumped model of the glottis is reviewed, and the structural similarities between this model and single reed models are pointed out. Two glottal models are proposed, both providing a simplified description of the IF model. It is shown that the models can be implemented in an efficient way, while preserving the main properties of the IF model.

Chapter 5 discusses contact models for sound rendering of impacts and develops a hammer-resonator model. It is shown that this model is structurally similar to those described in the previous chapters, and can be discretized with the same numerical techniques. The resulting numerical system has low computational costs and can be implemented in real-time. The influence of the physical parameters on the model behavior is also examined. More specifically, particular attention is devoted to the problem of embedding material properties into the model.