MOSART IHP Network

Meeting on Music Performance:
analysis, modeling, tools

16-17 March 2001

Department of Electronics and Informatics, University of Padova,
in cooperation with Conservatory B. Marcello of Venice

 

Abstracts of Young Researcher Presentations

Index


Refined knowledge-based pitch tracking

Stéphane Rossignol
NICI Nijmegen
e-mail: S.Rossignol@nici.kun.nl
www: ...

I present here an elegant method to obtain reliable and precise f0-trajectories from monophonic audio fragments that can be used for the analysis and modeling of vibrato in music performance. To model the vibrato during notes and in note transitions accurate f0-trajectories are needed. The proposed pitch extraction method takes advantage of the fact that the score, the timing, the instrument and even the fingering is known. The pitch extraction method is divided in three stages. In the first stage the audio signal is fed through a band-pass filter bank. For each harmonic one time-varying band-pass filter is used which adjusts its width and central frequency according to the pitch information in the score. Information from the instrument is used to adjust the bandwidth to the pitch and to the speed of transitions. In the second stage the frequency and energy trajectories are computed for each partial, using the signals obtained in the previous stage. In the final stage the obtained FN and amplitude trajectories are merged to provide the optimalf0-trajectory. Here the instrument information is used to decide on the correct interpretation in situations where a higher partial is known to be a louder or more reliable source of f0 information than the fundamental itself, or where the tracks of certain harmonics of certain pitches are known to be distorted by sympathetic resonance.


Optimal estimation of parameters in rule systems for musical performances

Patrick Zanon
C.S.C. DEI - Università di Padova
e-mail: patrick@dei.unipd.it
www: www.dei.unipd.it/~patrick

The use of rule systems for studying expressive deviations in musical performances is complicated by the absence of an objective method for an estimation of their parameters to emulate a given human performance. In this work will be presented a solution of the problem that also allows a comparison between the synthesis produced by different rule systems. To achieve best fit the theory of Hilbert space is used, by representing a performance or a rule as a vector in a "performance space", in which distances are defined according to the perceptive characteristics of the human ear and the fitting is obtained with an orthogonal projection. The results confirm this methodology, and give a numerical idea of how near the selected rule system can approach to a human performance.


Multi-layered Mapping Strategies in Interactive Systems

Gualtiero Volpe
Laboratorio di Informatica Musicale DIST - University of Genova

e-mail: volpe@infomus.dist.unige.it
www: ...

Design and implementation of automatic systems able to operate in artistic contexts deals with problems such as automatic recognition of expressive gestures from data from different channels (audio, video, sensors), generation of expressive outputs controlled by high level expressive parameters, searching for trade-offs between constraints that technology imposes and freedom and creativity of the artists involved.
This presentation focuses on a main aspect in the design of applications for art and entertainment: the strategies that should be used in order to map the inputs of a system (on several levels: from the rough data from sensors to the extracted expressive cues) onto some generated multimodal outputs. How can we take advantage from information coming from the analysis side in order to synthesizes a suitable expressive content? A first multilayered model for mapping strategies will be presented. Several kinds of mapping can be considered in this framework:
(i) Simple direct mapping: for example, functions directly associating the detected expressive cues with the available synthesis parameters without any kind of dynamics.
(ii) More complex mapping, including rational processes (e.g., rule-based systems) and decision making [2]. Consider, for example, a module able to make decisions on the basis of expressive and environmental information: its decisions can be used to switch between several available lower-level functions as in (i), thus allowing the direct mapping to adapt itself to the current situation.
(iii) Strategies taking into account a sort of "performance measure" of the behavior of the underlying layers with respect to the overall goals of the systems (eventually including also compositional, artistic goals): the selection between available sets of rules, the use of particular decision making algorithms, the values assigned to some decision making parameters (e.g., the weights assigned to each different aspect on the basis of which the decisions are made) can depend on such "performance measure".
The relationships between mapping strategies and expressive autonomy[1], i.e. the degrees of freedom the artist intends to leave to the automatic system, will be also discussed.
References
[1] A. Camurri, P. Coletta, M. Ricchetti, and G. Volpe (In Press) "Expressiveness and Physicality in Interaction", Journal of New Music Research.
[2] A. Camurri, G. Volpe (1999) "A goal-directed rational component for emotional agents" in Proc. IEEE Intl. Conf. SMC'99, Tokyo, October 1999.


Accents in drumming: kinematics and timing.

Sofia Dahl
KTH and DIST

e-mail: sofiad@speech.kth.se
www: ...

Percussionists are required to perform the same rhythmical pattern on different surfaces with different physical properties. Together with the limitations introduced by the instruments (the difficulty of changing more than the timing and dynamics of the music played) this puts demands on the performer to develop a strategy for certain musical components, e.g. accents. Playing a stroke stronger than the previous one is a simple task but it never the less needs a preparation in order to be well performed. The accented stroke is usually initiated from a greater height, which allows for an increased striking force with the least possible effort. The aims for this study have been to investigate the different playing strategies the players use in playing an accented stroke, and how the accent affects the timing.


Score Segmenting Models: Horizontal and Vertical Methods

Goffredo Haus & Massimiliano Pancini
LIM Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano

e-mail: haus@dsi.unimi.it, maestro_pan@libero.it
www: ...

In this talk, LIM research team presents a new approach for the problem of music score segmentation on the top of software tools developed in the past. The previous use of melodic operators was limited by the large number of comparisons needed for sequence matching and the lack of robustness when musical inputs are affected in its repetitions by small variations.
Melodic operators are a family of analysis tools designed to extract musical objects. They find every composer's manipulation on a melody, like transposition, inversion and retrograde motion. To extend the focus toward meaningful moments in music, a musical class of semantic operators has been developed. Thus, we introduce the "Harmonic Operators", which identify sets of notes played simultaneously and detect the harmonic bass; on the basis of the estimated bass note, remaining notes are sorted, rebuilding the underlying chord. This way we extract an harmonic track that can be further reduced with the detection of the less important symbols, like ornament or repetitions, harmonies too short or harmonies not pertinent to tonal context. The harmonic track can be considered the output of a stand-alone analysis stage, but also the central state of a score segmenting method. Grouping all notes within a 7-symbols alphabet (i.e. the grade scale) reduces the amount of operation and increases the validity of results.
Assembling a string from the sequence of chord symbols, two methods are designed to identify possible themes. The first one is a syntactic search of suitable strings of chords. Each decision is taken out of semantic symbols, and a well formed phrase is easily recognized by a syntactic cadence parser. A cadence is a sequence of harmonies ruled by easy sentences (e.g. "last chord of a cadence must be built on a I or a V or a VI grade", or "a musical phrase start where another one ends"). A measure of plausibility based on sorting by duration (in bars) and articulation; in other words, the number of different chord in a phrase can be used like index of musical utterance.
The second method is substring search routine that is able to find a repeated string in a text. The result is weighted by the number of repetitions of the sub-string, and in case of equality, it is sorted by sub-string length. Thus, we look for the longest repeated harmonic sequence, since it is the more probable harmonization of a theme. The substring search routine is accomplished through a novel distance metric. When a mismatch is found, we explore the origin of the error: a mismatch due to a different chord of the same tonal function (e.g. I and VI) is not a "complete mismatch" but a similar interpretation of the same phrase, and the error has a minor weight if compared to other errors (e.g. V and IV ).
Moreover, the harmonic operators can be combined with our pre-existent operators. Actually, the rhythmic accent determination (on weak and strong beats) is already used implicitly by harmonic analysis to mark suitable beats where searching for harmonic pulsation. Harmonic pulsation is defined for our purpose as the expected duration of chords (and the relative change between them). All the other combinations of methods have been considered in our work.
At a first stage, the harmonic syntactic parser recognizes sequences of chords respecting the above-mentioned set of easy composer rules . The same result can be obtained through the search of substring recurrences.
These methods can be jointly adopted, using the sequences recognized in the first stage as input data for the second one . In this case, the final result is the group of higher repeated well-formed sequences.
Furthermore, the output of the harmonic process can be used to obtain start and end markers of a melodic phrase that can be analyzed by the usual melodic operators. This last level of refinement assures valuable results in the search for musically meaningful objects.
Acknowledgements
This work has been made possible by the effort of researchers and graduatestudents at LIM. The authors are mainly indebted to Giuseppe Frazzini, Stefano Guagnini, and Franco Lonati.This research is partially supported under a grant by the Italian National Research Council (CNR) in the frame of the Finalized Project "Beni Culturali".


Sound behaviour: taming nasty noises

Adrian Moore, Dave Moore
University of Sheffield

e-mail: a.j.moore@sheffield.ac.uk
www: www.shef.ak.uk/~mu1ajm


From physics to sounds: the piano case

Julien Bensa
LMA, CNRS Marseille

e-mail: bensa@lma.cnrs-mrs.fr
www: ...

During this intervention I presented a source/resonance model of piano strings. The aim is to build a physical synthesis model, which is able to reproduce a piano string sound measured on our experimental setup. This model takes into account physical phenomena such as the non-linear behaviour of the hammer/string contact, beats and double decay on the amplitude of the sound due to the transfer of energy between two or three strings of a same note. The parameters of this model are identified thanks to the analysis of experimental signals measured at the bridge level. The calibrated model allows a perfect resynthesis of the original sound from a perceptive point of view and can simulate physical situations by pertinently modifying the parameters.


The Hybrid Model and its applications
Kristoffer Jensen
DIKU, Copenhagen
e-mail: krist@diku.dk
www: www.diku.dk/research-groups/musinf/krist

The hybrid model is a parametric signal model of voiced musical sounds. It models the sound in a spectral envelope, frequency envelope, envelope split-point times and relative amplitudes, envelope curve forms, shimmer and jitter standard deviation and bandwidth. Although the hybrid model is a signal model, many of the parameters are perceptually important. The hybrid model as been used for analysis/synthesis, sound creation, morphing, classification and understanding of acoustic musical instruments.


Gesture Based Performances
Declan Murphy
DIKU, Copenhagen
e-mail: declan@diku.dk
www: www.maths.tcd.ie/~dec/dec.html

Having recently started a Ph.D. at DIKU, an interim presentation was given, outlining the theoretical research areas being focused on, and the practical projects investigating these areas.
The research areas may be categorised as representations of music, representations of emotion/expression, and gesture capture (all as applied to music composition and performance).
On the practical side, a video based gesture capture system is being built, which is to serve as input for a suite of composition & performance tools (based on the above research areas) which are also being devised.
Three stepping-stone sub-projects were introduced: (1) a gesture based front end for the hybrid synthesiser under development at DIKU, (2) a computer conducting system, (3) a composition & performance based on reflections from conic sections, including performance-time gesture based sound manipulation.


Preserving directional cues in additive analysis/synthesis
Tue Haste Andersen
DIKU, Copenhagen

e-mail: haste@diku.dk
www: www.diku.dk/students/haste

A number of preliminary psycho acoustic experiments for assessing the importance of phase information, regarding the spatial qualities of synthesized sounds from voiced instruments are presented. Binaural recorded sounds are synthesized using additive analysis/synthesis, by estimation of amplitude, frequency and phase in small time steps, for each overtone. The synthesis is done by adding all sinusoids, where the phase for each sinusoid is either obtained as the integral of frequency over time, or as cubic interpolation between the estimated frequency and phase values. By using the estimated phase values in the interpolation, the waveform characteristics are preserved in the synthesis. The experiments involves a number of test persons, and shows the importance of phase when dealing with localization of voiced instruments. In 60% of the tests, the phase information is crucial for determining the direction of the sound. Furthermore the average error in localization of original sounds and sounds synthesized with phase is the same. For sounds synthesized without phase, the error is substantially higher. These results show clearly that phase information is important when dealing with spatial information in binaural recorded sounds.