Meeting on Music Performance:
analysis, modeling, tools
16-17 March 2001
Department of Electronics and Informatics, University of Padova,
in cooperation with Conservatory B. Marcello of Venice
Abstracts of Young Researcher Presentations
Index
Refined knowledge-based pitch tracking
Stéphane Rossignol
NICI Nijmegen
e-mail: S.Rossignol@nici.kun.nl
www: ...
I present here an elegant method to obtain reliable and precise f0-trajectories from monophonic audio fragments that can be used for the analysis and modeling of vibrato in music performance. To model the vibrato during notes and in note transitions accurate f0-trajectories are needed. The proposed pitch extraction method takes advantage of the fact that the score, the timing, the instrument and even the fingering is known. The pitch extraction method is divided in three stages. In the first stage the audio signal is fed through a band-pass filter bank. For each harmonic one time-varying band-pass filter is used which adjusts its width and central frequency according to the pitch information in the score. Information from the instrument is used to adjust the bandwidth to the pitch and to the speed of transitions. In the second stage the frequency and energy trajectories are computed for each partial, using the signals obtained in the previous stage. In the final stage the obtained FN and amplitude trajectories are merged to provide the optimalf0-trajectory. Here the instrument information is used to decide on the correct interpretation in situations where a higher partial is known to be a louder or more reliable source of f0 information than the fundamental itself, or where the tracks of certain harmonics of certain pitches are known to be distorted by sympathetic resonance.
Optimal estimation of parameters in rule systems for musical
performances
Patrick Zanon
C.S.C. DEI - Università di Padova
e-mail: patrick@dei.unipd.it
www: www.dei.unipd.it/~patrick
The use of rule systems for studying expressive deviations in musical performances is complicated by the absence of an objective method for an estimation of their parameters to emulate a given human performance. In this work will be presented a solution of the problem that also allows a comparison between the synthesis produced by different rule systems. To achieve best fit the theory of Hilbert space is used, by representing a performance or a rule as a vector in a "performance space", in which distances are defined according to the perceptive characteristics of the human ear and the fitting is obtained with an orthogonal projection. The results confirm this methodology, and give a numerical idea of how near the selected rule system can approach to a human performance.
Multi-layered Mapping Strategies in Interactive Systems
Gualtiero Volpe
Laboratorio di Informatica Musicale DIST - University of Genova
e-mail: volpe@infomus.dist.unige.it
www: ...
Design and implementation of automatic systems able to operate in artistic
contexts deals with problems such as automatic recognition of expressive gestures
from data from different channels (audio, video, sensors), generation of expressive
outputs controlled by high level expressive parameters, searching for trade-offs
between constraints that technology imposes and freedom and creativity of the
artists involved.
This presentation focuses on a main aspect in the design of applications for
art and entertainment: the strategies that should be used in order to map the
inputs of a system (on several levels: from the rough data from sensors to the
extracted expressive cues) onto some generated multimodal outputs. How can we
take advantage from information coming from the analysis side in order to synthesizes
a suitable expressive content? A first multilayered model for mapping strategies
will be presented. Several kinds of mapping can be considered in this framework:
(i) Simple direct mapping: for example, functions directly associating the detected
expressive cues with the available synthesis parameters without any kind of
dynamics.
(ii) More complex mapping, including rational processes (e.g., rule-based systems)
and decision making [2]. Consider, for example, a module able to make decisions
on the basis of expressive and environmental information: its decisions can
be used to switch between several available lower-level functions as in (i),
thus allowing the direct mapping to adapt itself to the current situation.
(iii) Strategies taking into account a sort of "performance measure"
of the behavior of the underlying layers with respect to the overall goals of
the systems (eventually including also compositional, artistic goals): the selection
between available sets of rules, the use of particular decision making algorithms,
the values assigned to some decision making parameters (e.g., the weights assigned
to each different aspect on the basis of which the decisions are made) can depend
on such "performance measure".
The relationships between mapping strategies and expressive autonomy[1], i.e.
the degrees of freedom the artist intends to leave to the automatic system,
will be also discussed.
References
[1] A. Camurri, P. Coletta, M. Ricchetti, and G. Volpe (In Press) "Expressiveness
and Physicality in Interaction", Journal of New Music Research.
[2] A. Camurri, G. Volpe (1999) "A goal-directed rational component for
emotional agents" in Proc. IEEE Intl. Conf. SMC'99, Tokyo, October 1999.
Accents in drumming: kinematics and timing.
Sofia Dahl
KTH and DIST
e-mail: sofiad@speech.kth.se
www: ...
Percussionists are required to perform the same rhythmical pattern on different surfaces with different physical properties. Together with the limitations introduced by the instruments (the difficulty of changing more than the timing and dynamics of the music played) this puts demands on the performer to develop a strategy for certain musical components, e.g. accents. Playing a stroke stronger than the previous one is a simple task but it never the less needs a preparation in order to be well performed. The accented stroke is usually initiated from a greater height, which allows for an increased striking force with the least possible effort. The aims for this study have been to investigate the different playing strategies the players use in playing an accented stroke, and how the accent affects the timing.
Score Segmenting Models: Horizontal and Vertical
Methods
Goffredo Haus & Massimiliano Pancini
LIM Dipartimento di Scienze dell'Informazione, Università degli Studi
di Milano
e-mail: haus@dsi.unimi.it,
maestro_pan@libero.it
www: ...
In this talk, LIM research team presents a new approach for the problem of
music score segmentation on the top of software tools developed in the past.
The previous use of melodic operators was limited by the large number of comparisons
needed for sequence matching and the lack of robustness when musical inputs
are affected in its repetitions by small variations.
Melodic operators are a family of analysis tools designed to extract musical
objects. They find every composer's manipulation on a melody, like transposition,
inversion and retrograde motion. To extend the focus toward meaningful moments
in music, a musical class of semantic operators has been developed. Thus, we
introduce the "Harmonic Operators", which identify sets of notes played
simultaneously and detect the harmonic bass; on the basis of the estimated bass
note, remaining notes are sorted, rebuilding the underlying chord. This way
we extract an harmonic track that can be further reduced with the detection
of the less important symbols, like ornament or repetitions, harmonies too short
or harmonies not pertinent to tonal context. The harmonic track can be considered
the output of a stand-alone analysis stage, but also the central state of a
score segmenting method. Grouping all notes within a 7-symbols alphabet (i.e.
the grade scale) reduces the amount of operation and increases the validity
of results.
Assembling a string from the sequence of chord symbols, two methods are designed
to identify possible themes. The first one is a syntactic search of suitable
strings of chords. Each decision is taken out of semantic symbols, and a well
formed phrase is easily recognized by a syntactic cadence parser. A cadence
is a sequence of harmonies ruled by easy sentences (e.g. "last chord of
a cadence must be built on a I or a V or a VI grade", or "a musical
phrase start where another one ends"). A measure of plausibility based
on sorting by duration (in bars) and articulation; in other words, the number
of different chord in a phrase can be used like index of musical utterance.
The second method is substring search routine that is able to find a repeated
string in a text. The result is weighted by the number of repetitions of the
sub-string, and in case of equality, it is sorted by sub-string length. Thus,
we look for the longest repeated harmonic sequence, since it is the more probable
harmonization of a theme. The substring search routine is accomplished through
a novel distance metric. When a mismatch is found, we explore the origin of
the error: a mismatch due to a different chord of the same tonal function (e.g.
I and VI) is not a "complete mismatch" but a similar interpretation
of the same phrase, and the error has a minor weight if compared to other errors
(e.g. V and IV ).
Moreover, the harmonic operators can be combined with our pre-existent operators.
Actually, the rhythmic accent determination (on weak and strong beats) is already
used implicitly by harmonic analysis to mark suitable beats where searching
for harmonic pulsation. Harmonic pulsation is defined for our purpose as the
expected duration of chords (and the relative change between them). All the
other combinations of methods have been considered in our work.
At a first stage, the harmonic syntactic parser recognizes sequences of chords
respecting the above-mentioned set of easy composer rules . The same result
can be obtained through the search of substring recurrences.
These methods can be jointly adopted, using the sequences recognized in the
first stage as input data for the second one . In this case, the final result
is the group of higher repeated well-formed sequences.
Furthermore, the output of the harmonic process can be used to obtain start
and end markers of a melodic phrase that can be analyzed by the usual melodic
operators. This last level of refinement assures valuable results in the search
for musically meaningful objects.
Acknowledgements
This work has been made possible by the effort of researchers and graduatestudents
at LIM. The authors are mainly indebted to Giuseppe Frazzini, Stefano Guagnini,
and Franco Lonati.This research is partially supported under a grant by the
Italian National Research Council (CNR) in the frame of the Finalized Project
"Beni Culturali".
Sound behaviour: taming nasty noises
Adrian Moore, Dave Moore
University of Sheffield
e-mail: a.j.moore@sheffield.ac.uk
www: www.shef.ak.uk/~mu1ajm
From physics to sounds: the piano case
Julien Bensa
LMA, CNRS Marseille
e-mail: bensa@lma.cnrs-mrs.fr
www: ...
During this intervention I presented a source/resonance model of piano strings. The aim is to build a physical synthesis model, which is able to reproduce a piano string sound measured on our experimental setup. This model takes into account physical phenomena such as the non-linear behaviour of the hammer/string contact, beats and double decay on the amplitude of the sound due to the transfer of energy between two or three strings of a same note. The parameters of this model are identified thanks to the analysis of experimental signals measured at the bridge level. The calibrated model allows a perfect resynthesis of the original sound from a perceptive point of view and can simulate physical situations by pertinently modifying the parameters.
The Hybrid Model and its applications
Kristoffer Jensen
DIKU, Copenhagen
e-mail: krist@diku.dk
www: www.diku.dk/research-groups/musinf/krist
The hybrid model is a parametric signal model of voiced musical sounds. It models the sound in a spectral envelope, frequency envelope, envelope split-point times and relative amplitudes, envelope curve forms, shimmer and jitter standard deviation and bandwidth. Although the hybrid model is a signal model, many of the parameters are perceptually important. The hybrid model as been used for analysis/synthesis, sound creation, morphing, classification and understanding of acoustic musical instruments.
Gesture Based Performances
Declan Murphy
DIKU, Copenhagen
e-mail: declan@diku.dk
www: www.maths.tcd.ie/~dec/dec.html
Having recently started a Ph.D. at DIKU, an interim presentation was given,
outlining the theoretical research areas being focused on, and the practical
projects investigating these areas.
The research areas may be categorised as representations of music, representations
of emotion/expression, and gesture capture (all as applied to music composition
and performance).
On the practical side, a video based gesture capture system is being built,
which is to serve as input for a suite of composition & performance tools
(based on the above research areas) which are also being devised.
Three stepping-stone sub-projects were introduced: (1) a gesture based front
end for the hybrid synthesiser under development at DIKU, (2) a computer conducting
system, (3) a composition & performance based on reflections from conic
sections, including performance-time gesture based sound manipulation.
Preserving directional cues in additive analysis/synthesis
Tue Haste Andersen
DIKU, Copenhagen
e-mail: haste@diku.dk
www: www.diku.dk/students/haste
A number of preliminary psycho acoustic experiments for assessing the importance of phase information, regarding the spatial qualities of synthesized sounds from voiced instruments are presented. Binaural recorded sounds are synthesized using additive analysis/synthesis, by estimation of amplitude, frequency and phase in small time steps, for each overtone. The synthesis is done by adding all sinusoids, where the phase for each sinusoid is either obtained as the integral of frequency over time, or as cubic interpolation between the estimated frequency and phase values. By using the estimated phase values in the interpolation, the waveform characteristics are preserved in the synthesis. The experiments involves a number of test persons, and shows the importance of phase when dealing with localization of voiced instruments. In 60% of the tests, the phase information is crucial for determining the direction of the sound. Furthermore the average error in localization of original sounds and sounds synthesized with phase is the same. For sounds synthesized without phase, the error is substantially higher. These results show clearly that phase information is important when dealing with spatial information in binaural recorded sounds.