Category Theory is a branch of mathematics born during the 1950's of last century to model several branches of mathematics with the aid of points, arrows, and diagrams. Categorical thinking has proven to be useful in other disciplines as well, and thus, *applied* categories have been developed. We talked about this in another post.

Books about category theory seem to be "abstract": we might wonder why there are so many arrows and points. But...

Some concepts of category theory can be applied to music as well as to the comparison between visual forms and musical forms. This post is part of my "virtual poster" for the conference Applied Category Theory (ACT) 2020 organized by MIT, that will be held online in July 2020.

Several scholars, including Jedrzejewski, Mazzola, Arias, Popoff, and Clark, have investigated the application of category theory to music.

I also tried to use some basic concepts and some simple connections. I use the language of categories to model musical structures and transformations and also to compare them with visual structures, forms from nature, and growth and transformation processes. In this short post, I will cite a few examples of this approach.

Too ambitious? Let's just smile about it.

Figure 1 shows two points and an arrow between them: it's a graphic representation of a morphism between two objects of a category. In the framework of music, the two points can represent two different values of intensity, such as a *piano* and a *forte*, and the arrow connecting them is a *crescendo*. (We'll see below why we can talk of a *category*).

Let's imagine a double "smile", to indicating two different ways to realize a crescendo, for example, slowly or quickly. One becomes the other thanks to an "arrow between arrows" (Figure 2).

Each point itself can contain points and arrows, creating in this way a nested structure (Figure 3).

We can thus obtain a sort of face, where the eyes themselves are also faces (graphs from this article [3]). These ideas can inspire graphic works, such a fragment of my drawing "Duality" (Figure 4).

Let's go back to music, which shows several nested structures. In the aforementioned example, a *crescendo* can be described as a variation from one intensity level to another one, e.g., from *piano* to *forte*. But there are several ways to do it, e.g., faster or slower. If each crescendo is represented by an arrow, the time variation between crescendi is represented by an arrow between arrows. Therefore, let the points be the intensities and the arrows be the intensity variations. The composition of two intensity variations yields an intensity again. The associative property is easy to verify, and the neutral element is given by the zero intensity variation.

Thus, we have a category with intensities as objects and intensity variations as morphisms. We can define a 2-cell where the objects are the intensity variations (1-arrow), and the morphisms are the intensity variations (2-arrows). The same musical passage can be performed by different musical instruments. We can thus define the 3-cells, and so on. See this article and this other one for details.

Morphisms between categories, called *functors*, can also be defined. Functors transform objects of a category into objects of another category and morphisms of the one into morphisms of the other one. These are basic concepts in category theory, but they nevertheless have a potential for generalization already. I applied these ideas to the form of trees (Figure 5).

Figure 5 shows, on the left, a young (top) and adult (bottom) *Butia capitata *tree*,* and, on the right, a young (top) and an adult (bottom) *Coccothrinax argentea *tree*.* Here, we are using the idea of the category to indicate a species. Morphisms within a category indicate the growth process; morphisms between categories indicate the comparison between the forms of individual of the same species ("form comparison" between objects), and the comparison between growth processes within different species ("growth comparison" between morphisms). This example comes from the book "Mathematics, Nature, Art" [1].

A suitable functor can turn trees' forms into music. Music develops in time, and thus it can represent the development of a form within the space of sounds. Several scholars see the plants' form as the result of a growth process through time. According to D’Arcy Thompson [6]:

Organic form itself is found, mathematically speaking, to be a function of time... We might call the form of an organism an event in space-time, and not merely a configuration in space.

And, as Francis Hallé [5] points out:

The idea of the form implicitly contains also the history of such a form.

The idea of functor can be applied to the forms of animals and their possible musical renditions as well; see a detailed post on this topic.

Another categorical structure is given by *natural transformations*, that, intuitively, compare the action of different functors. The same book [1] includes an example where the morphisms indicate the blossom of flowers in an inflorescence, a functor represents the "placement" of flowers within the spherical inflorescence, and another functor represents the placement of flowers into an ellipsoidal inflorescence. The spherical inflorescence is inspired by *Echinops ritro;* the ellipsoidal inflorescence is, however, invented here, see Figure 6. Flowers of *Echinops ritro* have five petals; here, they are simplified with three petals only.

In the same book, I defined a "sonification" functor that transforms single flowers into musical themes and inflorescences into distributions of themes on a sphere (in the space of sounds). In fact, it's possible to define nested categories and functors between functors.

The sonification technique I propose is grounded in the mathematical theory of musical gestures and on their "transferability" from one domain to another one.

Regarding gestures, I'd like to cite the definition of musical gesture given by Mazzola and Andreatta [4], see Figure 7, as a mapping from an oriented graph (with points and arrows) to a continuous path (that connects points in a space) within a topological space. Each vertex of the graph is mapped onto a point of the space, and each edge of the graph (arrow) onto a continuous arrow between points of the space, keeping the correspondence of original arrows' tails and heads.

In [2, 4], the oriented graph is Delta, and the set of points and continuous curves belongs to the space X. We can thus define gestures as mappings from Delta to X, and we can represent transformations between these gestures as 2-arrows, that can also be composed. In music, a non-characterized movement (gesture) can become a "piano" and then a "forte" movement (Figure 8).

We can also indicate as *gesture* the set of points in space and curves connecting them. Mazzola and Andreatta use the metaphor of a dancer's continuous movements, connecting discrete steps. I started from this metaphor to run a short study on categories applied to dance (Figure 9).

The orchestral conductor, wanting to conduct, let's say, a ternary time, thinks of a scheme with three points in the air touched according to a precise order; he/she will then join these points by performing some continuous movements in space and time (Figure 10).

We can consider a whole gesture as a point, defining arrows between gestures. The compound gestures are the *hypergestures *[4], that can be recursively defined. The composition of hypergestures (paths) is, as mathematicians would say, associative up to a path of paths. We can re-define hypergestures as an equivalence class of hypergestures, in order to formally have a 2-category. These ideas are discussed and proved in two theorems, see here [3].

We can take into account the cyclicity of conducting movements in correspondence with the cyclicity of musical time (e.g., the scheme 1-2-3 is repeated in a ternary time; see Figure 11).

If we consider the category having the points in space (and time) needed for orchestral conducting as objects, and the conductor baton's movements joining them (for sake of simplicity, we are only considering the conductor's right hand) as morphisms, the 2-morphisms are the variations of these movements --- speed changes, articulation changes, and so on. The formalism of 2-cells can be useful to analyze conductor's gestures, comparing them with the movements of orchestral performers. If, for example, the conductor points out a*tutti crescendo*, musicians will make transformations of gestures (2-morphisms) that will have something in common. E.g., a pianist performs a gesture producing a

*forte*dynamic; a percussionist would too. If the scheme of their gestures is different, we can call them Delta and Gamma. The pianist's and percussionist's actual movements unfold within parameter spaces X and Y, respectively. The parameters shaped by the pianist are: time, hands' position on keyboard, speed, and so on; the parameters shaped by the percussionist are: time, mallets' position, instrument choice, and so on. We can compare the percussionist's and pianist's gestures by using the language of categories; see Figure 12. The double arrows (2-morphisms) act equivalently, making the gestures

*similar*.

The same formalism of Figure 12 can be applied to dance, see Figure 13.

We could investigate *similar* variations in the motion of a violinist's bow, or in a flutist's use of mouth and diaphragm. These concepts lead to gestural similarity [2]. Here is its heuristic version.

Conjecture A.1 (The heuristic conjecture) Two gestures, based on the same skeleton, are

similarif and only if they can be connected via a transformation:

(1) that homotopically transforms one gesture into the other, and

(2) that also leads to similar changes in their respective acoustical spectra.Homotopy is a necessary, but not sufficient, condition to get similar gestures.

When the conductor signals a *forte* for all performers, each performer will make different movements according to the parameter space for each instrument. However, the different movements will have some analogies (e.g., an increase of pressure of the bow for the violinist and of airstream pressure for the flutist) producing sound analogies (e.g., an intensity increase and a related timbre variation), which can be retrieved in spectrograms.

The idea of a functor can also be applied to the transformations from the visual symbols of the score to the sound of the performance. Natural transformations help formalize the comparison between different performances of the same composition. In this way, we can compare gestures by different musicians of different orchestras, their sounds, and so on.

Because the conductor's gesture does not directly produce a sound but instead only suggests it, the conductor's gesture appears to be more "abstract" with respect to the orchestral performers' movements. We might say that the conductor's gestures are ontologically different from performers' gestures.

As another topic, the listener and the conductor have opposite roles. Orchestral performance can be seen as a flux of arrows from the conductor (level 1) to the orchestral performers (level 2) to the listener (level 3). In category theory, the constructions which can be obtained by reversing all arrows are called *dual*. In fact, we can build an equivalent (but reversed) formalism, where the listener (level 1) pays attention to performers' movements and sound (level 2) and the performers pay attention to the conductor's gestures (level 3). In addition, the conductor's gestures synthesize the main points of the score and the essential information for performers. For these reasons, we can metaphorically be inspired by categorical constructions, such as limits and colimits, to model these interactions between different levels (Figure 14); find more detailed diagrams here [2].

A limit is the generalization of a product (i.e., simplifying, the arrows *start* from it); a colimit is the generalization of a direct sum (the arrows *converge* to it). As pointed out in the article, in [2] the *universal properties* are not described in detail, but some conceptual explanations are given, regarding why such a language could be adapted to a conductor/orchestra/listener.

The choice of listener as a limit, even if considered as a metaphor, satisfies the universal property because all the “listening and perception activities” can be reduced to the listener, in contraposition to the sound-production activities. The conductor plays the opposite role: all the “sound production” gestural activities can be related to the conducting gesture, which is a pure gesture without any direct sound production.

The passage toward the abstraction, that we can metaphorically represent as a colimit in an extra-musical domain such as biology could be applied to species classification (taxonomy). Figure 15 shows, in the case of fish, the progressive abstraction, through arrow composition, from the single species, to the genera, and moving upward to reach an equivalent of the most abstract idea of "fish," represented here by a schematic drawing. (The book [1] discusses some differences from the Platonic idea [1]).

The gestural similarity conjecture can be extended to the interaction between music and images. Let's think of a collection of dots on a piece of paper and a sequence of *staccato* notes: staccato notes and dots can be considered as *similar* because they are produced by the same detached gesture, that is, as being drawn in the space of sounds, as being generated by the same creative gesture.

Clearly, there isn't any one-to-one correspondence because we can have infinite collections of dots to be associated with the given musical sequence. To such a musical sequence we wouldn't associate a continuous line, though. In the same way, we can sonify a given set of dots with different notes, all played as *staccato*. We can thus think of *equivalence classes* of possible musical renditions verifying gestural similarity.

In this way, music generated by an algorithm or "freely" composed can create the illusion of simultaneous production of sound and image; see this interview at Ca' Foscari University in Venice. Whatever the chosen technique for a form's musical rendition is, be it a free or an algorithmic one (exploiting Lindenmayer studies [5] in the case of plants), in my opinion, an effective rendition should verify gestural similarity. How to effectively translate a complex form into sound is a non-trivial problem, discussed in the article on "Quantum GestART".

To conclude, let's plunge into a mathematical ocean, where the forms are graphs of parametric equations (or rotation solids), and the music follows the gestural similarity criterion.

----

[1] Mannone, Maria. *Mathematics Nature Art.* (2019). Palermo: Palermo University Press.

[2] Mannone, Maria. (2018). "Introduction to Introduction to gestural similarity in music. An application of category theory to the orchestra". *Journal of Mathematics and Music*, 12(2): 63-87. https://www.tandfonline.com/doi/abs/10.1080/17459737.2018.1450902

[3] Mannone, Maria. (2018). "Knots, Music, and DNA". *Journal of Creative Music Systems, *2(2).* https://doi.org/10.5920/jcms.2018.02*

[4] Mazzola, Guerino, and Andreatta, Moreno. (2010). “Diagrams, Gestures and Formulae in Music.” *Journal of Mathematics and Music,* 1 (1): 23–46.

[5] Prusinkiewicz, Przemyslaw, and Lindenmayer, Aristid. (2004). *The Algorithmic Beauty of Plants*. New York: Springer.

[6] Thompson, D’Arcy Wentworth. (1966). *On Growth and Form. An Abridged Edition Edited by John Tyler Bonner*.Cambridge, Massachusetts: Cambridge University Press.

"The orchestral conductor, wanting to conduct, let's say, a ternary time, thinks of a scheme with three points in the air touched according to a precise order; he/she will then join these points by performing some continuous movements in space and time ..."

But isn't this (and its concomitant hypergesturality) due more to the accident of physicality rather than to the necessity of communication? A conductor *must* move her arms from point to point to effect the instruction. A set of traffic lights (say) would not need to. A conductor's motions are forced to adopt some stylistics of execution. A traffic light would still, admittedly, need a 'style of transition' between states but this will be too fast to be noticed by the targets of the communication and thus would need to be be inessential. Note I'm not saying conductors should be replaced my machines - I'm just wondering about the necessity of the higher levels here. It seems to be turning, somewhat unnecessarily in this case, a locutionary act into an illocutionary one.

Hi, Thank you for reading the post and asking this interesting question! Yes, the physicality in conducting movements is fundamental. A piece of simple information such as the time scanning 1, 2, 3, 1, 2, 3, ..., could easily be replaced by, let's say, some light signals (as three lights turning on or off one after the other). It's kinda the job of a metronome. But performers need more information than the one provided by a metronome. Thus, the "accident of physicality" becomes a powerful tool, providing more information: how fast performers have to start playing at movement 2? Which the articulation is? (e.g., the conductor's gesture is continuous, but he/she can mimic a staccato as he/she is drawing it in the air). How fast are the changes happening? Who is playing? And many, many more musical examples. The conductor is, also, always a little bit in advance, so orchestral performers have the time to perform the required gestures. Higher levels in mathematics are an attempt to formally catch such a musical complexity.

BTW, the comparison with the traffic light is fun; it becomes stronger if we consider an imaginary traffic-conductor that touches points and turns light on or off. For car drivers, then, the attention shifts from the points themselves to the way of reaching them. The sequence of green and red corresponds to what they have into the score, but the traffic-conductor allows them to predict what is going on: how many car drivers will have to stop in a couple of seconds? How quickly should do they restart their cars? and so on. (But it's safer to make experiments with the orchestra rather than with traffic and cars...)

I take the point about the extra information a simple traffic light would be unable to provide. But if you need that, then you need it in any case. Isn't any necessary hypergesturality embedded within arm movements - which, by the way, requires performers to appreciate a (dynamical, say) syntax arguably orthogonal to that of timing conveyed by the (hypo?)gesturality - just another signal which *could* be conveyed via another channel? The staccato example is useful (to me!) since properly prepared performers really ought to have got that more directly from the score anyway (a prior, training, channel)? But - obviously - not everything can be scored, and prompts/reminders are always helpful! A conductor is not merely a timekeeper.

To continue the analogical fun, especially wrt safety, had cars been invented today we would not allow humans anywhere near being in a position to 'conduct' them.