We have all seen, at least once in our life, a juggler tossing balls in the air. Why is that so impressing at our eyes?

Despite having just two hands, any respectable juggler can **juggle** three balls at the same time. Considering for simplicity that one can handle one ball for each hand, how is that possible?

Let's try to analyze Animation 1. We can see that each ball is tossed by one hand to the other: the right hand tosses the balls to the left hand and vice versa. Just as the floating ball floating is about to fall down, the juggler tosses another ball up to free his hand and catch the falling one. Juggling three or more balls is possible only by iterating this principle.

The **pattern** represented in Animation 1 is known as three-ball *cascade*. Let's analyze now Animation 2 and compare it with Animation 1.

In this case we immediately note that the number of balls is still 3, but the pattern is different. Indeed, by observing it carefully, we see that the juggler tosses the three balls at three different heights.

As you can easily imagine, there is a wide variety of patterns and, if we were to assign a name to each pattern (as in the case of the *cascade*), we would have to make a prohibitive effort of memory.

For this reason Paul Klimek and Don Hatch, at the beginning of the 80s, independently invented a notation system to describe and name juggling tricks nowadays called **siteswap**. Afterwards, this system has been developed and extended by other jugglers, like Bruce Tiemann, Jack Boyce and Ben Beever.

Siteswap is able to describe (and name) all juggling patterns with any number of jugglers and balls, covering both the case of *synchronous* and *asynchronous* throws. In the following two animations we can see the same pattern done in both the asynchronous and synchronous versions.

(NOTE: some patterns can be only asynchronous while others can be only synchronous).

For simplicity, we will describe the so-called **Vanilla siteswap.** This siteswap notation allows us to describe all the patterns where the balls are tossed asynchronously by a single juggler using both hands.

Before going through the description of this notation method, we must underline that siteswap has a limitation. Let's observe the following two animations.

We have seen already the left-side animation: the three-ball cascade. The right-side pattern, known as three-ball *Mill's Mess*, is still a cascade but it's done by crossing and switching the hands' position alternatively. Even though the two patterns look very different, they have the same siteswap notation, i.e. they are identical. Indeed, if we focus on the trajectories of the balls with respect to the positions of the hands, we see that *Mill's Mess* is identical to the normal cascade.

Therefore, siteswap is able to describe juggling patterns by considering the height and the direction in which the balls are tossed (a ball can be tossed to the same or to the other hand) but without considering "how" the pattern is executed.

After this quick introduction, we will now describe how siteswap works. The basic idea is very simple: we assign a positive integer number to each throw that corresponds to the number of **beats** (soon we will deepen this concept) that the ball takes to complete his trajectory. We use odd numbers (**1**, **3**, **5**, ...) for throws from one hand to the other hand and even numbers (**2**, **4**, **6**, ...) for throws from one hand to itself. The number zero (**0**) is used to indicate when one hand is not holding balls during a beat.

In other words:

- A
**0**means a beat when the hand is empty. - A
**1**means a direct throw from one hand to the other, during which there is no time to catch or throw other balls, i.e. it is executed in one beat. - A
**2**means a very small throw (almost imperceptible) of a ball to the same hand. While the ball is completing its trajectory, the hand who tossed it has no time to do anything else while the other one has a beat to catch and throw another ball. - A
**3**means a throw from one hand to the other during which both have a beat to juggle a ball each (so there is time to juggle two other balls). - A
**4**means a throw from one hand to the same hand during which the tossing hand can juggle another ball while the other hand can juggle two balls (so there is time to juggle other 3 balls). - ...

Therefore, the numbers indicate the height at which the balls are tossed relatively to the execution speed of the throws. Indeed, it is possible to toss a **5** with top height under our head if we juggle quickly, or over 3 meters if we juggle slowly. What really matters are the beats left to juggle other balls during the trajectory of the toss. This depends, of course, by the speed of the juggler.

Furthermore, as it is easy to guess from the animations above, the patterns are repeated cyclically. In other words, there is a **period** after which the pattern is repeated (identically or symmetrically). With the siteswap notation we only write the throws that identify the period of the pattern. For example, the period of **531531531** is **531**. We refer to it as **531** by removing the redundant part and without loosing any information.

Once the concepts detailed above are clear, we can try to recognize some patterns:

Once we are familiar with the concept of siteswap we can go through a little bit of theory. Let's try to imagine the pattern **432**. First, say with the right hand, we toss a **4**, i.e. the ball will falls in the same had. Then we toss a **3** with the left hand, i.e. the ball will fall in to the right hand. While the two balls are still completing their trajectory, the right hand executes a **2**, in other words it performs a small toss to itself. What will happen is that the right hand will find itself with three balls falling on it at the same time. In siteswap jargon this event is called *collision*, and the pattern is impossible to repeat. Indeed, the sequence **432** is not executable.

How can we distinguish an executable sequence from a non executable one? Fortunately maths comes to the rescue! Indeed, there is a theorem that characterizes siteswaps and gives us a condition such that there are no collisions.

**Characterization theorem of siteswaps**

A finite sequence of non-negative numbers (where is the number of digits) is executable if

for each .

Here, the operator returns the remainder of the division

Let's come back to the previous example and verify, using the theorem, that the sequence **432** is not valid:

In this case we get 2 for every digit of the sequence and, according to the theorem, this is not a valid siteswap. We now try to apply the theorem to a valid siteswap that can be obtained by switching the last two digits of the sequence above: **423**

It is clear that this siteswap respects the condition imposed by the theorem (and you can actually find it in one of the animations above).

Suppose now to have a valid siteswap, for example **534**, How many balls do we need in order to execute it? Again we have another nice and helpful theorem used by the jugglers from all over the world.

**Theorem on the number of balls**

If is a valid siteswap (where is the number of digits), then we have that

Let's try how many balls we need for the **534 **pattern:

The answer is 4 balls!

Do you think that those patterns are science-fiction? Try to watch the following video by Ofek Snir, a great juggler that executes (among other stuff) some very hard siteswap with 7 balls.

As already specified above, this article only talks about the *Vanilla siteswap*. Actually there are also siteswap notations to represent other categories of patterns such as synchronous, the patterns where one hand can hold and toss more than one ball at time (in jargon **multiplex**) and the ones executed by more than one juggler (in jargon **passing**). Here are some examples

PANGOLIN (Ground Pangolin, manis temminckii)

A pangolin curled up in the defensive position:

When the pangolin is frightened, it curls up, becoming a sort of armored ball that the predatory animals are not able to open, but easy to be caught by poachers.

The pangolin is characterized by a strong scale armor that makes it look almost like a small dinosaur. The pangolin is too little to run and too big to hide, thus nature gave it the armor. The particular armor of the pangolin can guide us to the discovery of some basic concepts of category theory.

Let us consider one scale.

Then, two scales.

The repetition of two scales is defined by a transformation that we call g:

The scales are our “objects,” and the arrows g are our “morphisms.” If we compose several g-arrows, we again obtain scales, as the following image shows.

If we don’t make any repetition, thus, if we move from a scale and we apply the arrow “1” that gives us again the same scale, we just defined what mathematicians call an “identity.” Thus, already the very first observation allows us to define the category “scale,” with its objects, arrows, identity-arrows, and arrow composition.

The whole image of the pangolin in its closed-defensive position is much more complicated. We can try to add new “transformations” to build such a whole image.

After the “horizontal” composition, we add a vertical composition, given by the “vertical” repetition of scales, through the arrow h.

In this way, by combining g and h, we can build several rows of scales, the ones partially superposed to the others. However, by looking at the real image of the pangolin’s armor, we can see that the rows are offset as if we introduced a little shift in the even rows. Let us indicate such a transformation with Sh (where Sh stands for shift and not for ‘s composed with h’):

We obtain all the scales by repeating N-times the action of g and h. To obtain a whole image that is more realistic, we can modify the “general shape” of the image via the arrow I:

Finally, we “close” the obtained figure, to imitate the image of the defensive-position curled-up pangolin’s back. Here, we use letter L which stands for Loop.

Remaining within the category “scales,” we reconstructed a complex image through a sequence of progressive transformations.

If we keep working with a category-typical operation, we can take such a collection of objects (points) and morphisms (arrows), and we can apply this to something else. It is like… to have some bridges, from an initial to a final point (to be opportunely shifted into a new initial and new final point), and we can transform each bridge into another bridge: we would create a “bridge of bridges”! The concept of “transformation between transformations” is, in fact, the “primum movens” that gave birth to Category Theory in the Forties of last Century.

Now, let us try to apply all of that to music. We can move from the category “images” to the category “musical fragments.”

There is an endless number of ways (mappings) to transfer a visual shape into a musical structure (in general, a set of non-sound data into a sound data). We are in the field of “sonification.”

In our analysis, we can choose a melody that imitates the upper contour of the scale, with a raising and lowering movement. Is that a case that, in English, the term “scale” applies both to music as well as to a part of the armor?

Now, let us apply to music, one by one, each transformation we could apply to images. Our priority is to opportunely translate each arrow from the visual field into the sound domain.

The horizontal repetition g could be ‘translated’ in a time repetition (g’) of a musical fragment; the vertical repetition h becomes a simultaneous repetition of the same melodic fragment at different pitches (h’); the spatial shift of even rows becomes a time shift of even melodies.

There are several ways to musically render the transformations I and L. A possible choice consists in the assignation of an intensity change in correspondence of the change of shape (I’), and a cycle of repetition of fragments and melodies with L’. We get a series of musical fragments (objects) connected by transformations (the arrows g’, h’, Sh’, I’, L’). The composition of two or more arrows still gives musical fragments, and an arrow that does not change anything and returns a musical fragment that is identical to the initial one is the identity. In this way, we build up the category of musical fragments.

We obtain the same result if we separately sonify each image, and also if we sonify the initial one and we successively apply the musical transformations. For such an invariance of results, we obtained diagrams that are called “commutative.”

The transformation that brings each image into a musical fragment (red arrows), and each visual transformation into a musical transformation (green arrows), is the “sonification functor.” In category theory, a functor is a mathematical entity that transfers objects and morphisms of a category into objects and morphisms of another category. It is a sort of… hyper-bridge between bridges and islands!

Let us call S_{1} the functor we just defined: we have g’ = S_{1}(g), h’ = S_{1}(h), …, and so on.

Let us now suppose to have another sonification functor, S_{2}. Category theory also allows here to define what changes between the melodic fragments we produced with S_{1} and the melodic fragments we produced with S_{2}, if we restrict the analysis to the symbolic point of view. These transformations are called “natural”: they are transformations α defined from S_{1} to S_{2}. All said above can be summarized in the following image.

Via the tools given by category theory is then possible to create music, and also to analyze written music—by “decomposing” the structures as we did for the pangolin’s armor—as well as to analyze basic and advanced elements of musical practice. An example for all: a crescendo at the piano, from ‘piano’ to ‘forte’ can be described through an arrow; a slower crescendo and a faster one can be represented as two arrows between the same points, connected by a temporal transformation (an arrow between arrows.)

Summarizing, the particular example we considered can also be interpreted as a structure to compose, program, and also as a scheme to improvise music. The following link leads us to an improvisation whose unique “score” is constituted not by notes to play, but by transformations to apply.

(Idea, study, schemes, and drawings by Maria Mannone)

**Selected References.** A very clear textbook to start the study of category theory is “Conceptual Mathematics” by Lawvere and Schanuel. For a deeper reading, we suggest the classic textbook “Categories for the Working Mathematician” by Sounders Mac Lane, and, for an interdisciplinary view on the topic, the reader can examine “Categories for the Sciences” by David Spivak. The recent literature on mathematics and music about the applications of category theory to music includes the works by Guerino Mazzola, Franck Jedrzejewski, and Alexandre Popoff. The Italian book “Le Figure della Musica” by the composer Salvatore Sciarrino, even if not explicitly talking about category theory, highlights the importance of concepts of elementary mathematics to compare structures and transformations between music and visual arts. The works by Lawrence Zbikowski concerning the relationships between music and movement from a cognitive point of view can find in category theory a formal explanation, that graphically describes the “transformations of transformations.” The philosopher Charles Alunni uses diagrammatic thinking in his works, and the physicist John Baez, an expert of the topic, administrates an interdisciplinary blog, considered as one of the most important references for category theory. My studies are about applications of category theory to the orchestra, structures in composition, and relationships between music and images.

]]>

I've always been amazed by the beauty of nature and its wonderful patterns: symmetries, spirals, meanders, waves, cracks or stripes.

At the very beginning I started creating artworks with basic geometry and fractals, but later I discovered the possibilities of using randomness, physics, autonomous systems, data or interaction to get more expressive and meaningful artwork.

My goal as a generative artist and experimental animator is to create autonomous systems that make the essence of nature's beauty emerge by modeling not only its appearance but its behavior.

Mathematics, physics and computation are essential tools to implement a set of rules that an autonomous system must follow to simulate nature's appearance and behavior.

When I started this project I had in mind so many different ideas related to generative art and creative processes, and I was willing to explore them. But I had no idea what path to follow. "*Spaghetti coding*" is a pejorative phrase in programming world to refer to a piece of code that has a complex and tangled control structure. So basically I called the website "*Spaghetti Coder*" because it just reflected my state of mind at the beginning of the project.

"Spaghetti coder" is a creative and experimental art project inspired by generative artists from the past century such as Ellsworth Kelly, Sol LeWitt or François Morellet, but using not only chance as a main generator but physics, agents, data or interaction.

I got inspired by the essence of pioneers in Generative Art and I'm trying to evolve that conceptual vision using nowadays tools (programming, media, data, IOT) and exploring how it suits in experimental animation world.

Depending on the nature of the artwork I'm developing at the moment, I use many different mathematical tools: trigonometry, vector spaces and calculus, operators and matrices, densities and distributions, combinatorics, graph theory ...

And I usually write my code using programming languages such as Java, Processing, C++, openFrameworks and GLSL to generate visual products (drawings, animations, 3d graphics), or SuperCollider and Chuck to compose algorithmic audio (soundtracks, audio effects).

When I use different programs that work together or there's some kind of user interaction through hardware (MIDI controller, instrument or IOT device), a communication protocol is required. Then I frequently use OSC, MIDI or DMX protocols.

AAAC (An Autonomous Agent Choreo). from Spaghetti Coder on Vimeo.

I think I was able to make some kind of rhythm emerge through the motion and sound of +5.000 autonomous agents and in the end it suggested many interesting variations in the global perception of basic elements of design (color, shape and texture). I'm also really happy with people's feedback and how AAAC worked in animation festivals. It has been screened at many of them: CutOut Fest, SIMULTAN, Tasmanian International Video Art Festival, Seoul International Cartoon & Animation Festival, Azores Fringe Festival, Bogotá Experimental Film Festival, Punto y Raya Festival...

The only experience that is worse than the annual condo meeting or queuing up at the post office is, probably, the parent-teacher conference.

It's an ordeal for the parents, forced to wait a long time. It's cause of panic for the students, who are afraid that their parents may be mad at them. But, I assure you, it's a terrible experience for the teachers too.

Obviously, since bad marks in maths are widespread in Italian schools (but the same can be said of schools all over the World), queues to speak to the maths teachers are never-ending, exactly like queues at the post office. In such situations, the long waits result in the best and worse behaviours on could possibly imagine.

As an experienced teacher, I tend to split parents into three categories: the strikers, the compliant and the flatterers.

The strikers are the funniest ones. They sit down, ask you how their child is doing and, before you have even finished to speak, they charge. They do it recklessly, without thinking, coming up with completely made up arguments like:

- "Last year, with the previous teacher, my son had good marks" (Actually he had E, with me at least he's got to C)
- "But, in elementary school he was great" (Sure, but that was 12 years ago, now we're studying limits, you may agree that that is somewhat more difficult)
- "I am an Engineer and I know what I'm talking about. My son told me that the results were correct. I don't understand why you gave him a D" (that's correct, too bad that he copied the results without showing how he got them. How did he get them? Was it some sort of godsend?)
- "They guy who gives private lessons to my son says that, with him, he always solves the exercises." (OK, what should the poor guy say other than your son can solve the exercises... with his help?)

The flatteres are, instead, the most dangerous ones. They always start with something like:

- "My daughter said that, thanks to you, she finally gets maths. She couldn't get it with last year's teacher." You would like to say: "Look, you have probably said the same thing to the other teacher when you were talking about your other child. Also, too bad that last year's teacher was me.", but you just shut up and pretend to be playing the game. The parent, in the meantime, scrutinizes you, trying to understand whether he fooled that idiot of a teacher or not.

- "Your explanations are excellent", they go on, "and my son is working really hard. It's a shame that last exam didn't go so well. But I'm sure that if you could help out with the grades, he would get back on track. It would be too bad if he lost motivation given that you're such a great teacher." And then the may speak ill of the other teachers, expecting that you'd support them against the guy who teaches philosophy or the one who teaches sciences.

With this lot, I usually keep a low profile. I don't expose myself to their trap and let them talk. Not much, but I do let them talk. Luckily the fact that there's a queue of people waiting to speak with me is a good excuse.

The compliant parents, instead, are the ones you try to plead for compassion.

"My son's going through may health problems, you know. He finds it hard to eat and has been so weak lately." And you say: "Are you talking about Alessandro, madam? The 6'4" tall guy who plays football four times a week with the semi-pro team?"

Then, after explaining the most improbable health and family problems, and saying that their child have experienced many different teachers across their school years, the compliant parent plays what they think is the ace up their sleeve:

"Anyway, sir, we can't expect much from my son. Nobody in the family really understands a thing about maths.

Now you would like to tell them:

*"Luckily, neither stupidity nor not understanding maths are hereditary. Actually, your son has many chances to get better."* But clearly you don't say that and try to encourage them reminding them that it takes time to catch up with maths.

In the end, this is the problem: we live in a society where it's not a problem to say:

"I never understood maths."

Many are even proud of this. How do they not understand that by saying that openly, what they're actually saying to their children is:

"Give up. Maths is useless. I never studied it and yet here I am."

The problem is with the parents (and in some of the humanities teachers) who candidly say that they do not understand maths. **They don't realize that they are indirectly giving permission to their childred to not care about maths!**

When this happens, a teacher has almost powerless. As long as family and society will take it as granted that one can live with no maths, and that maths is not part of a good citizen's cultural baggage, this is a hard battle to fight.

You will always meet students who say "Sir, I never got maths" and they will feel justified by the society.

But the parent-teacher conference come to an end too, sooner or later. At the end of the day, the maths teacher goes back home a little beaten up, a little dizzy because of all the words that have been spoken. He thinks with some melancholy about those "normal" parents (there are some obviously) with "normal" children (what a blessing!).

Next day, you start again your classes hoping that you'll convince your students to change their minds, they who are so sure they don't get maths. You start again your classes, trying to help them love as much as they can the topic that you yourself love (thanks to some other good teacher).

P.S.

I am sure that in a few years, when I will attend the parent-teacher conference as a parent, I will fall in one of the categories I've been talking about. There's nothing you can do about it, being a parent is really hard... harder even than mathematics!

(translated by Stefano)

]]>

We can consider, for now onwards, a metric space. By my previously post it is clear that a metric space is the only one in which we can define a distance. If we use an Euclidean metric is simple to prove, but also to understand in an intuitive way, that the shortest distance between two points is a line that directly goes from the first point to the second one.

Now we try to repeat the same with a different metric. We can use a taxi driver metric; with this metric we can draw the minimum length curve between two points. In the following picture the minimum length curve is the line crossing the minor number of blocks.

The first important concept we can appreciate from the picture is that the minimum length distance is not univocal in some metrics. In some metric spaces more than a single minimum length curve is allowed.

Everyone can draw a straight line across a paper, probably it depends on how good you are in drawing. Given two points we can draw the line from one point to other one. Our brain is used to think the outstanding space in an Euclidean metric so we can figure out a segment connecting two points as a piece of straight line. It is so easy.

In some metric spaces a straight line is not a straight line in the sense we are usually to think about it. In some other metric spaces a straight line doesn't exist. In math, in this strange and fantastic world, a straight line has a definition. If in a metric spaces exist a minimum length line connecting two points and if this line is unique it represents a straight line for that metric space.

According to the above definition in the taxi driver metric space there isn't a straight line (the minimum length line is not unique) and in the Euclidean space it is the so common straight line we are used to draw.

In the end we can figure out a particular space made by only 4 lines. The man inside the space can see only the four lines and can move only along one of them. No other paths or lines or space or points are accessible by him.

In this case the red line represent the minimum distance path between the man and the final point. For the man the red line represent a straight line between the two points. Surely it is not a straight line from our point of view but it is due to a better space concept and to the possibility of observing the space from the outside.

In the end, we can conclude saying we are in the same condition of the man described above. We are cheated by our limited space perception. To trick us is more easy than we can think.

You are walking on train rails, straight on; in front of you the way and you walk in the same direction. Are you moving along a straight line? I let you think for few seconds... 1... 2... 3...

No, you are not moving on a straight line. We are walking along a minimum distance path between two points on the Earth surface.

This is possible because we perceive the earth as flat. We are like points in a 2D space. Our height is really small if compared with the earth radius so we are unable to perceive the earth curvature. All we perceive is a bidimensional space around us.

When we move along earth surface we don't perceive the fact we are moving along a curve path. Why? Simply becouse the path we are following is the only path we can see. Surely we can see other paths between two points but they are longer than the one we are moving on, so this path is effectively the straight line between two points for us.

The true concept of Euclidean distance isn't applicable to the path we follow on earth surface. When we move along the earth surface we can measure the purple path as distance between two points. The real Euclidean distance is represented by the red line (see figure above).

The difference between the two lines is as greater as the points are far away. If we move from one end to another of a football stadium the distance we travel is small if compared with earth radius. In this case the purple distance and the red one are really close each other and we can consider them as the same distance.

If the difference between the distance on earth surface and the Euclidean distance is neglectable for small distances it is not in the case of a flygth from Rome to New York.

Looking at a planisphere we can see Rome and New York are almost on the same parallel. Intuitively we can think the shortest path between the two cities is following the parallel on which they are located. Another time the perception of the space tricks us so we are wrong again.

On earth surface the shortest distance between two points is called geodesic. The geodesic is obtained intersecting the earth with a plane passing for the two points and the earth center. The consequence of this is the parallel (except the equator) aren't geodesics. The meridians are geodesics instead. An airplane flying from Rome to New York don't follow the parallel but the geodesics that pass over the two cities (see figure above).

The concept of geodesics was introduced by Georg Friedrich Bernhard Riemann.

When in its books of geometry Euclide try to set up the bases of modern geometry, he fixed some definitions. These are indemostrable but are so intuitive that they don't deserve a proof. They are called postulates or axioms. They give us a basic definition basic geometric entities (points, lines, planes, circles, angles).

The most interesting postulate is the fifth. It say, more or less, that given a line and a point out of it, then these exists only one line passing for the point and parallel to the first line, for this reason it is also called postulate of parallel lines. For a lot of time some mathematics tried to deduce the fifth postulate from the other four because it did not seem very intuitive.

For a moment we can think about the straight lines are in the euclidean definition. The shortest line between two points. On the earth surface these lines are the geodesics. We can then try to apply the fifth postulate to the earth surface. We can draw a geodesics, keep a point on earth surface, not on geodesics, and draw a geodesics passing for this point. The two geodesics we have drawn intersect each other in two points. If we repeat the procedure for all points on earth we can find that given a line and a point not on it, there is no line passing for the point and parallel to the first line. The fifth postulate is wrong.

The hypothesis of the falsity of the fifth postulate give us another geometry. The elliptical one. In this geometry the first four postulate of Euclide are ture, the fifth isn't. Every theorems and consequences of the first fourth postulates is true again but aren't the theorems or consequences of the fifth postulate. We have a strange and not intuitive geometry but a valid geometry. The elliptic geometry is really relevant in all problems in which the distances over the earth surface are not neglectable respect to the radius such as navigation, orientation, positioning and long distance travelling.

In the end there exists another way in which the fifth postulate can fail. Given a line and a point there exists infinite lines passing from the point and parallel to first line. Also in this case we obtain a different geometry. It is consistent and valid and is called hyperbolic geometry. It was postulated by Lobachevskij. This type of geometry with the add of the time give the Minkowski geometry that is the base fo relativity theory of Einstein and is used to describe the behavior of the Universe over large distances.

]]>"The illusion of mind" is a very interesting topic but so hard and subtle to treat.

- In this post, I do not want to deal with such issue from a psychological point of view and, indeed, I have no knowledge to do it.
- My aim is to play: in several circumstances human intuition is completely wrong.
- Any resemblance to real events and/or to real persons is
*not*purely coincidental. - The topic "Paradoxes in mathematics" has already been introduced by Francesco (in his post you can read some interesting examples).

Let us start with a nice result suggested by Camilla and Ludovica Pisani. The following is one of the most famous paradoxes in mathematics: while reading one realizes that intuition and probability make us give different answers to problems. I invite the reader to carefully follow the steps of the proof and find the mistake!

A bag contains 2 counters, as to which nothing is known except that each is either black or white. Then, one is black, and the other is white.

We know that, if a bag contained 3 counters, two being black and one white, the chance of drawing a black one would be $2/3$; and that any other state of things would not give this chance.

Now the chances, that the given bag contains (i) BB, (ii) BW, (iii) WW, are respectively , , . Add a black counter.

Then, the chances that it contains (i) BBB, (ii) BBW, (iii) BWW, are, as before, , , .

Hence the chances of now drawing a black one,

Hence the bag now contains BBW (since any other state of things would not give this chance).

*Hence, before the black counter was added, it contained BW, i.e. one black counter and one white.*

It is based on the American television game show Let's Make a Deal and named after its original host, Monty Hall.

Suppose you're on a game show, and you're given the choice of three doors: behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then asks you: ``Do you want to pick door No. 2?"

Is it to your advantage to switch your choice?

One would answer, following intuition and forgetting the past that since one can choose between two doors, then the probability of choosing the right one is .

*This is wrong! If you change, the winning probability increases to !*

It concerns the probability that, in a set of randomly chosen people, some pair of them will have the same birthday. However, probability is reached with just people, and probability with people! These conclusions are based on the assumption that each day of the year (except February 29) is equally probable for a birthday.

The first time I heard about this, I was completely shocked but then we (my professor and my university colleagues) computed the probabilities and it came out to be true.

However, to find other interesting paradoxes like the ones mentioned above, you just need to google... for instance: ``doubling the ball" (Banach-Tarski paradox)

and ``The barber of Sevile":

]]>Hi, everyone. Today we’re going to speak about pizza!

Yes, I’m not crazy! I perfectly know it’s a blog on maths!

Indeed, we want to talk of geometry of pizza and why there are some people who prepare it round and some who prepare it squared…who is earning money from this?

But the real answer I want you to propone today is: do you prefer a round or a squared pizza?

I imagine that the average reaction is another question: “What’s the difference? It’s pizza!”

The difference is huge and I’ll show what it is. Have fun!

**Dido and oxhide**

There are plenty of mathematical problems called *maximum/minimum problems*. In general they can be solved in many ways but I have no intentions of explaining the general theory. Some of you probably knew about it from school or at University. To people who reads it for the first time: “don’t be scared!”

In a few words, we try to understand how to minimize or maximize quantities under certain circumstances. An example from history: once upon a time there was Dido, Tyre’s queen, arrived in North Africa at the court of King Iarbas as a refugee. Iarbas decided to give her as much and as could be encompassed by an oxhide. Apart from the strange offering, Dido made lots of strips from the oxhide and made a very tiny rope. Finally, she used it to cover an huge land, the land in which Cartago was founded. Now, the question is: what’s the best geometrical form, having a rope, to obtain the maximum area?

In this example, the question is equivalent to: **having fixed the perimeter, what’s the figure with maximum area?** The answer is: the **circle**!

Another example: in an opposite way, **fixing the area, what is the figure with smaller perimeter? **The answer is: the **circle** again!

Do you know how to show this fact? To simplify the problem let us consider regular polygons, that are polygons with equal edges. It is possible to show that, fixed area A, the perimeter of a regular n-gon is: <<formula>>

So, this is a very well known formula, don’t you know it?

We see that the perimeter decreases as n increases, so we could say that the minimum perimeter is when there are an infinite number of edges…so? The circle!

If you compute the limit as n goes to infinity, you get <<formula>> and it’s the perimeter of a circle of area A!

So, we answered the question!

**What a Pizza!**

Come back to Pizza! We discovered that, fixed area, a round pizza has less perimeter than any other shape. But…perimeter is a thing, border is another. The border is an area, not a line!!

So, now the real question is? Who has the minimal border? Who maximal?

Let us start writing some formulas. A is the fixed area, n is the number of the edges of the regular polygon. Let b be the measure of the border.

The n-gon can be decomposed in n isosceles triangles. The angle in front of the base is and the area of this triangle is , where is the apothem. Knowing that , we have that and so .

If the pizza was a circle of area , then where is the radius. Hence . Comparing and we get . That’s the relation between a and r.

Now, either to the n-gon and to the circle, we cut a border thick . We avoid to make all computations, the final formula for the internal areas of the little triangles are . So we get: .

For the circle, the situation is similar: .We compare now the two areas computing: . Using the relation between r and a, we get that the difference is positive if has a value between 0 and . The value , if , is greater than and, as n goes to infinity, it goes to .

This means that if b is less than the 87% of the radius, then **the internal area of the circle is greater that the n-gon’s one**, for all . **The converse holds for borders! **

So, the question “do you prefer a round or a squared pizza? ” has now an answer. Since the square is a 4-gon the answer is: if you like borders, choose a squared pizza, otherwise a round one!

**Conclusion (of the average man)**

With all those letters, we lost the gist. Let us fix some values.

An average pizza has an area of . It turns out that, if the border is between 0 and 17 centimeters, the internal area of the square is smaller than the circle’s one. Since the border is 1 or 2 cm thick, than our result holds!

If you’re not crazy/deviated/maniac/sociopathic/my-cousin/curious, it’s enough for today. See you next time!

**Conclusion (of an applied mathematician, of a economist, of a financial man and similar)**

There are other questions: if a pizza company makes round pizzas, how much money does it lose?

Indeed, having a greater area to cover with ingredients, it should spend more.

I made some computations: fixing and , the internal area of a squared pizza is and of a round pizza is . So a squared pizza has 1.5% less ingredients than a round one.

Simplifying, the pizza company would save the 1.5% of the price of ingredients making squared pizzas instead of round pizzas. This means that, if making 100 pizzas costs 400 euros (100 euros for bases and 300 euros for ingredients), supposing that he sells pizzas at a price of 6 euros, he would earn 600-400=200 euros.

Leaving the price to 6 euros, but making squared pizzas, he would save the 1.5% of the costs of ingredients. Hence he would save 4.50 euros. It’s not too much, but it’s only 100 pizzas! Do you want to save more? Make bigger pizzas!

**Conclusion (of a pure mathematician)**

What? Have I used numbers?

**Conclusive conclusion**

Well, this is the end! Sayonara!

]]>In a previous post we talked about Euclidean and other distances...

In that post we introduced the definition of Euclidean, taxi driver and infinity distances. Almost anyone of you remembers the minimum distance, at the end of the post we explained that the minimum distance is not acceptable as a measure because it does not fulfill the distance definition requirements.

We can summarize the requirements a distance need to fulfill in order to be acceptable. It should:

- be positively defined (greater than or equal to 0)
- be symmetric
- respect triangular inequality

Reading again the previous post, a certain curiosity to know more about the distance and the way it influences our life grew in me... In this post I will try to explain the concept of proximity between two geometrical entities.

For a moment we come back to the company and the biker of the previous post. We supposed that only one employee uses a bike to go to work, what happens if the number of employees that use bike to go to work is , big as you want. Everyone of these bikers has a certain distance from the company, and it doesn't matter which definition of distance we use, the important is that it fulfills the distance requirements. We have a lot of couple defined with the notation where is the company position in the space and on of the n bikers. For each couple we can define a number that represents the distance between the biker and the company.

In maths, the union composed by the company and the bikers is called set. If we add a distance definition to the set we obtain a metrical set. Metrical sets was theorized by Felix Hausdorff at the beginning of 20th century. Hausdorff, when introduced the metrical sets was studying a more general class of sets called topological sets. Metrical sets are a special class of topological ones.

In the previous paragraph I introduced in an easy way a very hard concept to understand. What is a topological set? A topological set is a set in which is defined the concept of proximity. Before introduce more complex concepts can be useful to keep in mind two questions, What does it mean proximity? What does it mean that two points are close? From an intuitive point of view (as often happens in maths, intuition drives men before equations...) we agree if I say that two points are close if the distance between them is really small. But... what is the definition of small? A mathematician could answer "as small as I want".

]]>Surely you're wondering how does Mathematics fit in Game of Thrones? Well I confess, I could use any other TV series to tell you what follows in this post, but honestly ... what is best than Game of Thrones (GoT)?

The connection to Mathematics is not so much considered in the series but in the type of data used. In particular one kind of data that has been used is the set of the comments regarding GoT posted on the IMDb site reviews to produce this infographic, on which I worked with my brother.

Firstly, very briefly, let us see what types of information were extracted from various comments (671).

The map on the left shows the major lineages; each one has been placed in its city of reference, indicating:

- The number of reviews in which the single lineage is appointed;
- The number of reviews that speak positively of the family;
- The number of reviews that speak negatively of the family;
- The member of the most cited lineage;
- The not mentioned member of the family.

Instead, in the chart on the right hand side appear:

- the performance of the nominated characters from each house in five seasons of the series;
- the most cited characters that are not attributable to lineages;
- the graphs of co-occurrences between characters and between lineages.

The part on the infographics which, however, I would like to focus attention is the one related to the word-cloud in the bottom right side of the picture, where the adjectives that have been mainly used in the reviews of a lineage are reported.

It needs to be explained how to determine whether a word in a speech has to be regarded as an adjective rather than as a preposition or article. And it is here that Mathematics comes to help.

To perform this type of analysis a **POS (Part of Speech) Tagger** has been used, namely a tool able to make grammatical analysis of a text. The POS Tagger taken into account is based on OpenNLP library, which is essentially based on the **Maximum Entropy model** that we will analyze in detail.

Before examining the MaxEnt algorithm, I would define the concept of entropy used here.

Most commonly we talk about entropy in the following areas:

**thermodynamics**: the entropy of a gas is function of its temperature, volume, mass and nature; it is a reversibility index: if there is no change, the process is reversible, otherwise it is irreversible (just think of the exchange of heat between two bodies, one hot and one cold);**statistical mechanics**: the entropy is a measure of the increase of disorder (inability to predict the position and velocity of the particles); the greater is the knowledge, the lower the entropy, and viceversa.

To these interpretations of entropy, one can be added and it plays a very important role in information theory (especially in the field of signal processing) and will be the way in which we understand it in our case.

Let's consider a source of messages . The amount of information transmitted from the message increases with the increase of the uncertainty of the product message. The greater our knowledge about the message produced by the source, the lower the uncertainty, the entropy and the transmitted information.

We formulate the concept of entropy, as introduced by **Claude Shannon** in 1948.

Let be a source and a signal emitted by . The information given by is called ** autoinformation** and is defined as

where is the probability that the event happens.

The entropy of a source is the expected value of the autoinformation, i.e. the average information contained in any signal from , in particular

if is a discrete variable

if is a continuous variable

Let be a discrete source, then

In particular the maximum of the entropy is when all the events are equally probable.

Let's see a simple example.

**Example 1**

Suppose to have a source that emits with probability and with probability .

Then

and the entropy is equal to if (*unpredictable sources*), while it is equal to if (*predictable sources*, i.e. it always outputs or always ).

The classifier Maximum Entropy is a discriminative classifier widely used in the areas of *Natural Language Processing*, *Speech Recognition* and *Information Retrieval.* In particular, it is used in order to solve problems of classification of text such as language detection, topic and sentiment analysis.

The Maximum Entropy algorithm is based on the **principle of Maximum Entropy** and selects the model that has the maximum entropy (as enunciated by Shannon) on the training set of all tested models.

Recalling Bayes' theorem (see here), the Max Entropy classifier is used when you do not have any information about the prior distribution of the data and it is not correct to make assumptions about.

Unlike the Naive Bayes classifier, the Maximum Entropy has not the hypothesis that the variables are independent of one another, which reflects the nature of the natural text where the variables into account are the words, that of course are not independent of one another since the grammatical rules of the language; moreover, it requires more time in training while providing more reliable results.

**Example 2**

Before going into the theory behind the MaxEnt, consider an example which clarifies from the outset what will be said in a formal way in the following.

Suppose we want to determine the grammatical form of the word "*set*."

The word "*set*" can take the following forms:

**Adjective**: "*He has a set smile*."**Noun**: "*Give me your chess set.*"**Verb**: "*They set a fast pace.*"

We collect a large number of examples from which to extract information to determine the decision-making model . The model we're going to build will assign the word "*set*" a chance to take a particular grammatical meaning.

As we don't have other information from the data, you can impose for our model:

There are several models that hold previous identity, including:

By analyzing the data set further, let's suppose to get other informations, such as every time the word "*set*" is preceded by a pronoun is a verb. This, added to the normalization condition, changes the possible chances, reducing the possible models.

The goal of the MaxEnt algorithm is to determine the model uniform as possible (maximizing the entropy), according to the information derived from the data, without making any additional assumptions.

We pass now to the explanation of the algorithm.

Consider a text-based document and have words to each of which corresponds to a particular tag (i.e. a grammatical part of the document: noun, adjective, pronoun, article, etc.). We introduce the concept of "*history*" of the word as the possible informations arising from the context in which is located and we indicate it with .

We make a small example to explain how you can understand the "*story*" of a word.

**Example 3**

Consider the sentence "*Today is a beautiful day*". The set of words is {*Today**,* *is*, *just*, *a*, *beautiful*, *day*} and we call "*history*" of a word the grammatical information of the previous and next word.

For example, for the word "*beautiful*"

= {feminine singular indefinite article - "*a*", feminine singular noun - "*day*"}

What we have to define is a stochastic model that estimates the conditional probability of getting a tag, given a particular "*story*" , namely is .

Then we follow the usual classification scheme, i.e. we build our model starting from couples of the training set, where is the "*story*" of the word and is the class assigned to it (the grammatical part of speech) .

Consider the probability distribution based on the sample

where is the size of the training set, while is the number of occurrences of pair in the training set.

We introduce the indicator function

and consider the features as variables for the construction of our model.

The average value of variable compared to the probability derived from the sample is

where clearly whether each pair of the training set has occurrence .

While the average value of variable with respect to probability model is equal to

We impose the condition that the average value of the model is limited to on the training set, i.e.

We now have so many conditions as the previous ones for each , which can be met by several models. To choose the best conditions, we use the principle of Maximum Entropy, by selecting the closest possible model to standard form (maximization of information) .

In particular, it shows that exists and a well established model that maximizes the entropy of a system with constraints.

In our case the problem is the following:

Determine such that

with

- .
- .
- .

We use Lagrange multipliers to solve it :

obtaining the solution

At this point we insert in the Lagrangian the values of and and get , maximizing the function that follows for which it is stated that (without proving it here):

- is the log-likelihood of the exponential model ;
- the maximum problem cannot be resolved analytically but by numerical methods (
*conjugate gradient*,*GIS - Generalized Iterative Scaling*), taking advantage of the regularity and the convexity of the function for each .

So, as made the POS Tagger training on a given set (training set), we proceed to make a classification on new words to test (test set). In our case the POS Tagger, already trained on a large enough data set, is used for the classification of words present in IMDb's review; discarded all the words that are not classified as adjectives, then we will assign adjectives to families based on the frequency in the review of a specific lineage.

Now, if you're not a fan of Game of Thrones, I repeat that it can also be applied to other TV series (House of Cards, The Walking Dead, etc.) or in completely different areas, like ... what we will see in my next post.

Until now there is a unique conclusion ... 24.04 the sixth season begins.

**Winter is coming ...**

In these days I and a collegue of mine had a conversation on a recent law proposed by Ségolèn Royal in French Parlament.

Shortly, this law would provide a monetary refund for those who go to work by bike; the refund is proportional to the distance covered every day.

After a moment, everyone of us started thinking at the main point of it; everyone knows that there are lots of choices of paths when going from a point A to a point B in a city and everyone knows that the paths have different lengths. Which of those paths is used for deciding the amount of money to be refunded? We agreed on the answer...the shortest one; and you, do you agree with us? Probably someone among my readers doesn't know there are a lot of way to calculate the shortest distance between two points. In this post we will talk about distances, the way used to calculate them and some basic concepts in topology and limits.

The easiest way to calculate the distance between two points is the Euclidean distance. The Euclidean distance is the first we learn at school and the one we are confident to. Imagine an employee that lives in an imaginary point A on a bi-dimensional plane. The company is situated in a point B on the same plane. The best choice to go to work for him is to move straight on from A to B. To draw it, we can fix two different points in the plane and connect them with a stright line: this is the Euclidean distance.

For a more detailed discussion suppose to introduce a reference system with the origin in a certain point O. In the reference system both point A and B would have two coordinates (we are using a bi-dimensional plane, in a three dimensional space such as the Earth there are three coordinates).

The formula for Euclidean distance between A and B is

really easy to figure out.

It is well known that the civilization improves and some houses are built on the spece between point A and B.

Our friend, now, has to change the way to go from home to work according to the streets available between the houses.

The employee will pass two houses in vertical direction (up - down direction on your screen) and three houses in horizontal direction (left - right direction) and this the shortest way from home to work and is surely grater than Euclidean distance... a lucky fact for the company who refund the worker.

The distance shown above is called taxi distance or Manhattan distance and is expressed by the following equation

This is a valid alternative to Euclidean distance in everyday life; for example when we drive we use the taxi distance to decide the shortest way to go from our position to the final destination.

But now we have a problem: in order to save some money, the company establishes to refund only higher distance between the vertical and the horizontal. In the case of our employee only the horizontal distance will be refunded (three houses versus two houses in vertical direction). This could seem a stretch of taxi distance but there is a refined equation to express this distance:

this is called distance of infinity.

At the end, there is another distance we can investigate: the minimum distance. It is obtained changing max with min in the distance of infinity; simply only the shortest distance is kept in account and is written as

As we have seen, there are lots of different distances between two points. We are more familiar with some definitions of distance than others but, each definition given above, is a valid distance in the physical word.

An entirely mathematical definition rises up some questions; What is the best way to measure a distance? How many different distances are there?

We will answer these questions in reverse order. With a little imagination we can suppose that there are infinite ways to define a distance. From a mathematical point of view every equation involving subtractions of coordinates of two points is a distance. After this answer, some of us are starting to have doubts the existence of a universal definition for distances.

It is almost totally true. We cannot define a unique distance definition but we can define some criteria a distance definition must be respect.

First of all, a distance must be "positive defined"; in mathematical linguage this means that the value of a distance must be greater tha or, at least, equal to zero but never negative. It sounds so intuitive and everyone agrees that a distance between two points must be positive or zero (if the starting point coincides with the arriving one). I want you notice we are not considering vectorial distance that is positive in one direction and negative in the opposite one.

The second criteria is an extension of first one; it underlines that the distance is zero only if the starting point and the arriving one coincide.

In this case a little deviation is necessary. Suppose our biker travels from his house to another and suppose this new house is in the same street of his company.

Now he has to run three blocks in horizontal direction and zero in vertical direction. If we apply the minimum distance the distance from A to B is zero, the biker could get a little angry for this; this is surely not true.So, we notice the minimum distance is not a good definition and cannot be accepted.

Switching back to our criteria; the third is the symmetry of the distance calculation. It can be written as

The distance from A to B is the same from B to A and this is intuitive too.

The last criterium is the most important one and is called triangular inequality.

Suppose our biker, before going to work, needs to leave his son at school; to figure out the scene we suppose the school is in the point C in the above figure. If C is on the path from A to B there is no distance increase and we can write

If the point isn't on the path from A to B (see figure above) the distance from A to C and then from C to B is higher than distance from A to B. In mathematical form it is

Putting all together we obtain the triangular inequality

With the triangular inequality we have completed the criteria and this discussion too. In the next post we will talk about metrical spaces and topology.

]]>