The term “portamento” has been used in music for centuries to describe the effect of gliding a note of one pitch into a note of a lower or higher pitch. The effect can only be achieved by instruments that can continually vary in pitch, such as the human voice, string instruments, and trombones.
An MIT student has developed a new algorithm that creates a portamento effect in real time between any two audio streams. In tests, the program blended different audio recordings perfectly, such as a piano note gliding into a human voice and one song melting into another. At the recent International Conference on Digital Audio Effects, his paper describing the method got the “best student paper” prize.
The technique is based on “optimal transport,” a geometry-based framework that calculates the most effective methods of moving objects (or data points) between numerous origin and destination configurations. The framework, which was created in the 1700s, has been used in supply chains, fluid dynamics, picture alignment, 3-D modeling, computer graphics, and other areas.
Trevor Henderson, now a graduate student in computer science, used optimum transport to interpolate audio signals, or mix one sound into another, in work that began as a semester project. The program divides the audio signals into short parts initially. Then it determines the best way to connect the pitches in each segment to pitches in the other signal, resulting in a smooth portamento effect. The program also uses specialized approaches to keep the audio signal’s fidelity while it transforms.
Henderson, a classically trained organist who performs electronic music and has been a DJ on MIT’s radio station WMBR 88.1, explains, “Optimal transport is applied here to identify how to map pitches in one sound to the pitches in the other.” “For example, if you’re transforming one chord into another with a different harmony or more notes, the notes will break from the first chord and find a location to fluidly glide to in the other chord.”
This is one of the first ways to apply optimum transport to altering audio signals, according to Henderson. He’s already used the algorithm to create equipment for his radio broadcast that effortlessly transitions between tracks. During live performances, DJs could use the equipment to segue between tracks. Other musicians might utilize it on stage or in the studio to combine instruments and voices.
Justin Solomon, an X-Consortium Career Development Assistant Professor in the Department of Electrical Engineering and Computer Science, is Henderson’s co-author on the study. Solomon is a member of the Center for Computational Engineering and leads the Geometric Data Processing Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He also plays cello and piano.
Henderson enrolled in Solomon’s 6.838 (Shape Analysis) program, which requires students to apply geometric concepts such as optimal transport to real-world problems. The majority of student projects rely on 3-D shapes created by virtual reality or computer graphics. As a result, Solomon was taken aback by Henderson’s concept. According to Solomon, “Trevor recognized an abstract relationship between geometry and moving frequencies around in audio signals to create a portamento effect.” “Throughout the semester, he came in and out of my office with DJ equipment. It wasn’t quite what I was expecting, but it was quite entertaining.”
It wasn’t too much of a stretch for Henderson. “I ask myself, ‘Is this related to music?'” he says when he sees a new idea. “So, as we were discussing optimal transport, I was curious what would happen if I related it to audio spectrum.”
Finding “a slow way to build a sand castle” is a smart approach to think about ideal transportation, according to Henderson. In that comparison, the framework is used to determine the quickest route to transport each grain of sand from its current location in a shapeless pile to its new location in a sand castle. For example, in computer graphics, optimal transport can be used to change or morph shapes by determining the best path from each point on one shape to the next.
Applying this theory to audio clips necessitates the use of specific signal processing concepts. Depending on the instrument, sound is produced by vibrations of component vibrations. Strings are used in violins, air is used in brass instruments, and vocal chords are used in humans. These vibrations can be recorded as audio signals, with varied pitches represented by frequency and amplitude (peak height).
Traditionally, a fade is used to transition between two audio streams, with one signal decreasing in volume while the other increases. Henderson’s technique, on the other hand, smoothly transitions frequency portions from one clip to the next without volume fading.
To do so, the program divides any two audio clips into 50 millisecond segments. Then it performs a Fourier transform on each window to get its frequency components. Individual synthesized “notes” are created from the frequency components within a window. The notes in one signal’s window will then migrate to the notes in the other signal’s window, according to optimal transport.
The “interpolation parameter” takes over after that. That value defines where each note will be on the way from its starting pitch in one signal to its final pitch in the other. The portamento effect is created by manually altering the parameter value, which sweeps the pitches between the two locations. A crossfader, a slider component on a DJ’s mixing board that smoothly fades between songs, can be set to control that single parameter. The interpolation parameter changes as the crossfader moves, creating the effect.
Two technologies work behind the scenes to produce a distortion-free transmission. Henderson began by employing an innovative use of a signal-processing technique known as “frequency reassignment,” which combines frequency bins into single notes that can easily transition between signals. Second, while stitching together the 50-millisecond windows, he created a mechanism to synthesize fresh phases for each audio stream, ensuring that adjoining windows do not interfere with one other.