Who Owns The Codes? The Sound and Ownership of Music in the AI Age

Preamble

Early in 2024 I was asked to contribute a chapter to the IAEL’s book on AI in music. It was an honour to be asked, and a significant effort to deliver something that I felt would not waste the readers’ time by adding to the torrent of ill-informed texts about this curious subject. Curious, because at no point have I heard anyone complain about the paucity of supply in music; quite the opposite. As a supremely human endeavour, music sits uneasily within the framework of capital and production for profit. AI promises to remove the human, and the endeavour, so that the gains can all accrue to capital. Why we should give this idea the time of day I find baffling, and the knowledge that our elected representatives here in the UK are seriously contemplating lifting one of the great achievements of the Enlightenment – copyright in creative work – to grease the extruders is profoundly depressing.

I set off at a clip, and by the third delayed deadline and some excellent editing had delivered a shortened version of this text. Some of the shortening took out references to interesting music produced along the timeline of music’s interaction with systems and ultimately with computing, of which generative AI is arguably a continuation. The IAEL is now no doubt working on another publication of equal importance to the music industry, and I can with confidence recommend whatever they produce as I know it is the product of a very thorough, well informed, and humanistic process. So with broader interest in music and AI probably peaking here in the UK I am taking the opportunity to evade my editors and publish my first version.

Introduction

AI and music is a big subject. I propose here to discuss some of the concepts and foundational methods behind using AI to generate music. I will ask how we can know about the sources that have been reworked into new music by generative AI. Finally I will consider some of the impacts we will very likely see on culture and the music industry, including how we might need to rethink some of our ideas about creativity, ownership, and attribution.

AI is already used widely in music for other things, some of which are certainly interesting and useful. Recently AI extracted a John Lennon vocal track from an old mono cassette demo, and gave us an unexpected and unprecedented new Beatles single. And AI is smoothing out human fallibilities in pitching and phrasing on many new recordings, and helping deliver industry standard mixing and mastering. Assistive technologies raise very different questions however; we see far greater cultural, legal, and economic challenges when AI itself makes the music.

First the usual warning! IANAL – I am not a lawyer, but I do have over 30 years broad and active experience in the digital music industry. And for another acronym – TLDR – here’s where I think we will land. I certainly don’t foresee wholesale destruction of either musicianship or the economics of making records; the value of these is not just an affective waveform, but rather a complex bundle of culture and identity alongside the traditional music making and recording skills. And that suggests that we will find ways to recognise and reward human creativity when it uses and is incorporated in new computer generated music.

Sonification of Code

To understand how generative AI can make music we need to start with an idea of how music can be understood in terms of codes and rules. People have been using codes to generate music probably for millennia. Anyone who been through a western music education will have learnt many sets of rules, such as those which underlie fugue and counterpoint. Some prominent composers played with their understanding of rules and chance. One thinks of Mozart’s ‘Musikalisches Würfelspiel’ (Musical Dice Game), where a roll of the dice selected precomposed bars. Outside of the western music, traditional forms such as raga and gamelan are essentially heuristics performed by musicians. 

Instruments such as the barrel organ and player piano, which performed mechanical representations of music, were transformative, helping sophisticated music reach further than could have happened solely with trained musicians. It was not just that the machines could be operated by an untrained musician; they could also play music that was beyond human capability.  Rendering the mysteries of music as a sequence of pins or holes in a roll of card must have suggested the possibilities inherent in applying maths and geometry to expand what could be created. This characteristic was exploited to the full by Conlon Nancarrow in his Studies for Player Piano. For a fun example, track down Study for Player Piano No. 3a, which is, to my ears, remarkably similar to early generative, or AI music, featuring manic boogie woogie motifs.

It was the introduction of the computer that blew the lid off what had until the mid 20th century seemed the rather constrained field of mechanised music. The digitisation of music in western culture, as notation, chiptunes, and later as MIDI data, made the rules inherent in music much more observable. Advances in digital signal processing during the 1990s showed that audio files containing music could be analysed and musical information extracted, starting out with pitch, onset, and duration of sounds. Early computers could be applied to algorithmic composition, picking up on the idea of music as the product of heuristic or stochastic processes.

So this approach treats music as essentially a sonification of codes, and there are some quite wild examples. The fruitful interchange between modern classical practice and the computer music labs that started to be built in the second half of the 20th century looks almost like a ‘scene’. Alvin Lucier for instance mapped brain impulses to electronic instruments, producing a percussive soundtrack to perception, thought, and feeling. Another academic composer, Xenakis, was among the most strongly identified with computer software as a compositional tool. It’s an important distinction that Xenakis was not looking for creativity in the algorithms that assisted in his composition. Instead he considered the computer as a way to express through the medium of orchestral music the mathematical concepts he was familiar with. Later he developed software that could turn his drawings into sound, enabling him to use in composing the skills he had developed as an architect.

Beyond the Code: How Generative AI Works For Music

Contemporary applications of AI in music are much more sophisticated in how closely they emulate what we think of as ‘human’ made music. And they generate music that aces the Turing test. Given how good humans are at making and following rules, that’s perhaps not so surprising. But when we can’t tell whether the pilot’s hands have ever touched the rudder, that seems to me a step change.

Many aspects of our cultural and economic life have been disrupted by the rapid global development of digital and communication technologies over the last 35 years, and the academic field known as MIR – Music Information Retrieval – is no exception. The ability to make huge collections of recorded music available digitally, to apply powerful computing resources to analysis, and quickly share the resulting codified data sets, catalysed the explosion of ‘generative’ AI that we are now in the middle of.

To understand better the impact AI will have on music, the music industry, and music culture, it helps to understand something of how those black box algorithms can churn out tunes.  With sincere apologies to the scientists and researchers whose work actually powers these extraordinary advances, here’s a very brief and sketchy overview of how generative AI works for music.

The sounds and patterns we perceive in music can be extracted from a corpus of compositions and recordings, and expressed as a set of probabilities. Some of the basics are easy to codify – pitch, basic timing blocks such as beats and bars. Different instruments can be separated out, and their roles understood, which enables the more properly ‘musical’ features such as melody, rhythm, and harmony to be delineated. Higher level structures such as verse, chorus, bridge, etc., emerge out of further analysis. Commonalities between different works and recordings can be clustered, creating sets which are analogous to genres, styles, and modes. Hierarchical sets of features further define the differing eras and traditions, enabling automated differentiation between similar sets of instruments used in different ways.

These models can be thought of as the vocabulary and grammar of the language of music, but of course without the meaning that natural human languages carry. In data terms the music has been separated out into tokens, the sequential relationships between the tokens have been inferred and stored, and the inference used to improve subsequent performance of the algorithms. The software machines that do this work are known as ‘transformers’. Recent advances have been driven by the discovery that adding more parameters and more transformers hugely improves the models that can then be used to generate output. In some ways this is similar to the way computers understand images, as a grid of pixels; the image gains fidelity the more gradations in colour and brightness there are, and the denser the pixels are packed. At a threshold human perception loses the ability to see the degradation, and just sees the image. Similarly with generative AI, adding vast complexity nudges the performance up to the point where we humans simply can’t differentiate between ourselves and robots.

In AI music analysis and generation, as in other applications, this phase is known as ‘training’. Any corpus of music, which in the AI field is known as a ‘training set’, can thus be rendered as a set of descriptors, along with the probability that each musical idea or motif will happen at a given point in a piece. The result in AI terminology is called a ‘foundation model’. Prompt the foundation model with some terms that are strongly associated with a particular form or style of music and out comes what you might call a statistically probable fugue, or rockabilly song, or eurodisco track. Or a guitar part, or drum track, or string section to supply what would otherwise be provided by a musician.

All very interesting, but if audiences don’t rate the results the whole enterprise remains, literally, academic. Early computer music did indeed sound pretty inhuman. The micro-timings and weights in human performance, and the infinite timbral spectrum of the voice and acoustic instruments were not easy to emulate. Perhaps encouraged by the market’s reaction to his Ambient series of recordings, in the mid 1990s Brian Eno produced and released his ‘Generative Music 1’, using software created by Tim Cole & Pete Cole at their company SSEYO, and which they called ‘Koan Music Engine’. Their software emulated stochastic processes, such as the path a ball bearing takes down a chute. It lent itself naturally to contemplative, slowly developing soundscapes. Eno himself made the connection with the minimalist composers of the 1960s.

The huge scale of today’s foundation models, well beyond what even the most powerful computers could process in the 1990s, has opened up many more styles and genres of music to convincing generative production.

Owning the Codes

This asks some very fundamental questions about whether the creators of the music in the training sets have an attribution or ownership interest in the output of generative AI. It is as well to know what we could be able to do about that before considering what we should do. Tracing the output back to the inputs in the sets is technically difficult, but by no means impossible. Any piece of music in a training corpus can be uniquely identified using methods we have developed and applied in the digital music industry. A combination of technological tools such as file hashes and audio fingerprints can be used to build strong associations between the music and copyright holders. The nature of the analysis required to produce a foundation model maps very well onto a requirement to store and manage the provenance of each inference stored as a token in the model. And when the model is used to generate a new piece of music, the tokens invoked, and therefore the provenance of each element, can be logged.

This describes a situation where we could have visibility and control over each step in the process, from raw material to finished product. There are two aspects to this which would present a challenge in the real world. Firstly it is data heavy, and very dependent on trust in the parties and the systems involved. When very few entities have the resources to manage the processes involved in training foundation models this might not be insurmountably costly. As this capability spreads however it will quickly become impossible to identify who has done what with which music. And that’s the second challenge; we wouldn’t necessarily think it desirable to embed the kind of external accountability and surveillance into private computing systems that would be required to make an auditable supply chain.

One response to this could be to decide just to select reputable sources and rely on what they tell us. That is the approach Google is taking with its SynthID product. When its own music generator, Lyria, has output a new piece of music, an identifier is also generated and embedded in the waveform. When the audio is rendered for playback this identifier can be extracted. It’s robust enough to withstand compression and transcoding. But it clearly has no meaning outside of a Google world, so doesn’t help us much when we leave the ecosystem. Auditing Google seems tough too, one of those ‘quis custodiet’ type of questions.

Where we have no visibility or trust, either because we can’t or don’t want to embed traceability in our music production and distribution systems, there is another approach which might yield at least a reasonable degree of confidence that a source was in a training set. Just as the characteristics of sets of music are codified and stored by the foundation models in the generative AI process, the same process can be applied to the results. If the resulting models display a high degree of similarity it’s possible that the generated music was based on a highly correlated training set. That’s not quite the same as fingerprints on the murder weapon, but close.

Given the complexity and disproportionate cost of tracking, and some of the uncertainties, it doesn’t seem viable to me to use the same private licensing and accounting for AI generated music that we have developed for sampling, for example. And with complete supply chain tracking unlikely to be achieved, we would naturally look for some kind of collectivisation of ownership, with distribution rules to deliver fairness rather than absolute accuracy. This comes with note of caution, as this is a rapidly developing field of study and much remains uncertain or unknown. But if we can train the transformers to deliver an opinion much like a trained musicologist might, the same technology that generated the music could be used to apportion revenue to the creators of the training set. We can see from this how we can maintain a set of sufficiently identified works and recordings in a repertoire pool, and a set of usage data and statistical processes with which to share out revenue pools derived from AI generated works and recordings.

Speed and Unflagging Energy

If AI music makes some people nervous, it’s easy to see why. We’re very attached to our Romantic notions of creativity, and the myth of the inspired and lonely genius. Indeed, artistic copyright has its foundation in individualism, and the inalienable identification of the work with its author and of the performance with the artist. Mozart’s appreciation of his own genius was ahead of its time, as his letters painfully show. His miserable daily life was that of a court servant who refused to please his master.

It seems possible that our Romantic notions of creativity follow, rather than lead our investment of copyright in the results of creative efforts. The myth of the lonely genius seems to emerge alongside the introduction of copyright, perhaps to manage the risk of being accused of plagiarism. In an inversion of the political economy that made a pauper of Mozart, the most successful copyright creators have ended up with the wealth and status of their erstwhile aristocratic patrons, while somehow preserving their bohemian freedoms. Exclusive ownership in music turned out to be very much worth having.

Even the loneliest and most inspired Romantic composer leaned on formal systems, such as scales, harmony, regular rhythms and time structures, and larger musical forms such as the symphony. Most have studied music theory, covering systems and rules as well as precedent. Innovators such as Satie and Debussy were contemporaries at the Paris Conservatoire, and today there is a Conservatoire named after Debussy, which tells you which of the two was more of an iconoclast!

Contrast the most formal approach to composing and performing music with the speed and unflagging energy of AI, processing tune after tune in seconds. Then add the notion that the AI could at some point be able to call on an almost perfect and complete set of music theory, and millions of previous examples of human music. It’s a classic B movie plot – the machines are taking over, and all we can do is build more electricity generating plants and data centres to feed their insatiable maws. And, thanks to the combination of cloud computing and the internet, the music machines are universally accessible, at low cost, and already connected to global music distribution platforms.

A Synthetic Economy of Music?

It’s here, at the intersection of culture and the economy of music, that some anticipate an AI-induced tectonic shift. There’s a dystopian way of looking at this; synthetic artists flooding synthetic markets with synthetic music. To me this seems to misunderstand the nature and meaning of value in music, severing it from its cultural and human context. That today’s markets might not be great at rewarding the value we find in music is not an argument for abandoning them!

Digital music platforms have a history of intervening in the supply side of the market; indeed the digital market was kickstarted by two big interventions, unbundling the album, and enabling mass ripping of CDs to fill iPods. Platforms have encouraged mass piracy, and more recently incubated and supported DIY music upload businesses. AI is the latest intervention and is already fueling a massive increase in supply. Just one company, Boomy, claims to have generated nearly 20m songs.

Success in the industry equates, broadly, to high popularity over time. Music is a cultural good with very low barriers to consumption, and a typical consumer’s usage remains strongly influenced by her awareness that others like her are also enjoying the same music. Such a competitive dynamic fuels a very concentrated market; a few tracks earn almost all the revenue, while most get little or none. Even the crudest division of the market into winners and losers asks the pertinent question: which cohort is most vulnerable to competition from AI-generated music?

I suggested above that current AI-generated music aces the Turing test; this assertion needs heavy qualification. There is a big difference between ambient trance music, and a complex piece involving traditional acoustic instruments, singers, and sophisticated production techniques. AI is much better at trance. A big movie soundtrack remains a big beast of a creative challenge (although one can prompt an AI music generator with ‘epic cinematic soundtrack’ – the result sounds like the band Dragonforce but without the humour). So the strength of AI music generators is their effectively infinite capacity to increase the supply of generic music.

But our collective preference is demonstrably for hits and innovation over time, rather than just more generic music with less of the cultural and human value in the bundle. Even the recommendation and curation interventions struggle to compete with audience preference. The prized Spotify editorial playlist placements have a follower to playcount ratio of between 1% and 5%. Unless we get to the point where we prefer AI generated music, its impact on success for artists will be minimal.

Assimilation

Like so often in culture and markets, the infiltration of generative AI into music turns out to be a question of values as much as of technology. A dystopian outlook for a human economy of music making seems to me to be founded on a view that the value in music is solely in any anonymous affect it has when heard. This is ahistorical. Music might not have the cosmological significance it once had, but it remains an experience and activity loaded with personal and societal significance. We might find comfort in familiar or unchallenging music, but the norms expressed in our copyright laws incentivise innovation and originality, and markets agree. It’s the Queen catalogue that is the Koh-i-Noor diamond for the music rights hoarders, not the tens of thousands of contemporaneous rock plodders.

Here then is a positive view of a future in which generative AI makes music. We will surely find ways to harness AI in generating therapeutic or performance enhancing music for health and sport. Generative AI can iterate and shorten the feedback loop in ways that humans just can’t. The value of this music will be immense, and there is already a lot of literature supporting its effectiveness, even before we could adapt and optimise the music itself. 

Alongside this the processes involved in generative AI are producing new insights into what music is, leading to an expanded commons of ideas and understanding available to the foundation models as well as to human artists and composers. Our copyright framework recognises the expression of an idea as worthy of protection, not the idea itself. This common stock from which musicians draw is an important cultural seedbank. We use tunings and structures in contemporary music that were fresh in the Baroque period; generative AI will perhaps help keep this stock fresh so that future musicians have their own go at making the perfect break up song or pastoral tone poem.

Our veneration for artists and creators is unlikely to be eroded by the presence of artificially produced music in our public spaces and on music platforms. We will continue to want to see the artist’s presence in and behind the music, and to do so we will surely need to strengthen the way we manage identity and authenticity in creation and performance. So with an awareness that I am writing for an audience of entertainment lawyers, here is my best guess for what the near future holds for music in the age of generative AI. We will add an AI value for the use of music and recordings to our collecting society mandates, and find inexpensive ways to distribute revenue, probably using similar technology to that which made the AI music in the first place. And we will support the deep cultural and social value our artists represent with stronger attribution rights, and technology to identify and authenticate their work.

This entry was posted in regulation, technology and tagged , , , . Bookmark the permalink.

Leave a Reply