Deep Fake Birdsong | Details

Deep Fake Birdsong, 2020

Artist and analog electrical engineer: Kelly Heaton
Software engineer: Johann Diedrick

Project Summary

Deep Fake Birdsong unites art with science to inquire about the electrical nature of lifeforms and to push the envelope of artificial intelligence. The project examines the art of Kelly Heaton, who builds circuits with lifelike qualities in order to question the nature of consciousness. For this collaborative work with software engineer Johann Diedrick, Heaton built an interactive analog electronic circuit that generates birdsongs. In essence, Heaton's electronic sculpture is a minimal interface analog audio synthesizer. The circuit's simplicity and low-component count is conceptually important to the experiment, which endeavors to show the wide range of lifelike behaviors that can manifest from analog electronic hardware. As it turns out, the expressive power of analog electronics is tremendous because a small quantity of oscillators can, when combined, create a surprisingly rich diversity of sound. We demonstrate that slight alterations in the base resistance of five interconnected oscillators will result in a wide variety of unique bird-like sounds. In this regard, Heaton's work has much in common with BEAM robotics (Mark W. Tilden et al.) In a second step of the project, Heaton and Diedrick recorded audio from her birdsong generating circuit and ran it through software that is designed to identify biological species of birds. Diedricks's Flights of Fancy is a software system that parses an audio recording of arbitrary length into a series of spectrograms with one second duration. These incremental signals are then compared against a database of actual recorded birdsong and assigned a species match along with a percent confidence rating. Of the 122 generated spectrograms parsed and analyzed by Flights of Fancy, four different species of bird were identified with match confidence percentages ranging from 48% to 96%. The four species of identified birds are:

Silky-tailed Nightjar: 47 matches with an average percent confidence of 93.147

Grey-breasted Seedsnipe: 16 matches with an average percent confidence of 26.515

Paramillo Tapaculo: 54 matches an average percent confidence of 48.991

Olive-striped Flycatcher: 5 matches with an average percent confidence of 47.938

The Experiment

Heaton approached Diedrick with the following experimental idea: what would his bird identification software say about an analog electronic circuit that generates bird-like sounds (as opposed to recordings of actual birdsong)? Would the software detect one or more bird species and, if so, with what degree of confidence? Based on a 2:02 minute recording (one spectrogram per second, omitting segments that contained no audio), Diedrick’s software system generated 122 spectrograms. From there, we passed these images to our trained model to see what bird species it predicted the spectrogram to come from, based on its training against the BirdCLEF dataset. From our experiments, the system identified four bird species that it predicted the spectrograms belonged to from our recording. The system produces a “match confidence” score which tells us that, across the 1,500 bird species it can predict, which one of these species does it have the highest confidence to which the spectrogram belongs. The match confidence percentages of our experiment ranged from 19% to 96%. Notably, 47 of the 122 spectrograms were matched with Antrostomus sericocaudatus (Silky-tailed nightjar) with an average percent confidence match rating (across all associated spectrograms) of 93%. If we were scientists instead of experimental artists, we would confidently conclude that the predominant bird species in our recording is the Silky-tailed nightjar.

Technical Details / Hardware

The "singing" electronic bird sculpture is centered on a discrete component (analog electronic) circuit. This circuit consists of five oscillators (astable multivibrators) with adjustable frequency. Each oscillator is capacitively coupled to a common signal line that is amplified to drive an audio transformer with output to a speaker. The component count for the entire circuit is 77. The hardware for this project was designed by Kelly Heaton. Please refer to project documents for the circuit schematic and link to our project website for additional photos of the resulting artwork: https://www.kellyheatonstudio.com/deep-fake-birdsong

Technical Details / Software

The possibility of both detecting and identifying birds in field recordings has been a long-standing problem in the field of bioacoustics and acoustic scene recognition [1, 2]. Only in recent years has it been possible to do large scale bird detection and recognition across large datasets of recordings, facilitated through the use of convolutional neural networks (CNNs) and other deep learning techniques [3]. Through the use of deep neural networks, we can scan through recordings, identify where bird calls are happening, and further, report back on what bird species produced that call. In Diedrick’s work Flights of Fancy, he was able to train a CNN through transfer learning to recognize birds found in the BirdCLEF dataset, which contains over 36,000 recordings across 1,500 species of birds primarily from South America [4. 5]. With this trained model, it is possible to take new recordings from birds that the model has never seen before, pass it through the software system, and produce a prediction as to what bird species the system thinks is contained within the recording (based on bird species from the dataset).

The Flights of Fancy software system was developed as follows: first, the BirdCLEF 2018 dataset was downloaded and sorted in such a way as to organize all of the audio recordings into folders named for each species. This way, we can use the folder name as the label for training, with all the recordings for a species contained within its species-named folder. Next, a data pre-processing step went through all of the recordings and segmented out bird calls through a signal-to-noise heuristic. This determination identifies signals with a significant amount of upward/downward variation as to be called a chirp. The system saves these segments as spectrograms images (frequency representation of a signal) in a folder named after each species like before in order to be used for training. Finally, the system uses a technique known as transfer learning [6] to take a specific kind of neural network (ResNet) already pre-trained on images (ImageNet), and leverages its pretrained weights to train more effectively on our generated spectrogram images of bird calls. From there, our model was trained down to a 27% error rate on predicting the species of a bird called based on spectrograms of bird species from that specific dataset.

For the collaborative work “Deep Fake Birdsong,” Johann Diedrick took a recording of Kelly Heaton's electronic birdsong-generating sculpture and passed it through this model to see what bird species were correlated with the electronically-produced sounds.

References:

[1] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J.P. Bello. "BirdVox-full-night: a dataset and benchmark for avian flight call detection", Proc. IEEE ICASSP, 2018.

[2] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.

[3] https://arxiv.org/pdf/1804.07177.pdf

[4] https://github.com/aquietlife/flightsoffancy/blob/master/flightsoffancy.ipynb

[5] https://www.aicrowd.com/challenges/lifeclef-2018-bird-monophone [6] https://docs.fast.ai/vision.learner.html#Transfer-learning

NB: Relationship to scientific endeavors

Our project relates to scientific endeavors in several ways. First, a concern for biodiversity and the importance of improving AI systems to evaluate the status of birds in their natural habitat. This work is built upon the Xeno-Canto bird recording archive, which is run by the Xeno-Canto foundation (or officially Stichting Xeno-canto voor natuurgeluiden) based in the Netherlands [1]. Through the use of this essential resource, we are able to demonstrate its importance for biological and AI research. As bird populations continue to decline across Europe, our research into bird sound classification will provide vital in determining how these populations are changing over time and how successful remediation projects are in helping these populations to healthy numbers [2]. Our work is one among many that could be useful within the EU to monitor, track, and potentially help in revitalizing bird populations across Europe.

Our project may also be of interest to researchers who are studying bird (or other animal) neurology. Heaton's use of simple analog oscillators to generate birdsong patterns is compelling insofar as it suggests a principle for the generation of complex biological behaviors. It has been shown in scientific research that there is a strong correlation between bird neurology and the songs they produce [3].

[1] https://www.xeno-canto.org/about/xeno-canto

[2] https://www.ncbi.nlm.nih.gov/pubmed/30912605

[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3065528/

Project Details