Shazam has come a long way since 2002 when a user had to dial “2580” and get an SMS identifying the song playing. With over 12 billion tags, Shazam can find almost anything you throw at it, making it one of the best music recognition apps out there.
When it comes to identifying music, humans can do it rather quickly. It takes us 100 to 300 milliseconds to recognise songs we are familiar with, but when it comes to computers, it is a different ball game altogether. A computer can’t understand rhythms like the human brain as all it knows is ones and zeros
To solve this problem, Shazam began to take shape in 1999 with an aim to make music identification effortless. Shazam was acquired by Apple for $400 million in 2017.
With an active user base of over 450 million users, Shazam does a couple of things right, but how does Shazam identify music at lighting fast speeds and understand something intangible like music?
Also read: Top 5 Shazam alternatives
Shazams’ music identification broken down
While a lot is happening in the background to identify the song you heard in a noisy environment. Shazam divides its audio processing between its servers and the client to make the process faster.
- Sampling: Shazam uses the microphone on your smartphone to listen to the song playing around you. It then converts this analog signal into a digital signal by sampling it at 44,100 samples per second. This sampling rate is determined by the Nyquist sampling criterion, which allows data integrity and prevents aliasing.
- Creating a spectrogram of the sampled signal: A spectrogram is a three-dimensional graph — time, frequency and amplitude — which helps a computer understand the various parameters of an audio signal. The spectrogram plots the time vs frequency graph with time on the x-axis and frequency on the y-axis. The colour of the plot gives the amplitude. To create the spectrogram from the digital data, Shazam uses the Discrete Fourier Transform.
- Transforming the 3D spectrogram to 2D: Using a 3D plot to identify music can take a lot of time as it has a lot of data points. Therefore to minimise the computational overhead and reduce noise Shazam creates a 2D map of the audio. To do this, it selects the frequencies with the highest amplitude and plots them against time. This map is sent to Shazam’s servers for further processing
Although it is possible to find the song by comparing this 2D map to maps of audio files on Shazam’s servers, the time complexity is preposterous. To understand the problem, let us look at an analogy.
The 2D map of the audio file is like a part of a constellation and Shazam has to find which constellation the snapshot represents. Looking at this problem you can understand how hard it can be to find a song using the 2D map. So to make things easier Shazam uses audio fingerprinting and hashing.
- Audio fingerprinting: To create the fingerprint using the 2D map Shazam selects a frequency point on the map. This frequency point is known as an anchor point. For each anchor point, it selects target zones which consist of other frequency points. After this, it calculates the time delay between the anchor point and the frequencies in the target zone.
- Hashing and comparing: Using the frequency of the anchor points, time delay and the frequency of points in the target zone it creates a hash of the audio clip. Comparing this hash to the hashes in its database helps Shazam find your music.
All in all, Shazam uses a lot of audio processing and computational power to help you find the music you love. So the next time you hear a beat that makes your feet tap, just open the Shazam app.
Also read: 7 best sites to learn music online