Sound and Music Processing

TEL-860 - Sound and Music Processing

The Sound and Music Processing course introduces students to the fundamental principles, techniques, and methodologies for analyzing and processing audio and music signals. It focuses on three main areas: audio processing, music processing, and music information retrieval. Students explore the properties of sound, its generation and perception, room acoustics, and digital audio processing techniques. They also learn about music representation, feature extraction, and recognition systems for genre classification, instrument recognition, and music recommendation systems.
The course combines theory with practical laboratory exercises, using tools like Python (LibROSA, Essentia, Madmom, Marsyas) and MATLAB to solve real-world problems in audio and music processing.

Key Topics Include:

Introduction to Audio Signals
- Types of sound and methods of generation and transmission.
- Key features and descriptors of audio signals.
- Frequency transforms: Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT).
- Digital filters and time-varying filters for audio processing.
Room Acoustics
- Acoustic characteristics of indoor and outdoor environments.
- Reverberation, echo, and Head-Related Transfer Functions (HRTF).
- Design and simulation of acoustic spaces.
Sound Perception and Psychoacoustics
- Human perception and cognition of sound.
- Psychoacoustic principles: hearing thresholds and critical bands.
Audio Descriptors and Feature Extraction
- Time-domain descriptors: energy, zero-crossing rate, and entropy.
- Frequency-domain descriptors: spectral centroid, spectral entropy, MFCC, and chroma features.
- Estimation of periodicity and harmonic content.
Audio Processing Applications
- Multi-microphone systems: echo cancellation, dereverberation, and blind source separation.
Music Representation
- Representation through scores, symbolic, and acoustic formats.
- Key features of music signals: timbre, pitch, amplitude, and duration.
Music Descriptors and Feature Extraction
- Descriptors for timbre, rhythm, pitch, and harmony.
- Dynamic Time Warping (DTW) for tempo and beat detection.
Structure Analysis of Musical Pieces
- Self-similarity tables, audio thumbnailing, and segmentation techniques.
Music Information Retrieval
- Data mining and retrieval techniques for music databases.
- Recognition of musical attributes like lyrics, genres, and cover versions.
Music Indexing and Audio Fingerprinting
- Music indexing techniques and fingerprint extraction for recognition systems.
Similarity Measurement in Music
- Methods for measuring similarity and comparing musical pieces.
Music Recognition Systems
- Recognition of instruments, emotions, genres, and songs.
- Cover song detection and recommendation systems

Laboratory Topics

Basic Digital Audio Processing Techniques
Implementing filters, transformations, and basic editing tools for sound manipulation.
Simulation of Room Acoustics
Designing and simulating acoustic spaces to analyze reverberation and echo effects.
Audio Processing Applications
Echo cancellation, noise reduction, and blind source separation in audio signals.
Introduction to Music Audio Signals
Processing music signals for feature extraction, representation, and segmentation.
Feature Extraction from Musical Tracks
Calculating features such as MFCC, chroma, tempo, and rhythm for analysis and classification.
Music Information Retrieval Applications
Implementing algorithms for music classification, genre recognition, and recommendation systems.

Project-Based Learning

The Sound and Music Processing course is project-based, emphasizing teamwork, hands-on implementation, and real-world problem-solving. Students will work in teams of two to complete a project focused on one of the topics covered in the course.

Each team is required to:

Deliver a Written Paper: The final report must follow a conference-style format (4–6 pages maximum), including sections such as abstract, methodology, results, and conclusions. Proper formatting guidelines will be provided.
Prepare a 15-Minute Presentation: Teams will present their work, highlighting the project’s objectives, technical implementation, results, and conclusions. The presentation must also include a live demonstration of the developed system or algorithm.

At the end of the semester, all projects will be presented during a Workshop, which will be open to all members of the department. This provides students with the opportunity to showcase their work, receive feedback, and discuss their findings with peers, faculty, and researchers.

This structure encourages creativity, collaboration, and professional communication, preparing students for future academic and industry-related challenges.

Student Projects

The Sound and Music Processing course includes a project-based component where students work in teams of two to develop practical solutions to real-world problems. Each project requires a conference-style paper (4–6 pages) and a 15-minute presentation with a demo of the developed system. Projects are showcased in a Workshop open to all members of the department. Below are some examples of proposed projects for this course:

Sound Event Recognition System on Single-Board Computers
Implement a system to detect and classify sound events using Raspberry Pi or NVIDIA Jetson Nano platforms.
Exploratory Analysis of Music Tracks from Online Databases
Create a web-based app to collect and visualize music data from platforms like Spotify, Deezer, and Last.fm using machine learning techniques.
Continuous Audio Classification Using TinyML
Develop an always-on audio classifier using Arduino Nicla Voice and TinyML for real-time audio analysis with synthetic datasets.
Emotion-Based Music Playlist Generator
Design a playlist generator that recommends songs based on the user’s mood using valence-arousal emotional models.
Room Acoustics Simulation and Optimization
Analyze the acoustics of enclosed spaces (e.g., classrooms or studios) and propose improvements using simulation tools.
Diarization System for Multi-Speaker Audio Recordings
Implement a diarization system to segment and label speakers in multi-speaker recordings using PyAnnote.audio in Python.
Voice-Controlled Digital Assistant with Raspberry Pi
Develop a digital assistant that recognizes speech commands and responds using pre-trained LLMs like LLaMA-2 or TinyLlama.
Speech Recognition with NVIDIA RIVA
Explore NVIDIA RIVA‘s speech recognition system and develop a voice-driven application.
Voice-Controlled Smart Home System
Build a speech-controlled system to operate smart home devices using NVIDIA RIVA.
Automatic Music Genre Classification
Create a machine learning model to classify songs into genres based on their features.
Speaker Recognition for Access Control Systems
Implement a speaker verification system for secure access control.
AI-Powered Tourist Guide
Design a voice-interactive assistant for tourists, combining LLMs and text-to-speech synthesis.
Digital Text Reader for Accessibility
Develop a text-to-speech application to assist users with visual impairments.
Voice-Controlled Robotic System
Implement a robotic control system using voice commands to navigate directions.
Voice-Controlled Video Game Interface
Create a voice-based control system for moving characters in video games using Unity.
Emotion Detection in Sports Commentary
Analyze and visualize emotional variations in sports commentators’ speech during live games.
Interactive Music Discovery Map
Use the Spotify API to create a music exploration tool for discovering artists and genres interactively.
Visualization of Music Features Using Spotify API
Build an application to display song characteristics (tempo, mood, style) using data retrieved from Spotify API.
Heart Sound Analysis for Diagnosing Cardiac Conditions
Implement a machine learning system to analyze heartbeats and detect cardiovascular issues using pre-recorded datasets.

Equipment

The Sound and Music Processing course incorporates a range of modern hardware and software tools to support practical learning and real-world applications. Students gain hands-on experience with digital audio processing, machine learning techniques, and embedded systems through the following equipment

1. Raspberry Pi (Model 4 or 5)
Raspberry Pi is a compact, affordable single-board computer used for prototyping and deploying audio and music processing applications. It supports Python programming and connects seamlessly with external hardware like microphones and speakers. Students use Raspberry Pi for tasks such as sound classification, speech recognition, and smart assistant development.

2. NVIDIA Jetson Nano and Xavier NX
These AI-powered platforms are designed for machine learning and signal processing applications. They provide high-performance computing capabilities required for real-time processing tasks, such as music genre classification, audio fingerprinting, and voice-controlled smart systems.

3. Arduino Boards with Nicla Voice Module
Arduino boards combined with the Nicla Voice module enable students to develop TinyML-based audio processing systems. These setups are ideal for implementing always-on audio classifiers, voice-controlled applications, and speech recognition projects.

4. HuskyLens AI Camera
The HuskyLens AI Camera supports visual and audio processing tasks with built-in machine learning algorithms. It is used for object recognition, gesture detection, and integrating audio-visual systems for interactive applications.

5. Audio Recording and Playback Equipment
High-quality microphones and speakers are available for recording and analyzing sound signals. These tools are essential for projects like room acoustics modeling, voice diarization, and music recognition systems.

6. MATLAB
MATLAB provides a robust environment for designing and simulating audio processing algorithms. It is used in labs for tasks like signal filtering, feature extraction, and audio enhancement.

7. Python Libraries for Audio Processing
Open-source Python libraries such as LibROSA, Essentia, Madmom, and Marsyas support music and speech analysis. These libraries enable students to implement advanced techniques, including beat tracking, music feature extraction, and tempo estimation.

8. Edge Impulse Studio
Edge Impulse Studio is an AI development platform used for training and deploying TinyML models with minimal coding requirements. It is particularly useful for developing embedded systems, such as voice-controlled assistants or wearable audio devices.

Recommended Bibliography

Koutras A., Alexandraki C., Zarouchas T., Zervas P., Chatziantoniou P. Audio, Speech, and Music Processing and Analysis. Kallipos Open Academic Editions, 2023.

This comprehensive textbook, published by Kallipos Open Academic Editions, delves into the fundamental principles and advanced techniques of audio, speech, and music signal processing. It covers a wide range of topics, including digital signal processing fundamentals, psychoacoustics, room acoustics, feature extraction, and various applications in audio processing and acoustics. The book is structured to provide both theoretical insights and practical applications, making it a valuable resource for students and professionals in the field.

Additionally, the following resources are recommended for a deeper understanding of the concepts covered in the Sound and Music Processing course:

Müller M. Fundamentals of Music Processing. Springer International Publishing, 2015. ISBN: 978-3-319-21944-8.
Giannakopoulos T., Pikrakis A. Introduction to Audio Analysis: A MATLAB® Approach. Academic Press, 2014. ISBN: 978-0-12-394443-6.
Weihs C., Jannach D., Vatolkin I., Rudolph G. Music Data Analysis: Foundations and Applications. Chapman & Hall/CRC, 2016. ISBN: 978-1-4987-3236-8.
Li T., Ogihara M., Tzanetakis G. Music Data Mining. Chapman & Hall/CRC, 2011. ISBN: 978-1-4398-0914-2.
Lerch A. An Introduction to Audio Content Analysis. Wiley, 2012. ISBN: 978-1-118-26842-3.

These resources are available in the university library or can be accessed through online platforms for supplementary reading. Students are encouraged to refer to these books for additional examples, explanations, and problem-solving techniques.

Lectures are held weekly on Spring Semester, Thursdays 17:00 – 20:00 at Room K1.07.
Please refer to the official timetable on e-Class (access for registered users only) for the most up-to-date details