PianoMentor: A Technological Framework to Teach and Practice Piano/Keyboard Online via Machine Learning with an Embedded Practice Lesson Generator
Date
2024-07-30Type of Degree
PhD DissertationDepartment
Computer Science and Software Engineering
Restriction Status
EMBARGOEDRestriction Type
Auburn University UsersDate Available
07-30-2025Metadata
Show full item recordAbstract
In this dissertation, we aim to create an advanced artificial system that enhances its ability to perceive, coordinate, and analyze music scores generated from human performances in real time and revolutionizes online music technique learning. By harnessing cutting-edge technology, this system aims to improve existing music practice methods and foster a new era of self-directed learning among students, thereby significantly impacting music education. Artificial Intelligence (AI) and Human-Computer Interaction (HCI) have significantly advanced computer music systems, enabling them to collaborate with humans across various applications. This dissertation employs various techniques, particularly machine learning and deep learning algorithms, to offer music students and educators a more intuitive and effective experience. The research focuses on three main aspects of human-computer collaborative music practice: 1- Automating music transcription to extract pitch and timing information. 2- Generating real-time performance analysis reports as annotated music scores. 3- Developing practice exercises based on students' performances while adapting different practice strategies. We utilize a framewise pitch detection model based on pitch onset predictions to achieve these goals, allowing a new note to start only when the onset detector confirms its existence. This integrated approach to improving onsets and offsets aligns better with human musical perception. Additionally, we propose a system that facilitates the visualization and comparison of MIDI files, addressing the abstract nature of MIDI data. Depending on user groups and tasks, we present various visualizations, including card lists for viewing multiple MIDI files, heatmaps for note distribution, a Note Histogram for note occurrence counts, pitch-time charts, adapted MatrixWave visualizations for note sequences, and diagram designs for visualizing similarities between pattern representations of sequences. This system is implemented as a browser application and evaluated using a usage scenario, demonstrating its effectiveness for the specified user tasks and highlighting some limitations. Finally, we explore the potential of large-scale language models (LLMs) like ChatGPT, ChatMusician, and MusicLang in generating relevant practice exercises based on students' performance. Unlike traditional methods requiring deep musical and statistical knowledge, LLMs enable users to describe their musical desires directly. We evaluate LLMs' composing abilities based on their alignment with user input and the overall quality of their compositions, considering factors like repetitiveness and scale diversity. This research aims to underscore LLMs' strengths, applications, and limitations in music, paving the way for their expanded role in computer-assisted composition and the broader music industry.