PhD Studentships at the Centre for Digital Music - Autumn 2026 start
by Admin — Monday, 17 November 2025
The Centre for Digital Music at Queen Mary University of London is inviting applications for PhD study for Autumn 2026 start across various funding schemes. Below are suggested PhD topics offered by academics; interested applicants can apply for a PhD under one of those topics, or can propose their own topic. In all cases, prospective applicants are strongly encouraged to contact academics at C4DM to informally discuss prospective research topics.
Opportunities include internally and externally funded positions for PhD projects to start in Autumn 2026. It is also possible to apply as a self-funded student or with funding from another source. Studentship opportunities include:
-
One UK home PhD studentship (Autumn 2026 start, UK home applicants, deadline 9 January 2026)
-
S&E Doctoral Research Studentships for Underrepresented Groups (UK home applicants, Autumn 2026 start, 3 positions funded across the Faculty of Science & Engineering, deadline 28 January 2026 5pm)
-
CSC PhD Studentships in Electronic Engineering and Computer Science (Autumn 2026 start, Chinese applicants, up to 5 nominations allocated for the Centre for Digital Music, deadline 28 January 2026 5pm)
-
International PhD Funding Schemes (Autumn 2026 start, numerous international funding agencies)
Each funding scheme has a dedicated application process and requirements. Detailed information and application links can be found on the respective funding scheme pages, following the above links.
Understanding Neural Audio – and Building it Better
Supervisor: Mark Sandler
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Scheme
As Deep Learning models for Audio and Music have got ever more complex, so our ability to fully understand them has diminished. In this research, we explore how weight matrices evolve during training and how activations evolve during both training and inference. One of the key tools for this is Linear Algebra, especially Matrix and Tensor decomposition techniques. But Matrix and Tensor decomposition not only leads us to new insights, it also leads to new ways to build Deep Learning models. In particular, we have developed a new approach called Sum of Rank One (SoRO) layers, where a fully connected layer is replaced with a sum of small, rank-1 matrices which can be very efficiently implemented. Preliminary work has shown that these layers not only learn quicker, they also learn better and potentially with less data.
Students joining this project – which is in collaboration with our School of Mathematical Sciences – will get the opportunity either to study training in existing Neural Audio models (e.g. for sound synthesis or for sound source separation) or to explore novel models that incorporate SoRO layers, or both.
Another possible aspect to this work is to develop ways to implement convolutional layers in the SoRO formulation and explore the changing learning dynamics of CNNS. Further potential avenues for study are Attention Heads in Transformers and the autoencoders of Diffusion Models.
Applicants should develop their own particular interest within this framework and explain it in their Research Proposal.
Differentiable Physics Neural Modelling of Strings, Membranes and Plates
Supervisor: Mark Sandler
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Scheme
This roject will investigate a new architecture for data-driven sound generation that builds on recent advances in Neural Audio sound generation using Differential Digital Signal Processing (DDSP) and a new differentiable approach to more conventional physical modelling (Diaz Fernandez et al., 2024). This is what we call Differentiable Physics Neural Modelling (DPNM). This delivers efficient, interpretable sound generation by combining the best of physical modelling and data-driven approaches. It falls into the broader category of Model Based Deep Learning (Shlezinger et al., 2021).
In recent years, significant research attention has been paid to Physics-informed Neural Networks (PINNs) (Raissi et al., 2019) in a variety of fields and it is natural to explore their use in physical modelling of sounding objects, such as plucked strings, wooden panels, membranes etc. However, extensive research at QMUL team has shown that the levels of accuracy that are sufficient in conventional physics are not accurate enough for studio-quality sound generation over adequately long periods. Preliminary work suggests that DPNM can overcome these short-comings, making this an exciting topic for PhD research.
The student should have a background that includes understanding of Deep Learning and ODEs/PDEs.
Diaz Fernandez, R., De La Vega, M. C., & Sandler, M. (2024). Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods. International Conference on Digital Audio Effects, DAFx.
Shlezinger, N., Whang, J., Eldar, Y. C., & Dimakis, A. G. (2021). Model-Based Deep Learning: Key Approaches and Design Guidelines. 2021 IEEE Data Science and Learning Workshop (DSLW), 1-6. 10.1109/DSLW51110.2021.9523403
AI Models of Music Understanding
Supervisor: Simon Dixon
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Music information retrieval (MIR) applies computing and engineering technologies to musical data to satisfy users' information needs. This topic involves the application of artificial intelligence technologies to the processing of music, either in audio or symbolic (score, MIDI) form. The application could be e.g. for software to enhance the listening experience, for music education, for musical practice or for the scientific study of music. Examples of topics of particular interest are automatic transcription of multi-instrumental music, providing feedback to music learners, incorporation of musical knowledge into data-driven deep learning approaches, and tracing the transmission of musical styles, ideas or influences across time or locations.
It is intentional that this topic description is very general, but it is expected that applicants choose your own specific project within this broad area of research, according to your interests and experience. The research proposal should define the scope of the project, the relationship to the state of the art, the data and methods that you plan to use, and the expected outputs and means of evaluation.
Bridging Musical Intelligence and Machine Learning: Integrating Domain Knowledge into Music and Audio Representation Learning
Supervisor: George Fazekas
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Audio and music representation learning seeks to transform raw data into latent representations for downstream tasks such as classification, recommendation, retrieval and generation. While recent advances in deep learning, especially contrastive, self-supervised and diffusion-based approaches have achieved impressive results, most remain purely data-driven and neglect domain-specific musical structures like rhythm, melody, harmony, metrical hierarchy or genre-style traits.
This PhD project will explore ways to embed theoretical and structural knowledge into modern representation learning pipelines to enhance interpretability, controllability and performance. For example, incorporating symbolic or other structured representations, inductive biases, well-known principles exploited in classic DSP algorithms, or ontological constraints, the research aims to bridge the gap between data-driven models and the structured understanding of music and audio.
Potential directions include hybrid models that combine deep audio and symbolic embeddings, graph-based or relational learning of musical structure, and explainable methods for music analysis, production or generation. The project will also engage with principles of Ethical and Responsible AI: reducing data bias, improving transparency and supporting fair attribution of authorship.
Examples of relevant works include but not limited to:
Guinot, Quinton, Fazekas: "Semi-Supervised Contrastive Learning of Musical Representations", ISMIR-2024
Yu, Fazekas: "Singing voice synthesis using differentiable LPC and glottal-flow-inspired wavetables", ISMIR-2023
Agarwal, Wang, Richard: F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation, ICASSP-2025
Assistive technologies for music making, production or listening using Generative AI
Supervisor: George Fazekas
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Applications are invited for a PhD exploring how generative AI can power new forms of assistive technology that support music creation, performance, production, and listening. As AI systems become increasingly capable of modelling human emotion, intention, and creative context, they open opportunities to help people engage with music in more intuitive, expressive and enriched ways.
This research will investigate 1) how AI can support, for example, musicians in composing or performing through intelligent accompaniment, adaptive sound design, or personalised production tools, or alternatively 2) how listeners can benefit from systems that shape or generate music in response to emotional states, physiological data, or broader wellbeing needs. Drawing inspiration from recent work on music-based self-regulation, the project may explore how generative models might interpret multimodal signals such as facial expression, movement or physiological data, and respond with music that supports focus, enhances creativity, comfort or helps stress recovery.
Methodologically, the PhD may incorporate advances in deep generative audio models, foundational music models, multimodal learning, affective computing, reinforcement learning and interactive human-AI co creation systems. The expected outcomes include novel AI-driven tools that make music creation more accessible, enhance creative workflows, or offer evidence-based benefits for listeners’ emotional and mental wellbeing, contributing both new technologies and new understanding of human–music interaction in the era of generative AI.
Relevant references include but are not limited to:
Herremans et al.: "A Functional Taxonomy of Music Generation Systems" ACM Computing Surveys, 2017
Liyanarachchi et al.: "A Survey on Multimodal Music Emotion Recognition" arXiv, 2025
Strano et al., "STAGE: Stemmed Accompaniment Generation through Prefix-Based Conditioning", ISMIR 2025
Automated machine learning for music understanding
Supervisor: Emmanouil Benetos
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
The field of music information retrieval (MIR) has been growing for more than 20 years, with re-cent advances in deep learning having revolutionised the way machines can make sense of music data. At the same time, research in the field is still constrained by laborious tasks involving data preparation, feature extraction, model selection, architecture optimisation, hyperparameter optimisa-tion, and transfer learning, to name but a few. Some of the model and experimental design choices made by MIR researchers also reflect their own biases.
Inspired by recent developments in machine learning and automation, this PhD project will investi-gate and develop automated machine learning methods which can be applied at any stage in the MIR pipeline as to build music understanding models ready for deployment across a wide range of tasks. This project will also compare the automated decisions made on every step in the MIR pipe-line, as compared with manual model design choices made by researchers. The successful candidate will investigate, propose and develop novel deep learning methods for automating music under-standing, resulting in models that can accelerate MIR research and contribute to the democratisation of AI.
Sonification techniques for understanding hidden processes of LLMs
Supervisors: Anna Xambó and Charalampos Saitis
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Large language models (LLMs) are a type of artificial intelligence program that can recognise and generate text, which are trained on huge sets of data with a complex network of hidden processes. This PhD topic explores sonification techniques of LLMs for a better understanding of the way they process the information. Can we treat LLM engines such as ChatGPT as a musical instrument and listen to its internal processes? Can sonification techniques help us to hear and see how the information is processed? Compared to vinyl records or tape recordings, what is the acoustic signature, and what are the artefacts that are distinctive of this new medium? This work will contribute to addressing an important challenge in AI: making the inner workings and hidden knowledge of models more interpretable for people.
Keywords: sonification, large language models (LLMs), explainable AI
Audio-visual sensing for machine intelligence
Supervisor: Lin Wang
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Scheme
The project aims to develop novel audio-visual signal processing and machine learning algorithms that help improve machine intelligence and autonomy in an unknown environment, and to understand human behaviours interacting with robots. The project will investigate the application of AI algorithms for audio-visual scene analysis in real-life environments. One example is to employ multimodal sensors e.g. microphones and cameras, for analysing various sources and events present in the acoustic environment. Tasks to be considered include audio-visual source separation, localization/tracking, audio-visual event detection/recognition, audio-visual scene understanding.
Interpretable AI for Sound Event Detection and Classification
Supervisor: Lin Wang and Emmanouil Benetos
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Deep-learning models have revolutionized state-of-the-art technologies for environmental sound recognition motivated by their applications in healthcare, smart homes, or urban planning. However, most of the systems used for these applications are based on black boxes and, therefore, cannot be inspected, so the rationale behind their decisions is obscure. Despite recent advances, there is still a lack of research in interpretable machine learning in the audio domain. Applicants are invited to develop ideas to reduce this gap by proposing interpretable deep-learning models for automatic sound event detection and classification in real-life environments.
Using machine learning to enhance simulation of sound phenomena
Supervisor: Josh Reiss
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Scheme
Physical models of sound generating phenomena are widely used in digital musical instruments, noise and vibration modelling, and sound effects. They can be incredibly high quality, but they also often have a large number of free parameters that may not be specified just from an understanding of the phenomenon.
Machine learning from sample libraries could be the key to improving the physical models and speeding up the design process. Not only can optimisation approaches be used to select parameter values such that the output of the model matches samples, the accuracy of such an approach will give us insight into the limitations of a model. It also provides the opportunity to explore the overall performance of different physical modelling approaches, and to find out whether a model can be generalised to cover a large number of sounds, with a relatively small number of exposed parameters.
This work will explore such approaches. It will build on recent high impact research from the team in relation to optimisation of sound effect synthesis models. Existing physical models will be used, with parameter optimisation based on gradient descent. Performance will be compared against recent neural synthesis approaches, that often provide high quality synthesis but lack a physical basis. It will also seek to measure the extent to which entire sample libraries could be replaced by a small number of physical models with parameters set to match the samples in the library.
The student will have the opportunity to work closely with research engineers from the start-up company Nemisindo, though will also have the freedom to take the work in promising new directions. Publishing research in premier venues will be encouraged.
The project can be tailored to the skills of the researcher, and has the potential for high impact.
Intelligent audio production for the hearing impaired
Supervisor: Josh Reiss
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Scheme
This project will explore new approaches to audio production to address hearing loss, a growing concern with an aging population. The overall goal is to investigate, implement and validate original strategies for mixing audio content such that it can be delivered with improved perceptual quality for hearing impaired people.
Music content is typically recorded as multitracks, with different sound sources on different tracks. Similarly, soundtracks for television and radio content typically have dialogue, sound effects and music mixed together with normal-hearing listeners in mind. But a hearing impairment may result in this final mix sounding muddy and cluttered. The research team here have made strong advances on simulating hearing loss, understanding how to mix for hearing loss, and attempting to automatically deliver enhanced mixes for hearing loss. But these initial steps identified many unresolved issues and challenges. Why do hearing loss simulators differ from real world hearing loss, and how can this be corrected? How should hearing loss simulators be evaluated and how should they be used in the music production process? What is the best approach to mix audio content to address hearing loss? These questions will be investigated in this project.
The project can be tailored to the skills of the researcher, and has the potential for high impact.
Neural Dynamics of Perceptually Aligned Artificial Intelligence
Supervisor: Iran Roman
Eligible funding schemes: Fully-funded UK Home studentship (fees and London stipend)
The brain easily makes sense of complex perceptual tasks, while sophisticated AI systems still struggle. This PhD project aims to bridge computational neuroscience, machine learning, and multimodal perception to build AI that perceives the world more like living organisms. Current AI often relies on statistical shortcuts, not genuine understanding. This project will draw on Neural Resonance Theory to replicate perceptual alignment, where biological networks resonate and synchronize to embody perceptual structure. The project will investigate how principles of oscillation, resonance, and attunement can be embedded in neural networks. The successful candidate will develop new theories and algorithms. Potential applications include multimodal self-supervised learning, neurodynamical models, and embodied, interactive AI systems that can understand and anticipate actions in real-time.
Deep neural modelling of music and speech perception
Supervisor: Marcus Pearce and Iran Roman
Eligible funding schemes: S&E Studentships for Underrepresented Groups, CSC PhD Studentships, International PhD Funding Scheme
Evidence suggests speech and music perception depend on cognitive models acquired through implicit statistical learning. While deep neural networks (DNNs) use analogous mechanisms for generating music and language, it is unknown if they truly simulate human perception. This project will develop novel neural network architectures to simulate speech and music perception. The project will use existing probabilistic methods both as a benchmark and as a tool for interpreting the abstract representations learned by the DNNs. Models will be tested through iterative comparison with behavioural and neural data from human psychological experiments. The successful candidate will also investigate cross-cultural comparisons and the psychological relationships between speech and music. The project's outcome will be a computational understanding of the psychology of human cultural learning in auditory perception.
Neural Resonance and the Perception of Timbre: A Neurodynamic Modeling Approach
Supervisor: Charalampos Saitis and Iran Roman
Eligible funding schemes: S&E Studentships for Underrepresented Groups, International PhD Funding Schemes
The perception of timbre is central to auditory recognition, yet its neurodynamic basis remains underexplored compared to pitch or rhythm. Neural Resonance Theory (NRT) posits that musical experience arises from brain-body dynamics entraining to structured sound, resulting in stable, pattern-forming oscillatory activity. Recent models of cochlear and brainstem activity using nonlinear resonator networks provide biologically grounded support for this view. This project investigates how networks of nonlinear oscillators and recurrent neural networks (RNNs) respond to stimuli varying only in timbre (e.g., sinusoids, violin, voice), aiming to identify differential resonance patterns attributable to spectral characteristics alone. In parallel, RNNs trained on pitch tasks will be analyzed to determine whether their emergent internal dynamics replicate resonance phenomena. Finally, EEG recordings from human listeners will be used to detect entrainment signatures matching model predictions. This interdisciplinary approach offers a novel application of NRT to timbre perception, bridging biologically inspired modeling, machine learning, and empirical neuroscience.