Interest in Federated Learning, as a collaborative way to train AI or machine learning models without anyone else seeing or accessing your private data, has increased rapidly in recent years. The potential to use federated learning to allow multiple organisations to create combined synthetic datasets is explored by researchers from the University of Manchester. Creating these combined synthetic datasets could, for instance, allow multiple hospitals to work together to create one large synthetic patient dataset - potentially providing a more complete picture than each individual hospital dataset could alone, and produced without sharing any of their own private patient data.

Synthetic data can allow governments, organisations, or businesses to release data without compromising on privacy. A synthetic dataset contains artificially generated data, created so that it looks like the original dataset but without containing any of the original data - this can be used in place of the original to train algorithms, or for analysis. It can be particularly useful where there is no possibility of accessing the original dataset due to privacy constraints.

The review found that whilst the field of federated synthesis is in its early stages, there is real promise to the idea of producing collaborative synthetic data in this way. However, there are privacy concerns that need to be more thoroughly explored before this method would be trusted. Even though federated learning is thought to be privacy-preserving there are ways that private information can be leaked, and similarly it is also still possible for synthetic datasets to disclose private information. Therefore, further research is needed to explore the privacy implications of federated synthesis, but the potential of collaboratively creating a global synthetic dataset without sharing any individual data is certainly a sufficient prize to motivate this effort.

Lead author, Dr Claire Little, added that “whilst further research is required, federated synthesis presents a potentially useful way to create synthetic data collaboratively and may in future allow access to more diverse synthetic datasets.


Click here to read the full open access article

Claire Little, Research Associate, Cathie Marsh Institute for Social Research, School of Social Sciences, University of Manchester, UK

Little, C., Elliot, M. and Allmendinger, R. (2023) “Federated learning for generating synthetic data: a scoping review”, International Journal of Population Data Science, 8(1). doi: 10.23889/ijpds.v8i1.2158.