Abstract
Healthcare data holds huge societal and monetary value. It contains information about how disease manifests within populations over time, and therefore could be used to improve public health dramatically. To the growing AI in health industry, this data offers huge potential in generating markets for new technologies in healthcare. However, primary care data is extremely sensitive. It contains data on individuals that is of a highly personal nature. As a result, many countries are reluctant to release this resource. This paper explores some key issues in the use of synthetic data as a substitute for real primary care data: Handling the complexities of real world data to transparently capture realistic distributions and relationships, modelling time, and minimising the matching of real patients to synthetic datapoints. We show that if the correct modelling approaches are used, then transparency and trust can be ensured in the underlying distributions and relationships of the resulting synthetic datasets. What is more, these datasets offer a strong level of privacy through lower risks of identifying real patients.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, Z., Myles, P., Tucker, A.: Generating and evaluating synthetic UK primary care data: preserving data utility & patient privacy. In: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems, pp. 126–131 (2019)
Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data (2016). https://doi.org/10.1038/sdata.2016.35
Wolf, A., et al.: Data resource profile: clinical practice research datalink (CPRD) aurum. Int. J. Epidemiol. 44(3), 827–836 (2019)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction and Search. Lecture Notes in Statistics, vol. 81. Springer, New York (1993). https://doi.org/10.1007/978-1-4612-2748-9
Rabiner, R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: PrivBayes: private data release via Bayesian networks. In: SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014 (2014)
Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: IEEE 3rd International Conference on Data Science and Advanced Analytics (DSAA), vol. 1, pp. 399–410 (2016). https://doi.org/10.1109/DSAA.2016.49
Snoke, J., Slavkovi, A.: pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity. arXiv:1805.09392v1 (2018)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002)
Abay, N., Zhou, Y., Kantarcioglu, M., Thuraisingham, B., Sweeney, L.: Privacy Preserving Synthetic Data Release Using Deep Learning, pp. 510–526 (2018). https://doi.org/10.1007/978-3-030-10925-7
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)
Friedman, N.: Learning belief networks in the presence of missing values and hidden variables. In: Proceedings of the 14th International Conference on Machine Learning, pp. 125–133 (1997)
Xu, L., et al.: Modeling tabular data using conditional GAN. In: 33rd Conference on Neural Information Processing Systems (2019)
Jia, S., Lansdall-Welfare, T., Cristianini, N.: Right for the right reason: training agnostic networks. In: Advances in Intelligent Data Analysis XVII 17th International Symposium, IDA (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
de Benedetti, J., Oues, N., Wang, Z., Myles, P., Tucker, A. (2020). Practical Lessons from Generating Synthetic Healthcare Data with Bayesian Networks. In: Koprinska, I., et al. ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham. https://doi.org/10.1007/978-3-030-65965-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-65965-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65964-6
Online ISBN: 978-3-030-65965-3
eBook Packages: Computer ScienceComputer Science (R0)