If you are wondering which is the best way to go about de-identifying free text data, specifically free text data containing personal health information then you have found just the article to help you decide.

A review conducted by researchers from the Manitoba Centre for Health Policy at the University of Manitoba, Canada, has identified and categorised de-identification methods for free text data, how systems have evolved over time and highlights which hybrid approaches are the most promising for future research.

As the availability of free text for research purposes continues to increase, so too is there is a growing need for preserving privacy during research analysis, particularly for free text data containing personal health information such as electronic health records. Needless to say, de-identification, or the process of removing personal identifiable information such as personal names from data, has become a hot topic in free text data world. This is why we need a thorough understanding of de-identification approaches and strategies employed to safeguard individuals' privacy while still enabling valuable analysis and research opportunities. Here’s why…

  1. It provides insight into the methods for the protection of privacy and confidentiality of individuals. By de-identifying personal identifiable information, researchers can analyse the data more securely.
  2. To provide guidance to researchers on which de-identification methods to choose based on requirements and goal of the research. Different methods offer varying levels of privacy, accuracy and usability, so researchers have to be aware of the different methods available and make an informed decision towards obtaining acceptable de-identification results.

Knowing and understanding the various methods for de-identification of personal health information in free text data is crucial for maintaining privacy and abiding by privacy acts and guidelines, and it helps researchers to access important free text data that can be used to advance medical research.

Lead author Bekelu Negash commented “This research offers a chance to acquire knowledge about methods of de-identification for text data containing personal health information and areas of improvement which is valuable in this field of study”


Click here to read the full open access article

Bekelu Negash, Research Assistant, Manitoba Centre for Health Policy, Department of Community Health Sciences, Rady Faculty of Health Sciences, University of Manitoba

Negash, B., Katz, A., Neilson, C. J., Moni, M., Nesca, M., Singer, A. and Enns, J. E. (2023) “De-identification of Free Text Data containing Personal Health Information: A Scoping Review of Reviews”, International Journal of Population Data Science, 8(1). Available at: https://ijpds.org/article/view/2153 (Accessed: 12 December 2023).