Extent of Open Science Practices in the Reporting of Real World Evidence Research

Main Article Content

Lillian Liu
Louisa Jorm
Nahee Kim
Tom Honeyman
Claire M Vajdic
https://orcid.org/0000-0002-3612-8298

Abstract

Introduction
Open sharing of research methods and software code is fundamental to open science principles and reproducible research practices and has long been the norm in some scientific disciplines. Increasingly, scientific publishers are introducing policies to encourage or mandate sharing of research protocols and analytical code. Code sharing is especially important when research data cannot be shared, as is often the case in research using population data. However, the prevalence of protocol and code sharing in population data science research has been underexplored.


Objectives
To assess open science practice usage by authors in real world evidence (RWE) research published in the International Journal of Population Data Science (IJPDS).


Method
We reviewed RWE research articles publishing estimates of associations in the IJPDS from January 2019 to October 2024. We determined the proportion of published articles reporting (i) a link to a study protocol, (ii) a link to a pre-registered study protocol, (iii) a statement about the availability of the data, (iv) a link to the analytical code, and (iv) reference to a reporting checklist or guideline.


Results
None of the 41 eligible articles met all five open science domains. One article included a link to the study protocol and none cited a pre-registered protocol. Fourteen (34%) articles included a statement about data availability. No articles included a link to the analytical code, although one included it in supplementary material and two indicated availability on request. Five (12%) articles referred to using a reporting checklist. There was no clear evidence of increasing adoption of open science practices over time.


Conclusions
Researcher alignment with international best practice for open science was poor among RWE articles published in IJPDS. Potential solutions to encourage an open science culture include increasing awareness through training and education, building Communities of Practice, providing incentives and implementing open science publication policies.

Introduction

In recent years, there has been a push towards open science practices to make research openly available and publicly accessible for others to reuse. Although there can be many interpretations of what open science entails [1], common practices include open access publishing, pre-registering analyses, and sharing research outputs such as protocols, data and code [2]. Open science practices aim to improve the overall quality of research by increasing accessibility and promoting research transparency, reproducibility, reliability and collaboration [3]. They provide researchers the means to verify analyses, replicate results and reduce duplication and research waste as existing knowledge is built upon [4]. Open science practices can also democratise access to high-quality research tools and methods, particularly benefiting researchers in resource-limited settings [5].

The adoption of open science practices has been facilitated by journals, funders, and institutions who have been progressively transitioning to open access publication models and introducing policies to encourage and sometimes mandate sharing of protocols, data and code. Furthermore, the FAIR (Findability, Accessibility, Interoperability and Reusability) principles [6] which guide best practice scientific data management to support open data are increasingly being adopted by research organisations. An adapted version for open code, the “FAIR for research software (FAIR4RS) principles,” are similarly gaining traction amongst the research community [2].

Open science practices are routine or mandated by many scholarly societies and publishers, including genetics [7], environmental science [8], geophysics [9], astronomy [10], life sciences [11], biological/physical/social sciences [12] and computer science and chemistry [13], where there is a long tradition of making data and code available. However, there is little evidence to demonstrate the openness of research within the population data science community, despite the field being well-suited to benefiting from an open science approach. Enhanced transparency, reproducibility and collaboration would help drive policy development informed by population data studies given the field’s multi-disciplinary nature and its use of data drawn from privacy-sensitive sources [14]. Code sharing in particular is important for supporting reproducibility in situations where research data themselves may not be readily shared to protect the privacy and confidentiality of individuals and organisations.

In health and medical research, open data and sharing of code or software remain uncommon [1517], despite many researchers being aware of the advantages of open science and research reproducibility [4]. The situation is likely to be similar for real world evidence (RWE) studies, and indeed a framework for an explicit transparency statement for these studies has recently been proposed [18]. Despite several assessments of the open science practices in specific fields of research (e.g. cancer, surgery) [15, 19, 20], the extent of openness in RWE research has been underexplored to date. Therefore, this study aimed to examine the degree to which open science practices have been adopted amongst RWE research published between January 2019 to October 2024 in the International Journal of Population Data Science (IJPDS). Within this context, RWE research refers to studies that generate evidence from data collected from routine healthcare delivery.

Methods

We performed a review of all RWE ‘research articles’ and ‘methodological developments’ papers published in the IJPDS (https://ijpds.org) from January 2019 to October 2024. To be eligible for this review, an article reported estimates of association based on individual-level routinely collected observational data, including administrative, electronic medical record, and registry datasets, using a single dataset or linked datasets. Ineligible articles were non-original research articles such as perspectives, commentaries and literature reviews, and case studies, protocol papers, data resource profiles, as classified by the journal. Each original research article was reviewed and manually categorised by the authors; research that was qualitative only, that described only data linkage methods or algorithm development or reported only descriptive statistics, was also ineligible for inclusion.

Two authors (NK and CMV) independently reviewed each published article for eligibility and extracted standardised information from eligible studies. For each published article we extracted the article digital object identifier (DOI), first author name, and year of publication. For ineligible articles we recorded the reason for ineligibility. For eligible articles we extracted information for each of the five transparency domains for RWE studies proposed by Wang and Pottegård [18], as described in Table 1. Wang and Pottegård selected these domains based on their alignment to the Transparency and Openness Promotion (TOP) guidelines, and because they are actions researchers can take to facilitate reproducible research [21]. Differences between reviewers regarding study eligibility, extracted information and transparency coding were resolved through discussion and confirmation against the published article. We then calculated the number and proportion of all eligible articles that met each domain.

Open science domain Domain relevance to transparency and reproducibility in RWE research Information abstracted
[ 18 ]
Protocol The a priori scope of the research question(s), study design, data sources and analyses. Plus, version control showing any improvements or additions, including justifications for these amendments. The use of structured protocol templates (e.g., the Harmonized Protocol to Enhance Reproducibility, HARPER [22]) facilitates use of best practice approaches. This transparency reduces the risk of presenting a post hoc hypothesis as an a priori hypothesis, and the selective presentation of evidence. Did the article refer to a study protocol? If so, did the article include a weblink to the study protocol, or was it available from the authors on request?
Pre-registration A time-stamped, accessible protocol. This transparency also helps reduce research waste and publication bias. It is most valuable for inferential analyses. Was the study protocol pre-registered and if so, was a weblink provided?
Data The location where others can find or regenerate the source data. As individual-level health information is rarely publicly available or able to be shared, researchers can include guidance on how to regenerate the sourced data, including version information and the date of the data extract(s). Alternatively, privacy-preserving synthetic data could be provided. This information or data will enable analyses to be reproduced. Did the article include a data availability statement, and if so, were the details of the data repository given?
Code sharing Shared code that is well annotated allows others to reproduce the study. Ideally it includes the code used to generate the analytical dataset from the source data, as well as the code that creates results, tables and figures. The generation of reproducible code is facilitated by open source version control systems such as Git. Did the article include a statement about the availability of analytical code, including whether it was included with the article, available on request, or available online (with a weblink)?
Reporting checklist Lists the most critical elements of research that must be included in publications to allow an assessment of the research quality and allow it to be reproduced. The current best practice checklist is the Reporting of Studies Conducted using Observational Routinely Collected Health Data Statement for Pharmacoepidemiology (RECORD-PE) [23]. Did the article state whether the research had been written up in alignment with an international checklist or guideline?
Table 1: Domain elements and data abstraction for eligible RWE articles.

Results

A total of 41 eligible and 139 ineligible research articles were published in IJPDS over the study period. Details of the ineligible studies, and the reasons they were judged to be ineligible, are given in Supplementary Table 1. The data abstracted from the eligible studies are given in Supplementary Table 2.

No eligible RWE articles satisfied the reporting requirements for all five open science domains and correspondence with individual domains was low (Table 2). There was no clear evidence of increasing reporting against any of the domains over time. Additionally, there was no apparent correlation between the open science practices. Adhering to one of the domains did not increase the likelihood of the research following the other practices.

Open science domain [ 18 ] Number (%) of articles meeting the domain Publication year of article(s)
Included a statement about access to the study protocol 2 (5%) 2024, 2021
The study protocol was pre-registered 0 (0%)
Included a stand-alone statement about the availability of the data 14 (34%) 2024 (5), 2023 (1), 2022 (1), 2021 (4), 2020 (2), 2019 (1)
Included a statement about the availability of analytical code 3 (7%) 2024, 2021 (2)
Included a statement about aligning to reporting checklist or guideline 5 (12%) 2024, 2022, 2021, 2019 (2)
Table 2: Proportion of eligible IJPDS RWE articles reporting open science domain practices, January 2019 to October 2024.

Two articles referred to a study protocol; one included a link to the study protocol and the other noted the protocol was available on request. No articles stated the study protocol was pre-registered. Fourteen articles included a stand-alone statement about the availability of the data and six of these articles included access details. No articles included a weblink to the analytical code, although one included the code in supplementary material (link not currently functional) and two indicated it was available on request. Five articles referred to using a reporting checklist.

Discussion

This paper reviewed RWE research articles recently published in the IJPDS to gain insight to the extent of contemporary open science practices in population data science. Given the IJPDS is itself an open access journal, we focused on evaluating the five open science domains described by Wang and Pottegård as the key building blocks for transparency and reproducibility in RWE studies [18]. These domains are based on the Transparency and Openness Promotion (TOP) guidelines, which outline best practice recommendations for research practices and are used as indicators to measure research reproducibility [21].

Overall, we found a low prevalence of studies reporting these open science practices, suggesting their limited adoption in the population data science community. Reporting of study protocol pre-registration was particularly poor with no articles including a statement on this. However, these numbers may reflect an underestimation of the practice if researchers had pre-registered their study protocol without reporting it in their publications. This could similarly be the case for reporting checklists, although in both cases research transparency and reproducibility is reduced without publicly sharing and providing access to this information.

Although data availability had the highest prevalence with 34% of articles including a stand-alone statement, this proportion was lower than expected when considering the IJPDS’ requirement for all research articles to include a ‘Data Availability Statement’ when publishing. Reasons for this discrepancy are unclear but could be lack of awareness and inconsistent enforcement of the policy. It should also be noted that inclusion of a data availability statement does not necessarily mean the data were directly accessible, and only six of the 14 articles with a data availability statement provided access details. Similar trends have been observed across other journals and funding agencies where mandating publishing of data, source code and/or software [24, 25] has not always led to a meaningful uptick in sharing rates [15, 26].

RWE studies using individual-level health data are typically unable to publish their datasets in open access repositories due to legislation and policies designed to protect the privacy of individuals and institutions. Indeed, privacy concerns and risk of data misuse have consistently been identified as one of the top barriers to sharing by public health researchers [4]. One potential solution is the use of synthetic data, which can preserve the key statistical properties of the original data whilst mitigating privacy risks [27].

Sharing of analytical code generally poses minimal risk of disclosing the identities of individuals or health care organisations, or revealing sensitive health information. As such, code can often be made publicly available even if the underlying unit record data cannot. Releasing well annotated research code alongside data availability statements enables analyses to be reproduced using the same or periodically updated data, a major advantage when open data access is not possible.

However, code relating to some machine learning models may pose additional privacy risks. For example, training data may inadvertently be embedded within model descriptions or parameters, requiring careful review prior to release to ensure privacy is maintained [28]. Barring such exceptions—or licensing restrictions that explicitly prohibit the redistribution of code—there is little reason why RWE researchers using standard statistical methods, like the articles reviewed here, should not share code.

Despite this, we found that the proportion of articles reporting code availability was even lower than those reporting data availability. Moreover, none of the articles that included a statement on the availability of code provided open access to it. Statements that the code is “available on request”, fall short of best practice, as this approach lacks transparency, version control, and long-term accessibility.

Commonly cited barriers for code sharing include a lack of knowledge, experience, incentives and time, as well as concerns about the inclusion of potentially sensitive information [4, 29, 30]. Additionally, emotional barriers such as feelings of embarrassment or fear are powerful disincentives towards openly sharing code [29]. Many researchers are reluctant to publish work they do not believe is “polished” or of an acceptable standard to publicly share, [31] highlighting the need to provide safe environments for researchers to learn and develop the confidence to share their work. There are also fears regarding the consequences of the detection of errors, as well as loss of control over research outputs.

Shifting the culture

The Centre for Open Science has a strategy for implementing culture change, [32] which describes a pyramidal model of five progressive intervention levels: ‘Infrastructure, user interface/experience, communities, incentives, and policies.’ Advancements in tools and technical infrastructure have lowered barriers to sharing data and code. Open access repository-hosting platforms including GitHub or GitLab are highly accessible and allow for easy publishing of data or code. Furthermore, digital repositories such as Zenodo, Open Science Framework and Figshare offer persistent identifiers (e.g. DOIs), enabling researchers to archive versioned code for longer-term preservation and receive academic credit through citation.

At the community level, institutions, professional societies, networks, and conference organisers can play a key role in increasing the visibility and awareness of open science practices by actively promoting their benefits. Offering training courses and workshops, particularly for early career researchers, will help ensure the next generation of researchers are advocates for reproducible research practices which will be important in shaping the culture in the long-term [4]. Early career researchers are also likely to have more time to undertake training and can in turn support more senior researchers in adopting these new practices. Establishing Communities of Practice, wherein researchers can share knowledge, skills and code in a supportive environment may reduce apprehension about public sharing [29]. The Carpentries are an exemplary example of a Community of Practice that delivers data and coding workshops that build capacity and promote open, reproducible research practices [33].

Incentives, policies, and mandates can further reinforce open science practices. Incentive mechanisms from other disciplines can readily be adapted to population data science research. For instance, open data badges used to recognise data sharing have been demonstrated to increase data sharing within some scientific disciplines [34, 35]. Whilst comparable studies for open code and open materials badges are limited, some journals or institutions already offer these badges [36, 37] as a motivational incentive and to recognise researchers’ efforts in sharing their software. Such incentives should be underpinned by rigorous checks or peer review to ensure quality, executability, and so on, to build trust and accountability. Badges are only one possible approach. Communities, journals, and professional societies should consider other forms of awards such as grants or prizes for code sharing. For example, EarthCube, a geoscience research community, annually review open source computational notebooks and award the best example code [38].

At the top of the culture change pyramid is the implementation of mandatory code sharing policies, such as journals implementing requirements for releasing code alongside publications [39, 40]. An increasing number of journals are already adopting such policies. Funding agencies can also consider mandating code to be shared as a condition of projects receiving grants, akin to the United States National Institutes of Health model which requires data sharing plans to be submitted with grant applications [41].

Adequate financial and technical support is essential, as preparing data and code for sharing is technically complex and resource-intensive. Data sharing platforms can be designed not only to support sharing through technical means, but also to address policy barriers [42]. Artificial-Intelligence (AI)-powered tools also present the potential to facilitate sharing by automating aspects of code and data preparation [43].

These considerations reinforce that effective code sharing requires careful implementation. While sharing code can enhance transparency and reproducibility, potential risks include the facilitation of “sloppy science” through reuse without adequate understanding of underlying assumptions and modelling choices, and the propagation of misinformation through misleading or poorly contextualised reanalyses of data [44].

International initiatives

Whilst low code sharing rates are seen across health research in general [16], internationally there are several successful initiatives that could be adapted to help encourage code sharing. The United Kingdom (UK) has OpenSAFELY, an open source platform for analysing sensitive health data. Researchers are given access to synthetic data for the purpose of developing their statistical analysis software and are required to openly publish this software for review before it can be run against the real data [45]. As researchers are not provided direct access to the real data, the risk of disclosure in the code is negligible whilst the open release of the code facilitates transparency, reproducibility, and confidence in the health research. However, development and maintenance of the OpenSAFELY infrastructure has required substantial investments from research funding agencies and the UK government [46].

Another tool used broadly across the UK, USA and Canada is the ‘concept library’. These are platforms or portals that compile and curate standard concepts, code, or algorithms for reuse across multiple projects. For example, a standardised definition for phenotyping clinical conditions using administrative records [4749]. Concept libraries are a simple way of openly sharing best practice code and encouraging consistency across the sector. However, building a concept library does rely on researchers’ contributions and thus are subject to the cultural and resource barriers discussed above.

Limitations of the study

There are several limitations to our study. Firstly, we acknowledge the pool of articles reviewed was small and from a single journal, therefore we cannot generalise these results across the field. Furthermore, given the small sample size, we did not break down the articles into study types. However, future research could explore whether certain types of RWE studies, data sources, settings and funders are more likely to demonstrate open research practices.

We are hopeful these findings provide a starting point for discussing and improving the use of open science practices within this discipline. Understanding the barriers and enablers of open science practices amongst the population data science community, as well as implementing and evaluating cultural shift strategies, could be valuable areas of future work.

Conclusion

In summary, our analysis of RWE articles recently published in the IJPDS has provided an indication of the low prevalence of open science practices amongst population data science research. Code sharing in particular is uncommon. Culture change is an often-overlooked barrier to increasing code sharing rates. To enable a cultural shift, we must work towards normalising the practice through increasing visibility and creating supportive spaces and Communities of Practice to share knowledge and build trust. Introducing incentives and policies to reward and reinforce code sharing as a standard practice will help to sustain this culture and contribute towards improving transparency and reproducibility in population data science research. We recommend that (i) all journals publishing RWE research require articles to explicitly comply with or at least report on the five open science indicators and display Open Science Badges on publications [18], and (ii) reviewers explicitly report on the indicators when reviewing manuscripts submitted for publication. Our findings highlight an opportunity to enhance transparency in RWE research published in the IJPDS, and the need to foster a stronger culture of open science within the population data science community, so this becomes the norm.

Acknowledgements

This work was partially funded by a grant from the Australian Research Data Commons (DP793).

AI Statement

An author used Perplexity and ChatGPT for research and language editing recommendations. All text was written and edited by the authors.

Statement on Conflicts of Interest

The authors have no conflicts of interest to declare.

Ethics Statement

The research did not require ethical approval because it used publicly available information that is not personally identifiable.

Data Availability Statement

The data collected for this article can be found in Supplementary Tables 1 and 2.

Abbreviations

RWD: Real world data
RWE: Real world evidence
IJPDS: International Journal of Population Data Science

References

  1. Vicente-Saez R, Martinez-Fuentes C. Open Science now: A systematic literature review for an integrated definition. Journal of Business Research. 2018;88:428-36. 10.1016/j.jbusres.2017.12.043

    10.1016/j.jbusres.2017.12.043
  2. Barker M, Chue Hong NP, Katz DS, Lamprecht A-L, Martinez-Ortiz C, Psomopoulos F, et al. Introducing the FAIR Principles for research software. Scientific Data. 2022;9(1):622. 10.1038/s41597-022-01710-x

    10.1038/s41597-022-01710-x
  3. Cobey KD, Haustein S, Brehaut J, Dirnagl U, Franzen DL, Hemkens LG, et al. Community consensus on core open science practices to monitor in biomedicine. PLoS Biology. 2023;21(1):e3001949. 10.1371/journal.pbio.3001949

    10.1371/journal.pbio.3001949
  4. Harris JK, Johnson KJ, Carothers BJ, Combs TB, Luke DA, Wang X. Use of reproducible research practices in public health: A survey of public health analysts. PLoS One. 2018;13(9):e0202447. 10.1371/journal.pone.0202447

    10.1371/journal.pone.0202447
  5. Bezuidenhout LM, Leonelli S, Kelly AH, Rappert B. Beyond the digital divide: Towards a situated approach to open data. Science and Public Policy 2017;44(4):464-475. 10.1093/scipol/scw036

    10.1093/scipol/scw036
  6. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3(1):160018. 10.1038/sdata.2016.18

    10.1038/sdata.2016.18
  7. Schatz MC, Philippakis AA, Afgan E, Banks E, Carey VJ, Carroll RJ, et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genomics. 2022;2(1):100085. 10.1016/j.xgen.2021.100085

    10.1016/j.xgen.2021.100085
  8. Wohner C, Peterseil J, Klug H. Designing and implementing a data model for describing environmental monitoring and research sites. Ecological Informatics. 2022;70:None. 10.1016/j.ecoinf.2022.101708

    10.1016/j.ecoinf.2022.101708
  9. American Geophysical Union. Data & software for authors. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://www.agu.org/Publish-with-AGU/Publish/Author-Resources/Data-and-Software-for-Authors.

  10. American Astronomical Society. Policy statement on software. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://journals.aas.org/policy-statement-on-software/.

  11. eLife. Editorial process. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#process.

  12. PNAS. Editorial and journal policies. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://www.pnas.org/author-center/editorial-and-journal-policies#materials-and-data-availability.

  13. PeerJ. Policies & procedures. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://peerj.com/about/policies-and-procedures/#data-materials-sharing.

  14. McGrail K, Moran R, Keefe C, Preen D, Quan H, Sanmartin C, et al. A Position Statement on Population Data Science: The science of data about people. International Journal of Population Data Science. 2018;3(1):415. 10.23889/ijpds.v3i1.415

    10.23889/ijpds.v3i1.415
  15. Hamilton DG, Page MJ, Finch S, Everitt S, Fidler F. How often do cancer researchers make their data and code available and what factors are associated with sharing? BMC Medicine. 2022;20(1):438. 10.1186/s12916-022-02644-2

    10.1186/s12916-022-02644-2
  16. Hamilton DG, Hong K, Fraser H, Rowhani-Farid A, Fidler F, Page MJ. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ. 2023;382:e075767. 10.1136/bmj-2023-075767

    10.1136/bmj-2023-075767
  17. ARDC Ltd. Unearthing research software. [Internet] 2024 [cited 2024 Dec 13]; Available from 10.5281/zenodo.10530616.

    10.5281/zenodo.10530616
  18. Wang SV, Pottegård A. Building transparency and reproducibility into the practice of pharmacoepidemiology and outcomes research. American Journal of Epidemiology. 2024;193(11):1625-31. 10.1093/aje/kwae087

    10.1093/aje/kwae087
  19. Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, et al. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. Journal of Clinical Epidemiology. 2024;165:111199. 10.1016/j.jclinepi.2023.10.015

    10.1016/j.jclinepi.2023.10.015
  20. Pathak K, Marwaha JS, Chen HW, Krumholz HM, Matthews JB. Open science practices in research published in surgical journals: A cross-sectional study. medRxiv. 2023.05.02.23289357. 10.1101/2023.05.02.23289357

    10.1101/2023.05.02.23289357
  21. Centre for Open Science. TOP Factor. [Internet] 2024 [cited 2025 June 30]; Available from: https://www.cos.io/initiatives/top-guidelines

  22. Wang SV, Pottegård A, Crown W, Arlett P, Ashcroft DM, Benchimol EI, et al. HARmonized Protocol Template to Enhance Reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: A good practices report of a joint ISPE/ISPOR task force. Pharmacoepidemiology and Drug Safety. 2023;32(1):44-55. 10.1002/pds.5507

    10.1002/pds.5507
  23. Weberpals J, Wang SV. The FAIRification of research in real-world evidence: A practical introduction to reproducible analytic workflows using Git and R. Pharmacoepidemiology and Drug Safety. 2024;33(1):e5740. 10.1002/pds.5740

    10.1002/pds.5740
  24. PLOS Digital Health. Materials, Software and Code Sharing. [Internet]. 2023 [cited 2024 Dec 13]; Available from: https://journals.plos.org/plosone/s/materials-software-and-code-sharing

  25. BioMedCentral. Sharing your data, materials, and software. [Internet]. 2023 [cited 2024 Dec 13]; Available from: https://www.biomedcentral.com/getpublished/writing-resources/structuring-your-data-materials-and-software.

  26. McIntosh LD, Sumner J, Vitale C. Transparently reported research: An analysis of Wellcome-funding publications in 2016 and 2019. [Internet] 2021 [cited 2024 Dec 13]: Available from: 10.6084/m9.figshare.13810220.v1.

    10.6084/m9.figshare.13810220.v1
  27. Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. npj Digital Medicine. 2023;6(1):186. 10.1038/s41746-023-00927-3

    10.1038/s41746-023-00927-3
  28. Ritchie F, Tilbrook A, Cole C, Jefferson E, Krueger S, Mansouri-Benssassi E et al. Machine learning models in trusted research environments – understanding operational risks. International Journal of Population Data Science. 2023;8(1):2165 10.23889/ijpds.v8i1.2165

    10.23889/ijpds.v8i1.2165
  29. Gomes DGE, Pottier P, Crystal-Ornelas R, Hudgins EJ, Foroughirad V, Sánchez-Reyes LL, et al. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B (Biological Sciences). 2022;289(1987):20221113. 10.1098/rspb.2022.1113

    10.1098/rspb.2022.1113
  30. Science D, Simons N, Goodey G, Hardeman M, Clare C, Gonzales S, et al. The State of Open Data 2021 [Internet]. Digital Science; 2021 [cited 2024 Dec 13]. Available from: https://digitalscience.figshare.com/articles/report/The_State_of_Open_Data_2021/17061347/1.

  31. Barnes N. Publish your computer code: it is good enough. Nature 2020;467:753. 10.1038/467753a

    10.1038/467753a
  32. Nosek B. Strategy for Culture Change. [Internet] Center for Open Science. 2019 [cited 2024 Dec 13]; Available from: https://www.cos.io/blog/strategy-for-culture-change.

  33. The Carpentries. Workshops. [Internet] 2025 [cited 2025 June 30]; Available from: https://carpentries.org/workshops/.

  34. Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S, Falkenberg L-S, et al. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLOS Biology. 2016;14(5):e1002456. 10.1371/journal.pbio.1002456

    10.1371/journal.pbio.1002456
  35. Rowhani-Farid A, Barnett A. Badges for sharing data and code at Biostatistics: an observational study. F1000Research. 2018;7(90). 10.12688/f1000research.13477.2

    10.12688/f1000research.13477.2
  36. Nüst D, Eglen SJ. CODECHECK: an Open Science initiative for the independent†execution of computations underlying research articles during peer review to improve reproducibility [version 2; peer review: 2 approved]. F1000Research. 2021;10(253). 10.12688/f1000research.51738.2

    10.12688/f1000research.51738.2
  37. Sharma NK, Ayyala R, Deshpande D, Patel Y, Munteanu V, Ciorba D, et al. Analytical code sharing practices in biomedical research. PeerJ Computer Science. 2024;10:e2066. 10.7717/peerj-cs.2066

    10.7717/peerj-cs.2066
  38. EarthCube. Notebook Directory. [Internet] 2022 [cited 2024 Dec 13] Available from: https://www.earthcube.org/notebooks

  39. BMJ. Mandatory data and code shring for research published by the BMJ. BMJ 2024;384:q324. 10.1136/bmj.q324

    10.1136/bmj.q324
  40. Hamilton DG, Fraser H, Hoekstra R, Fidler F. Journal policies and editors’ opinions on peer review. Elife 2020;9:e62529. 10.7554/eLife.62529

    10.7554/eLife.62529
  41. Bethesda, MD: Office of The Director, National Institutes of Health;.
  42. Devriendt T, Shabani M, Lekadir K, Borry P. Data sharing platforms: instruments to inform and shape science policy on data sharing? Scientometrics 2022;127:3007-3019. 10.1007/s11192-022-04361-2

    10.1007/s11192-022-04361-2
  43. Siegel ZS, Kapoor S, Nagdir N, Stroebl B, Narayanan A. CORE-Bench: Fostering the credibility of published research through a computational reproducibility agent benchmark. arXiv:2409.11363. 10.48550/arXiv.2409.11363

    10.48550/arXiv.2409.11363
  44. Santana C. The Value of Openness in Open Science. Canadian Journal of Philosophy. 2024;54(4):251–65. 10.1017/can.2024.44

    10.1017/can.2024.44
  45. OpenSAFELY. About OpenSAFELY. [Internet] 2023 [cited 2024 Dec 13]; Available from: https://www.opensafely.org/about/#transparency-and-public-logs.

  46. OpenSAFELY. The past, present and future of OpenSAFELY. [Internet] 2024 [cited 2024 Dec 13]; Available from: https://www.bennett.ox.ac.uk/blog/2024/12/a-successful-symposium-thank-you-for-coming/OpenSAFELY-booklet-2024.pdf.

  47. Thayer D, Bown D, Leake T, Jones J-L, Noyce R, Brooks C, Ford D. Code List Library: A solution to improve research repeatability, transparency, and efficiency by curating lists of clinical codes. International Journal of Population Data Science. 2018;3(4). 10.23889/ijpds.v3i4.891

    10.23889/ijpds.v3i4.891
  48. Almowil ZA, Zhou S-M, Brophy S. Concept libraries for automatic electronic health record based phenotyping: A review. International Journal of Population Data Science. 2021;6(1):1362. 10.23889/ijpds.v6i1.1362

    10.23889/ijpds.v6i1.1362
  49. Smith M, Turner K, Bond R, Kawakami T, Roos LL. The concept dictionary and glossary at MCHP: Tools and techniques to support a population research data repository. International Journal of Population Data Science. 2021;4(1):1124. 10.23889/ijpds.v4i1.1124

    10.23889/ijpds.v4i1.1124

Article Details

How to Cite
Liu, L., Jorm, L., Kim, N., Honeyman, T. and Vajdic, C. M. (2026) “Extent of Open Science Practices in the Reporting of Real World Evidence Research”, International Journal of Population Data Science, 10(2). doi: 10.23889/ijpds.v10i2.2960.