Reviews for Student Achievement Trajectories in Ontario: Creating and validating a province-wide, multi-cohort and longitudinal database
By Jeanne Sinclair, Scott Davies and Magdalena Janus
Article as submitted
Article Authors
Submission Date: 17/07/2022
Round 1 Reviews
Reviewer A
Anonymous Reviewer
Completed 11/08/2022
https://doi.org/10.23889/ijpds.v8i1.1843.review.r1.reviewa
Overall, this is a fairly well written article, describes a fairly useful data linkage process and could be published. But there are a few things that should be done to make this more relevant and useful for readers. This is a complicated field, I am well aware of that, and there are substantial differences in data linkage between countries and terminology (as it appears). I’ve put down a lot of comments but that really reflects the complexity of the process not the quality of the work. You are submitting this to an international journal and as such, it should try to appeal more to a wider readership and use language that is more universal.
First, the language is very Canada focused and uses terms that are unfamiliar to an international reader, and you have submitted this to an international journal. Terms like “job action”, “jurisdiction”, “K-12” are unfamiliar, do you mean strikes, nations/countries and high school? There are some other wording issues, for example, you talk about matching, but generally this is called linkage in the UK (as is the journal), maybe change it to linkage?
The mention in the abstract that the data sources “lacked common identifiers” is not really accurate also. Sex and DOB are common identifiers. What you mean is common index numbers or personal identification numbers. It’s not uncommon that these are not present, many countries outside Northern Europe have to use probabilistic linkage using sex, DOB, address etc. You can just write “We developed an innovative matching protocol, and validated how the resulting database could be generalized to Ontario’s student population.” It was not also clear what are "fully matched” cases, is that exact links? Do you also have partial links?
I have no idea at what age to children usually start kindergarten in Ontario or when the go to school. Please provide clear age ranges for the data collections and rather than mapping out the data availability in grades, could you do that by ages.
I assume school is mandatory, but I wonder if kindergarten is? It would be important to know what percent of Canadian children go to kindergarten and get their EDI assessment. If this is not close to 100%, you start with a non-representative sample.
The mode and period of assessment for EDI is a bit confusing. You say it was run in three cycles that each ran three years. Was a child assessed only once during that time or multiple times and at what age/ages? Were some children assessed at a different age and are the assessment scores age adjusted? Were the cycles 4-5 ran for children at a specific age (and were some age groups skipped) or for children of different ages? This relates back to my earlier comment that it would be better if you mapped your cohorts out by ages, rather than grades.
The CanNECD description is a bit confusing. You write “This study created 2,058 neighbourhoods covering Canada (with the exception of Nunavut), each with a minimum of 50 and a maximum of 600 valid EDI records.” And then “The index remains a permanent component of the CanNECD EDI database. As of 2014 the database contained 798,987 EDI records spanning the previous decade and including EDI data matched with the SES index from all Canadian provinces and territories except Nunavut” Is this a neighbourhood or individual level measure? What is the CanNECD EDI database, is it a different database? Maybe just focus on how many neighbourhoods are there in Ontario and what variables have you specifically used.
Page 8 – when you say “As a first step, only students with unique and exact matches on these variables were included.” I assume you later included partial matches, what is the difference here between exact and partial matches? This is a bit further confused when later on you write “The linking process created a fully matched dataset of the EDI and all four EQAO data points (n=86,778), which is a subset of a partially matched dataset (n=155,082) in which the EDI is matched to all four EQAO datapoints except those with job action or COVID-19 related cancellations.” So, when you refer to partially matched here, do you mean they are exact linked, but do not include data for all assessments?
Table 2 is a bit confusing and I recommend doing that as a timeline plot, where x-axis is calendar time and y-axis are your cohorts. Sth like this https://stackoverflow.com/questions/44265512/creating-a-timeline-in-r but leave gaps for years for which there is not data. If EDI and CanNECD data is included for all years, just state that and exclude these from the plot/table. Again, it may be better to refer to child age at the start of the linkage or their year of birth.
You should consider adding a data linkage diagram, I think it would make it much easier for others to follow you.
I would like to see a Data access statement. How could other researchers use this linked data, or can they not? If they would like to do sth similar in Ontario, what would they have to do?
Also, the discussion could include some comparison to other provinces in Canada or other countries. How do the linkage rates, available variables, time dimension etc compare to other parts of Canada. Would this stand out as a significant improvement over other regions? (Is Ontario big in population in terms of Canada and can this data set make a big impact?) Discussion could also include a section on potential links to other data sources, such as crime and justice, health? Or the work that you/others can do next. Overall, the discussion and introduction should really try to sell the work more and appeal to a wider audience. Think about why someone outside Ontario should be reading this?
Minor points:
In a couple of places you mention flexible and important partnerships and highlight these also in the abstract. If they were so important, why do you not discuss these more? Why were the crucial? Or maybe just leave that out?
Introduction, end of last paragraph. I would move “prohibitive costs” last in the list as the example that follows is on co.
Page 3 - “In Canada, over 1.5 million children have been assessed with the EDI since its inception [12].” Add “inception since xxx year.”
Page 3 – “The other major data-sharing partner was Ontario’s Educational Quality Assessment Office (EQAO) [add: a public authority/government agency ?] EQAO is Ontario’s main provider…”
Page 3 – “home environments and routines outside of school”, what is a routine outside school?
Page 4 – “Vulnerability is assigned using the first baseline dataset: children who score at or below he [add: bottom? 10th percentile?] in a given domain”
Page 5 – the order of your subsections is different, and you have not listed sex and DOB in additional variables.
Page 5-6. I would remove Table 1 or put in a supplement. This is not your core focus and takes up too much space.
Page 6 Overview of research linking EDI-EQAO thus far – I find the discussion of results a bit out of place in this section. It should go in the introduction to justify your work or in the discussion to hint at what can be done later.
Page 8 “which added 51,000 new matches to the 2005, 2008, and 2011 cohorts. ” say that in terms of how did the linkage rate improve, what was the % linked after that?
Page 8 – when you say “As a first step, only students with unique and exact matches on these variables were included.” I assume you later included partial matches? This is a bit further confused when later on you write “The linking process created a fully matched dataset of the EDI and all four EQAO data points (n=86,778), which is a subset of a partially matched dataset (n=155,082) in which the EDI is matched to all four EQAO datapoints except those with job action or COVID-19 related cancellations.”
Page 10- “… can be interpreted like a r2 value: it measures proportions of variance in a continuous variable that is explained by a grouping variable and is standardized between -1 and 1. Effect sizes are interpreted as small at 0.01, medium at 0.06, and large at 0.14 or greater.” Something is not right here, r2 does not range between -1 and 1, but 0-1.
Reccomendation: Revisions Required
Reviewer B
Anonymous Reviewer
Completed 29/08/2022
https://doi.org/10.23889/ijpds.v8i1.1843.review.r1.reviewb
This was a well-written manuscript that describes a very complex and well-conceived study. I, however, am struggling to understand how the partial linkage of child-level EDI records furthers our understanding of children and their academic trajectories, as the EDI has not been validated for that purpose and the linkage was only partial. And if the intent is to detail an “innovative” record matching protocol, their method, which is purely deterministic, does not fit the bill. Unfortunately, on those two points, I have to recommend against publication, but appreciated the opportunity to review.
Reccomendation: Decline Submission
Editor Decision
Merran Beckley Smith
Decision Date: 02/10/2022
Decision: Resubmit for Review
https://doi.org/10.23889/ijpds.v8i1.1843.review.r1.dec
Dear Jeanne Sinclair, Scott Davies, Magdalena Janus:
We have reached a decision regarding your submission to International Journal of Population Data Science, "Student Achievement Trajectories in Ontario: Creating and validating a province-wide, multi-cohort and longitudinal database".
Please address the attached reviewers' comments and return to us: one clean and one tracked changes version of your revised manuscript, plus a point by point letter of response/rebuttal, by 26 October 2022.
Our decision is to: Resubmit for Review
Author Response
Elizabeth Ford
Response Date: 22/11/2022
Dear Editor and Reviewers,
Thank you very much for your review and constructive feedback. We have incorporated your suggestions. Specific responses are found in the table below.
Kind regards,
The authors
Reviewer’s suggestion | Authors’ response |
---|---|
Reviewer A | |
You are submitting this to an international journal and as such, it should try to appeal more to a wider readership and use language that is more universal. | Thank you for this suggestion. We have attempted to update the manuscript to use language that is more universal. |
First, the language is very Canada focused and uses terms that are unfamiliar to an international reader, and you have submitted this to an international journal. Terms like “job action”, “jurisdiction”, “K-12” are unfamiliar, do you mean strikes, nations/countries and high school? | We have replaced or explicitly defined the terms mentioned here throughout the paper to make the language more clear to an international audience. |
There are some other wording issues, for example, you talk about matching, but generally this is called linkage in the UK (as is the journal), maybe change it to linkage? | Thank you for this recommendation. We have replaced references to “matching” with “linking” throughout the paper. |
The mention in the abstract that the data sources “lacked common identifiers” is not really accurate also. Sex and DOB are common identifiers. What you mean is common index numbers or personal identification numbers. It’s not uncommon that these are not present, many countries outside Northern Europe have to use probabilistic linkage using sex, DOB, address etc. You can just write “We developed an innovative matching protocol, and validated how the resulting database could be generalized to Ontario’s student population.” | We revised the sentence per your recommendation and replaced references to “common identifiers” with “personal identification numbers”. |
It was not also clear what are "fully matched” cases, is that exact links? Do you also have partial links? | Thanks for noticing this potentially confusing language. Yes, there are partial links, but we removed reference to them in the abstract and described the partial links more thoroughly in the body of the manuscript. |
I have no idea at what age to children usually start kindergarten in Ontario or when the go to school. |
We have included this clarification on page 4: “In Ontario, children can be enrolled in non-mandatory, publicly funded kindergarten in September of the year they turn 4 years of age (junior kindergarten), and in September of the year they turn 5 years they can be enrolled in senior kindergarten. Publicly funded compulsory schooling begins at age 6. Approximately 97% of students attending public school in Ontario entered school in kindergarten [22]. Senior kindergarten teachers complete the EDI for each child in their classroom, based on their observation of the child for at least 4 to 6 months (median age of administration in the present dataset is 5.68 years).” |
Please provide clear age ranges for the data collections and rather than mapping out the data availability in grades, could you do that by ages. |
Thanks for this suggestion. Mapping the data by age does make sense, but Ontario uses the calendar year instead of the academic year to place students in the appropriate grade level. Therefore, a student entering Grade 3 for example will be 7 or 8 (they have to turn 8 by Dec 31 of the year they enter Grade 3). Because of this potentially confusing policy, we retained the organization of the data by grade, but included verbiage about the ages for each grade level. Page 5: “Ontario places children in the appropriate grade based on the calendar year; they must turn a given age by December 3. Thus, Ontario students enter Grade 3 in September at age 7-8, Grade 6 at age 10-11, Grade 9 at age 13-14, and Grade 10 at age 14-15. The EQAO assessment is taken in spring of the academic year.” |
I assume school is mandatory, but I wonder if kindergarten is? It would be important to know what percent of Canadian children go to kindergarten and get their EDI assessment. If this is not close to 100%, you start with a non-representative sample. | Thank you for this suggestion. On page 4 we now have a reference indicating that 97% of Ontario senior kindergarten-age children (who continue in public schools) are enrolled in public-school kindergartens. |
The mode and period of assessment for EDI is a bit confusing. You say it was run in three cycles that each ran three years. Was a child assessed only once during that time or multiple times and at what age/ages? Were some children assessed at a different age and are the assessment scores age adjusted? Were the cycles 4-5 ran for children at a specific age (and were some age groups skipped) or for children of different ages? This relates back to my earlier comment that it would be better if you mapped your cohorts out by ages, rather than grades. |
We hope the clarification on the age of EDI assessment eases the interpretation of the EDI administration. We also revised as follows on page 5: “The present Ontario-based dataset consists of the EDI data collected over nine years in three-year cycles [23]: Cycle 1: 2004-2006, Cycle 2: 2007-2009, Cycle 3: 2010-2012. In each Cycle, the EDI was completed for all senior kindergarten students in approximately a third of all publicly funded school board in Ontario; thus by the end of the 3-year cycle the EDI assessment was implemented just once in every school district in Ontario. This roll-out strategy allowed communities to utilize longitudinal data by which they could examine changes in patterns of child vulnerabilities and thereby guide new policy and funding directions, without the cost of administering the EDI every single year.” |
The CanNECD description is a bit confusing. You write “This study created 2,058 neighbourhoods covering Canada (with the exception of Nunavut), each with a minimum of 50 and a maximum of 600 valid EDI records.” And then “The index remains a permanent component of the CanNECD EDI database. As of 2014 the database contained 798,987 EDI records spanning the previous decade and including EDI data matched with the SES index from all Canadian provinces and territories except Nunavut” Is this a neighbourhood or individual level measure? What is the CanNECD EDI database, is it a different database? Maybe just focus on how many neighbourhoods are there in Ontario and what variables have you specifically used. |
We have removed these sentences pertaining to the nationwide CanNECD database: “The index remains a permanent component of the CanNECD EDI database. As of 2014 the database contained 798,987 EDI records spanning the previous decade and including EDI data matched with the SES index from all Canadian provinces and territories except Nunavut [13].” In this section (page 7), we added that the CanNECD SES Index is a neighbourhood-level measure and that there are 797 unique neighbourhoods in the present dataset. |
Page 8 – when you say “As a first step, only students with unique and exact matches on these variables were included.” I assume you later included partial matches, what is the difference here between exact and partial matches? This is a bit further confused when later on you write “The linking process created a fully matched dataset of the EDI and all four EQAO data points (n=86,778), which is a subset of a partially matched dataset (n=155,082) in which the EDI is matched to all four EQAO datapoints except those with job action or COVID-19 related cancellations.” So, when you refer to partially matched here, do you mean they are exact linked, but do not include data for all assessments? |
We have modified the paragraph in question on page 7-8 to be more clear. In the revision, we did not use the term “partial match”, to hopefully avoid confusion. The paragraph in question on page 9 now reads: The proportion of EDI participants linked to the four EQAO datapoints ranged from 42-50%, depending on the cohort year (Table 2). For cohorts from 2009-2012, which had fewer EQAO assessments to match due to job action or COVID-19, up to 57% of EDI cases were linked to at least one EQAO administration. We began the linkage process with 374,239 total EDI cases, to which 174,685 EQAO Grade 3 cases were linked using the process described above. EQAO grades 6, 9, and 10 administrations were then linked to that dataset using a stepwise inner-join linking approach. “Inner-join” refers to a linkage of two datasets that restricts the outcome dataset to participants who exist in both datasets. The result was 155,082 cases when including participants who are missing EQAO administrations due to job action or COVID-19. From there, a fully linked dataset (n=86,778) was created which includes participants who took the EDI and each of the four EQAO assessments (with no missing EQAO datapoints). Figure 1 visually depicts the linking process. We also added this paragraph on page 8 to clarify the linkage process: “We merged the subsequent grade 6, 9, and 10 EQAO scores to the EDI-Grade 3 dataset simply by isolating each EDI administration year, linking the datasets on the EQAO student personal identification number, and then re-merging all administration years together. Three versions were created using a stepwise inner join merging strategy that restricted each cohort’s linked data to the EQAO administration with the least number of participants present for each EQAO administration that was available (see Figure 1). The least restrictive version of the dataset captures all students who were present for at least one EQAO administration; the second least restrictive restricts to students present for all four EQAO administrations (3, 6, 9, and 10) except missingness due to job action and COVID-19, and the most restrictive restricts to students who were present for all four EQAO administrations with no missingness. The fewer assessment timepoints included, the greater the size of the dataset, since each additional assessment adds the constraint that the participant be present for the linked dataset with no missingness. For researchers interested in pursuing analyses about specific academic outcomes (i.e., just literacy or numeracy, or only academic outcomes through Grade 6), the size of the dataset will be larger, as literacy is not assessed in Grade 9, and numeracy is not assessed in Grade 10. The rationale for creating a dataset with no missingness is described later in the section on validation of the linkage and discussion below.” |
Table 2 is a bit confusing and I recommend doing that as a timeline plot, where x-axis is calendar time and y-axis are your cohorts. Sth like this https://stackoverflow.com/questions/44265512/creating-a-timeline-in-r but leave gaps for years for which there is not data. If EDI and CanNECD data is included for all years, just state that and exclude these from the plot/table. |
Thank you for this suggestion. We agree Table 2 was confusing. The revised Table 2 (now Table 1) takes the form you recommend (page 9). Per your recommendation, we excluded CanNECD from this table. |
You should consider adding a data linkage diagram, I think it would make it much easier for others to follow you. | Thank you for the suggestion. We have added a data linkage diagram (Figure 1) on page 10. |
I would like to see a Data access statement. How could other researchers use this linked data, or can they not? If they would like to do sth similar in Ontario, what would they have to do? |
Data access statement The data set at the centre of this study is held securely at OCCS. Data-sharing agreements prohibit OCCS from making the data set publicly available, but access may be granted to those who meet pre-specified criteria for confidential access by contacting the EDI team at OCCS at https://edi.offordcentre.com/contact. The full data set creation plan and underlying analytic code are available from the authors upon request; programs may rely upon coding templates or macros that are unique to OCCS. |
Also, the discussion could include some comparison to other provinces in Canada or other countries. How do the linkage rates, available variables, time dimension etc compare to other parts of Canada. Would this stand out as a significant improvement over other regions? (Is Ontario big in population in terms of Canada and can this data set make a big impact?) Discussion could also include a section on potential links to other data sources, such as crime and justice, health? Or the work that you/others can do next. Overall, the discussion and introduction should really try to sell the work more and appeal to a wider audience. Think about why someone outside Ontario should be reading this? |
This has been added to the discussion on page 14: “While several provinces and at least one territory in Canada practice data linkage aiming to enable the study of children’s educational trajectory similar to our study goals (e.g., Manitoba [17], British Columbia [69]), the methods of linking and their success are not directly comparable with ours. The major limitation of our study was lack of common identifiers, which is a specific feature of government administration in Ontario – Canada’s most populous province – where separate identifiers are issued through health authorities and education authorities without an existing cross-walk between the two. In the future, our study may pave the way for connecting educational data to administrative databases (e.g., 64), once legislative obstacles allow the combination of health and education identifiers. In the meantime, however, our study has tested a unique method that allows building useful, linked datasets despite substantial barriers, and creates a potential to catch up to the educational research conducted in other parts of the country.” |
In a couple of places you mention flexible and important partnerships and highlight these also in the abstract. If they were so important, why do you not discuss these more? Why were the crucial? Or maybe just leave that out? | Thank you for this comment. We agree that there is not sufficient emphasis on partnerships in the body of the paper to merit multiple mentions in the abstract. We have de-emphasized partnerships in the abstract, but retained the discussion of the partnerships in the body of the paper as they contextualize the dataset creation and this study. |
Introduction, end of last paragraph. I would move “prohibitive costs” last in the list as the example that follows is on co. Page 3 - “In Canada, over 1.5 million children have been assessed with the EDI since its inception [12].” Add “inception since xxx year.” Page 3 – “The other major data-sharing partner was Ontario’s Educational Quality Assessment Office (EQAO) [add: a public authority/government agency ?] EQAO is Ontario’s main provider…” Page 3 – “home environments and routines outside of school”, what is a routine outside school? Page 4 – “Vulnerability is assigned using the first baseline dataset: children who score at or below he [add: bottom? 10th percentile?] in a given domain” Page 5 – the order of your subsections is different, and you have not listed sex and DOB in additional variables. Page 5-6. I would remove Table 1 or put in a supplement. This is not your core focus and takes up too much space. Page 6 Overview of research linking EDI-EQAO thus far – I find the discussion of results a bit out of place in this section. It should go in the introduction to justify your work or in the discussion to hint at what can be done later. Page 8 “which added 51,000 new matches to the 2005, 2008, and 2011 cohorts. ” say that in terms of how did the linkage rate improve, what was the % linked after that? Page 8 – when you say “As a first step, only students with unique and exact matches on these variables were included.” I assume you later included partial matches? This is a bit further confused when later on you write “The linking process created a fully matched dataset of the EDI and all four EQAO data points (n=86,778), which is a subset of a partially matched dataset (n=155,082) in which the EDI is matched to all four EQAO datapoints except those with job action or COVID-19 related cancellations.” Page 10- “… can be interpreted like a r2 value: it measures proportions of variance in a continuous variable that is explained by a grouping variable and is standardized between -1 and 1. Effect sizes are interpreted as small at 0.01, medium at 0.06, and large at 0.14 or greater.” Something is not right here, r2 does not range between -1 and 1, but 0-1. |
Thank you for these suggestions. All have been incorporated. |
Reviewer B | |
This was a well-written manuscript that describes a very complex and well-conceived study. I, however, am struggling to understand how the partial linkage of child-level EDI records furthers our understanding of children and their academic trajectories, as the EDI has not been validated for that purpose and the linkage was only partial. And if the intent is to detail an “innovative” record matching protocol, their method, which is purely deterministic, does not fit the bill. | We appreciate your feedback. We agree that the process of linkage was deterministic and have removed the word “innovative” from our description of our linkage process. However, the EDI has been utilized in many studies as a predictor of later academic outcomes, so we believe it is appropriate for this use. Our current and future research indeed focuses on further validation of the EDI for the purpose of predicting and understanding later academic outcomes at the population level. |
Round 1 Reviews
Reviewer A
Anonymous Reviewer
Completed 06/12/2022
https://doi.org/10.23889/ijpds.v8i1.1843.review.r2.reviewa
The authors have taken care to address the comments I raised and the article can be published.
Reccomendation: Accept Submission
Reviewer B
Anonymous Reviewer
Completed 15/11/2022
https://doi.org/10.23889/ijpds.v8i1.1843.review.r2.reviewb
Reccomendation: Decline Submission
Editor Decision
Merran Beckley Smith
Decision Date: 14/12/2022
Decision: Accept Submission
https://doi.org/10.23889/ijpds.v8i1.1843.review.r2.dec
Dear Jeanne Sinclair, Scott Davies, Magdalena Janus:
We have reached a decision regarding your submission to International Journal of Population Data Science, "Student Achievement Trajectories in Ontario: Creating and validating a province-wide, multi-cohort and longitudinal database", and are delighted to inform you that our decision is to: Accept Submission.
We look forward to working with you through the next stages towards final publication.
Please get in touch if you have any queries going forward. Thank you.
Kind Regards