Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study

Journal article

Kate H Bentley, Emily M. Madsen, Eugene Song, Yu Zhou, Victor M Castro, Hyunjoon Lee, Y. Lee, J. Smoller
JMIR Formative Research, 2023

Semantic Scholar DOI PubMedCentral PubMed

Cite

APA Click to copy
Bentley, K. H., Madsen, E. M., Song, E., Zhou, Y., Castro, V. M., Lee, H., … Smoller, J. (2023). Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study. JMIR Formative Research.

Chicago/Turabian Click to copy
Bentley, Kate H, Emily M. Madsen, Eugene Song, Yu Zhou, Victor M Castro, Hyunjoon Lee, Y. Lee, and J. Smoller. “Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study.” JMIR Formative Research (2023).

MLA Click to copy
Bentley, Kate H., et al. “Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study.” JMIR Formative Research, 2023.

BibTeX Click to copy

@article{kate2023a,
  title = {Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study},
  year = {2023},
  journal = {JMIR Formative Research},
  author = {Bentley, Kate H and Madsen, Emily M. and Song, Eugene and Zhou, Yu and Castro, Victor M and Lee, Hyunjoon and Lee, Y. and Smoller, J.}
}

Abstract

Background Prior suicide attempts are a relatively strong risk factor for future suicide attempts. There is growing interest in using longitudinal electronic health record (EHR) data to derive statistical risk prediction models for future suicide attempts and other suicidal behavior outcomes. However, model performance may be inflated by a largely unrecognized form of “data leakage” during model training: diagnostic codes for suicide attempt outcomes may refer to prior attempts that are also included in the model as predictors. Objective We aimed to develop an automated rule for determining when documented suicide attempt diagnostic codes identify distinct suicide attempt events. Methods From a large health care system’s EHR, we randomly sampled suicide attempt codes for 300 patients with at least one pair of suicide attempt codes documented at least one but no more than 90 days apart. Supervised chart reviewers assigned the clinical settings (ie, emergency department [ED] versus non-ED), methods of suicide attempt, and intercode interval (number of days). The probability (or positive predictive value) that the second suicide attempt code in a given pair of codes referred to a distinct suicide attempt event from its preceding suicide attempt code was calculated by clinical setting, method, and intercode interval. Results Of 1015 code pairs reviewed, 835 (82.3%) were nonindependent (ie, the 2 codes referred to the same suicide attempt event). When the second code in a pair was documented in a clinical setting other than the ED, it represented a distinct suicide attempt 3.3% of the time. The more time elapsed between codes, the more likely the second code in a pair referred to a distinct suicide attempt event from its preceding code. Code pairs in which the second suicide attempt code was assigned in an ED at least 5 days after its preceding suicide attempt code had a positive predictive value of 0.90. Conclusions EHR-based suicide risk prediction models that include International Classification of Diseases codes for prior suicide attempts as a predictor may be highly susceptible to bias due to data leakage in model training. We derived a simple rule to distinguish codes that reflect new, independent suicide attempts: suicide attempt codes documented in an ED setting at least 5 days after a preceding suicide attempt code can be confidently treated as new events in EHR-based suicide risk prediction models. This rule has the potential to minimize upward bias in model performance when prior suicide attempts are included as predictors in EHR-based suicide risk prediction models.