A Shared Task Involving Multi-label Classification of Clinical Free Text

John Pestian¹, Christopher Brew², Paweł Matykiewicz^1,4, DJ Hovermale², Neil Johnson¹, K. Bretonnel Cohen³, and Wlodzislaw Duch⁴,
¹Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio, USA, and
²Ohio State University, Department of Linguistics,
³University of Colorado School of Medicine
⁴Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.

Abstract.

This paper reports on a shared task involving the assignment of ICD- 9-CM codes to radiology reports. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the first freely distributable corpus of fully anonymized clinical text. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large and commercially significant set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.

Reference: Pestian J, Brew C, Matykiewicz P, Hovermale D.J, Johnson N, Cohen K.B, Duch W, A shared task involving multi-label classification of clinical free text. BioNLP 2007: Biological, translational, and clinical language processing, pp. 97–104, ACL 2007.

Free PDF on ACM conference page, 330 KB.

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.