TY - JOUR
T1 - Feature Selection with Small Data Sets
T2 - Identifying Feature Importance for Predictive Classification of Return-to-Work Date after Knee Arthroplasty
AU - Rietdijk, Harald H.
AU - Strijbos, Daniël O.
AU - Conde-Cespedes, Patricia
AU - Dijkhuis, Talko B.
AU - Oldenhuis, Hilbrand K.E.
AU - Trocan, Maria
PY - 2024/10/15
Y1 - 2024/10/15
N2 - In recent decades, the number of cases of knee arthroplasty among people of working age has increased. The integrated clinical pathway ‘back at work after surgery’ is an initiative to reduce the possible cost of sick leave. The evaluation of this pathway, like many clinical studies, faces the challenge of small data sets with a relatively high number of features. In this study, we investigate the possibility of identifying features that are important in determining the duration of rehabilitation, expressed in the return-to-work period, by using feature selection tools. Several models are used to classify the patient’s data into two classes, and the results are evaluated based on the accuracy and the quality of the ordering of the features, for which we introduce a ranking score. A selection of estimators are used in an optimization step, reorganizing the feature ranking. The results show that for some models, the proposed optimization results in a better ordering of the features. The ordering of the features is evaluated visually and identified by the ranking score. Furthermore, for all models, higher accuracy, with a maximum of 91%, is achieved by applying the optimization process. The features that are identified as relevant for the duration of the return-to-work period are discussed and provide input for further research.
AB - In recent decades, the number of cases of knee arthroplasty among people of working age has increased. The integrated clinical pathway ‘back at work after surgery’ is an initiative to reduce the possible cost of sick leave. The evaluation of this pathway, like many clinical studies, faces the challenge of small data sets with a relatively high number of features. In this study, we investigate the possibility of identifying features that are important in determining the duration of rehabilitation, expressed in the return-to-work period, by using feature selection tools. Several models are used to classify the patient’s data into two classes, and the results are evaluated based on the accuracy and the quality of the ordering of the features, for which we introduce a ranking score. A selection of estimators are used in an optimization step, reorganizing the feature ranking. The results show that for some models, the proposed optimization results in a better ordering of the features. The ordering of the features is evaluated visually and identified by the ranking score. Furthermore, for all models, higher accuracy, with a maximum of 91%, is achieved by applying the optimization process. The features that are identified as relevant for the duration of the return-to-work period are discussed and provide input for further research.
KW - feature selection
KW - knee arthroplasty
KW - machine learning
KW - selectie van functie
KW - knie artroplastiek
KW - machinaal leren
U2 - 10.3390/app14209389
DO - 10.3390/app14209389
M3 - Article
SN - 2076-3417
VL - 14
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 20
M1 - 9389
ER -