Indexed by:
Abstract:
The Common Data Elements (CDEs) standard of the International organization for Standardization (ISO) 11179 is commonly used in the field of clinical data processing. The Biomedical Research Integrated Domain Group (BRIDG) model is the framework for biomedical and clinical research. Mapping CDEs to BRIDG (also known as CDE classification) would help with interoperability and data analysis in the field of clinical research. That said, manually mapping CDEs to their corresponding BRIDG class is highly time-consuming and labor-intensive. In this paper we present a new classification algorithm along with a new oversampling method. Our algorithm uses the Term Frequency-Inverse Document Frequency (TF-IDF) as the feature representation method. By assigning different weights to various attributes, we enable more important attributes to perform more important roles during the mapping process. In addition, the oversampling method generates every new attribute in the minor class by picking the length and setting the word of the new attribute according to the existing training set. Our research outcomes demonstrate significant contributions to the field in the following ways: (1) Generation of a new CDE classification algorithm that outperforms existing algorithms in the literature, including the Random Forest Classifier, Linear Support Vector Classification (SVC), Multinomial Naive Bayes (NB), Logistic Regression, and Long Short-Term Memory (LSTM) networks, in terms of accuracy, precision, recall, and F-1 score measures. (2) Generation of a new oversampling method able to improve CDE classification accuracy for Random Forest and Multinomial NB. (3) Our classification algorithm employs two novel attributes, namely 'Data Element Preferred Definition' and 'Document,' which are more efficient at classifying CDEs than the six attributes traditionally selected by domain experts. © 2021 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
Year: 2021
Page: 2788-2795
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: