• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Li, Shengyu (Li, Shengyu.) [1] | Huang, Yulong (Huang, Yulong.) [2] | Kasukurthi, Mohan Vamsi (Kasukurthi, Mohan Vamsi.) [3] | Yang, Jiajie (Yang, Jiajie.) [4] | Li, Dongqi (Li, Dongqi.) [5] | Yang, Guanhuan (Yang, Guanhuan.) [6] | Lin, Jingwei (Lin, Jingwei.) [7] | Tan, Shaobo (Tan, Shaobo.) [8] | Bourrie, David (Bourrie, David.) [9] | Ma, Bin (Ma, Bin.) [10] | Borchert, Glen M. (Borchert, Glen M..) [11] | Huang, Jingshan (Huang, Jingshan.) [12]

Indexed by:

EI

Abstract:

The Common Data Elements (CDEs) standard of the International organization for Standardization (ISO) 11179 is commonly used in the field of clinical data processing. The Biomedical Research Integrated Domain Group (BRIDG) model is the framework for biomedical and clinical research. Mapping CDEs to BRIDG (also known as CDE classification) would help with interoperability and data analysis in the field of clinical research. That said, manually mapping CDEs to their corresponding BRIDG class is highly time-consuming and labor-intensive. In this paper we present a new classification algorithm along with a new oversampling method. Our algorithm uses the Term Frequency-Inverse Document Frequency (TF-IDF) as the feature representation method. By assigning different weights to various attributes, we enable more important attributes to perform more important roles during the mapping process. In addition, the oversampling method generates every new attribute in the minor class by picking the length and setting the word of the new attribute according to the existing training set. Our research outcomes demonstrate significant contributions to the field in the following ways: (1) Generation of a new CDE classification algorithm that outperforms existing algorithms in the literature, including the Random Forest Classifier, Linear Support Vector Classification (SVC), Multinomial Naive Bayes (NB), Logistic Regression, and Long Short-Term Memory (LSTM) networks, in terms of accuracy, precision, recall, and F-1 score measures. (2) Generation of a new oversampling method able to improve CDE classification accuracy for Random Forest and Multinomial NB. (3) Our classification algorithm employs two novel attributes, namely 'Data Element Preferred Definition' and 'Document,' which are more efficient at classifying CDEs than the six attributes traditionally selected by domain experts. © 2021 IEEE.

Keyword:

Bioinformatics Classification (of information) Clinical research Data handling Decision trees Inverse problems Logistic regression Long short-term memory Mapping Random forests

Community:

  • [ 1 ] [Li, Shengyu]University of South Alabama Mobile, School of Computing, United States
  • [ 2 ] [Huang, Yulong]College of Allied Health Professions University of South Alabama Mobile, United States
  • [ 3 ] [Kasukurthi, Mohan Vamsi]University of South Alabama Mobile, School of Computing, United States
  • [ 4 ] [Yang, Jiajie]University of Illinois Urbana-Champaign Urbana, Department of Mathematics, United States
  • [ 5 ] [Li, Dongqi]University of South Alabama Mobile, School of Computing, United States
  • [ 6 ] [Yang, Guanhuan]University of South Alabama Mobile, School of Computing, United States
  • [ 7 ] [Lin, Jingwei]Ocean School Fuzhou University, Fuzhou, China
  • [ 8 ] [Tan, Shaobo]University of South Alabama Mobile, School of Computing, United States
  • [ 9 ] [Bourrie, David]University of South Alabama Mobile, School of Computing, United States
  • [ 10 ] [Ma, Bin]Qilu University of Technology, Shandong Academy of Science, Jinan, China
  • [ 11 ] [Borchert, Glen M.]College of Medicine University of South Alabama Mobile, United States
  • [ 12 ] [Huang, Jingshan]College of Medicine University of South Alabama Mobile, School of Computing, United States

Reprint 's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2021

Page: 2788-2795

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Online/Total:1591/9883352
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1