Volume 24, Issue 5 (Iranian South Medical Journal 2021)                   Iran South Med J 2021, 24(5): 454-468 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Torkashvand Z, Mahjub H, Soltanian A R, Farhadian M. Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data. Iran South Med J 2021; 24 (5) :454-468
URL: http://ismj.bpums.ac.ir/article-1-1515-en.html
1- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
2- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
3- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
Abstract:   (1592 Views)
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared the predictive performance of decision trees, ordinal forest, and L1 penalized continuation ratio regression.
Materials and Methods: In the present study, three data sets were used. The B-cell data contained 12,625 gene expression data related to 128 patients with four ordinal levels of response variables. The HCC data related to liver cancer included 1469 genes of 56 patients with three ordinal levels of response variables. The Heart data contained information of five variables in 294 patients undergoing angiography with five ordinal levels of response variables. The performance of the methods was compared based on the same training and test datasets using indicators such as accuracy, gamma, and kappa.
Results: For two high-dimensional data sets, the ordinal forest model had a higher predictive ability while for the low-dimensional data set, the L1 penalized continuation ratio model had a better predictive performance.
Conclusion: The selection of the best prediction model depends on the data set, and for each data, different methods should be considered to achieve the best model.
Full-Text [PDF 693 kb]   (376 Downloads)    
Type of Study: Original | Subject: General
Received: 2021/11/28 | Accepted: 2021/11/28 | Published: 2021/11/28

References
1. Chen CK. The Classification Of Cancer Stage Microarray Data. Comput Meth Prog Bio 2012; 108(3): 1070-7.
2. Archer KJ, Hou J, Zhou Q, et al. Ordinalgmifs: An R Package For Ordinal Regression In HighDimensional Data Settings. Cancer Inform 2014; 13: CIN.S20806.
3. Farhadi Z, Shahsavani D. Gene Expression Data Clustering With Random Forest Dissimilarity. Razi J Med Sci 2015; 22(136): 109-18. (Persian)
4. Safe M, Faradmal J, Mahjub H. A Comparison Between Cure Model And Recursive Partitioning: A Retrospective Cohort Study Of Iranian Female With Breast Cancer. Comput Math Methods Med 2016; 2016: 9425629.
5. Archer KJ, Williams AA. L1 Penalized Continuation Ratio Models For Ordinal Response Prediction Using High‐Dimensional Datasets. Stat Med 2012; 31(14): 1464-74.
6. Tibshirani R. Regression Shrinkage And Selection Via The Lasso. J Royal Stat Soc Series B (Methodological) 1996; 58(1): 267-88.
7. Buntine W, Niblett T. A Further Comparison Of Splitting Rules For Decision-Tree Induction. Mach Learn 1992; 8: 75-85.
8. Zhang H, Singer B. Recursive Partitioning And Applications. New York: Springer Science & Business Media, 2010, 79-95.
9. Breiman L, Friedman J, Stone CJ, et al. Classification And Regression Trees. 1st ed. Chapman And Hall/CRC, 1984, 18-41.
10. Archer KJ. Rpartordinal: An R Package For Deriving A Classification Tree For Predicting An Ordinal Response J Stat Softw 2010; 34: 7.
11. Galimberti G, Soffritti G, Di Maso M. Classification Trees For Ordinal Responses In R: The Rpartscore Package. J Stat Softw 2012; 47(10): 1-25.
12. Cappelli C, Mola F, Siciliano R. A Statistical Approach To Growing A Reliable Honest Tree. Comput Stat Data Anal 2002; 38(3): 285-99.
13. Mingers J. Expert Systems—Rule Induction With Statistical Data. J Oper Res Soc 1987; 38(1): 39-47.
14. Niblett T, Bratko I. Learning Decision Rules In Noisy Domains. Proceedings Of Expert Systems' 86, The 6Th Annual Technical Conference On Research And Development In Expert Systems III. Brighton, United Kingdom: Cambridge University Press, 1987.
15. Genuer R, Poggi JM, Tuleau C. Random Forests: Some Methodological Insights. arXiv Preprint arXiv:0811.3619. 2008.
16. Hornung R. Ordinal Forests. J Classif 2020; 37: 4-17.
17. Drummond C, Holte RC. C4.5, Class Imbalance, And Cost Sensitivity: Why Under-Sampling Beats Over-Sampling. In Workshop On Learning From Imbalanced Datasets II. Washington DC: Citeseer, 2003; 11: 1-8.
18. Breiman L, Friedman J, Olshen R, et al. Classification And Regression Trees. Wadsworth Int Group 1984; 37(15): 237-51.
19. Gentry AE, Jackson-Cook CK, Lyon DE, et al. Penalized Ordinal Regression Methods For Predicting Stage Of Cancer In High-Dimensional Covariate Spaces. Cancer Inform 2015; 14(s2): CIN.S17277.
20. Janitza S, Tutz G, Boulesteix AL. Random Forest For Ordinal Responses: Prediction And Variable Selection. Comput Stat Data Anal 2016; 96(C): 57-73.

Rights and Permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2024 CC BY-NC 4.0 | Iranian South Medical Journal

Designed & Developed by: Yektaweb