A reliable and accurate tumor classification is crucial for successful diagnosis and treatment of cancer diseases. With the recent advances in molecular genetics, it is possible to measure the expression levels of thousands of genes simultaneously. Thus, it is feasible to have a complete understanding the molecular markers among tumors and make a more successful and accurate diagnosis. A common approach in statistics for classification is linear and quadratic discriminant analysis. However, the number of genes (p) is much more than the number of tissue samples (n) in gene expression datasets. This leads to data having singular covariance matrices and limits the use of these methods. Diagonal linear and diagonal quadratic discriminant analyses are more recent approaches that ignore the correlation among genes and allow highdimensional classification. Nearest shrunken centroids algorithm is an updated version of diagonal discriminant analysis, which also selects the genes that mostly contributed in class prediction. In this study we will discuss these algorithms and demonstrate their use both in microarray and RNA sequencing datasets.
Full Citation: Zararsiz, G., Korkmaz, S., Goksuluk, D., Eldem, V., & Ozturk, A. (2015). Diagonal Discriminant Analysis for Gene-Expression Based Tumor Classification. Journal of Advances in Information Technology Vol, 6(2).