A Gene Selection Approach based on Clustering for Classification Tasks in Colon Cancer

José Antonio CASTELLANOS GARZÓN, Juan RAMOS GONZÁLEZ

Abstract


Gene selection (GS) is an important research area in the analysis of DNA-microarray data, since it involves gene discovery meaningful for a particular target annotation or able to discriminate expression profiles of samples coming from different populations. In this context, a wide number of filter methods have been proposed in the literature to identify subsets of relevant genes in accordance with prefixed targets. Despite the fact that there is a wide number of proposals, the complexity imposed by this problem (GS) remains a challenge. Hence, this paper proposes a novel approach for gene selection by using cluster techniques and filter methods on the found groupings to achieve informative gene subsets. As a result of applying our methodology to Colon cancer data, we have identified the best informative gene subset between several one subsets. According to the above, the reached results have proven the reliability of the approach given in this paper.


Keywords


Gene selection; DNA-microarray; Clustering; Filter method; Colon cancer

Full Text:

PDF

References


Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J., 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, USA, 96:6745–6750.

Balaguer, F., 2014. Cáncer colorrectal familiar y hereditario. Gastroenterología y Hepatología, 37:77–84. Berrar, D. P., Dubitzky, W., and Granzow, M., 2003. A Practical Approach to Microarray Data Analysis. Kluwer Academic Publishers, New York, Boston, Dordrecht, London, Moscow.

Bourne, P. E. and Wissig, H., 2003. Structural Bioinformatics. Wiley-Liss, Inc., Hoboken, New Jersey. Castellanos-Garzón, J. A. and Díaz, F., 2013. An Evolutionary Computational Model Applied to Cluster Analysis of DNA Microarray Data. Expert Systems with Applications, Elsevier, 40:2575–2591.

Castellanos-Garzón, J. A., García, C. A., Novais, P., and Díaz, F., 2013. A Visual Analytics Framework for Cluster Analysis of DNA Microarray Data. Expert Systems with Applications, Elsevier, 40:758–774.

Eisen, M., Spellman, T., Brown, P., and Botstein, D., 1998. Cluster Analysis and Display of Genome-wide Expression Patterns. Proceedings of the National Academy of Sciences, USA, 95:14863–14868.

Geoffrey, J. M., Do, K. A., and Ambroise, C., 2004. Analyzing Microarray Gene Expression Data. John Wiley & Sons, Inc., Hoboken, New Jersey.

Han, J. and Kamber, M., 2006. Data Mining: Concepts and Techniques. Elsevier Inc.

Haraldsdottir, S., Einarsdottir, H., Smaradottir, A., Gunnlaugsson, A., and Halfdanarson, T., 2014. Colorectal cancer-review. Laeknabladid, 2(100):75–82.

Inza, I., Larrañaga, P., Blanco, R., and Cerrolaza, A., 2004. Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, Elsevier, 31:91–103.

Jass, J., 2007. Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology, 50:: 223–230.

Jiang, D., Tang, C., and Zhang, A., 2004. Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering, 16(11):1370–1386.

Kaufman, L. and Rousseeuw, P. J., 2005. Finding Groups in Data. An Introduction to Clustering Analysis. John Wiley & Sons, Inc., Hoboken, New Jersey.

Kim, S.-E., Paik, H., Yoon, H., Lee, J. E., Kim, N., and Sung, M.-K., 2015. Sex- and gender-specific disparities in colorectal cancer risk. World Journal of Gastroenterology: WJG, 17(21):5167–5175.

Lazar, C., Taminau, J., Meganck, D., S.and Steenhoff, Coletta, A., Molter, V., C.and deSchaetzen, Duque, H., R.and Bersini, and Nowé, A., 2012. A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis. IEEE/ACM Transactions On Computational Biology And Bioinformatics, 9(4):1106–1118.

Markowitz, S. and Bertagnolli, M., 2009. Molecular basis of colorectal cancer. New England Journal of Medicine, 25(361):2449–2460.

Martens, D., Baesens, B., Van, T. G., and Vanthienen, J., 2007. Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research, 183:1466–1476.

Natarajan, A. and Ravi, T., 2014. A Survey on Gene feature selection using microarray data for cancer classification. International Journal of Computer Science & Communication (IJCSC), 5(1):126–129.

Perea, J., Lomas, M., and Hidalgo, M., 2011. Molecular basis of colorrectal cancer: towards an individualized management. Revista Española de Enfermedades Digestivas, 1(103):29–35.

Shraddha, S. and Anuradha, S., N.and Swapnil, 2014. Feature Selection Techniques and Microarray Data: A Survey. International Journal of Emerging Technology and Advanced Engineering, 4(1):179–183.

Tyagi, V. and Mishra, A., 2013. A Survey on Different Feature Selection Methods for Microarray Data Analysis. International Journal of Computer Applications, 67(16):36–40.

Vapnik, V., 1995. The nature of statistical learning theory. Springer, New York.

Wang, Y., Tetko, I., Hall, M., Frank, E., Facius, A., Mayer, K., and Mewes, H., 2005. Gene selection from microarray data for cancer classification - a machine learning approach. Computational Biology and Chemistry, Elsevier, 29:37–46.




DOI: http://dx.doi.org/10.14201/ADCAIJ201543110





Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Clarivate Analytics