Workshop:CAFCW20: Sixth Computational Approaches for Cancer Workshop
Authors: Judith D. Cohn (Los Alamos National Laboratory)
Abstract: When applying machine learning in cancer research and, in particular, when trying to predict drug response in cancer cell lines or other experimental systems, an ongoing problem is the relatively small number of samples when compared to the possible number of features. This is frequently compounded with a large amount of noise in the data, a lack of uniformity in the way experiments are conducted across labs, and an over-reliance on expression data.
Metapath analysis is a computational approach to similarity search which takes advantage of the links among nodes in a heterogenous data network to encode model features. Published examples of this approach include a number of subject domains, including exploration of disease-associated genes. The base calculation for encoding metapath features is the path degree product (PDP), which is an indication of the number and length of paths through the data network between nodes of interest.
In this presentation, we show that xgboost models including metapath-encoded features appear to perform significantly better than models using expression alone or the same features encoded with one-hot encoding. In addition, these models appear to produce more robust feature selection.