Machine Learning in Recommender Systems
With Yahoo! Labs, I worked with Dr. Deepak Agarwal to develop new methods and algorithms for recommender systems. Our primary goal was to predict a user’s reaction to a new item given their reactions to previous items. In the web context, this corresponds to showing users articles or adds and observing whether or not the user clicks on it. We would like to use this information to improve the user’s experience.

In the context of generalized, bilinear random effects models, we are developing new algorithms for more efficient model fitting on massive datasets. We will also be generalizing to multivariate and categorical responses.

Mixture Modeling in Flow Cytometry
I work with a team of statisticians and computational biologists headed by Dr. Mike West and Dr. Cliburn Chan. Our end goal is to use Dirichlet Process Mixture Modeling to understand the cellular makeup of blood samples to develop vaccines for HIV and cancer. These datasets are intricate and have millions of observation, so I have been developing computational methods with Graphical Processing Units (GPUs). These computational tools are very affordable and offer up to 100 fold increases in speed.

Now that analysis of these individual datasets is feasible, I am addressing problems of comparing these large samples across time and patients. Several problems arise including model identification and further computational issues. I am developing new methodology and software implementations that could bring a once infeasible problem to biologists’ desktops. See our team’s website for more information. I am also working on the dimensionality side of the problem by trying to use more parsimonious covariance models to increase our sensitivity to rare cell subtypes which are often missed.

Non-linear Hierarchical Regression and Registration in Metabolomics
I am working with Dr. David Banks on developing new statistical methodology in metabolomics. Our goal is to use mass spectrometry data to understand a tissue sample’s metabolic state. We are developing new Bayesian non-linear regression models to incorporate biologists’ prior knowledge about specific biological aspects of the data while still allowing flexibility in other areas. This includes matching multiple complex functions from a library of compounds to a new tissue sample in a flexible way similar to image analysis. Computationally, this involves implementing adaptive metropolis hastings within Gibbs sampling and some non-linear minimization techniques that have shown promising results in the early stages of research.