SM: papers Unpublished

How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms. [  arXiv ]

Label propagation defines signaling networks associated with recurrently mutated cancer genes. [  bioRxiv ]

Dynamic linear models guide design and analysis of microbiota studies within artificial human guts. [  bioRxiv  |   code ]

Subspace-Induced Gaussian Processes. [  arXiv  |   code ]

Classical Music Composition Using State Space Models. [  arXiv  |   website  |   code ]

Functional Data Analysis using a Topological Summary Statistic: the Smooth Euler Characteristic Transform. [  arXiv  |   code ]

The Geometry of Synchronization Problems and Learning Group Actions. [  arXiv |   code ]

Approximations of Markov Chains and Bayesian Inference. [  arXiv ]

Learning Subspaces of Different Dimension. [  arXiv  |   code ]

Persistent Homology Transform for Modeling Shapes and Surfaces. [  arXiv ]

Towards Stratification Learning through Homology Inference. [  working paper ]

Multiscale factor models for molecular networks . [  working paper ]

Published/In Press

Scalable Algorithms for Learning High-Dimensional Linear Mixed Models. Conference on Uncertainty in Artificial Intelligence [  arXiv  |   code ]

Adaptive Randomized Dimension Reduction on Massive Data. Journal of Machine Learning Research [  journal  |   arXiv  |   code ]

Development and assessment of fully automated and globally transitive geometric morphometric methods, with application to a biological comparative dataset with high interspecific variation. The Anatomical Record [  journal |  bioRxiv ]


Melanoma therapeutic strategies that select against resistance by exploiting MYC-driven evolutionary convergence. Cell Reports [  journal ]

HOMINID: A framework for identifying associations between host genetic variation and microbiome composition. GigaScience [  bioRxiv  |   journal  |   code  |   webtool ]

Bayesian Approximate Kernel Regression with Variable Selection. Journal of the American Statistical Association [  journal |   arXiv |   code ]

Detecting Epistasis in Genome-wide Association Studies with the Marginal Epistasis Test. PLoS Genetics [  journal  |   bioRxiv  |   code  ]

Differential Expression Analysis for RNAseq using Poisson Mixed Models. Nucleic Acids Research [  journal  |  bioRxiv  |   code  ]

Fast moment estimation for generalized latent Dirichlet models. Journal of the American Statistical Association [  journal   arXiv  |   code ]

Efficient Learning of Graded Membership Models. International Conference on Machine Learning [  conference  |   arXiv  |   code ]

Geometric representations of random hypergraphs. Journal of the American Statistical Association [  journal  |   arXiv ]

A phylogenetic transform enhances analysis of compositional microbiota data. eLife. [  journal  |   bioRxiv  |   website   |   code  ]

Phylogenetic factorization of compositional data. PeerJ. [  journal  |   bioRxiv  |  code  ]

Topological consistency via kernel estimation. Bernoulli. [  journal  |   arXiv ]


Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples. Genetics. [  journal  |   bioRxiv | code ]

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. American Journal of Human Genetics. [  journal  |   bioRxiv | code ]

Random walks on simplicial complexes and harmonics. Random Structures & Algorithms [  journal  |   arXiv ]

Bayesian group factor analysis with structured sparsity. Journal of Machine Learning Research [  journal  |   code ]


Statistical inference for dynamical systems: A review. Statistical Surveys [  journal  |   arXiv ]

Contour Trees of Uncertain Terrains. ACM SIGSPATIAL Conference on Advances in Geographic Information Systems [  journal ]

Citizen Science as a New Tool in Dog Cognition Research. PLOS One [  journal ]

Probabilistic Fr├ęchet means for time varying persistence diagrams. Electronic Journal of Statistics [  journal  |   arXiv ]

The Information Geometry of Mirror Descent. IEEE Transactions of Information Theory [  journal  |   arXiv ]

Consistency of maximum likelihood estimation for some dynamical systems. Annals of Statistics [  journal  |   arXiv ]

The topology of probability distributions on manifolds. Probability Theory and Related Fields [  journal  |   arXiv ]


Cumulon: Cloud-Based Statistical Analysis from Users Perspective. IEEE Bulletin on Data Engineering. [  journal ]

Core and region-enriched networks of behaviorally regulated genes and the singing genome. Science. [  journal ]

Persistent Homology Transform for Modeling Shapes and Surfaces. Information and Inference. [  journal  |   bioRxiv ]

A new fully automated approach for aligning and comparing shapes. The Anatomical Record. [  journal  |   website  |  code ]

Novel Distal eQTL Analysis Demonstrates Effect of Population Genetic Architecture on Detecting and Interpreting Associations. Genetics. [  journal  |   website ]

GSAASeqSP: A Toolset for Gene Set Association Analysis of RNA-Seq Data. Scientific Reports. [  journal  |   website ]

A Cheeger-Type Inequality on Simplicial Complexes. Advances in Applied Mathematics. [  journal  |   arxiv ]

Frechet Means for Distributions of Persistence Diagrams Discete and Computational Geometry. [  journal  |   arxiv ]

A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics. PLOS ONE. [  journal ]

Statistical Analysis of Crystallization Database Links Protein Physico-Chemical Features with Crystallization Mechanisms. PLOS ONE. [  journal ]


Distinct and Overlapping Sarcoma Subtypes Initiated from Muscle Stem and Progenitor Cells. Cell Reports. [  journal ]

Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. Bioinformatics. [  journal ]

DNase-seq predicts regions of rotational nucleosome stability across diverse human cell types. Genome Research. [  journal ]

A comparative study of covariance selection models for the inference of gene regulatory networks. Journal of Medical Bioinformatics. [  journal ]

Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits. Genome Biology. [  journal ]

Partial factor modeling: predictor-dependent shrinkage for linear regression. Journal of the American Statistical Association. [  journal  |   arxiv  |   code ]

Kernel Sliced Inverse Regression: Regularization and Consistency. Abstract and Applied Analysis. [  journal  |   code ]

Assessing the radiation response of lung cancer with different gene mutations using genetically engineered mice. Frontiers in Oncology. [  journal ]

Dissecting High-Dimensional Phenotypes with Bayesian Sparse Factor Analysis of Genetic Covariance Matrices. Genetics. [  journal  |   code ]


Genetics of gene expression responses to temperature stress in a sea urchin gene network. Molecular Ecology. [  journal ]

Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning. PLOS One. [  journal ]

Genetic effects on mating success and partner choice in a social mammal. American Naturalist. [  journal ]

Cyclin-Dependent Kinases Are Regulators and Effectors of Oscillations Driven by a Transcription Factor Network. Molecular Cell. [  journal ]

Homology Transfer and Stratification Learning. ACM-SIAM Symposium on Discrete Algorithms. [  journal ]

Probability measures on the space of persistence diagrams. Inverse Problems. [  journal ]

Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Research. [  journal  |   website ]


RS-SNP: a random-set method for genome-wide association studies. BMC Genomics. [  journal ]

Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals. Digestive and Liver Disease. [  journal ]

Cross Species Genomic Analysis Identifies a Mouse Model as Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma. PLOS One. [  journal ]

Estimating variable structure and dependence in Multi-task learning via gradients. Machine Learning. [  journal  |   working paper ]


On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers. Journal of Biomedical Informatics. [  journal ]

Localized Sliced Inverse Regression. Journal of Computational and Graphical Statistics. [  journal  |   working paper  |   code ]

Learning gradients: predictive models that infer geometry and dependence. Journal of Machine Learning Research. [  journal ]

Supervised Dimension Reduction Using Bayesian Mixture Modeling. International Conference on Artificial Intelligence and Statistics. [  journal  |   code ]

Learning Gradients and Feature Selection on Manifolds. Bernoulli. [  journal  |   arxiv ]

Evidence-ranked motif identification. Genome Biology. [  journal  |   website ]


Comparative study of gene set enrichment methods. BMC Bionformatics. [  journal ]

Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variation. Molelcular Biology and Evolution. [  journal ]

Do serum biomarkers really measure breast cancer?. BMC Cancer. [  journal ]

Characterizing the developmental pathways TTF-1, NKX2-8, and PAX9 in lung cancer. Proc. Natl. Acad. Sci. USA. [  journal ]

Local sliced inverse regression. Advances in Neural Information Processing Systems 21. [  journal  |   code ]


Modeling cancer progression via pathway dependencies. PLoS Comput Biol. [  journal ]


Gene Expression Programs of Human Smooth Muscle Cells: Tissue-Specific Differentiation and Prognostic Significance in Breast Cancers. PLoS Genetics. [  journal ]

Understanding the use of unlabelled data in predictive modelling. Statistical Science. [  journal  |   arxiv ]

Characterizing the Function Space for Bayesian Kernel Models. Journal of Machine Learning Research. [  journal ]

Genomic sweeping for hypermethylated genes. Bioinformatics. [  journal ]


Evidence of influence of genomic DNA sequence on human X chromosom inactivation. PLoS Comput Biol. [  journal ]

Analysis of Sample Set Enrichment Scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. Bioinformatics. [  journal  |   website ]

Gene expression changes and moelcular pathways mediating activity-dependent plasticity in visual cortex. Nat Neurosci. [  journal ]

Estimation of Gradients and Coordinate Covariation in Classification. Journal of Machine Learning Research. [  journal ]

Learning Coordinate Covariances via Gradients. Journal of Machine Learning Research. [  journal ]

Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization. Adv Comput Math. [  journal ]


Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-wide Expression Profiles. Proc Natl Acad Sci USA. [  journal  |   website ]

An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat Genet. [  journal  ]

Stability Results in Learning Theory. Anal App. [  journal ]

Permutation Tests for Classification. Proceedings of the Conference on Learning Theory. [  journal  |   working paper ]

Risk Bounds for Mixture Density Estimation. ESAIM: Probability and Statistics. [  journal ]


Androgen-Induced Differentiation and Tumorigenicity of Human Prostate Epithelial Cells. Cancer Research. 2004. [  journal ]

Learning Theory: general conditions for predictivity. Nature. 2004. [  journal ]

Estimating Dataset Size Requirements for Classifying DNA Microarray Data. J Comput Biol. 2003. [  journal ]

An Analytical Method for Multi-class Molecular Cancer Classification. SIAM Reviews. 2003. [  journal ]

Optimal gene expression analysis by microarrays. Cancer Cell. 2002. [  journal ]

Gene Expression-Based Classification and Outcome Prediction of Central Nervous System Embryonal Tumors. Nature. 2002. [  journal ]

Choosing Multiple Parameters for Support Vector Machines. Machine Learning. 2002. [  journal ]

A Uniform Approach to Molecular Cancer Diagnosis Using Tumor Gene Expression Signatures. Proc Natl Acad Sci U S A. 2001. [  journal ]

Molecular classification of multiple tumor types. Bioinformatics. 2001. [  journal ]

Bounds on sample size for policy evaluation in Markov environments. Broceedings of the Conference on Learning Theory. 2001. [  journal  |   arxiv ]

Feature Selection for SVMs. Advances in Neural Information Processing Systems. 2000. [  journal ]

Support Vector Method for Multivariate Density Estimation. Advances in Neural Information Processing Systems. 1999. [  journal ]