SM: papers Unpublished

Scalable Bayesian inference for the generalized linear mixed models. arxiv. ]

Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs. arxiv. ]

Irreducibility of Markov Chains on simplicial complexes, the Spectrum of the Discrete Hodge Laplacian and Homology. arxiv. ]

Minimum Φ-distance estimators for finite mixing measures. arxiv. ]

Universal gut microbial relationships in the gut microbiome of wild baboons. bioRxiv. ]

A Statistical Analysis of Compositional Surveys. arxiv ]

Extended probabilities in Statistics. arxiv ]

A Large Deviation Approach to Posterior Consistency in Dynamical Systems. arxiv ]

Towards Explainable Convolutional Features for Music Audio Modeling. arxiv ]

Random Lie Brackets that Induce Torsion: A Model for Noisy Vector Fields. arxiv ]

Stanza: A Nonlinear State Space Model for Probabilistic Inference in Non-Stationary Time Series. arxiv ]

On the geometric properties of finite mixture models.arxiv ]

Scalable Modeling of Spatiotemporal Data using the Variational Autoencoder: an Application in Glaucoma.arxiv ]

Adaptive particle-based approximations of the Gibbs posterior for inverse problems.arxiv ]

Measuring and Mitigating PCR Bias in Microbiome Data.bioRxiv ]

Gaussian Process Mixtures for Estimating Heterogeneous Treatment Effects.arxiv ]

Statistical Considerations in the Design and Analysis of Longitudinal Microbiome Studies.bioRxiv ]

Learning Integral Representations of Gaussian Processes.arXiv  |   code ]

Classical Music Composition Using State Space Models.arXiv  |   website  |   code ]

Approximations of Markov Chains and Bayesian Inference.arXiv ]

Persistent Homology Transform for Modeling Shapes and Surfaces. [  arXiv ]

Towards Stratification Learning through Homology Inference. [  working paper ]

Multiscale factor models for molecular networks . [  working paper ]


Published/In Press

Asymptotics of Bayesian Uncertainty Estimation in Random Features Regression. NeurIPS Proceedings. [  arxiv    |   proceeding  ]

Representing Fields without Correspondences: the Lifted Euler Characteristic Transform. Journal of Applied and Compuational Topology. [  arxiv    |   journal  ]

Probabilistic Approach to Parameteric Inverse Problems Using Gibbs Posteriors. Inverse Problems (in press). [  arxiv. |

A Sheaf-Theoretic Construction of Shape Space. Foundations of Computational Mathematics. [  arxiv  | <>p>

Global Optimality of Elman-type RNN in the Mean-Field Regime. ICML. [  arxiv.  |   arxiv. ]

Concentration inequalities and optimal number of layers for stochastic deep neural networks. IEEE Xplore. [  arxiv    |   journal  ]

Identifying risk factors for blindness from glaucoma at first presentation to a tertiary clinic. American Journal of Ophthalmology. [  arxiv  |   journal  ]

Ergodic theorems for imprecise probability kinematics. International Journal of Approximate Reasoning. [  arxiv  |   journal  ]

Multiple testing with persistent homology. Foundations of Data Science. [  arxiv  |   journal  ]

The Bulk and the Extremes of Minimal Spanning Acycles and Persistence Diagrams of Random Complexes. Discrete Analysis. [  journal ]

The accuracy of absolute differential abundance analysis from relative count data. PLoS Computational Biology. [  bioRxiv ] |   journal ]

A Topological Data Analytic Approach for Discovering Biophysical Signatures in Protein Dynamics. PLoS Computational Biology. [  bioRxiv  |   journal ]

Synchrony and idiosyncrasy in the gut microbiome of wild primates. Nature Ecology & Evolution. [  bioRxiv  |   journal  ]

How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms. Transactions of the American Mathematical Society. [  arXiv ] |   journal  ]

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. Journal of Machine Learning Research. [  journal  |   arXiv ]

Learning Subspaces of Different Dimension. Journal of Computational and Graphical Statistics. [  journal  |   arXiv  |   code ]

Gibbs posterior convergence and the thermodynamic formalism. Annals of Applied Probability. [  arxiv  | journal ]

Morphological and genomic shifts in mole-rat ‘queens’ increase fecundity but reduce skeletal integrity. eLife. [  journal ]

Statistical Robustness of Markov Chain Monte Carlo Accelerators . Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. [  conference ]

Likelihood ratio statistics for gene set enrichment in Alzheimer's disease pathways. Alzheimer's & Dementia. [  journal ]

Subspace Clustering through Sub-Clusters. Journal of Machine Learning Reasearch. [  arxiv  |   journal ]

A Statistical Pipeline for Identifying Physical Features that Differentiate Classes of 3D Shapes. Annals of Applied Statistics. [  bioRxiv  |   journal]

Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces. Bayesian Analysis. [  arxiv ]

The Geometry of Synchronization Problems and Learning Group Actions. Discrete and Computational Geometry. [  journal   arXiv |   code ]

Naught all zeros in sequence count data are the same. Computational and Structural Biotechnology Journal. [  journal  |  bioRxiv ]

Estimating Rates of Progression and Predicting Future Visual Fields in Glaucoma Using a Deep Variational Autoencoder. Scientific Reports. [  journal  |   bioRxiv ]

Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis. Journal of the American Statisical Association. [  journal  |   arXiv  |   code ]

Label propagation defines signaling networks associated with recurrently mutated cancer genes. Scientific Reports. [  journal  |   bioRxiv ]

A unifying framework for interpreting and predicting mutualistic systems. Nature Communications. [  journal ]

Evolution of DNA methylation in Papio baboons. Molecular Biology and Evolution. [  journal  |   bioRxiv  ]

Phylofactorization - a graph partitioning algorithm to identify phylogenetic scales of ecological data. Ecological Monographs [  bioRxiv ]

Dynamic linear models guide design and analysis of microbiota studies within artificial human guts. Microbiome [  journal   |   bioRxiv  |   code  ]

Scalable Algorithms for Learning High-Dimensional Linear Mixed Models. Conference on Uncertainty in Artificial Intelligence [  conference  |  arXiv  |   code ]

Combinations of DIPs and Dprs control organization of olfactory receptor neuron terminals in Drosophila. PLoS Genetics [  journal ]

Adaptive Randomized Dimension Reduction on Massive Data. Journal of Machine Learning Research [  journal  |   arXiv  |   code ]

Development and assessment of fully automated and globally transitive geometric morphometric methods, with application to a biological comparative dataset with high interspecific variation. The Anatomical Record [  journal |  bioRxiv ]

Fast moment estimation for generalized latent Dirichlet models. Journal of the American Statistical Association [  journal   |   arXiv  |   code ]

Bayesian Approximate Kernel Regression with Variable Selection. Journal of the American Statistical Association [  journal |   arXiv |   code ]

Melanoma therapeutic strategies that select against resistance by exploiting MYC-driven evolutionary convergence. Cell Reports [  journal ]

HOMINID: A framework for identifying associations between host genetic variation and microbiome composition. GigaScience [  bioRxiv  |   journal  |   code  |   webtool ]

Detecting Epistasis in Genome-wide Association Studies with the Marginal Epistasis Test. PLoS Genetics [  journal  |   bioRxiv  |   code  ]

Differential Expression Analysis for RNAseq using Poisson Mixed Models. Nucleic Acids Research [  journal  |  bioRxiv  |   code  ]

Partitioned Tensor Factorizations for Learning Mixed Membership Models. International Conference on Machine Learning [  conference  |   arXiv  |   code ]

Geometric representations of random hypergraphs. Journal of the American Statistical Association [  journal  |   arXiv ]

A phylogenetic transform enhances analysis of compositional microbiota data. eLife. [  journal  |   bioRxiv  |   website   |   code  ]

Phylogenetic factorization of compositional data. PeerJ. [  journal  |   bioRxiv  |  code  ]

Topological consistency via kernel estimation. Bernoulli. [  journal  |   arXiv ]

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples. Genetics. [  journal  |   bioRxiv | code ]

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. American Journal of Human Genetics. [  journal  |   bioRxiv | code ]

Random walks on simplicial complexes and harmonics. Random Structures & Algorithms [  journal  |   arXiv ]

Bayesian group factor analysis with structured sparsity. Journal of Machine Learning Research [  journal  |   code ]

Statistical inference for dynamical systems: A review. Statistical Surveys [  journal  |   arXiv ]

Contour Trees of Uncertain Terrains. ACM SIGSPATIAL Conference on Advances in Geographic Information Systems [  journal ]

Citizen Science as a New Tool in Dog Cognition Research. PLOS One [  journal ]

Probabilistic Fréchet means for time varying persistence diagrams. Electronic Journal of Statistics [  journal  |   arXiv ]

The Information Geometry of Mirror Descent. IEEE Transactions of Information Theory [  journal  |   arXiv ]

Consistency of maximum likelihood estimation for some dynamical systems. Annals of Statistics [  journal  |   arXiv ]

The topology of probability distributions on manifolds. Probability Theory and Related Fields [  journal  |   arXiv ]

Cumulon: Cloud-Based Statistical Analysis from Users Perspective. IEEE Bulletin on Data Engineering. [  journal ]

Core and region-enriched networks of behaviorally regulated genes and the singing genome. Science. [  journal ]

Persistent Homology Transform for Modeling Shapes and Surfaces. Information and Inference. [  journal  |   bioRxiv ]

A new fully automated approach for aligning and comparing shapes. The Anatomical Record. [  journal  |   website  |  code ]

Novel Distal eQTL Analysis Demonstrates Effect of Population Genetic Architecture on Detecting and Interpreting Associations. Genetics. [  journal  |   website ]

GSAASeqSP: A Toolset for Gene Set Association Analysis of RNA-Seq Data. Scientific Reports. [  journal  |   website ]

A Cheeger-Type Inequality on Simplicial Complexes. Advances in Applied Mathematics. [  journal  |   arxiv ]

Frechet Means for Distributions of Persistence Diagrams Discete and Computational Geometry. [  journal  |   arxiv ]

A Digital Network Approach to Infer Sex Behavior in Emerging HIV Epidemics. PLOS ONE. [  journal ]

Statistical Analysis of Crystallization Database Links Protein Physico-Chemical Features with Crystallization Mechanisms. PLOS ONE. [  journal ]

Distinct and Overlapping Sarcoma Subtypes Initiated from Muscle Stem and Progenitor Cells. Cell Reports. [  journal ]

Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. Bioinformatics. [  journal ]

DNase-seq predicts regions of rotational nucleosome stability across diverse human cell types. Genome Research. [  journal ]

A comparative study of covariance selection models for the inference of gene regulatory networks. Journal of Medical Bioinformatics. [  journal ]

Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits. Genome Biology. [  journal ]

Partial factor modeling: predictor-dependent shrinkage for linear regression. Journal of the American Statistical Association. [  journal  |   arxiv  |   code ]

Kernel Sliced Inverse Regression: Regularization and Consistency. Abstract and Applied Analysis. [  journal  |   code ]

Assessing the radiation response of lung cancer with different gene mutations using genetically engineered mice. Frontiers in Oncology. [  journal ]

Dissecting High-Dimensional Phenotypes with Bayesian Sparse Factor Analysis of Genetic Covariance Matrices. Genetics. [  journal  |   code ]

Genetics of gene expression responses to temperature stress in a sea urchin gene network. Molecular Ecology. [  journal ]

Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning. PLOS One. [  journal ]

Genetic effects on mating success and partner choice in a social mammal. American Naturalist. [  journal ]

Cyclin-Dependent Kinases Are Regulators and Effectors of Oscillations Driven by a Transcription Factor Network. Molecular Cell. [  journal ]

Homology Transfer and Stratification Learning. ACM-SIAM Symposium on Discrete Algorithms. [  journal ]

Probability measures on the space of persistence diagrams. Inverse Problems. [  journal ]

Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Research. [  journal  |   website ]

RS-SNP: a random-set method for genome-wide association studies. BMC Genomics. [  journal ]

Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals. Digestive and Liver Disease. [  journal ]

Cross Species Genomic Analysis Identifies a Mouse Model as Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma. PLOS One. [  journal ]

Estimating variable structure and dependence in Multi-task learning via gradients. Machine Learning. [  journal  |   working paper ]

On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers. Journal of Biomedical Informatics. [  journal ]

Localized Sliced Inverse Regression. Journal of Computational and Graphical Statistics. [  journal  |   working paper  |   code ]

Learning gradients: predictive models that infer geometry and dependence. Journal of Machine Learning Research. [  journal ]

Supervised Dimension Reduction Using Bayesian Mixture Modeling. International Conference on Artificial Intelligence and Statistics. [  journal  |   code ]

Learning Gradients and Feature Selection on Manifolds. Bernoulli. [  journal  |   arxiv ]

Evidence-ranked motif identification. Genome Biology. [  journal  |   website ]

Comparative study of gene set enrichment methods. BMC Bionformatics. [  journal ]

Genomic features that predict allelic imbalance in humans suggest patterns of constraint on gene expression variation. Molelcular Biology and Evolution. [  journal ]

Do serum biomarkers really measure breast cancer?. BMC Cancer. [  journal ]

Characterizing the developmental pathways TTF-1, NKX2-8, and PAX9 in lung cancer. Proc. Natl. Acad. Sci. USA. [  journal ]

Local sliced inverse regression. Advances in Neural Information Processing Systems 21. [  journal  |   code ]

Modeling cancer progression via pathway dependencies. PLoS Comput Biol. [  journal ]

Gene Expression Programs of Human Smooth Muscle Cells: Tissue-Specific Differentiation and Prognostic Significance in Breast Cancers. PLoS Genetics. [  journal ]

Understanding the use of unlabelled data in predictive modelling. Statistical Science. [  journal  |   arxiv ]

Characterizing the Function Space for Bayesian Kernel Models. Journal of Machine Learning Research. [  journal ]

Genomic sweeping for hypermethylated genes. Bioinformatics. [  journal ]

Evidence of influence of genomic DNA sequence on human X chromosom inactivation. PLoS Comput Biol. [  journal ]

Analysis of Sample Set Enrichment Scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. Bioinformatics. [  journal  |   website ]

Gene expression changes and moelcular pathways mediating activity-dependent plasticity in visual cortex. Nat Neurosci. [  journal ]

Estimation of Gradients and Coordinate Covariation in Classification. Journal of Machine Learning Research. [  journal ]

Learning Coordinate Covariances via Gradients. Journal of Machine Learning Research. [  journal ]

Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization. Adv Comput Math. [  journal ]

Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-wide Expression Profiles. Proc Natl Acad Sci USA. [  journal  |   website ]

An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat Genet. [  journal  ]

Stability Results in Learning Theory. Anal App. [  journal ]

Permutation Tests for Classification. Proceedings of the Conference on Learning Theory. [  journal  |   working paper ]

Risk Bounds for Mixture Density Estimation. ESAIM: Probability and Statistics. [  journal ]

Androgen-Induced Differentiation and Tumorigenicity of Human Prostate Epithelial Cells. Cancer Research. 2004. [  journal ]

Learning Theory: general conditions for predictivity. Nature. 2004. [  journal ]

Estimating Dataset Size Requirements for Classifying DNA Microarray Data. J Comput Biol. 2003. [  journal ]

An Analytical Method for Multi-class Molecular Cancer Classification. SIAM Reviews. 2003. [  journal ]

Optimal gene expression analysis by microarrays. Cancer Cell. 2002. [  journal ]

Gene Expression-Based Classification and Outcome Prediction of Central Nervous System Embryonal Tumors. Nature. 2002. [  journal ]

Choosing Multiple Parameters for Support Vector Machines. Machine Learning. 2002. [  journal ]

A Uniform Approach to Molecular Cancer Diagnosis Using Tumor Gene Expression Signatures. Proc Natl Acad Sci U S A. 2001. [  journal ]

Molecular classification of multiple tumor types. Bioinformatics. 2001. [  journal ]

Bounds on sample size for policy evaluation in Markov environments. Broceedings of the Conference on Learning Theory. 2001. [  journal  |   arxiv ]

Feature Selection for SVMs. Advances in Neural Information Processing Systems. 2000. [  journal ]

Support Vector Method for Multivariate Density Estimation. Advances in Neural Information Processing Systems. 1999. [  journal ]