Journal Paper

Tensor-based approaches for omics data analysis: applications, challenges, and future directions

Journal Paper

Amirhamzeh Khoshnam, Daniel Chafamo, Neriman Tokcan

Khoshnam, A., Chafamo, D., & Tokcan, N. (2025). Tensor-Based Approaches for Omics Data Analysis: Applications, Challenges, and Future Directions. La Matematica, 1-35.

Publication year: 2025

Omics technologies, including genomics, transcriptomics, proteomics, and metabolomics, have revolutionized biological research by enabling comprehensive, high-throughput analysis of molecular components within cells and organisms. The resulting high-dimensional datasets pose significant analytical challenges, particularly in integrating diverse data types and uncovering complex biological relationships. Tensor-based approaches have emerged as powerful tools for analyzing these high-dimensional omics datasets, offering advantages over traditional matrix-based methods in capturing complex, multi-way relationships. This review provides an overview of tensor decomposition techniques and their applications in omics data analysis, with a focus on multi-sample gene expression data, multi-omics integration, data imputation, and inference of cell-cell interactions from single-cell RNA sequencing. We discuss how tensors can naturally represent multidimensional omics datasets and how tensor factorization methods enable dimensionality reduction while preserving important structural information. A comprehensive biological background and an overview of relevant public databases and resources are provided to contextualize the computational methods. Case studies are presented to illustrate the application of tensor methods for tasks such as identifying gene expression modules, integrating multiple types of omics data, imputing missing values, and uncovering ligand-receptor interaction patterns. We highlight how tensor approaches can reveal higher-order interactions and context-dependent relationships that may be missed by traditional analyses. Challenges and future directions for tensor-based omics data analysis are also discussed, emphasizing the potential of these methods to extract meaningful biological insights from complex, heterogeneous datasets and advance our understanding of biological systems.

Tensor decompositions for signal processing: theory, advances, and applications

Journal Paper

Neriman Tokcan, Shakir Showkat Sofi, Clémence Prévost, Sofiane Kharbech, Baptiste Magnier, Thanh Phuong Nguyen, Yassine Zniyed, Lieven De Lathauwer

Tokcan, Neriman, et al. "Tensor decompositions for signal processing: Theory, advances, and applications." Signal Processing (2025): 110191.

Publication year: 2025

In the era of big data, rapid advancements in technology and data collection methods have led to the generation and accessibility of vast amounts of multi-modal, high-dimensional data across a diverse range of disciplines. Tensor methods have emerged as essential tools in signal processing, providing powerful frameworks to model and analyze such complex data effectively. This survey offers a comprehensive overview of tensor factorization techniques and their applications in key areas. We explore their role in remote sensing, focusing on tensor-based methods for analyzing hyperspectral and multispectral images, tackling challenges such as recovering super-resolution images and addressing spectral unmixing. In wireless communication, we examine tensor methods used for signal modulation in unsourced massive random access communication, which achieve strong performance in multi-antenna channel and signal modeling. We also discuss tensor applications in network compression, where they reduce the computational demands of deep neural networks, making them more feasible for edge devices. Additionally, we highlight the use of tensor methods in high-dimensional missing data completion problems, showcasing their versatility across various domains. Furthermore, we explore applications in image analysis and computer vision, where tensors are effectively utilized for motion and object tracking, 3D modeling, satellite image analysis, and medical imaging. By bridging theoretical advancements with practical applications, this survey aims to guide researchers in leveraging tensor methods to tackle emerging challenges in signal processing.

Genome-scale spatial mapping of the Hodgkin lymphoma microenvironment identifies tumor cell survival factors

Journal Paper

Vignesh Shanmugam*, Neriman Tokcan*, Daniel Chafamo, Sean Sullivan, Mehdi Borji, Haley Martin, Gail Newton, Naeem Nadaf, Saoirse Hanbury, Irving Barrera, Dylan Cable, Jackson Weir, Orr Ashenberg, Geraldine Pinkus, Scott Rodig, Caroline Uhler, Evan Macosko, Margaret Shipp, Abner Louissaint Jr, Fei Chen, Todd Golub

Nature Communications 17, 838 (2026).

Publication year: 2025

A key challenge in cancer research is to identify the secreted factors that contribute to tumor cell survival. Nowhere is this more evident than in Hodgkin lymphoma, where malignant Hodgkin Reed Sternberg (HRS) cells comprise only 1-5% of the tumor mass, the remainder being infiltrating immune cells that presumably are required for the survival of the HRS cells. Until now, there has been no way to characterize the complex Hodgkin lymphoma tumor microenvironment at genome scale. Here, we performed genome-wide transcriptional profiling with spatial and single-cell resolution. We show that the neighborhood surrounding HRS cells forms a distinct niche involving 31 immune and stromal cell types and is enriched in CD4+ T cells, myeloid and follicular dendritic cells, while being depleted of plasma cells. Moreover, we used machine learning to nominate ligand-receptor pairs enriched in the HRS cell niche. Specifically, we identified IL13 as a candidate survival factor. In support of this hypothesis, recombinant IL13 augmented the proliferation of HRS cells in vitro. In addition, genome-wide CRISPR/Cas9 loss-of-function studies across more than 1,000 human cancer cell lines showed that IL4R and IL13RA1, the heterodimeric partners that constitute the IL13 receptor, were uniquely required for the survival of HRS cells. Moreover, monoclonal antibodies targeting either IL4R or IL13R phenocopied the genetic loss of function studies. IL13-targeting antibodies are already FDA-approved for atopic dermatitis, suggesting that clinical trials testing such agents should be explored in patients with Hodgkin lymphoma.

Genome-Scale High-Resolution Spatial Mapping of the Pro-Tumorigenic Cellular Niche in Classic Hodgkin Lymphoma

Journal Paper

Vignesh Shanmugam*, Neriman Tokcan*, Daniel Chafamo, Sean Sullivan, Haley Martin, Gail A Newton, Mehdi Borji, Naeem Nadaf, Irving Barrera, Dylan Cable, Jackson Weir, Orr Ashenberg, Caroline Uhler, Geraldine Pinkus, Scott Rodig, Margaret A Shipp, Evan Macosko, Abner Louissaint, Fei Chen, Todd R Golub

Publication year: 2024

A fundamental hallmark of cancer is that tumor cells repurpose the tissue microenvironment to promote their own survival. An increased understanding of these mechanisms may lead to improved microenvironment-directed therapies, particularly in lymphoid malignancies. In classic Hodgkin lymphoma (cHL), the rare malignant Hodgkin Reed Sternberg (HRS) cells are surrounded by a CD4+ T-cell and macrophage-rich inflammatory infiltrate. Recent multiplexed immunofluorescence studies suggest that the micron-scale niche around HRS cells is composed of distinct populations of PD-L1+ macrophages and CD4+ T cells, including regulatory CTLA4+ and LAG3+ subsets (Carey et al. Blood 2017, Patel et al. Blood 2019 and Aoki et al. Cancer Discov 2020). However, the topography of the intact tumor microenvironment of cHL requires further definition. Recent single-cell RNA sequencing studies have led to important insights into the biology of cHL; however, they do not adequately capture myeloid cells, fibroblasts, and HRS cells, likely due to the relative fragility of these cells in conventional tissue dissociation protocols. In this study, we use tandem single nucleus and spatially resolved RNA sequencing to systematically dissect the pro-tumorigenic cellular niche of cHL to define potentially targetable microenvironmental dependencies

C-ziptf: stable tensor factorization for zero-inflated multi-dimensional genomics data

Journal Paper

Daniel Chafamo, Vignesh Shanmugam, Neriman Tokcan

Publication year: 2024

In the past two decades, genomics has advanced significantly, with single-cell RNA-sequencing (scRNA-seq) marking a pivotal milestone. ScRNA-seq provides unparalleled insights into cellular diversity and has spurred diverse studies across multiple conditions and samples, resulting in an influx of complex multidimensional genomics data. This highlights the need for robust methodologies capable of handling the complexity and multidimensionality of such genomics data. Furthermore, single-cell data grapples with sparsity due to issues like low capture efficiency and dropout effects. Tensor factorizations (TF) have emerged as powerful tools to unravel the complex patterns from multi-dimensional genomics data. Classic TF methods, based on maximum likelihood estimation, struggle with zero-inflated count data, while the inherent stochasticity in TFs further complicates result interpretation and reproducibility. Our paper introduces Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel method for high-dimensional zero-inflated count data factorization. We also present Consensus-ZIPTF (C-ZIPTF), merging ZIPTF with a consensus-based approach to address stochasticity. We evaluate our proposed methods on synthetic zero-inflated count data, simulated scRNA-seq data, and real multi-sample multi-condition scRNA-seq datasets. ZIPTF consistently outperforms baseline matrix and tensor factorization methods, displaying enhanced reconstruction accuracy for zero-inflated data. When dealing with high probabilities of excess zeros, ZIPTF achieves up to $2.4 \times$ better accuracy. Moreover, C-ZIPTF notably enhances the factorization’s consistency. When tested on synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently uncover known and biologically meaningful gene expression programs. Access our data and code at: https://github.com/klarman-cell-observatory/scBTF and https://github.com/klarman-cell-observatory/scbtf_experiments.

Multimodal Tensor-Based Method for Integrative and Continuous Patient Monitoring During Postoperative Cardiac Care

Journal Paper

Larry Hernandez, Renaid Kim, Neriman Tokcan, Harm Derksen, Ben Biesterveld, Alfred Croteau, Aaron Aaron, Michael Mathis, Kayvan Najarian, Jonathan Gryak

Submitted to Artificial Intelligence in Medicine

Publication year: 2021

Patients recovering from cardiovascular surgeries may develop life-threatening complications such as hemodynamic decompensation, making the monitoring of patients for such complications an essential component of postoperative care. However, this need has given rise to an inexorable increase in the number and modalities of data points collected, making it challenging to effectively analyze in real time. While many algorithms exist to assist in monitoring these patients, they often lack accuracy and specificity, leading to alarm fatigue among healthcare practitioners. In this study we propose a multimodal approach that incorporates salient physiological signals and EHR data to predict the onset of hemodynamic decompensation. A retrospective dataset of patients recovering from cardiac surgery was created and used to train predictive models. Advanced signal processing techniques were employed to extract complex features from physiological waveforms, while a novel tensor-based dimensionality reduction method was used to reduce the size of the feature space. These methods were evaluated for predicting the onset of decompensation at varying time intervals, ranging from a half-hour to 12 hours prior to a decompensation event. The best performing models achieved AUCs of 0.87 and 0.80 for the half-hour and 12-hour intervals respectively. These analyses evince that a multimodal approach can be used to develop clinical decision support systems that predict adverse events several hours in advance.

Deep learning and alignment of spatially-resolved whole transcriptomes of single cells in the mouse brain with Tangram

Journal Paper

T. Biancalani, G. Scalia, L. Buffoni, R. Avasthi, Z. Lu, A. Sanger, N. Tokcan, C. R. Vanderburg, A. Segerstolpe, M. Zhang, I. Avraham-Davidi, S. Vickovic, M. Nitzan, S. Ma, J. Buenrostro, N. B. Brown, D. Fanelli, X. Zhuang, E. Z. Macosko and A. Regev.

Publication year: 2021

Charting a biological atlas of an organ, such as the brain, requires us to spatially-resolve whole transcriptomes of single cells, and to relate such cellular features to the histological and anatomical scales. Single-cell and single-nucleus RNA-Seq (sc/snRNA-seq) can map cells comprehensively5,6, but relating those to their histological and anatomical positions in the context of an organ’s common coordinate framework remains a major challenge and barrier to the construction of a cell atlas7–10. Conversely, Spatial Transcriptomics allows for in-situ measurements11–13 at the histological level, but at lower spatial resolution and with limited sensitivity. Targeted in situ technologies1–3 solve both issues, but are limited in gene throughput which impedes profiling of the entire transcriptome. Finally, as samples are collected for profiling, their registration to anatomical atlases often require human supervision, which is a major obstacle to build pipelines at scale. Here, we demonstrate spatial mapping of cells, histology, and anatomy in the somatomotor area and the visual area of the healthy adult mouse brain. We devise Tangram, a method that aligns snRNA-seq data to various forms of spatial data collected from the same brain region, including MERFISH1 , STARmap2, smFISH3 , and Spatial Transcriptomics4 (Visium), as well as histological images and public atlases. Tangram can map any type of sc/snRNA-seq data, including multi-modal data such as SHARE-seq data5 , which we used to reveal spatial patterns of chromatin accessibility. We equipped Tangram with a deep learning computer vision pipeline, which allows for automatic identification of anatomical annotations on histological images of mouse brain. By doing so, Tangram reconstructs a genome-wide, anatomically-integrated, spatial map of the visual and somatomotor area with ~30,000 genes at single-cell resolution, revealing spatial gene expression and chromatin accessibility patterning beyond current limitation of in-situ technologies.

Algebraic Methods for Tensor Data

Journal Paper

Neriman Tokcan, Jonathan Gryak, Kayvan Najarian, and Harm Derksen

Accepted for publication - SIAM Journal on Applied Algebra and Geometry

Publication year: 2021

We develop algebraic methods for computations with tensor data. We give 3 applications: extracting features that are invariant under the orthogonal symmetries in each of the modes, approximation of the tensor spectral norm, and amplification of low rank tensor structure. We introduce colored Brauer diagrams, which are used for algebraic computations and in analyzing their computational complexity. We present numerical experiments whose results show that the performance of the alternating least square algorithm for the low rank approximation of tensors can be improved using tensor amplification.

Newton polytopes in algebraic combinatorics

Journal Paper

Cara Monical, Neriman Tokcan, Alexander Yong

Selecta Mathe- matica (N.S.) 25 (2019), no. 5

Publication year: 2019

A polynomial has saturated Newton polytope (SNP) if every lattice point of the convex hull of its exponent vectors corresponds to a monomial. We compile instances of SNP in algebraic combinatorics (some with proofs, others conjecturally): skew Schur polynomials; symmetric polynomials associated to reduced words, Redfield–Pólya theory, Witt vectors, and totally nonnegative matrices; resultants; discriminants (up to quartics); Macdonald polynomials; key polynomials; Demazure atoms; Schubert polynomials; and Grothendieck polynomials, among others. Our principal construction is the Schubitope. For any subset of $[n]^{2}$ , we describe it by linear inequalities. This generalized permutahedron conjecturally has positive Ehrhart polynomial. We conjecture it describes the Newton polytope of Schubert and key polynomials. We also define dominance order on permutations and study its poset-theoretic properties.

On the Waring rank of binary forms

Journal Paper

Neriman Tokcan

Linear Algebra and Its Applications 524 (2017), 250–262

Publication year: 2017

The K-rank of a binary form f in K[x,y], K⊆ℂ, is the smallest number of d-th powers of linear forms over K of which f is a K-linear combination. We provide lower bounds for the ℂ-rank (Waring rank) and for the ℝ-rank (real Waring rank) of binary forms depending on their factorization. We completely classify binary forms of Waring rank 3.

Binary forms with three different relative ranks

Journal Paper

Bruce Reznick and Neriman Tokcan

Proceedings of the American Mathematical Society, 145 (2017), 5169-5177

Publication year: 2017

Abstract

Suppose f(x,y) is a binary form of degree d with coefficients in a field K⊆ℂ. The K-rank of f is the smallest number of d-th powers of linear forms over K of which f is a K-linear combination. We prove that for d≥5, there always exists a form of degree d with at least three different ranks over various fields. The K-rank of a form f (such as x3y2) may depend on whether -1 is a sum of two squares in K.

Neriman Tokcan

Broad Institute of MIT and Harvard

Publication Types:

Tensor-based approaches for omics data analysis: applications, challenges, and future directions

Tensor decompositions for signal processing: theory, advances, and applications

Genome-scale spatial mapping of the Hodgkin lymphoma microenvironment identifies tumor cell survival factors

Genome-Scale High-Resolution Spatial Mapping of the Pro-Tumorigenic Cellular Niche in Classic Hodgkin Lymphoma

C-ziptf: stable tensor factorization for zero-inflated multi-dimensional genomics data

Multimodal Tensor-Based Method for Integrative and Continuous Patient Monitoring During Postoperative Cardiac Care

Deep learning and alignment of spatially-resolved whole transcriptomes of single cells in the mouse brain with Tangram

Algebraic Methods for Tensor Data

Newton polytopes in algebraic combinatorics

On the Waring rank of binary forms

Binary forms with three different relative ranks

Abstract