The latest updates, guides and stories from DataXight

Tahoe-100M in Practice: Workflows, Pitfalls, and Pathways to Scalable scRNA Analysis
9 mins read

Single-cell transcriptomics (scRNA) studies now profile millions of cells, revealing identity, state, and tissue heterogeneity, and create unprecedented opportunities to extract biological insights that would be invisible in smaller studies. Tahoe-100M, a groundbreaking resource hosted by Arc Institute, contains 100 million cells covering 379 distinct drugs and 50 cancer cell lines, is one such study. On the other hand, at Tahoe-100M scale, even routine queries pose significant computational ch

Reproducible Proteomics Pipelines Using Galaxy
{Insight}
7 mins read

The Clinical Data Analysis Pipelines (CDAP), originally developed by the NIH Office of Cancer Clinical Proteomics Research (OCCPR), formerly Clinical Proteomic Tumor Analysis Consortium (CPTAC), and now hosted by the NIH Proteomic Data Commons (PDC) standardize proteomics data processing to reduce variability and enable cross-dataset comparisons. Public dissemination of these Galaxy workflows on GitHub is part of  the NIH's support of FAIR data principles. While these pipelines represent a promi

<span style="white-space: pre-wrap;">Processing 500,000 whole genome sequences to identify rare genetic disease variants</span>
{Insight}
4 mins read

The Challenge: Big Data Meets Complex Biology Developing a scalable solution for rare disease screening at population scale presents a perfect storm of computational and biological complexity. The numbers can be staggering: over a billion variant alleles distributed across hundreds of thousands of genomic shards, often totaling tens of terabytes of data. But beyond the computational demands lies an even greater hurdle—the nuanced molecular complexity of rare genetic diseases themselves. Unlik

Looking at batch effect through scRNA-seq data
{batch effect}
{scRNA-seq}
10 mins read

Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to explore complex biological systems, but batch effects remain a significant challenge to accurate insights. In this first part of our series, we will delve into what batch effects is and how batch effects can hinder discoveries, illustrated with data visualizations and analysis. Future posts will explore strategies for correcting and visualizing batch effects to ensure reliable and reproducible results across multiomic data.




Loading
Have an idea?
Drop us a line