The latest updates, guides and stories from DataXight

Accelerating Training for Large-Scale Single-Cell Data at the NVIDIA Accelerate Omics Hackathon
7 mins read

Single-cell RNA sequencing is fueling a rapid expansion of large-scale omics data. Projects such as Tahoe 100M, which profiled 100 million cells across 50 cancer cell lines, are already opening new avenues for discovery, including predicting cellular responses to drug treatments. Not long after, the Billion Cells Project has partnered with more than 10 leading laboratories and institutions across the United States, with a goal of generating nearly 500 million single-cell profiles in its first ye

Introducing PROTOplast: Scalable Machine Learning for Molecular Data Analysis
{News}
{scRNA-seq}
{PROTOplast}
3 mins read

We're excited to announce the early developer preview of PROTOplast, our new Python library designed for fast scalable analysis of molecular data. PROTOplast addresses the unique challenges of working with large-scale molecular datasets while maintaining the flexibility needed for cutting-edge research. What is PROTOplast? PROTOplast is an open-source Python library, released under the Apache License 2.0, that bridges the gap between molecular data analysis and modern machine learning infrast

A Note on Parquet-based scRNA ML Pipelines
{Insight}
{scRNA-seq}
2 mins read

Single-cell RNA sequencing (scRNA-seq) is revolutionizing our understanding of cellular biology, but the computational challenges of processing these massive datasets continue to evolve. As datasets grow from thousands to millions of cells, the choice of data format and processing pipeline becomes critical.  Parquet files, with their columnar storage and excellent compression ratios, seem like a natural fit for intermediate data storage in machine learning workflows. In a previous blog post, we




Loading
Have an idea?
Drop us a line