Proteomic study of 2,002 tumors identifies 11 molecular subtypes of all cancers across 14 cancer types.

Proteomic study of 2,002 tumors identifies 11 molecular subtypes of all cancers across 14 cancer types.
To facilitate gene-level data queries from more than 10,000 cancer patient transcript sequences and proteomics data from 2,000 patients, the researchers developed an easy-to-use cancer data analysis web platform called UALCAN.

Proteomic studyA new study that analyzed protein levels in 2,002 primary tumors from 14 tissue-based cancer types identified 11 distinct molecular subtypes, providing systematic knowledge that greatly expands a searchable online database that has become a go-to platform for cancer data analysis. by users all over the world.

The University of Alabama Birmingham Cancer Data Analysis Portal, or UALCAN, was developed and released for public use in 2017 as an easy-to-use portal for comprehensive cancer data analysis, including transcriptome, epigenetics, and proteomics. UALCAN has received nearly 920,000 field visits from researchers in more than 100 countries and has been cited more than 2,750 times.

“UALCAN is an attempt to distribute comprehensive cancer data to researchers and clinicians in an easy-to-use format to make discoveries and find needles in the haystack,” said Sooryanarayana Varamballly, Ph.D., professor in UAB's Department of Molecular Pathology. 

He is a cytopathologist and director of the Translational Oncology Research Program at UAB. "Cancer detection, diagnosis, treatment, treatment, and research require a global team effort, and realizing the vast amount of data involved needs a way to analyze and interpret this data."

Cancer is a complex disease, and its initiation, progression, metastasis, and spread to distant organs, involves dynamic molecular changes in each type of cancer. Individual cancer patients show differences apart from some common genomic events.

In the new study, Varampali worked with longtime collaborator Chad Creighton, Ph.D., Baylor College of Medicine, Houston, Texas. Creighton led the proteomic study, published in Nature Communications, "The 2002 proteomic characterization of human cancers reveals pan-cancer molecular subtypes and associated pathways." This extends to two early studies on proteins published in 2019 and 2021.

Previously, the team performed RNA transcript analysis, and provided the data to researchers through UALCAN, to identify pathways used by the many forms of cancer to aid growth, spread, and aggressiveness. With this recent study, the team conducted and incorporated a large-scale proteomics analysis. The data and results provide new ideas for further research and possible therapeutic interventions.

A protein is a complement to proteins expressed in a cell or tissue, and it can be quantified by recent technological advances in mass spectrometry. In cells, DNA makes mRNA, and mRNA makes protein, processes known as the central tenet of molecular biology. 

Proteins are major functional parts of cells and are essential in cell metabolism, structure, growth, signaling, and movement.

Cancers represented in the UALCAN proteomic data set include breast, colorectal, stomach, glioblastoma, head, and neck, liver, lung adenocarcinoma, squamous lung, ovarian, pancreatic, pediatric brain, prostate, renal, and uterine cancers

The number of tumors in each type of cancer in the study ranged from 76 to 230, with an average of 143. Interestingly, the subtypes based on the overall cancer proteins that the current study found cut across tumor lineages.

The summary proteomic data set came from 17 individual studies. Data consistent with multi-omics was available for most of these tumors, including levels of mRNA, small DNA somatic mutations, insertions/deletions, and somatic DNA copy number alterations.

Overall, the researchers found that the protein expression of genes across tumors broadly correlates with corresponding mRNA levels or copy number changes. However, there were some notable exceptions.

They identified 11 distinct subtypes based on cross-cutting proteins - called s1 to s11 - that could provide insight into the pathways and dysregulated processes in tumors that make them cancerous. Each subtype extended to several types of tissue-based cancers, although the s11 subtype was specific to brain tumors, spanning glioblastomas and pediatric brain tumors.

Each subtype expressed specific gene classes, some of which had previously been seen in a previous, less comprehensive proteomic study. Three subtypes showed novel gene classes: s7 subtype with 'axon guiding' and 'crimp splicing' genes, s10 subtype with 'DNA repair' and 'chromatin regulation' genes, and s11 subtype with 'synapse' and 'dendritic' and "axon" genes.

At the DNA level, the study separated differences between protein-based subtypes in total copy number of genes, and somatic mutations in subtypes associated with higher pathway activity, as inferred from proteomic or transcription data.

"The results of our study provide a framework for understanding the molecular landscape of cancers at the protein level to integrate and compare the data with other molecular correlates of cancers," Varampalli said. 

“Data sets and associations at the level of relevant genes are a resource for the research community, including helping to identify candidate genes for functional studies and to develop candidates as prognostic markers. Therapeutic targets a particular subset of cancers.

“Furthermore, this study reinforces the notion that a comprehensive survey of cancers should be conducted at the protein level, although historically expressive profiling of tumors has been mostly limited to the level of RNA transcripts.

 Many of the analyzes are based on the cutting-edge cancer data analysis platform. Constantly based on requests from users or experts, and the team is indebted to support and encouragement from researchers using this platform to make discoveries that make a difference in cancer research.”

Some of the UAB's large datasets are generated by consortia such as the Cancer Genome Atlas, or TCGA, and the Clinical Tumor Protein Analysis Consortium, or CPTAC, of ​​the National Cancer Institute.

Accurate targeting of cancer requires the identification of individual or subclass-specific genomic and molecular alterations. To help cancer researchers perform various analyzes of data in order to better understand these large data sets, Darshan Shimoga Chandrashekar, Ph.D., led the development of the UALCAN portal under Varamballly's supervision. Updates to this ever-evolving portal have recently been published in Neoplasia.

The UALCAN initiative and its ongoing development include contributions from a team of experts including bioinformaticians, computer scientists, statisticians, cancer biologists, pathologists, and oncologists. "It's a collaborative scientific approach to empowering the global cancer research team to treat cancer," Varamballi said.

Support came from the National Institutes of Health awarding CA125123 and CA118948 and US Department of Defense grant W81XWH-19-1-0588.

Co-authors of this study are Yiqun Zhang and Fengju Chen, Baylor College of Medicine, and Chandrashekar, Department of Pathology at the UAB Department of Molecular and Cellular Pathology.

Pathology is a department in the Marnix et Hersinc School of Medicine at Abu Dhabi University. Varamballly is a senior scientist at the O'Neal Comprehensive Cancer Center and Institute for Informatics at UAB and is the co-director of the UAB Biomedical Graduate School of Cancer Biology topic. He holds an adjunct position at the Michigan Center for Transformational Pathology at the University of Michigan, Ann Arbor.


Materials are provided by the University of Alabama at Birmingham. Originally written by Jeff Hansen. Note: Content may be edited for style and length.


Yiqun Zhang, Fengju Chen, Darshan S. Chandrashekar, Sooryanarayana Varambally, Chad J. Creighton. Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways. Nature Communications, 2022; 13 (1) DOI: 10.1038/s41467-022-30342-3


Font Size
lines height