The ACRF International Centre for the Proteome of Human Cancer
ProCan is a world-first initiative developed by Professors Roger Reddel
and Phil Robinson
which launched in September 2016.
“ProCan will not only advance basic research into new and better cancer treatments but soon it will also help doctors rapidly choose the best existing treatment for their patients.”
Ian Brown, CEO of ACRF
During the 7 year project, scientists at CMRI are analysing tens of thousands of examples of all types of cancer from all over the world to develop a library of information to advance scientific discovery and enhance clinical treatment worldwide.
This database will mean doctors can effectively narrow down the best type of currently available treatment to target a cancer patient’s individual diagnosis, without having to waste time trialling medications that won’t effectively treat the disease.
The Centre is acting in partnership with cancer researchers, clinicians, tumour banks, and technology experts, such as Professor Ruedi Aebersold
in Zurich whose 2015 Nature
paper acted as the 'proof of concept' for us to undertake, on a much larger scale, the ProCan project.
PCT-SWATH mass spectrometry technology
is being used to rapidly and simultaneously measure the precise levels of many thousands of proteins in very small cancer biopsies. Working with leading cancer researchers throughout Australia and around the world, the Centre is analysing samples of all types of cancer, starting with cancers of childhood.
Advanced data science and software engineering will be used to compare the protein data with the de-identified information that is already available for each cancer, including clinical records such as pathology test results, genetic analyses, genome sequencing, and any previous responses to cancer treatment. This proteogenomics approach to understanding cancer is crucial to speeding up the search for cures, and towards this end, ProCan signed an MOU with the United States' National Cancer Institute as part of former Vice President Joe Biden's Cancer Moonshot Initiative
ProCan® is the abbreviated name of The Australian Cancer Research Foundation International Centre for the Proteome of Human Cancer. We are a major research program aiming to document the proteome of human cancer through Liquid Chromatography / tandem Mass Spectrometry analysis of cancer tissues. The goal is to achieve improvements in diagnostic and prognostic evaluation that will guide the management of cancer patients in the future. ProCan comprises teams in Oncology, Pathology, Proteomics, Data Science and Software Engineering as well as Operations.
What is Proteomics?
The proteome refers to the entire complement of proteins expressed in specific cells or tissues. Proteomics is a high-throughput method that can detect thousands of proteins in a single sample and their relative quantities. The field of proteomics is comparable to genomics or transcriptomics techniques for analysing DNA and RNA respectively, and is catching up as reliable proteomic analysis methods become available. A feature of ’omics methods is that they generate very large multi-dimensional datasets that can be searched using powerful data science approaches to identify informative patterns or signatures. More proteomics information about different cancer types could change the way we treat cancer patients.
What Will ProCan Achieve?
The cancer tissue proteome could guide individualised therapy. The essential cause of cancer is damage to DNA and mutations within the genome. However, cancer cell growth is driven by the resulting changes to the proteome of both cancer cells and normal surrounding cell types, for example - blood vessels. Most anti-cancer agents act by interacting with proteins; therefore, a detailed knowledge of the cancer tissue proteome has potential to assist in the selection of current cancer treatments and development of new targeted treatments - both exciting concepts that ProCan is very interested in pursuing.
ProCan Teams Working Together
The Oncology, Pathology, Proteomics and Data Science teams work together to source, prepare, process and analyse cancer samples. They are supported by ProCan Operations and Software Engineering. Together, they help answer high value research questions and curate the collective database to draw insights into the proteomic landscape of cancer.
The ProCan Oncology group is responsible for sample acquisition, a process involving the identification of suitable collaborative partnerships, liaison with collaborators to assess collaboration opportunities, assessment of cancer sample sets that may be available for study, and the honing of high value research questions which vary based on the tumour type and the (de-identified) clinical and ‘omic data available.
This team helps to co-ordinate completion of ethics approvals, Material Transfer Agreements (MTAs), and Data Transfer Agreements (DTAs). They also complete a study design and monitor the scheduling and physical transport of the precious materials. Their role extends through the lifecycle from processing and analysis of each cohort of samples to communicating proteomics results back to the collaborator. They also participate in downstream study design if further analysis or data synthesis with other cohorts is planned.
The Pathology group is responsible for sample reception and storage processes, tumour sectioning, and preparation for proteomic processing and analysis. On arrival of samples, they attend to sample reception and logging into the ProCan Laboratory Information Management System (LIMS). Before samples are processed by the Proteomics teams, all studies require a careful design to ensure that we will have all of the information necessary to correct for any potential batch effects. The Cancer Data Science team work with the other teams on batch design and sample randomisation to ensure the highest scientific rigour and that the data will be readily interpretable.
The Pathology team are caretakers of the physical samples, carefully labelling and storing them until scheduling determines that ProCan’s Cancer Proteomic team will soon be ready for them. Samples may be prepared onsite at ProCan or in collaborator laboratories. In general, it involves collection of a thin (ten micron) section of each sample into a small tube ready for the Proteomics team. A matched section from each sample is placed on a slide and stained for digital scanning and pathology review. This step allows visualisation of tissue composition and comparison of the histopathological features of each sample with proteomic profiles. This is important because, during cell lysis, the first step of the Proteomic Mass Spec workflow, the tissue structure is totally destroyed.
Tissue samples are processed using physical disruption, heat, pressure, enzymatic digestion, and treatment with chemicals. This lyses cells, disrupting cell membranes and breaking up the bonds that give proteins their structure. Reduction removes the last of the chemical bonds that give proteins their structure, and alkylation caps the end of these bonds so they don’t re-form. Digestion of samples using a protease cuts the protein chains at specific points to release peptides. The peptide mixtures are cleaned up on cation exchange columns to remove other substances, prior to liquid chromatography (LC) and analysis on a mass spectrometer.
The Proteomics team maintain a facility of high-performance equipment, featuring barocyclers and six Mass Spectrometers. They schedule and run sample cohorts in batches. The Mass Spectrometers analyse peptides on the basis of their mass to charge (m/z) ratio after ionisation, and their fragmentation pattern. They can simultaneously measure tens of thousands of peptides, representing thousands of proteins, in a single sample. The end result of the process is that a tissue is "converted" to a raw data file which is a permanent digital record of the peptides it contained. The proteomics team tracks each step of this process in a Laboratory Information Management System (LIMS).
ProCan’s Software Engineering team have developed and maintain the IT infrastructure which enables and underpins ProCan’s outcomes.
For each injection into the mass spec, around 1 gigabyte (GB) of data is generated; at full throughput this gives 100 GB of raw data to manage, curate, and process per day. By the end of the program, ProCan will have more than 200 terabytes (TB) of data just from the mass specs – that’s a 2 with 14 zeroes after it – but it’s significantly less data than in the digital images our Pathology team is capturing: each one of those can be up to 10 GB per image.
ProCan relies on a large computational infrastructure from a 500-core high-performance computer with 750 TB of storage, to an auto-scaling cluster that runs on Amazon’s cloud platform. It takes in the order of an hour to process each of the raw data files from the mass specs, through a pipeline of machine-learning algorithms; at the conclusion of which a matrix of data is produced that lists the abundance of each peptide for each injection in a given study. Furthermore, the computational infrastructure needs to ensure that metadata is captured through every step along the way, whether it is the buffer used during sample preparation, the mass spec instrument operator, time of day, or pipeline software version. This metadata is crucial to ensuring data is reproducible and analyses are repeatable, but it is also important for the normalisation and batch correction that must be performed in the next stage of analysis.
The Data Science team are then responsible for analysing the data produced from the mass spectrometers. They work with the Software Engineering team to process raw data files, producing a dataset of values describing intensities of every peptide that we measured in every sample in a given study. They perform data normalisation on our careful experimental design for batch correction. This means that no matter when they received and processed a sample, the results from different batches will be comparable.
Finally, the data is ready to answer important research questions. However, to answer each research question, they need to incorporate other types of data. This includes clinical data, such as patient outcome or response to certain cancer drugs. They often use biological datasets, such as genomics (informing us of mutations in the DNA of each patient) or transcriptomics (providing expression levels for each measured gene). The Data Scientists use a variety of techniques to integrate these data. For example, they may use machine learning algorithms to investigate how well the combination of genomic and proteomic data can predict whether a patient is likely to respond to drug X, or to survive their cancer for another five years. They may use traditional statistical approaches to find proteins that are highly expressed in tumours, thus identifying a new diagnostic biomarker.
Multiple skill sets - one team
With multiple concurrent projects running at any one time, the ProCan Operations team provides the operational and project management skills and processes to track and co-ordinate all of this activity. In addition, a ProCan cross-functional team is allocated to each cohort of samples to determine study design, batch design and project plan. A project manager co-ordinates the multiple cohorts being processed at any one time, tracking progress and status through to each study’s completion.
Ultimately, ProCan’s analyses have two major goals. The first goal is to build models that allow us to use proteomic data to answer clinically-relevant questions that can help doctors to make informed treatment decisions for patients. The second is to better understand cancer biology, so that we might make discoveries that help us to identify possible new cancer treatments.
If you would like to learn more or to contribute to please contact Children's Medical Research Institute.