FEATURE | AUGUST 13, 2014 | BY AMBER HARMON

Original Story Link

Even though next-generation sequencing (NGS) — with millions or billions of DNA nucleotides sequenced in parallel — is much less costly compared to first-generation sequencing, it still remains too expensive for many labs. NGS platform start-up costs can easily surpass hundreds of thousands of dollars, and individual sequencing reactions can cost thousands per genome.

 

To garner accurate information, the data analysis can be time-consuming and require special knowledge of bioinformatics. Even so, this high-throughput computational analysis is the backbone of novel discoveries in the life sciences, as well as in other domains including anthropology, social sciences, and plant sciences.

“Using next-generation sequencing you're getting a snapshot of everything that is happening in a given genome up to that point,” says Trupti Joshi, assistant research professor in computer science and core faculty at the Informatics Institute at the University of Missouri (MU), Columbia, US.

Joshi manages SoyKB (Soybean Knowledge Base), a free online data resource infrastructure that was developed as part of the Obama administration’s $200 million Big Data Research and Development Initiative. Joshi’s team is working with the iPlant Collaborative and XSEDE (Extreme Science and Engineering Discovery Environment) teams to integrate SoyKB data resources and analysis tools.

In addition to integrating SoyKB — which already includes many built-in informatics tools — with existing iPlant tools, the MU team is developing additional toolsets that will also be available to the iPlant community. “Right now we are building the infrastructure so that we can submit jobs — RNA-seq analysis is just one example — to iPlant Atmosphere.” Joshi says three to four different analysis capabilities will be available in a couple months.

SoyKB includes the tens of thousands of genes in the soybean genome, experimental data related to gene expressions, fast-neutron mutation data, and soybean lines GWAS (genome-wide association studies) data. SoyKB is unique in that it includes 'multi-omics' experimental data that might otherwise be irrelevant (thrown out) by a particular researcher at a particular time. By making all research data available, experiments take on an increasingly important role in the bigger picture, and enable future researchers to narrow their own results.

Researchers may want to look at soybeans that have a high-oil content, for example, or a high-protein content. Or, they may want to focus on soybean lines that are more drought, disease, or insect resistant. Scientists can access data on particular genomic variations directly in SoyKB, using tools to quickly query and isolate items of interest.

“One of the biggest advantages here is that iPlant is an integrated environment,” says Mats Rynge, who is part of XSEDE’s Extended Collaborative Support Service Workflow Community Applications team. “The iPlant team clearly understands the science and can tailor their services and setup to a biologist.”

More than 19,000 users take part in the iPlant Collaborative, and about 2,500 of them use Atmosphere — iPlant's cloud service that is fully integrated with user management and theData Store (570 terabytes). “Atmosphere is one of the nicest academic cloud implementations available,” says Rynge. “I would say it is on par with Amazon in terms of user interface; really well done.”

Rynge is developing a SoyKB submit infrastructure and Pegasus workflows for scientists to pull data from the data store, analyze it, and deposit the results back in the data store — all with the click of a button. The ultimate goal is to make the workflows general enough to be mapped to other infrastructures, which future sequencing groups can use as a starting point.

As NGS techniques continue to amass more data than labs and researchers can handle on their own, high-performance computing and infrastructures capable of presenting, analyzing, and storing data will remain critical resources for complex bioinformatics analysis. After all, with 50,000 to 70,000 genes in a single soybean, looking at thousands of soybean genomes can produce several gigabytes of data for each soybean line.

The progress of SoyKB as part of the Big Data Initiative was presented at the IEEE International Conference on Bioinformatics and Biomedicine, December 2013, in Shanghai, China. The US National Science Foundation funds the ongoing project. The SoyKB is also funded in part by our Root Hair NSF Plant Genome Program.

Time-lapse film of the infection of clover root hairs by rhizobia. Note that the root hair is curled, one of the first visible steps of the compatible nodulation reaction. The arrows point to the end of the growing infection thread during the infection process. The tubular infection thread is the means by which the rhizobia gain entry into the root. Once the thread exits the root hair, it ramifies into the root cortex, finally ending at a cortical cell that will become infected. Time lapse film kindly provided by Drs. S. Higashi and M. Abe, Kagoshima University, Japan.

Recent Publications

Three most recent:

Hossain MS, Kawakatsu T, Kim KD, Zhang N, Nguyen CT, Khan SM, et al. Divergent cytosine DNA methylation patterns in single-cell, soybean root hairs. New Phytol. 2017; n/a-n/a. doi:10.1111/nph.14421

Cao Y, Halane MK, Gassmann W, Stacey G. The Role of Plant Innate Immunity in the Legume-Rhizobium Symbiosis. Annu Rev Plant Biol. 2017; doi:10.1146/annurev-arplant-042916-041030

Tóth, Katalin and Gary Stacey (2015) Does plant immunity have a central role in the legume-rhizobium symbiosis? Front. Plant Sci., 02 June 2015

More publications...

This project uses functional genomics to investigate the impact of biotic and abiotic stress on legume root hairs, a single cell model for systems biology. Our vision is to utilize the soybean root hair system to explore, at a systems level, the biology of a single, differentiated plant cell type.
Work funded by the National Science Foundation (NSF) focuses on understanding the molecular processes involved in legume root hair infection by nitrogen-fixing rhizobia. This infection initiates the symbiosis between this bacterium and its host that will result in the de novo formation of a novel organ, the nodule. It is within the nodule that the bacterium fixes nitrogen providing its host plant an advantage in environments where this element is limiting. The establishment of the symbiosis involves a complex interplay between host and symbiont, which is orchestrated by the exchange of diffusible signal molecules.
Work funded by the Department of Energy, Office of Science (Office of Biological and Environmental Research) will focus on defining the transcriptional, metabolomic and proteomic response of the soybean root hair cell to variations in temperature and water availability. These data will allow the development of computational models to examine regulatory networks that function at a single cell level to control the response to environmental change. The data obtained should provide a better understanding of the impacts of climate change (heat and water limitation) on plant root physiology.

News

The 4th Biennial Joint Symposium between GNU and MU

MU plant sciences researchers hosted the 4th joint symposium this week (May 6-7) between faculty from Gyeongsang National University, Jinju, Korea. This symposium is a biennial exchange between MU and GNU. GNU (http://eng.gnu.ac.kr/main/) is one of Korea's strongest universities in plant science research.

 

SoyKB and iPlant streamline complex bioinformatics analysis

FEATURE | AUGUST 13, 2014 | BY AMBER HARMON

Original Story Link

Even though next-generation sequencing (NGS) — with millions or billions of DNA nucleotides sequenced in parallel — is much less costly compared to first-generation sequencing, it still remains too expensive for many labs. NGS platform start-up costs can easily surpass hundreds of thousands of dollars, and individual sequencing reactions can cost thousands per genome.

SoyKB: Leading the convergence of wet and dry science in the era of Big Data

 

Yaya Cui, an investigator in plant sciences at the Bond Life Sciences Center examines data on fast neuron soybean mutants that are represented on the SoyKB database.

The most puzzling scientific mysteries may be solved at the same machine you’re likely reading this sentence.

Go to top