Current experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks. In our recent publication (Zhu et al., 2012; see publication list) we describe the development of an integrated set of computational methods for analysis of differential gene expression data, including gene clustering, gene network inference, gene function prediction, and DNA motif identification. These tools automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses. We recently submitted a modification of these methods that is suitable for use starting from RNA-seq data and will post this information when available.

The initial steps in the rhizobia-root hair infection process are known to involve specific receptor kinases and subsequent kinase cascades. In our recent publication (Nguyen et al., 2012; see publication list) we characterized the phosphoproteome of the root hairs and the corresponding stripped roots (i.e., roots from which root hairs were removed) during rhizobial colonization and infection to gain insight into the molecular mechanism of root hair cell biology. Phosphopeptides derived from root hairs and stripped roots, mock inoculated or inoculated with the soybean-specific rhizobium Bradyrhizobium japonicum, were labeled with the isobaric tag 8-plex ITRAQ, enriched using Ni-NTA magnetic beads and subjected to nRPLC-MS/MS analysis using HCD and decision tree guided CID/ETD strategy. A total of 1,625 unique phosphopeptides, spanning 1,659 non-redundant phosphorylation sites, were detected from 1,126 soybean phosphoproteins. Among them, 273 phosphopeptides corresponding to 240 phosphoproteins were found to be significantly regulated (>1.5 fold abundance change) in response to inoculation with B. japonicum. The data reveal unique features of the soybean root hair phosphoproteome, including root hair and stripped root-specific phosphorylation suggesting a complex network of kinase-substrate and phosphatase-substrate interactions in response to rhizobial inoculation. Full details are available in our publication. The phosphorylation site data is available at the Plant Protein Phosphorylation Database via the link below. Note that this link works well with Google Chrome and Firefox only. Here is the link:


Our recent paper (Joshi et al., 2010) describes the identification of sRNAs from root, seed, flower, and nodules. We are now analyzing data to add sRNAs from B. japonicum inoculated (0-48 HAI) soybean root hair samples and the corresponding stripped root tissues. This includes 30,250, 33,034 and 219,131 sRNA sequences that had unique genome hits in the mock inoculated root hair, inoculated root hair, and stripped root tissues, respectively. A total of 129 miRNAs were identified from root, seed, flower and nodules, including 42 miRNAs that matched previously identified soybean miRNAs or were conserved in other species. However, 87 novel miRNAs were identified. We also predicted the putative target genes of all identified miRNAs with computational methods and verified the predicted cleavage sites in vivo for a subset of these targets using the 5’ RACE method. Finally, we also studied the relationship between the miRNA and expression of the respective target genes by comparison to Solexa cDNA sequencing data. A genome browser was developed ( that allows direct comparison of miRNA and mRNA (from our transcriptome analysis) expression.
Our manuscript (Brechenmacher et al., 2010) describes the polar and non-polar root hair and stripped root metabolites. Metabolites were analyzed after water and methanol/ chloroform extraction by GC-MS and after extraction with 80% methanol by UPLC-MS. A total of 1691 metabolites were identified by combining GC-MS and LC-MS approaches, with 134 responding significantly to inoculation (0-48 HAI). Principal component analysis clearly segregated root hair from stripped root metabolites. A genome browser was developed ( that allows easy access to these data. This website is also meant to serve as a depository and analysis tool for other soybean metabolomics data.

Over the last 100 years, the atmospheric concentration of carbon dioxide has dramatically increased, in major part due to the burning of fossil fuels, recent rapid industrialization, and land use changes. The predicted effects of continued climate change are complex but include effects on air and surface temperature, with coincident effects on water availability.  Soil temperature can influence root growth, cell elongation, root length and extension, initiation of new lateral roots and root hairs, and root branching. These effects are likely manifestations of the variety of physiological effects brought about by temperature on plant roots; including changes in root respiration, nutrient uptake, as well as physicochemical effects on the soil environment (e.g., changes in nitrogen mineralization). Ambient temperature changes also affects other parts of the plant (e.g., photosynthetic rates), which also affects below ground growth and physiology. When we include in this discussion issues of plant genetic variation, as well as the effects of temperature on water availability, the full complexity of the effects of climate change on the plant root environment becomes clear.

We conducted Illumina Solexa cDNA sequencing on 14 different soybean tissues/conditions. This information is contained in a searchable soybean gene expression atlas [(Libault et al., 2010);].
The raw sequences were submitted to the NCBI sequence read archive:"]. .
We mined the soybean genome and identified over 5500 putative transcription factors (Libault et al., 2009). A searchable database focused on the soybean TF genes was developed ( (Wang et al., 2009). We utilized both qRT-PCR and Illumina Solexa cDNA sequencing to identify 204 TF genes that responded directly to B. japonicum inoculation. We also showed that RNAi silencing of a specific Myb TF gene significantly reduced soybean nodulation (Libault et al., 2009).
We utilized root hair infection by the symbiotic bacterium, Bradyrhizobium japonicum as a tool to perturb cellular function. For example, Libault et al. (2010) utilized Affymetrix DNA microarray hybridization, high-throughput Illumina Solexa cDNA sequencing, and quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) to show that over 48,000 of the 69,000 predicted soybean ORFs were expressed in soybean root hairs. These expression data were used to improve the current soybean genome annotation; identifying new ORFs, defining splice variants, extending genes both 5’ and 3’, and providing transcriptional support for the annotated genes.
The success of our project rests on our development of a highly reproducible method for root hair isolation in quantities sufficient for a variety of functional genomic studies (Wan et al., 2005 (see protocol) This method generates highly pure soybean root hair preparations, as well as the comparative stripped root tissue (i.e., roots after the root hairs have been removed).

A hallmark of modern biology is large-scale -omics data. These data are massive and often very complex in nature; thereby generating the need for extensive storage, detailed computational analyses, fast retrieval and efficient integration, for better understanding of the data and hypothesis generation for the underlying biological system. Our efforts to study the systems biology of the soybean root hair cell have generated very large datasets. In addition, other laboratories are now applying these methods to soybean to address a variety of biological questions. To address the need for web resources capable of handling the complex task of integrating soybean -omics data and to provide data annotation (e.g. the pathway information), we developed the Soybean Knowledge Base. .


The Soybean Knowledge Base (SoyKB) is a comprehensive all-inclusive web resource for soybean. SoyKB is designed to handle the storage and integration of the genomics, microarray, transcriptomics, proteomics and metabolomics data along with the function and pathway information. It has four modules including the main mySQL database module at the back end that incorporates and integrates all the soybean genomics and -omics data from various sources. It is designed to contain information on four different entities namely genes, miRNAs, metabolites and SNPs. The other three front-end modules are web interface, genome browser and pathway integration.


SoyKB has four tiers of registration, which control the access to the public and private experimental datasets. Users can add comments, download data for multiple genes as well as submit their own datasets. Tools like protein 3D-structure and pathway viewers, gene family browsers and BLAST sequence similarity tool are all part of key features of SoyKB.


SoyKB can be accessed at

Go to top