To display the most relevant entries to you in priority,
vote for the stories you are interested in
(  )
and reject those that you are not interested in
(  )
Bioinformatics -
6 hours and 35 minutes ago
Publication Date: 2010 Mar 15 PMID: 20231229Authors: Yomtovian, I. - Teerakulkittipong, N. - Lee,
B. - Moult, J. - Unger, R.Journal: BioinformaticsMOTIVATION: Intriguingly, sequence analysis of
genomes reveals that a large number of genes are unique to each organism. The origin of these
genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple
measure called "composition bias", based on the deviation of the amino acid composition of a given
sequence from the average composition of all proteins of a given genome. RESULTS: For a set of 47
prokaryotic genomes, we show that the amino acid composition bias of real proteins, random
"proteins" (created by using the nucleotide frequencies of each genome), and "proteins" translated
from intergenic regions are distinct. For ORFans, we observed a correlation between their
composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more
similar to those of random "proteins", while the compositions of more ancient ORFan proteins are
more similar to those of the set of all proteins of the organism. This observation is consistent
with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random
mutations and selection, eventually adapting to the composition preference of their organism over
time.post to:
CiteULike

|
Bioinformatics -
7 hours and 35 minutes ago
Publication Date: 2010 Mar 12 PMID: 20228129Authors: Bouckaert, R. R.Journal:
BioinformaticsMOTIVATION: Bayesian analysis through programs like BEAST (Drummond and Rumbaut,
2007) and MrBayes (Huelsenbeck et al., 2001) provides a powerful method for reconstruction of
evolutionary relationships. One of the benefits of Bayesian methods is that well founded estimates
of uncertainty in models can be made available. So, for example not only the mean time of a most
recent common ancestor (tMRCA) is estimated, but also the spread. This distribution over model
space is represented by a set of trees, which can be rather large and difficult to interpret.
DensiTree is a tool that helps navigating these sets of trees. RESULTS: The main idea behind
DensiTree is to draw all trees in the set transparently. As a result, areas where a lot of the
trees agree in topology and branch lengths show up as highly colored areas, while areas with little
agreement show up as webs. This makes it possible to quickly get an impression of properties of the
tree set such as well supported clades, distribution of tMRCA and areas of topological uncertainty.
Thus, DensiTree provides a quick method for qualitative analysis of tree sets. AVAILABILITY:
DensiTree is freely available from http://compevol.auckland.ac.nz/software/DensiTree/. The program
is licensed under GPL and source code is available. CONTACT: remco@cs.auckland.ac.nz.post to:
CiteULike

|
Bioinformatics -
8 hours and 35 minutes ago
Publication Date: 2010 Mar 12 PMID: 20228128Authors: Faust, K. - Dupont, P. - Callut, J. - van
Helden, J.Journal: BioinformaticsMOTIVATION: Subgraph extraction is a powerful technique to predict
pathways from biological networks and a set of query items (e.g. genes, proteins, compounds...). It
can be applied to a variety of different data types, such as gene expression, protein levels,
operons or phylogenetic profiles. In this article, we investigate different approaches to extract
relevant pathways from metabolic networks. Although these approaches have been adapted to metabolic
networks, they are generic enough to be adjusted to other biological networks as well. RESULTS: We
comparatively evaluated seven sub-network extraction approaches on 71 known metabolic pathways from
S. cerevisiae and a metabolic network obtained from MetaCyc. The best performing approach is a
novel hybrid strategy, which combines a random walk-based reduction of the graph with a shortest
paths-based algorithm, and which recovers the reference pathways with an accuracy of ~ 77%.
AVAILABILITY: Most of the presented algorithms are available as part of the network analysis tool
set (NeAT). The kWalks method is released under the GPL3 license. CONTACT: kfaust@ulb.ac.be.post
to:
CiteULike

|
Bioinformatics -
4 days and 5 hours ago
Publication Date: 2010 Mar 11 PMID: 20223837Authors: Shiraishi, Y. - Kimura, S. - Okada, M.Journal:
BioinformaticsMOTIVATION: Clustering and gene network inference often help to predict the
biological functions of gene subsets. Recently, researchers have accumulated a large amount of
time-course transcriptome data collected under different treatment conditions to understand the
physiological states of cells in response to extracellular stimuli and to identify drug-responsive
genes. Although a variety of statistical methods for clustering and inferring gene networks from
expression profiles have been proposed, most of these are not tailored to simultaneously treat
expression data collected under multiple stimulation conditions. RESULTS: We propose a new
statistical method for analyzing temporal profiles under multiple experimental conditions. Our
method simultaneously performs clustering of temporal expression profiles and inference of
regulatory relationships among gene clusters. We applied this method to MCF7 human breast cancer
cells treated with epidermal growth factor and heregulin which induce cellular proliferation and
differentiation, respectively. The results showed that the method is useful for extracting
biologically relevant information. AVAILABILITY: A MATLAB implementation of the method is available
from http://csb.gsc.riken.jp/yshira/software/clusterNetwork.zip. CONTACT: yshira@riken.jp
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.post to:
CiteULike

|
Bioinformatics -
4 days and 6 hours ago
Publication Date: 2010 Mar 11 PMID: 20223836Authors: Xiao, S. J. - Zhang, C. - Ji, Z. L.Journal:
BioinformaticsSUMMARY: The tissue-specific genes are a group of genes whose function and expression
are preferred in one or several tissues/cell types. Identification of these genes helps better
understanding of tissue-gene relationship, etiology and discovery of novel tissue-specific drug
targets. In this study, a statistical method is introduced to detect tissue specific genes from
more than 123125 gene expression profiles over 107 human tissues, 67 mouse tissues and 30 rat
tissues. As a result, a novel subject-specialized repository, namely the Tissue-Specific Genes
Database (TiSGeD), is developed to represent the analyzed results. Auxiliary information of
tissue-specific genes were also collected from biomedical literatures. AVAILABILITY:
http://bioinf.xmu.edu.cn/databases/TISGED/index.html. CONTACT: appo@bioinf.xmu.edu.cn or
zhiliang.ji@gmail.com.post to:
CiteULike
|
Bioinformatics -
4 days and 7 hours ago
Publication Date: 2010 Mar 11 PMID: 20223835Authors: Narang, V. - Mittal, A. - Sung, W. K.Journal:
BioinformaticsMOTIVATION: Discovery of nucleotide motifs that are localized with respect to a
certain biological landmark is important in several applications, such as in regulatory sequences
flanking the transcription start site, in the neighborhood of known transcription factor binding
sites, and in transcription factor binding regions discovered by massively parallel sequencing
(ChIP-Seq). RESULTS: We report an algorithm called LocalMotif to discover such localized motifs.
The algorithm is based on a novel scoring function, called spatial confinement score, which can
determine the exact interval of localization of a motif. This score is combined with other existing
scoring measures including over-representation and relative entropy to determine the overall
prominence of the motif. The approach successfully discovers biologically relevant motifs and their
intervals of localization in scenarios where the motifs cannot be discovered by general motif
finding tools. It is especially useful for discovering multiple co-localized motifs in a set of
regulatory sequences, such as those identified by ChIP-Seq. Availability and Implementation: The
LocalMotif software is available at http://www.comp.nus.edu.sg/~bioinfo/LocalMotif CONTACT:
ksung@comp.nus.edu.sg SUPPLEMENTARY INFORMATION: Supplementary description of the method, algorithm
and results is available at Bioinformatics online.post to:
CiteULike

|
Bioinformatics -
4 days and 8 hours ago
Publication Date: 2010 Mar 11 PMID: 20223834Authors: Le Cao, K. A. - Meugnier, E. - McLachlan, G.
J.Journal: BioinformaticsMOTIVATION: Microarrays are being increasingly used in cancer research to
better characterize and classify tumors by selecting marker genes. However, as very few of these
genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and
pathological factors which are being used as prognostic indicators of clinical course. Combining
clinical data with gene expression data may add valuable information, but it is a challenging task
due to their categorical versus continuous characteristics. We have further developed the mixture
of experts methodology (ME), a promising approach to tackle complex nonlinear problems. Several
variants are proposed in integrative ME as well as the inclusion of various gene selection methods
to select a hybrid signature. RESULTS: We show on three cancer studies that prediction accuracy can
be improved when combining both types of variables. Furthermore, the selected genes were found to
be of high relevance and can be considered as potential biomarkers for the prognostic selection of
cancer therapy. AVAILABILITY: Integrative ME is implemented in the R package integrativeME
(http://cran.r-project.org/). CONTACT: k.lecao@uq.edu.au.post to:
CiteULike

|
Bioinformatics -
5 days and 7 hours ago
Publication Date: 2010 Mar 10 PMID: 20219865Authors: Bayjanov, J. R. - Siezen, R. J. - van Hijum,
S. A.Journal: BioinformaticsSUMMARY: A pangenome is the total of genes present in strains of the
same species. Pangenome microarrays allow determining the genomic content of bacterial strains more
accurately than conventional comparative genome hybridization microarrays. PanCGHweb is the first
tool that effectively calls genotype based on pangenome microarray data. AVAILABILITY: PanCGHweb
The web tool is accessible from: http://bamics2.cmbi.ru.nl/websoftware/pancgh/ CONTACT:
sacha.vanhijum@nizo.nl.post to:
CiteULike
|
Bioinformatics -
6 days and 7 hours ago
Publication Date: 2010 Mar 9 PMID: 20215462Authors: Li, Y. - Patra, J. C.Journal:
BioinformaticsMOTIVATION: Clinical diseases are characterized by distinct phenotypes. To identify
disease genes is to elucidate the genephenotype relationships. Mutations in functionally related
genes may result in similar phenotypes. It is reasonable to predict disease causing genes by
integrating phenotypic data and genomic data. Some genetic diseases are genetically or phenotypic
similar. They may share the common pathogenetic mechanisms. Identifying the relationship between
diseases will facilitate better understanding of the pathogenetic mechanism of diseases. RESULTS:
In this paper, we constructed a heterogeneous network by connecting the gene network and phenotype
network using the phenotype-gene relationship information from the OMIM database. We extended the
random walk with restart algorithm to the heterogeneous network. The algorithm prioritizes the
genes and phenotypes simultaneously. We use leave one out cross validation to evaluate the ability
of finding the gene-phenotype relationship. Results showed improved performance than previous
works. We also used the algorithm to disclose hidden disease associations that cannot be found by
gene network or phenotype network alone. We identified 18 hidden disease associations, most of
which were supported by literature evidence. AVAILABILITY: The MATLAB code of the program is
available at: http://www3.ntu.edu.sg/home/aspatra/research/Yongjin_BI2010.zip. CONTACT:
yongjin.li@gmail.com; aspatra@ntu.edu.sg.post to:
CiteULike

|
Bioinformatics -
7 days and 7 hours ago
Publication Date: 2010 Mar 8 PMID: 20212019Authors: Tuna, S. - Niranjan, M.Journal:
BioinformaticsMOTIVATION: High throughput measurements of mRNA abundances from microarrays involve
several stages of preprocessing. At each stage, a user has access to a large number of algorithms
with no universally agreed guidance on which of these to use. We show that binary representations
of gene expressions, retaining only information on whether a gene is expressed or not, reduces the
variability in results caused by algorithmic choice, while also improving the quality of inference
drawn from microarray studies. RESULTS: Binary representation of transcriptome data has the
desirable property of reducing the variability introduced at the preprocessing stages due to
algorithmic choice. We compare the effect of the choice of algorithms on different problems and
suggest that using binary representation of microarray data with Tanimoto kernel for SVM reduces
the effect of the choice of algorithm and simultaneously improves the performance of classification
of phenotypes. AVAILABILITY: Supplementary material is available online at Bioinformatics Webpage.
CONTACT: mn@ecs.soton.ac.uk.post to:
CiteULike

|
Bioinformatics -
8 days and 7 hours ago
Publication Date: 2010 Mar 5 PMID: 20208069Authors: Grabherr, M. G. - Russell, P. - Meyer, M. -
Mauceli, E. - Alfoldi, J. - Dipalma, F. - Lindblad-Toh, K.Journal: BioinformaticsMOTIVATION:
Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an
engineering perspective, the problem here is to provide maximum sensitivity (to find all there is
to find), specificity (to only find real homology), and speed (to accommodate the billions of base
pairs of vertebrate genomes). RESULTS: Satsuma addresses all three issues through novel strategies:
a) cross-correlation, implemented via Fast Fourier Transform, b) a match scoring scheme that
eliminates almost all false hits, and c) an asynchronous 'battleship'-like search, that allows for
aligning two entire fish genomes (470 Mb and 217 Mb) in 120 CPU hours using 15 processors on a
single machine. AVAILABILITY: Satsuma is part of the Spines software package, implemented in C++ on
Linux. The latest version of Spines can be freely downloaded under the LGPL license from
http://www.broadinstitute.org/science/programs/genome-biology/spines/ CONTACT:
grabherr@broadinstitute.org.post to:
CiteULike

|
Bioinformatics -
8 days and 8 hours ago
Publication Date: 2010 Mar 5 PMID: 20208068Authors: Johannes, F. - Wardenaar, R. - Colome-Tatche,
M. - Mousson, F. - de Graaf, P. - Mokry, M. - Guryev, V. - Timmers, H. T. - Cuppen, E. - Jansen, R.
C.Journal: BioinformaticsMOTIVATION: ChIP-chip and ChIP-seq technologies provide genomewide
measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples
collected from different tissue types and/or individuals, we can now begin to characterize
stochastic or systematic changes in epigenetic patterns during development (intra-individual) or at
the population level (inter-individual). This requires statistical methods that permit a
simultaneous comparison of multiple ChIP samples on a global as well as locus-specific scale.
Current analytical approaches are mainly geared towards single sample investigations, and therefore
have limited applicability in this comparative setting. This shortcoming presents a bottleneck in
biological interpretations of multiple sample data. RESULTS: To address this limitation, we
introduce a parametric classification approach for the simultaneous analysis of two (or more) ChIP
samples. We consider several competing models that reflect alternative biological assumptions about
the global distribution of the data. Inferences about locus-specific and genomewide chromatin
differences are reached through the estimation of multivariate mixtures. Parameter estimates are
obtained using a version of the Incremental Expectation Maximization algorithm (IEM). We
demonstrate efficient scalability and application to three very diverse ChIP-chip and ChIP-seq
experiments. The proposed approach is evaluated against several published ChIP-chip and ChIPseq
software packages. We recommend its use as a first-pass algorithm to identify candidate regions in
the epigenome, possibly followed by some type of second-pass algorithm to fine-tune detected peaks
in accordance with biological or technological criteria. AVAILABILITY: R source code is available
at http://gbic.biol.rug.nl/supplementary/2009/ChromatinProfiles/ Access to Chip-seq data: GEO
repository GSE17937 CONTACT: f.johannes@rug.nl.post to:
CiteULike

|
Bioinformatics -
8 days and 9 hours ago
Publication Date: 2010 Mar 15 PMID: 20207696Authors: Sasson, A. - Michael, T. P.Journal:
BioinformaticsSUMMARY: Here, we report the development of a filtering framework designed for
efficient identification of both polyclonal and independent errors within SOLiD sequence data. The
filtering utilizes the quality values reported by SOLiD's primary analysis for the identification
of the two different types of errors. The filtering framework facilitates the passage of
high-quality data into a variety of functional genomics applications, including de novo assemblers
and sequence matching programs for SNP calling, improving the output quality and reducing resources
necessary for analysis. AVAILABILITY: This error analysis framework is written in Perl and runs on
Mac OS and Linux/Unix systems. The filter, documentation and sample Excel files for quality
analysis are available at http://hts.rutgers.edu/filter and are distributed as Open Source software
under the GPLv3.0. CONTACT: tmichael@waksman.rutgers.edu SUPPLEMENTARY INFORMATION: Supplementary
data is available at Bioinformatics online.post to:
CiteULike

|
Bioinformatics -
11 days and 7 hours ago
Publication Date: 2010 Mar 3 PMID: 20202974Authors: Johnson, A. D.Journal: BioinformaticsThe
International Union of Pure and Applied Chemistry (IUPAC) code specified nearly 25 years ago
provides a nomenclature for incompletely specified nucleic acids (Cornish-Bowden 1984). The IUPAC
code has been applied in a wide-ranging manner, contributing to many biologically and chemically
meaningful representations, including: 1) recognition sequences (e.g., restriction enzymes, protein
and RNA binding sites, consensus signals), 2) codon degeneracy, 3) sequence base calling ambiguity
4) representation of ancestral states in phylogenetics and 5) to a vast extent in the fields of
genetics and genomics in representing polymorphic nucleic acids, e.g., single nucleotide
polymorphisms (SNPs). However, no system currently exists that allows for the informatics
representation of the relative abundance at polymorphic nucleic acids (e.g., SNPs) in a single
specified character, or a string of characters. Here I propose such an information code as a
natural extension to the IUPAC nomenclature code, and present some potential uses and limitations
to such a code. The original IUPAC code remains useful in all its previous applications and is also
compatible as a subset of the extended code proposed here. The extended IUPAC code allows for new
nucleic acid representations, in single characters or character strings, with potential
applications in genetics, cross-species or cross-strain comparison, sequence alignment,
bioinformatics, genome assembly, database design and querying, and chemical sequencing and
synthesis. The primary anticipated use of this extended nomenclature code is to assist in the
representation of the rapidly growing space of information on human genetic variation.post to:
CiteULike

|
Bioinformatics -
11 days and 8 hours ago
Publication Date: 2010 Mar 4 PMID: 20202973Authors: Clarke, J. - Seo, P. - Clarke, B.Journal:
BioinformaticsMOTIVATION: Global expression patterns within cells are used for purposes ranging
from the identification of disease biomarkers to basic understanding of cellular processes.
Unfortunately tissue samples used in cancer studies are usually composed of multiple cell types and
the non-cancerous portions can significantly affect expression profiles. This severely limits the
conclusions that can be made about the specificity of gene expression in the cell-type of interest.
However, statistical analysis can be used to identify differentially expressed genes that are
related to the biological question being studied. RESULTS: We propose a statistical approach to
expression deconvolution from mixed tissue samples in which the proportion of each component cell
type is unknown. Our method estimates the proportion of each component in a mixed tissue sample;
this estimate can be used to provide estimates of gene expression from each component. We
demonstrate our technique on xenograft samples from breast cancer research and publicly available
experimental data sets found in the National Center for Biotechnology Information (NCBI) Gene
Expression Omnibus (GEO) repository. AVAILABILITY: R code (http://www.r-project.org/) for
estimating sample proportions is freely available to non-commercial users and available at
http://www.med.miami.edu/medicine/x2691.xml CONTACT: jclarke@med.miami.edu; pseo@med.miami.edu;
bclarke2@med.miami.edu.post to:
CiteULike

|
Bioinformatics -
12 days and 6 hours ago
Publication Date: 2010 Mar 2 PMID: 20200011Authors: Moghaddas Gholami, A. - Fellenberg, K.Journal:
BioinformaticsMOTIVATION: Cross-species meta-analyses of microarray data usually require prior
affiliation of genes based on orthology information that often relies on sequence similarity.
RESULTS: We present an algorithm merging microarray datasets on the basis of co-expression alone,
without any requirement for orthology information to affiliate genes. Combining existing methods
such as co-inertia analysis, back-transformation, Hungarian matching, and majority voting in an
iterative non-greedy hill-climbing approach, it affiliates arrays and genes at the same time,
maximizing the co-structure between the datasets. To introduce the method, we demonstrate its
performance on two closely and two distantly related datasets of different experimental context and
produced on different platforms. Each pair stems from two different species. The resulting
cross-species dynamic Bayesian gene networks improve on the networks inferred from each dataset
alone by yielding more significant network motifs, as well as more of the interactions already
recorded in KEGG and other databases. Also, it is shown that our algorithm converges on the optimal
number of nodes for network inference. Being readily extendable to more than two datasets, it
provides the opportunity to infer extensive gene regulatory networks. Availability and
Implementation: Source code (MATLAB and R) freely available for download at
http://www.mchips.org/supplements/moghaddasi_source.tgz CONTACT: kurt@tum.de SUPPLEMENTARY
INFORMATION: Supplementary data are available at
http://www.mchips.org/supplements/moghaddasi_supp.pdf.post to:
CiteULike

|
Bioinformatics -
12 days and 7 hours ago
Publication Date: 2010 Mar 3 PMID: 20200010Authors: Rebholz-Schuhmann, D. - Kavaliauskas, S. -
Pezik, P.Journal: BioinformaticsMOTIVATION: The automatic analysis of scientific literature can
support authors in writing their manuscripts. Implementation: PaperMaker is a novel IT solution
that receives a scientific manuscript via a Web interface, automatically analyses the publication,
evaluates consistency parameters and interactively delivers feedback to the author. It analyses the
proper use of acronyms and their definitions, and the use of specialized terminology. It provides
GO and MeSH categorization of text passages, the retrieval of relevant publications from public
scientific literature repositories, and the identification of missing or unused references. RESULT:
The author receives a summary of findings, the manuscript in its corrected form and a digital
abstract containing the GO and MeSH annotations in the NLM/PubMed format. AVAILABILITY:
http://www.ebi.ac.uk/Rebholz-srv/PaperMaker.post to:
CiteULike

|
Bioinformatics -
12 days and 8 hours ago
Publication Date: 2010 Mar 3 PMID: 20200009Authors: Malone, J. - Holloway, E. - Adamusiak, T. -
Kapushesky, M. - Zheng, J. - Kolesnikov, N. - Zhukova, A. - Brazma, A. - Parkinson, H.Journal:
BioinformaticsMOTIVATION: Describing biological sample variables with ontologies is complex due to
the cross-domain nature of experiments. Ontologies provide annotation solutions, however, for
cross-domain investigations, multiple ontologies are needed to represent the data. These are
subject to rapid change, are often not interoperable and present complexities that are a barrier to
biological resource users. RESULTS: We present the Experimental Factor Ontology (EFO), designed to
meet cross-domain, application focused use cases for gene expression data. We describe our
methodology and open source tools used to create the ontology. These include tools for creating
ontology mappings, ontology views, detecting ontology changes and using ontologies in interfaces to
enhance querying. The application of reference ontologies to data is a key problem and this work
presents guidelines on how community ontologies can be presented in an application ontology in a
data driven way. AVAILABILITY: http://www.ebi.ac.uk/efo CONTACT: malone@ebi.ac.uk.post to:
CiteULike

|
Bioinformatics -
13 days and 6 hours ago
Publication Date: 2010 Mar 1 PMID: 20197286Authors: Shimamura, T. - Imoto, S. - Yamaguchi, R. -
Nagasaki, M. - Miyano, S.Journal: BioinformaticsMOTIVATION: Elucidating the differences between
cellular responses to various biological conditions or external stimuli is an important challenge
in systems biology. Many approaches have been developed to reverse-engineer a cellular system,
called gene network, from time-series microarray data in order to understand a transcriptomic
response under a condition of interest. Comparative topological analysis has also been applied
based on the gene networks inferred independently from each of the multiple time-series datasets
under varying conditions to find critical differences between these networks. However, these
comparisons often lead to misleading results, because each network contains considerable noise due
to the limited length of the time-series. RESULTS: We propose an integrated approach for inferring
multiple gene networks from time-series expression data under varying conditions. To the best of
our knowledge, our approach is the first reverse-engineering method that is intended for
transcriptomic network comparison between varying conditions. Furthermore, we propose a
state-of-the-art parameter estimation method, relevanceweighted recursive elastic net, for
providing higher precision and recall than existing reverse-engineering methods. We analyze
experimental data of MCF-7 human breast cancer cells stimulated by EGF or HRG with several doses
and provide novel biological hypotheses through network comparison. AVAILABILITY: The software
NETCOMP is available at http://bonsai.ims.u-tokyo.ac.jp/~shima/NETCOMP/. CONTACT:
shima@ims.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: All supplementary information can be accessed
online.post to:
CiteULike

|
Bioinformatics -
14 days and 6 hours ago
Publication Date: 2010 Mar 1 PMID: 20194627Authors: Chevenet, F. - Croce, O. - Hebrard, M. -
Christen, R. - Berry, V.Journal: BioinformaticsSUMMARY: There is a large amount of tools for
interactive display of phylogenetic trees. However, there is a shortage of tools for the automation
of tree rendering. Scripting phylogenetic graphics would enable the saving of graphical analyses
involving numerous and complex tree handling operations and would allow the automation of
repetitive tasks. ScripTree is a tool intended to fill this gap. It is an interpreter to be used in
batch mode. Phylogenetic graphics instructions, related to tree rendering as well as tree
annotation, are stored in a text file and processed in a sequential way. AVAILABILITY: ScriptTree
can be used online or downloaded at www.scriptree.org, under the GPL license. Implementation:
ScriptTree is written in Tcl/Tk is a cross-platform application available for Windows and Unix-like
systems including OS X. It can be used either as a standalone package or included in a
bioinformatic pipeline and linked to a HTTP server. CONTACT: chevenet@ird.fr.post to:
CiteULike

|
Bioinformatics -
14 days and 7 hours ago
Publication Date: 2010 Mar 1 PMID: 20194626Authors: Hemmerich, C. - Buechlein, A. - Podicheti, R. -
Revanna, K. V. - Dong, Q.Journal: BioinformaticsSUMMARY: Ergatis is a flexible workflow management
system for designing and executing complex bioinformatics pipelines. However, its complexity
restricts its usage to only highly skilled bioinformaticians. We have developed a web-based
prokaryotic genome annotation server, Integrative Services for Genomics Analysis (ISGA), which
builds upon the Ergatis workflow system, integrates other dynamic analysis tools, and provides
intuitive web interfaces for biologists to customize and execute their own annotation pipelines.
ISGA is designed to be installed at genomics core facilities and be used directly by biologists.
AVAILABILITY: ISGA is accessible at http://isga.cgb.indiana.edu/ and the system is also freely
available for local installation. CONTACT: Qunfeng.Dong@unt.edu SUPPLEMENTARY INFORMATION:
Supplementary figures are available at Bioinformatics online.post to:
CiteULike

|
Bioinformatics -
14 days and 8 hours ago
Publication Date: 2010 Mar 1 PMID: 20194625Authors: Tanaka, N. - Waki, K. - Kaneda, H. - Suzuki, T.
- Yamada, I. - Furuse, T. - Kobayashi, K. - Motegi, H. - Toki, H. - Inoue, M. - Minowa, O. - Noda,
T. - Takao, K. - Miyakawa, T. - Takahashi, A. - Koide, T. - Wakana, S. - Masuya, H.Journal:
BioinformaticsSUMMARY: This paper reports the development of SDOP-DB, which can provide definite,
detailed, and easy comparison of experimental protocols used in mouse phenotypic analyses among
institutes or laboratories. Because SDOP-DB is fully compliant with international standards, it can
act as a practical foundation for international sharing and integration of mouse phenotypic
information. AVAILABILITY: SDOP-DB (http://www.brc.riken.jp/lab/bpmp/SDOP/) CONTACT:
knowledge-base@brc.riken.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at
Bioinformatics online.post to:
CiteULike
|
Bioinformatics -
15 days and 6 hours ago
Publication Date: 2010 Feb 26 PMID: 20190251Authors: Chou, W. Y. - Chou, W. I. - Pai, T. W. - Lin,
S. C. - Jiang, T. Y. - Tang, C. Y. - Chang, M. D.Journal: BioinformaticsMOTIVATION: Carbohydrate
binding modules (CBMs) share similar secondary and tertiary topology, but their primary sequence
identity is low. Computational identification of ligand-binding residues allows biologists to
better understand the protein-carbohydrate binding mechanism. In general functional
characterization can be alternatively solved by alignment-based manners. As alignment accuracy
based on conventional methods is often sensitive to sequence identity, low sequence identity among
query sequences makes it difficult to precisely locate small portions of relevant features.
Therefore, we propose a feature-incorporated alignment (FIA) to flexibly align conserved signatures
in CBMs. Then, an FIA-based target-template prediction model was further implemented to identify
functional ligand-binding residues. RESULTS: Arabidopsis thaliana CBM45 and CBM53 were used to
validate the FIA-based prediction model. The predicted ligand-binding residues residing on the
surface in the hypothetical structures were verified to be ligand-binding residues. In the absence
of three dimensional structural information, FIA demonstrated significant improvement in the
estimation of sequence similarity and identity for a total of 808 sequences from 11 different CBM
families as compared with six leading tools by Friedman rank test. CONTACT:
dtchang@life.nthu.edu.tw.post to:
CiteULike

|
Bioinformatics -
15 days and 7 hours ago
Publication Date: 2010 Feb 26 PMID: 20190250Authors: Malhis, N. - Jones, S. J.Journal:
BioinformaticsMOTIVATION: Detection of single nucleotide polymorphisms (SNPs) has been a major
application in processing second generation sequencing (SGS) data. In principle, SNPs are called on
single base differences between a reference genome and a sequence generated from SGS short reads of
a sample genome. However, this exercise is far from trivial; several parameters related to
sequencing quality, and/or reference genome properties, play essential effect on the accuracy of
called SNPs especially at shallow coverage data. In this work, we present Slider II, an alignment
and SNP calling approach that demonstrates improved algorithmic approaches enabling larger number
of called SNPs with lower false positive rate. In addition to the regular alignment and SNP
calling, as an optional feature, Slider II is capable of utilizing information about known SNPs of
a target genome, as priors, in the alignment and SNPs calling to enhance it's capability of
detecting these known SNPs and novel SNPs and mutations in their vicinity. CONTACT:
nmalhis@bcgsc.ca Supplementary information and availability:
http://www.bcgsc.ca/platform/bioinfo/software/SliderII.post to:
CiteULike

|
Bioinformatics -
15 days and 8 hours ago
Publication Date: 2010 Feb 25 PMID: 20189941Authors: Trudgian, D. C. - Thomas, B. - McGowan, S. J.
- Kessler, B. M. - Salek, M. - Acuto, O.Journal: BioinformaticsSUMMARY: The Central Proteomics
Facilities Pipeline (CPFP) provides identification, validation, and quantitation of peptides and
proteins from LC-MS/MS datasets through an easy to use web interface. It is the first analysis
pipeline targeted specifically at the needs of proteomics core facilities, reducing the
data-analysis load on staff, and allowing facility clients to easily access and work with their
data. Identification of peptides is performed using multiple search engines, their output combined
and validated using state-of-the-art techniques for improved results. Cluster execution of jobs
allows analysis capacity to be increased easily as demand grows. AVAILABILITY: Released under the
Common Development and Distribution License (CDDL) at http://cpfp.sourceforge.net/. Demonstration
available at https://cpfp-master.molbiol.ox.ac.uk/cpfp_demo CONTACT: dctrud@ccmp.ox.ac.uk.post to:
CiteULike

|
Bioinformatics -
15 days and 9 hours ago
Publication Date: 2010 Feb 25 PMID: 20189940Authors: Corel, E. - Pitschi, F. - Morgenstern,
B.Journal: BioinformaticsMOTIVATION: Multiple sequence alignments can be constructed on the basis
of pairwise local sequence similarities. This approach is rather flexible and can combine the
advantages of global and local alignment methods. The restriction to pairwise alignments as
building blocks, however, can lead to misalignments since weak homologies may be missed if only
pairs of sequences are compared. RESULTS: Herein, we propose a graph-theoretical approach to find
local multiple sequence similarities. Starting with pairwise alignements produced by DIALIGN, we
use a min-cut algorithm to find potential (partial) alignment columns that we use to construct a
final multiple alignment. On real and simulated benchmark data, our approach consistently
outperforms the standard version of DIALIGN where local pairwise alignments are greedily
incorporated into a multiple alignment. AVAILABILITY: The prototype is freely available under GNU
Public Licence from the first author. CONTACT: ecorel@gwdg.de.post to:
CiteULike

|
Bioinformatics -
15 days and 10 hours ago
Publication Date: 2010 Feb 25 PMID: 20189939Authors: Beisser, D. - Klau, G. W. - Dandekar, T. -
Mueller, T. - Dittrich, M.Journal: BioinformaticsMOTIVATION: Increasing quantity and quality of
data in transcriptomics and interactomics create the need for integrative approaches to network
analysis. Here we present a comprehensive R-package for the analysis of biological networks
including an exact and a heuristic approach to identify functional modules. RESULTS: The BioNet
package provides an extensive framework for integrated network analysis in R. This includes the
statistics for the integration of transcriptomic and functional data with biological networks, the
scoring of nodes as well as methods for network search and visualization. AVAILABILITY: The BioNet
package and a tutorial are available from http://bionet.bioapps.biozentrum.uni-wuerzburg.de.
CONTACT: marcus.dittrich@biozentrum.uni-wuerzburg.de,
tobias.mueller@biozentrum.uni-wuerzburg.de.post to:
CiteULike
|
Bioinformatics -
15 days and 11 hours ago
Publication Date: 2010 Feb 25 PMID: 20189938Authors: Zehetmayer, S. - Posch, M.Journal:
BioinformaticsBACKGROUND: The statistical power or multiple Type II error rate in large scale
multiple testing problems as, for example, in gene expression microarray experiments, depends on
typically unknown parameters and is therefore difficult to assess a priori. However, it has been
suggested to estimate the multiple Type II error rate post-hoc, based on the observed data.
METHODS: We consider a class of post-hoc estimators that are functions of the estimated proportion
of true null hypotheses among all hypotheses. Numerous estimators for this proportion have been
proposed and we investigate the statistical properties of the derived multiple Type II error rate
estimators in an extensive simulation study. RESULTS: The performance of the estimators in terms of
the mean squared error depends sensitively on the distributional scenario. Estimators based on
empirical distributions of the null hypotheses are superior in the presence of strongly correlated
test statistics. AVAILABILITY: R-code (R Development Core Team, 2008) to compute all considered
estimators based on p-values and supplementary material is available from the authors web page
http://statistics.msi.meduniwien.ac.at/index.php?page=pageszfnr CONTACT:
martin.posch@meduniwien.ac.at.post to:
CiteULike

|
Bioinformatics -
15 days and 12 hours ago
Publication Date: 2010 Feb 25 PMID: 20189937Authors: Wong, G. - Leckie, C. - Gorringe, K. L. -
Haviv, I. - Campbell, I. G. - Kowalczyk, A.Journal: BioinformaticsMOTIVATION: High-density single
nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the
detection of copy number variation. To ensure accuracy in probe synthesis and to minimise
production costs, short oligonucleotide probe sequences are used. The use of short probe sequences
limits the specificity of binding targets in the human genome. The specificity of these short
probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence
similarity can artificially elevate or suppress copy number measurements and hence reduce the
reliability of affected probe readings. For the purpose of detecting narrow copy number variations
reliably down to the width of a single probeset, sequence similarity is an important issue that
needs to be addressed. RESULTS: We surveyed the Affymetrix Human Mapping SNP arrays for probeset
sequence similarity against the reference human genome. Utilising sequence similarity results, we
identified a collection of fine-scaled putative copy number variations between gender from
autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these
variations, we utilised our statistical approach, DRECS, and showed that its performance was
superior and more stable than the t-test in detecting copy number variations. Through the
application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we
identified biologically relevant SNPs in aberrant regions across populations with known association
to physical traits, such as height, covered by the span of a single probe. This provided empirical
confirmation of the existence of naturally occurring narrow copy number variations as well as the
sensitivity of the Affymetrix SNP array technology in detecting them. AVAILABILITY: The MATLAB
implementation of DRECS is available at http://ww2.cs.mu.oz.au/~gwong/DRECS/index.html CONTACT:
gwong@csse.unimelb.edu.au.post to:
CiteULike

|
Bioinformatics -
18 days and 7 hours ago
Publication Date: 2010 Feb 24 PMID: 20185407Authors: Loriot, S. - Cazals, F. - Bernauer, J.Journal:
BioinformaticsSUMMARY: The ever increasing number of structural biological data calls for robust
and efficient software for analysis. ESBTL (Easy Structural Biology Template Library) is a
lightweight C++ library that allows the handling of PDB data and provides a data structure suitable
for geometric constructions and analyses. The parser and data model provided by this ready-to-use
include-only library allows adequate treatment of usually discarded information (insertion code,
atom occupancy...) while still being able to detect badly formatted files. The template-based
structure allows rapid design of new computational structural biology applications and is fully
compatible with the new remediated PDB archive format. It also allows the code to be easy-to-use
while being versatile enough to allow advanced user developments. AVAILABILITY: ESBTL is freely
available under the GNU General Public License from http://esbtl.sf.net. The website provides the
source code, examples, code snippets, and documentation. CONTACT: julie.bernauer@inria.fr.post to:
CiteULike

|
|
What is Matoumba?
A website that sorts everyday the most relevant information to you.
Vote for the news and Matoumba will learn your tastes and the information that you like the most.
It is all FREE!
|