

Bioinformatics
The life science research will become more and more dependent on
computing analysis in every aspects, I will try to put some related
links here. Please give your
comment/ suggest your favorite links to me.
Databases
*if you are looking for gene expression
databases, please refer to public
database section in My microarray
software comparison page.
Amos'
WWW links page - contains almost exclusively pointers to
information sources for life scientists with an interest in biological
macromolecules
BioMOBY.org - BioMOBY
is an international research project involving biological data hosts,
biological data service providors, and coders whose aim is to explore
various methodologies for biological data representation, distribution,
and discovery.
DBCAT -
The catalog of databases
Nucleic Acid
Research January 1, 2003 issue - The Molecular Biology Database
Collection - This Collection is intended to bring fellow
scientists attention to high-quality databases that are available
throughout the world, rather than just be a lengthy listing of all
available databases. As such, this up-to-date listing is intended to
serve as the initial point from which to find specialized databases
that
may be of use in biological research.
General databases
Protein families & sequence motifs databases
- 3 Dee
- Database of Protein Domain Definitions
- Blocks
- a protein domain database
- CluSTr
(Clusters of SWISS-PROT+TrEMBL proteins) - automatic
classification of SWISS-PROT + TrEMBL proteins into groups of related
proteins.
- HPRD (Human Protein Reference
Database) - a centralized platform to visually depict and
integrate information pertaining to domain architecture,
post-translational modifications, interaction networks and disease
association for each protein in the human proteome. All the information
in HPRD has been manually extracted from the literature by expert
biologists who read, interpret and analyze the published data. HPRD has
been created using an object oriented database in Zope, an open source
web application server, that provides versatility in query functions
and allows data to be displayed dynamically.
- InterPro
- provides an integrated view of the commonly used signature
databases
- PIR
(Protein Information Resourse) - maintained by National
Biomedical Research Foundation
- Pfam
- Protein families of alignments and HMMs
- PRINTS - Protein
fingerprint database
- ProDom -
The protein domain database
- Proteome Analysis @ EBI -
a research-oriented initiative in order to utilise all the existing
resources and provide comparative analysis of the predicted protein
coding sequences of all complete genomes. The two main projects are InterPro and CluSTr.
- PRF (Protein
Research Foundation)
- PROSITE
- Database of protein families and domains
Signal
sequence/ transcription elements/ promoter analysis
- TFSEARCH
- Searching Transcription Factor Binding Sites
- TRANSFAC
(The Transcription Factor Database) - TRANSFAC is a database
on eukaryotic cis-acting regulatory DNA elements and trans-acting
factors. It covers the whole range from yeast to human.
- WWW Promoter Scan
- Predicts Promoter regions based on scoring homologies with putative
eukaryotic Pol II promoter sequences.
- WWW Signal Scan -
Find and list homologies of published signal sequences with the input
DNA sequence.
Protein-protein
interaction databases
- BIND -
Biomolecular Interaction Network Database is a database designed to
store full descriptions of interactions, molecular complexes and
pathways
- DIP
- Database
of Interacting Proteins catalogs experimentally determined interactions
between proteins.
- GRID (General
Repository for Interaction Datasets) - The GRID is a database
of genetic and physical interactions.It contains interaction
data from many sources, including several genome/proteome-wide studies,
the MIPS database, and BIND.
- Interact
- An Object Oriented Database for Protein-Protein Interactions.
- Pronet
Online - provide protein-protein interaction data to
biological and medical researchers in a useful, integrated fashion.
Maintained by Myriad Genetics.
- Yeast
protein complex database -
Pathways databases
- BioCyc
-
BioCyc Knowledge Library is a collection of Pathway/Genome Databases.
Each database in the BioCyc collection describes the genome and
metabolic pathways of a single organism, with the exception of the
MetaCyc database, which is a reference source on metabolic pathways
from
many organisms. There are two main categories: (1)
Literature-derived Pathway/ Genome Databases like EcoCyc & MetaCyc. (2)
Computationally-derived Pathway/Genome Databases consists of pathways
from different organisms, e.g.YeastCyc.
- BIND -
Biomolecular Interaction Network Database is a database designed to
store full descriptions of interactions, molecular complexes and
pathways
- CSNDB (Cell Signaling
Networks Database) - is a data- and knowledge- base for
signaling pathways of human cells.
- Database of Quantitative
Cellular Signaling (DOQCS) - a repository of models of
signaling pathways. It includes reaction schemes, concentrations, rate
constants, as well as annotations on the models. Reference [PubMed]
- EcoCyc Encyclopedia
- Database that describes the genome and the biochemical
machinery
of E. coli, maintained by SRI International, Menlo Park, CA.
- (EMP) Enzymes
and Metabolic Pathways database - covers all aspects of
enzymology and metabolism and represents the whole factual content of
original journal publications. A metabolic part of EMP constitutes a
separate database, EMP Pathways (earlier known as MPW). It contains
more
than 3,000 metabolic diagrams.
- ExPASy
- Biochemical Pathways - Digitized version of wall
charts courtesy Boehringer Mannheim et al, divided into Metabolic
Pathways and Cellular and Molecular Processes, maintained by the Swiss
Institute of Bioinformatics, Geneva, Switzerland.
- GeNet
(Gene Networks Database) - GeNet contains the information on
functional organization of regulatory genes networks acting at
embryogenesis.
- KEGG(Kyoto
Encyclopedia of Genes and Genomes) - an effort to
computerize current knowledge of molecular and cellular biology in
terms
of the information pathways that consist of interacting molecules or
genes and to provide links from the gene catalogs produced by genome
sequencing projects.
- Kinase
Pathway Database - an integrated database concerning
completed sequenced major eukaryotes, which contains the classification
of protein kinases and their functional conservation and orthologous
tables among species, protein-protein interaction data, domain
information, structural information, and automatic pathway graph image
interface.
- MetaCyc - Metabolic
Encyclopedia - Description of over 450 metabolic
pathways and their associated enzymes, from over 150 organisms
maintained by SRI International, Menlo Park, CA.
- MIPS
yeast pathways
- TRANSPATH
- an information system on gene-regulatory pathways It focuses on
pathways involved in the regulation of transcription factors. Elements
of the relevant signal transduction pathways like hormones, enzymes,
complexes and transcription factors are stored together with
information
about their interaction. All data is extracted by experts from the
scientific literature.
- UM-BBD -
Microbial Biocatalysis/Biodegradatation - Microbial biocatalytic
reactions and biodegradation pathways primarily for xenobiotic,
chemical
compounds.
- WIT
- a www-based system to support the curation of function assignments
made to genes and the development of metabolic models.
Structural databases
SNPs databases
Histology
databases
- Edinburgh Mouse
ATLAS(EMAP) - a series of three-dimensional models of mouse
embryos at successive stages of development, linked to a standard
anatomical nomenclature.
Languages/ Algorithms
XML for Molecular
Biology - a comprehensive list of XMLs for molecular biology
compiled by Paul Gordon
- BioCORBA.org - The BioCORBA
Project provides an object-oriented, language neutral, platform
independent method for describing and solving bioinformatic problems.
- Biodas.org- developing an Open
Source system for exchanging annotations on genomic sequence data.
- Bioinformatic Sequence Markup Language
(BSML) - a XML encodes biological sequence information and
includes graphical representations of biologically meaningful objects
such as sequences, genes, electrophoresis gels, and multiple
alignments.
Futher informaton from XML cover page.
- BioJava.org - The BioJava
Project is an open-source project dedicated to providing Java tools for
processing biological data, including objects for manipulating
sequences, file parsers, CORBA interoperability, access to ACeDB,
dynamic programming, and simple statistical routines.
- BioPerl.org - The Bioperl
Project is an international association of developers of open source
Perl tools for bioinformatics, genomics and life science research.
- BIOpolymer Markup
Language (BIOML) - a XML designed to be used for the
annotation of biopolymer sequence information. BIOML allows the full
specification of all experimental information known about molecular
entities composed of biopolymers, for example, proteins and genes.
Futher informaton from XML cover page.
- BioPython.org - The Biopython
Project is an international association of developers of freely
available Python tools for computational molecular biology
- BioXML.org - Bioxml.org is a
resource to gather XML documentation, DTDs and tools for biology in one
central location. It overlaps in interest and in tools with the BioPerl project, which also hosts a
page about XML.
- cellML - XML-based markup
language being developed to store and exchange computer-based
biological
models; aimed at describing the structure and underlying mathematics of
cellular models in a very general way and has facilities for describing
any associated metadata.
- Chemical Markup Language (CML)
- a XML to manage of chemical information. Futher informaton from XML cover page.
- GEML (Gene Expression Markup
Language) - an Extensible Markup Language (XML)-based tag
set, was developed by Rosetta Inpharmatics and others in the GEML
community to provide a standard method of exchanging gene
expression data along with the associated gene and experiment
annotation.
- Molecular Dynamics
[Markup] Language (MoDL) - XML for simulation data from
molecular dynamics; MoDL provides simple constructs like atom, bond,
molecule and TRANSLATE that mark-up the simulation data. Futher
informaton from XML
cover page.
- MSAML - An XML
for Multiple Sequence Alignments - a set of XML compliant
markup components for describing multiple sequence alignments [amino
acids and nucleic acid sequences] Futher informaton from XML cover page.
- OpenBQS -
Bibliographic Query Service is aimed at providing access to
heterogeneous bibliographic databases and the development of
interoperable clients that make use of this access.
- Protein Extensible
Markup Language (PROXIML) - Problems associated with
existing
protein data formats, such as PDB and mmCIF, indicate a need for a more
self-describing and machine-readable approach to exchanging
protein-related data. XML is an ideal solution for this particular
problem. PROXIML can encode the relevant details of protein structure
in
a more robust and well-structured fashion than other currently
available
data formats
- StarDOM -
Transforming Scientific Data into XML - a software package
to
transform data provided in the Self Defining Text Archival and
Retrieval
(STAR) format into XML. This opens new possibilities for visual
editing,
archiving, parsing and structured queries of structural biology data.
Futher informaton from XML cover page.
- Systems
Biology Markup Language (SBML) - XML-based exchange formats
for describing cellular models; aimed at exchanging information about
pathway and reaction models between several existing applications
Standards
- OmniGene -
OmniGene is an open source, open standards project aimed at helping
bioinformatics professionals and students exchange biological data.
Ontology
- Gene Ontology -
The goal of the Gene Ontology (GO) Consortium is to produce a
controlled vocabulary that can be applied to all organisms even as
knowledge of gene and protein roles in cells is accumulating and
changing. GO provides three structured networks of
defined terms to describe gene product attributes.
- Gene Ontology Annotation
(GOA) @ EBI - GOA is a project run by the European
Bioinformatics Institute that aims to provide assignments of gene
products to the Gene Ontology
(GO) resource.
Software/ programs/ online analysis
- ACGT
(A Comparative Genomics Tool) - a genomic DNA sequence
comparison viewer and analyzer. It can read a pair of DNA sequences in
GenBank, Embl or Fasta formats, with or without a comparison file, and
provide users with many options to view and analyze the similarities
between the input sequences. Reference [PubMed]
- Biocatalog
- a software directory of general interest in molecular biology and
genetics maintained by EBI.
- Bioinformatics.org
- a non-profit, academe-based organization committed to opening
access to bioinformatics research projects, providing Open Source
software for bioinformatics by hosting its development, and keeping
biological information freely available.
- Bionavigator.com
- a famous online analysis provider
- Biowire.com-
a online analysis provider
- BlastReport
- A Perl script to facilitate the use of sequence databases for mapping
and clustering
- Coded
Electronic Life Library (CELL) - cross-referencing of
biological entity information from numerous database in an ontology-
based system by incellico.
- Cyspred -
Predictor of the bonding state of cysteines in proteins
- Deep Computing
Institute - Some programs offer by IBM bioinformatic
initative
- EMBOSS -
The European Molecular Biology Open Software Suite: a package of
high-quality FREE Open Source software for sequence analysis.
- EMBOSS for Windows - an attempt to make the
EMBOSS package run on PCs equipped with Microsoft Windows
- ExPASy Proteomics tools
-
Tons of softwares for proteomics analysis in ExPASy (Expert Protein Analysis System)
server, maintained by Swiss
Institute of Bioinformatics (SIB).
- Folding@home
- A simulate protein folding project in Stanford by distributed
computing. You can contribute by donating your computer's idle time!
- GCG -
Genetic Computer Group: One of the most famous software suite for
bioinformatics.
- GeneMine
- a free sequence analysis and visualization program that makes full
use
of analysis servers across the Internet
- Genome@home
- Genome@home is to design new genes that can form working proteins in
the cell. Genome@home uses a computer algorithm (SPA), based on the
physical and biochemical rules by which genes and proteins
behave,
to design new proteins (and hence new genes) that have not been found
in
nature.
- Graphviz
- open source graph drawing software
- Intel
Philanthropic Peer-to-Peer Program - The United
Devices
Cancer Research Program to search for new drugs to treat leukemia by
distributed computing.
- Mathtools.net
- provided a list of biology related programs
- MatInd and
MatInspector - fast and sensitive tools for detection of
consensus matches in nucleotide sequence data
- Model@home
- is a distributed computing environment that follows the spirit of the
famous Seti@Home project, but is not tied to a specific application;
mainly used for large scale molecular modeling
- Pathway
Hunter Tool (PHT) - Shortest path analysis for metabolomics
networks.
- PIMRider
- a proteomics software platform written by Hybrigenics, dedicated to the
exploration of protein pathways.
- Psipred
- protein secondary structure prediction server
- Swiss-PdbViewer-
a good software to analyze protein structure; tightly link with
Swiss-model
- SWISS-MODEL
- An Automated Comparative Protein Modelling Server
- UK human
genome mapping centre - offers a lot of bioinformatics and
linkage analysis program for registered users
- Vector
NTI suite 7.0 - a comprehensive and seamlessly integrated
sequence analysis software toolset for molecular biologists
Network/
Pathway
in silico
biology/ computational biology
- The Physiome Project -The
PHYSIOME PROJECT is an integrated multi-centric program to design,
develop, implement, test and document, archive and disseminate
quantitative information and integrative models of the functional
behavior of organelles, cells, tissues, organs, and organisms.
- Virtual Cell Project -
a remote user modeling and simulation environment utilizing Java's
Remote Method Invocation (RMI), developed by The National Resource for
Cell Analysis and Modeling (NRCAM)
Education
- A list of
bioinformatics courses - a good list of course
maintained by WenTian Li in Rockefeller University
- Bioinformatics
@ NCBI - A simple introduction to bioinformatics
- Bioinformatics
Seminar Series - Online seminars at University of Michigan.
- GeneEd - offers
online courses for profit
- Molecular
Modeling @ NCBI - A simple introduction to protein structure
modeling
- Mathematical
Sciences Research Institute (MSRI) - provides a number of
online video seminars on mathematical analysis on genomics, microarray,
bioinformatics, proteomics, linkage analysis and genome mapping
- Online
Courses and Tutorials - a good list of online course
maintained by Bioinformatik.de
- Protein sequence
analysis - A practical guide - The online exercise of the
book
"Introduction
to bioinformatics"
- S-Star.org - The S-Star group of
teaching institutions have formed a global alliance to provide a
global,
unified bioinformatics learning environment (GLOBULE) made up of
modular
courses in the disciplines of genomics, bioinformatics, and medical
informatics. Their Mission is to provide anyone with an introductory
course in bioinformatics.
- Statistics
in Genetics - Dr. Terry Speed's statistics course note on
genetics, covering HMMs, MCMCs, sequence alignment, motif finding, gene
finding and microarray analysis.
- Virtual Bioinformatics
Distance Learning - Bioinformatics and Functional genomics
courses offered by IMC Bioinformatics, University of Tampere
- What
is Bioinformatics - An introduction article by Mark Gerstein at
Yale
University.
Conference
Resources
Companies
and Institutes
- Accelrys
- a
company formed by MSI, Synopsys, Oxford Molecular, and GCG to provide informatic tools.
- Ariadne
Genomics Inc - develops tools for systems biology:
proprietary
natural language processing (NLP) and statistical algorithms, knowledge
bases and software to analyze molecular networks.
- Compugen
- Compugen is a pioneer in the fields of computational genomics and
proteomics. The company combines the disciplines of mathematics and
computer science with molecular biology to improve the understanding of
genomics and proteomics, the study of genes and proteins.
- Computational and
Applied Genomics Program from Duke University
- Deep Computing
Institute - IBM bioinformatic initative.
- European
Bioinformatic Institute
- Jena
Centre for Bioinformatics (JCB) - to promote
inter-disciplinary research and to establish training courses in
bioinformatics in the Jena region. It is to stimulate the collaboration
between Computer Scientists and Mathematicians on the one side and
Biologists, Chemists, Physicists and Physicians on the other side.
- Lion
Bioscience - developer for many good softwares like SRS
- Open
Bioinformatics Foundation- a non profit, volunteer run
organization focused on supporting open source programming in
bioinformatics.
- Ocimum
Biosolution - Ocimum Biosolutions is a life sciences
contract
research and development company with competencies in Bioinformatics,
Genomics, Proteomics and custom contract research services, and
operations in both the USA and India.
- NCBI
(National Center for Biotechnology Information)
- Physiome
Sciences - aim to to help pharmaceutical companies develop
better drugs faster through the use of biological simulations.
- UK human
genome mapping centre - offers a lot of bioinformatics and
linkage analysis courses for registered users
- Research
Group Bioinformatics/AG Bioinformatik - their focuse is are
regulatory genomic signals and regions, in particular those that govern
transcriptional control. They analyze and characterize the underlying
sequence elements and their context and develop database and software
tools for their identification in newly unravelled genomic sequences.
- Systems
Biology Workbench Development Group - their mission is to
develop an integrated, easy-to-use environment, the workbench, which
will enable biologists to create, manipulate, display and analyze
biological models at molecular, cellular and multicellular levels.
- Swiss
Institute
of Bioinformatics (SIB)
- The
Bioinformatics Resource of the University of Hong Kong
-
provides bioinformatics services to users in University
of Hong Kong
- The Sanger
Centre - a genome research centre founded by the Wellcome
Trust and the Medical Research Council.
Associations
and Societies
Laboratories
and Peoples
- Genetic Circuits Research
Group - Dr. Bernhard O. Palsson's group focuses on in silico
modeling of genetic circuits involving metabolism and gene regulation.
- Gerstein, Mark - Dr.
Gerstein group is doing research in the emerging field of
bioinformatics, using computation to analyze genome sequences,
expression datasets, and macromolecular structures
Literatures
Please refer to bioinformatics
section of My functional genomics
journal watch for the literatures archives.
- Bioinform
- The Global Bioinformatics News Service
- BioInformer
- quarterly newsletter which focuses on bioinformatics research,
developments, and services at the European Bioinformatics Institute
(EBI) and elsewhere.
- Bioinformatics
- an international journal published by Oxford University Press
- Bio-IT
world magazine
- in
silico
Biology - an international journal on computational
molecular
biology
genome research
BMC bioinformatics
genome biology
BMC genomics
journal of computational biology
Briefings in Bioinformatics
Books
Please refer to My
favorite bioinformatics books & data
analysis reference book section (under My
favorite microarray books) for my choice of bioinformatics books
and
data analysis references!
Glossaries
Please refer to My
functional genomics and data mining glossaries
Genomics
Glossary - A comprehensive genomic glossary draft covering
many aspects, maintained by Mary Chitty of Cambridge Healthtech Institute
last updated: 11 Sep 2004
home