Please contact me if you have any suggestion on this list
Definition
of data mining software - specific analysis software
Suggested readings
Data
mining software - specific analysis software (in alphabetical order)
[ 50-50
MANOVA | Affymate | AIDA Array Compare | ANOVA programs for microarray data
| ArrayMiner | Cleaver
| Cluster Identification
Tool (CIT) | CLUSFAVOR
(CLUSter and Factor Analysis Using Varimax Orthogonal Rotation)
| Coupled Two-Way Clustering (CTWC)
| Cyber T | FDR
controlling procedures | FEXAT
| Gene Cluster | General
Hidden Markov Model library (GHMM) | GeneViz
| GIMM | INCLUSive | LACK | Machaon Clustering and
Validation Environment (CVE) | Microhelper
| Multi microarray
normalisation | NIA
Microarray ANOVA Tool | Onto-Express |
Open source clustering
software |
PAM
(Prediction Analysis for
Microarrays) | Probe Profiler | R
cluster | SAM
(Significance Analysis of
Microarrays) | supervised
Network Self-Organized Map (sNet-SOM) | SNOMAD: Standardization and
NOrmalization of MicroArray Data | SparseLOGREG | TableView | Venn Mapper | VERA
& SAM ]
| Product | Company/ Institute | Interface/ Operating System | Features | Price | Remarks |
| 50-50 MANOVA |
MATFORSK, Norwegian Food Research Institute | Windows |
New MANOVA method that handles
collinear responses. Calculates adjusted p-values in general linear models by rotation testing. False discovery rate function will be added to the program in the future. |
Free |
Download (50-50 MANOVA), (Rotation Tests) |
| Affymate | Array Genetics | web | Affymate is designed to rapidly analyze multiple pairs of DNA microarray data | no pricing information is available | Demo; |
| AIDA Array Compare | Raytest GmbH | windows | provides data of the comparison of one master array with client arrays | no pricing information is available | |
| ANOVA programs for microarray data | Churchill Statistical Genetics Group; The Jackson Laboratory | Matlab | performs the analysis of variance on microarray data. | Free | reference 1 [pdf]; reference 2 [pdf] |
| ArrayMiner 2 | Optimal Design | Windows/ Mac/ add-on to GeneSpring | Proprietary genetic algorihm for clustering; a number of visualization tools; a new class of clustering algorithm is available in verion 2 | check here | demo download; manual;paper compare with K-means; version 2 white paper; also available as optional add-on to GeneSpring |
| Cleaver 1.0 | Stanford Biomedical Informatics | Web | Classification (discriminant analysis), K-means clustering, PCA | Free | documentation; reference [PubMed] |
| Cluster Identification Tool (CIT) | Van Andel Research Institute | Windows | Statistical discrimination metric and permutation analysis to identify clusters of genes or individual genes that best differentiate experimental groups | Free | integrate with Cluster and Treeview; download; sample data; supplemental document |
| CLUSFAVOR 6.0 (CLUSter and Factor Analysis Using Varimax Orthogonal Rotation) | Molecular Biology Computation Resource, Baylor College of Medicine | Windows 95/98/NT/2000/XP | performing cluster and factor analysis | Free for academic user | download;user guide; features;faq; troubleshooting; reference [PubMed] |
| Coupled Two-Way Clustering (CTWC) | Department of Physics of Complex Systems, Weizmann Institute of Science | web | Coupled Two-Way Clustering | Free for academic user | registration is required; reference [PubMed][pdf];
algorithm; server: Reference [PubMed] |
| Cyber T | UC Irvine | Linux/ Unix with R statistical language; or use their web interface | t-test for statistically significant differences between sample sets for arrays; Bayesian probabilistic framework to estimate the variance among replicates | Free | help;tutorial;download only R library; how to install; download entire web interaface; CyberT has been incorporated into the GeneX database and analysis package; reference [PubMed] |
| FDR controlling procedures | Anat Reiner, Daniel Yekutieli and Yoav Benjamini |
Windows |
adjusts p-values generated in multiple hypothesis testing of gene expression data obtained by cDNA microarray experiment. | Free |
download; source code; reference [PubMed][doc] |
| FEXAT | Kraft P, Schadt EE, Aten J, Horvath S | Linux |
A family-based test for correlation between gene expression and trait values | Free |
download;
Reference [PubMed];Help
file |
| Gene Cluster 2.0 | Whitehead Institute Centre for genome research | JAVA | filter and preprocess data in a variety of ways; Self-Organizing Map; unsupervised classification by weighted voting (WV) and k-nearest neighbors (KNN) algorithms, gene selection and permutation test methods | Free for academic user | download;manual;faq; |
| General Hidden Markov
Model library (GHMM) |
Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, University of Cologne | C library with an additional C++-API | hidden Markov models to analyze
gene expression time course data. |
LGPL | Reference [PubMed];
|
| GeneViz | ContentSoft AG | Windows | Double Conjugated Clustering (DCC) - cluster simultaneously samples and genes; Singular Value Decomposition Sorting (SVD) | no pricing information is available | demo available; publications;brochures; |
| GIMM |
University of Cincinnati Medical
Center |
Windows |
A clustering procedure based on
the concept of Bayesian model-averaging and a precise statistical model
of expression data |
GNU GPL |
download;
source
code; Reference 1 [PubMed];
Reference 2 [PubMed] |
| GLR |
Wang S, Ethier S. Department of
Mathematics, University of Utah. |
Windows |
GLR is a statistical analysis program to identify differentially expressed genes from microarray data. It implements a generalized likelihood ratio test based on the two-component model | Free |
download; Reference [PubMed] |
| INCLUSive | Katholieke Universiteit Leuven | web | A suite of web based tools and is aimed at the automatic multistep analysis of microarray data (clustering and motif finding). Currently, adaptive quality-based clustering, retrieval of upstream sequences and the motif sampler are accessible from this website. | Free | demo |
| LACK | Charles C Kim & Stanley Falkow, Stanford University |
Windows & perl source code |
calculating the statistical significance of apparent lexical bias in microarray datasets | Free |
Download;perl
source code; manual;sample
data; Reference [PubMed][pdf] |
| Machaon
Clustering and Validation Environment (CVE) |
Nadia Bolshakova |
Windows |
a data mining system, which
allows the application of different clustering and cluster validity
algorithms for DNA microarray data. |
Free? |
Program available upon request |
| Microhelper 1.02 | Chang Bioscience | Windows NT/ Mac OS X | A tool for merging, filter, normalize, transform, handle missing, select subset, remove control and annotate data | US$65 | Demo available (windows)(mac) |
| Multi
microarray normalisation |
Keith Vass and Ernst Wit |
Web |
An ANOVA based normalization of
dye-swapped experiment, taking pin-tip effect into account |
Free |
Detailed
description |
| NIA Microarray ANOVA Tool |
National Institute on Aging (NIA);
NIH |
Windows, Sun Solaris, Linux |
A web-based tool for Analysis of
Variance (ANOVA) of gene microarray data. |
Free? |
Help; download |
| Onto-Tools |
Intelligent Systems and
Bioinformatics Laboratory, Computer Science Department, Wayne State
University |
Web |
Onto-Tools is a set of four integrated databases: Onto-Express: translate differentially regulated genes into functional profiles , Onto-Compare: comparisons of any sets of commercial or custom arrays, Onto-Design: select genes that represent given functional categories and Onto-Translate: translate ists of accession numbers, UniGene clusters and Affymetrix probes into one another. | Free |
Registration required; Reference
[PubMed] |
| Open
source clustering software |
Laboratory of DNA Information
Analysis of Human Genome Center, Institute of Medical Science, University of Tokyo. |
Windows, Mac OS X, Linux, Unix |
k-means clustering, hierarchical
clustering and self-organizing maps in a single multipurpose
open-source library of C routines, callable from other C and C++
programs. |
covered by the original Cluster/TreeView license. | Reference [PubMed][pdf] |
| PAM (Prediction Analysis for Microarrays) | Tibshirani Lab, Department of Statistics, Stanford University |
Excel add-in/ R-package | Performs sample classification from gene expression data, Estimates prediction error via cross-validation, Provides a list of significant genes whose expression characterizes each diagnostic class | Free for academics user | Excel add-in coming soon; reference [PubMed][pdf] |
| Probe Profiler |
Corimbia
Inc. |
Windows |
Assess Affymetrix data quality
assessing which chips or probe sets are bad, analyze groups of chips |
Not Available |
brochure and
references |
| R cluster | National Center for Genome Resources (NCGR) GeneX analysis server | Web | Web interface to a collection of clustering routines written in the R statistical programming language | Free | help; tutorial ; A demo of permutation based clustering is available |
| SAM (Significance Analysis of Microarrays) | Tibshirani Lab, Department of Statistics, Stanford University |
Excel add-in/ Web | Correlates gene expression data to a wide variety of clinical parameters including treatment, diagnosis categories, survival time and time trends; Provides estimate of False Discovery Rate for multiple testing | Free for academics user | excel
add-in (registration needed); [PubMed][pdf]; |
| supervised Network Self-Organized Map (sNet-SOM) | Department of Medical Physics, School of Medicine,University of Patras, Greece | Source code available | The sNet-SOM determines adaptively the number of clusters with a dynamic extension process. This process is driven by an inhomogeneous measure that tries to balance unsupervised, supervised and model complexity criteria. | ? | Reference [PubMed] |
| SNOMAD: Standardization and NOrmalization of MicroArray Data | Pevsner Lab;Johns Hopkins University School of Public Health | Web | a collection of algorithms directed at the normalization and
standardization of DNA microarray data; The majority of the
transformations within NOMAD are directed at the refinement of paired microarray data. |
Free | - |
| SparseLOGREG | S. K. Shevade and S. S. Keerthi | Linux/Unix |
a new and efficient algorithm for the sparse logistic regression problem which can be applied to a variety of real-world problems like identifying marker genes and building a classifier in the context of cancer diagnosis using microarray data | Free? |
Reference [PubMed] |
| TableView |
Center
for Computational Genomics and Bioinformatics, University of Minnesota |
Java |
TableView is a generalized
scientific visualization program for exploration of various biological
data, including EST, SAGE, microarray and annotation data. |
Free |
Reference [PubMed] |
| Venn
Mapper |
Smid M, Dorssers LC, Jenster G. Department of Pathology, Josephine Nefkens Institute. The Netherlands. | Windows |
Venn Mapper is a program that
compares heterologous microarray data sets, based on the number of
common, differentially expressed genes. |
Free |
Download; Manual; Reference [PubMed] |
| VERA & SAM | Institute for Systems Biology | Windows/ Linux/ Unix | VERA - Variability and ERror Assessment (Estimates error
model parameters from replicated, preprocessed experiments.) SAM - Significance of Array Measurement (Uses error model to improve the accuracy of the expression ratio and to assign a value 'lambda' to each gene, indicating the likelihood that the gene is differentially expressed.) |
Free | download;windows documentation; UNIX documentation; source code; |