My Microarray Software Comparison - R packages for microarray analysis



Go back to the software category index page

Please contact me if you have any suggestion on this list

Definition of R packages for microarray analysis
Suggested readings
General R packages useful for microarray analysis (in alphabetical order)

[ amap | cclust | cluster | e1071 | mclust | multiv | mva ]

R packages for microarray analysis (in alphabetical order)

[ ANOVA model for time course experiment | affy | BioConductor | BUM | CGH-Miner | CTC | CyberT | Emerging Patterns | EMV | FDR controlling procedures | FEXAT | GeneClust | GeneSOM | GIN | HighProbability | impute | LogitBoost | mixture modelling | MLE  adjustment for signal censoring | PAM | permax | phyloarray | POE | OOMAL | qvalue | R/maanova | SMA | SMA extension | Spot | Statomics | VSN | YASMA ]


Definition of R packages for microarray analysis

Any R packages that are specifically written or useful  for microarray data analysis; R environment is required for running these packages. Some S-plus programs will also be listed here.

Suggested readings

  1. Dalgaard P. Introductory Statistics with R. Springer Verlag 2002
  2. Krause A, Olson M. The Basics of S and S-Plus (Statistics and Computing). Springer Verlag 2000.
  3. Venables WN, Ripley BD. Modern Applied Statistics With S-Plus (Statistics and Computing). Springer Verlag 1999.
  4. Selvin S. Modern Applied Biostatistical Methods: Using S-Plus. Oxford University Press. 1998 (I highly recommend this book because it contains: (I) a comprehensive coverage of various statistical analysis topics carried out by S-plus/R, which is a great revision on the basics  (II) step by step command-line of every analysis, you can learn the operation of the program while revising the statistical background.)

General R packages useful for microarray analysis (in alphabetical order)

 
Package Author Features Licence Remarks
amap (Another Multidimensional Analysis Package)
Antoine Lucas amap is a package for Hierarchical clustering (optimised for memory), generalised PCA and graphics for PCA.
?
download from author's site (unix/linux);manual; link at CRAN
cclust (Convex Clustering Methods and Clustering Indexes) Evgenia Dimitriadou Convex Clustering methods, including Kmeans algorithm, On-line Update algorithm (Hard Competitive Learning) and Neural Gas algorithm (Soft Competitive Learning) and calculation of several indexes for finding the number of clusters in a data set.  GNU GPL (version 2 or later) download (unix/ linux) (windows);index;manual
CGH-Miner
Stanford University
Identifies DNA copy number alterations for CGH arrays using the "Cluster Along Chromosomes (CLAC)" method
?
download (windows); manual; Reference [pdf]
cluster S original by Peter Rousseeuw, Anja Struyf , Mia Hubert. R port by Kurt Hornik and Martin Maechler. Functions for cluster analysis GNU GPL (version 2 or later) download (unix/ linux) (windows);index;manual
e1071 Evgenia Dimitriadou, Kurt Hornik, Friedrich Leisch, David Meyer, and Andreas Weingessel Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, ... GPL version 2. download (unix/ linux) (windows);index;manual
mclust C. Fraley and A.E. Raftery.
R port by Ron Wehrens 
Model-based cluster analysis  Permission granted for unlimited redistribution for non-commercial use only download (unix/ linux) (windows);index;manual; website;
multiv (Multivariate Data Analysis Routines) S original by F. Murtagh . R port by Kurt Hornik, Friedrich Leisch and Achim Zeileis Multivariate Data Analysis Routines including hierarchical clustering, PCA, Sammon mapping, correspondence analysis Free re-distribution for non-commercial purposes. download (unix/ linux) (windows);index;manual
mva
-
Classical Multivariate Analysis, contains functions for hierarchical and k-means clustering, PCA, dendrogram and heatmap drawing
GNU GPL (version 2 or later)
a basic component of R

R packages for microarray analysis (in alphabetical order)

Package Author Feature Licence Remarks
ANOVA model for time course experiment Park T et al.,
A statistical test procedure based on the ANOVA model to identify genes that have different gene expression profiles among experimental groups in time-course experiments. -
Reference [PubMed]; available upon request
affy (Methods for Affymetrix Oligonucleotide Arrays) Rafael A. Irizarry, Laurent Gautier, Biostatistics Department; Johns Hopkins University. The package contains some methods for analyses of affymetrix oligonucleotide array data. GNU GPL (version 2 or later) description [pdf]; affy is now a part of the BioConductor project
BioConductor many
an open source software project with several goals. Main goals: providing infrastructure in terms of design and software for analysing genomic data, some form of graphical user interface for selected libraries and a mechanism for linking together different groups with common goal GNU GPL (version 2 or later) current released packages; current developmental packages; contributed packagesfaq;Vignettes; Short Courses (very useful!); Research Talks; An excellent  introductory tutorial by Chis Bye; GUI for package Limma;
BUM
Pounds S, Morris SW. Department of Biostatistics, St. Jude Children's Research Hospital
Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values.

?
download (S-plus); user guide; reference [PubMed]
CTC (Cluster and Tree Conversion)
Antoine Lucas
exports R tables to Xcluster and Cluster; imports Xcluster and Cluster output to R
?
download (unix/linux), (windows);manual;
CyberT Tony Long and Harry Mangalam (UC Irvine) t-test for statistically significant differences between sample sets for arrays; Bayesian probabilistic framework to estimate the variance among replicates GNU GPL (version 2 or later) download;help
Emerging Patterns Boulesteix AL, Tutz G, Strimmer K. A CART-based approach to discover EPs in microarray data. The method is based on growing decision trees from which the EPs are extracted. This approach combines pattern search with a statistical procedure based on Fisher's exact test to assess the significance of each EP. Subsequently, sample classification based on the inferred EPs is performed using maximum-likelihood linear discriminant analysis. GNU GPL R codes; Readme; examples; reference [PubMed]
EMV
Raphael Gottardo Estimation of missing values in a matrix by a k-th nearest neighboors algorithm GPL version 2 or later download (unix/linux), (windows); manual; reference[PubMed][pdf]
FDR controlling procedures Anat Reiner,   Daniel Yekutieli and
Yoav Benjamini
adjusts p-values generated in multiple hypothesis testing of gene expression data obtained by cDNA microarray experiment.
-
download (R); (S-plus) ; reference [PubMed][doc]
FEXAT Kraft P, Schadt EE, Aten J, Horvath S
A family-based test for correlation between gene expression and trait values
free
download; reference [PubMed]
GeneClust Kim-Anh Do GeneClust is a piece of computer software which can be used as a tool for exploratory analysis of gene expression microarray data; hierarchical and gene shaving; Simulation to assess the clustering performance ?
Require Unix/Linux or Windows 2000 running S-plus!
GeneSOM Jun Yan Clustering Genes using Self-Organizing Map  GNU GPL (version 2 or later) download (unix/ linux) (windows);index;manual;
GeneTS Wichert S, Fokianos K, Strimmer K some functions useful for microarray time series analysis, in particular cell cycle analysis and inferring graphical models from microarray data. GNU GPL download (unix/linux) (windows); reference [PubMed]
GIN (Gene Index)
LeBlanc M et al.,
a gene index technique that generalizes methods that rank genes by their univariate associations to patient outcome. Genes are ordered based on simultaneously linking their expression both to patient outcome and to a specific gene of interest. -
download; Reference [PubMed];
HighProbability
David R. Bickel HighProbability estimates which genes have frequentist or Bayesian probabilities of differential expression at least as great as a specified threshold, given a list of p-values.
Mozilla Public License 1.1 (http://www.mozilla.org/MPL/)
 source; windows binary; manual;
Impute
Trevor Hastie, Robert Tibshirani, Balasubramanian Narasimhan, Gilbert Chu Imputation for microarray data (currently KNN only) GPL2.0 download (unix/linux) (windows); index; manual;
LogitBoost
Dettling, Marcel and Bühlmann, Peter
a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems for supervised classification
Free
download (unix/linux) (windows); manual [ps][pdf] Reference [PubMed][pdf][ps]
mixture modelling Debashis Ghosh Mixture modelling of gene expression data from microarray experiments ? download; paper (pdf), (ps); require mva and mclust.
MLE  adjustment for signal censoring
Ernst Wit The function calculates the maximum likelihood estimate of the parameters for a Gamma(alpha, beta) pixel intensity model, when only the mean, median variance and number of pixels are given.
Free?
Reference [PubMed]
PAM (Prediction Analysis for Microarrays) Tibshirani Lab
Department of Statistics,
Stanford University
Performs sample classification from gene expression data, Estimates prediction error via cross-validation, Provides a list of significant genes whose expression  characterizes each diagnostic class  GPL2.0 download (unix/linux) (windows);manual; paper (pdf); documentation on nearest shrunken centroid classification; sample plots; reference[pdf
permax Robert J. Gray The permax library consists of 7 functions, intended to facilitate certain basic analyses of DNA array data, especially with regard to comparing expression levels between two types of tissue.  GNU GPL 2 download (unix/linux) (windows);index;manual;
phyloarray
Kurt Sys Software to process data from phylogenetic or identification microarrays. At present state, it is rather limited and focuss was on a fast and easy way for calculating background values by interpolation and plotting melting curves. The functions for reading the data are similar to those used in package 'sma' (statistical microarray analysis). GNU GPL 2 download (unix/linux) (windows); index; manual;
POE (Probability of Expression)
Elizabeth Garrett, Jiang Hu, Giovanni Parmigiani, Rob Scharpf
statistical approaches to molecular classification that emphasize simple molecular profiles based on latent categories signifying under-, over-, and baseline-expression.
GNU GPL 2
download (linux); Reference [PubMed][pdf]
OOMAL (Object-Oriented Microarray Analysis Library) (* require S-PLUS!) MD Anderson Cancer Center, The University of Texas Object-oriented library for analyzing microarray data in S-PLUS, flexible tools for loading raw quantification data from a variety of microarray formats, normalization, identified differentially expressed genes, classification and discrimination between samples. ? download source code; documentation;
qvalue
John D. Storey
for calculating q-values in multiple testing situations
?
download source code (please send the author an email with "qvalue download" in the subject line); manual;
R/maanova Gary Churchill's Statistical Genetics Group, The Jackson Laboratory R/maanova is an extensible, interactive environment for the analysis of variance on microarray data.  free for academic registration before download; reference 1[pdf]; reference 2[pdf]
SMA (Statisics for Microarray Analysis) Sandrine Dudoit,Yee Hwa (Jean) Yang, Benjamin Milo BOLSTAD (UC Berkeley) The package contains some simple functions for exploratory microarray analysis, M-A plots, lowess curve fitting, handles replicate array data by Bayesian methods GNU GPL (version 2 or later) download (unix/ linux) (windows);help;index;manual; paper 1,2,3
SMA extension (com.braju.sma) Henrik Bengtsson extensions of SMA ? download (unix/ linux) (windows); documentation;presentation; requires  SMA library and R.classesinstalled
Spot CSIRO Mathematical and Information Sciences Spot is a software package for the analysis of microarray images; Automatic grid location; Flexible spot segmentation; Morphological background estimation.  Commerical package; price depends on number of users User guide; installation instruction; Demo version available upon registration
Statomics David Bickel Statomics is a software suite for the statistical analysis of genomic and proteomic data.
?
 source code; Reference [PubMed][pdf]
VSN Wolfgang Huber;
Molecular Genome Analysis National Cancer Research Institute of Germany
Variance stabilization applied to microarray data calibration and to the quantification of differential expression Free for academic use Reference [PubMed][pdf]
YASMA (Yet Another Statistical Microarray Analysis) Lorenz Wernisch and others correlation between array replicates, ANOVA analysis, p- values for ANOVA analysis, standard t-tests ? download(unix/linux);tutorial;related statistical notes; reference [PubMed][pdf]


last updated: 16 Jul 2004
home