Title: | CCMN and Other Normalization Methods for Metabolomics Data |
---|---|
Description: | Implements the Cross-contribution Compensating Multiple standard Normalization (CCMN) method described in Redestig et al. (2009) Analytical Chemistry https://doi.org/10.1021/ac901143w and other normalization algorithms. |
Authors: | Henning Redestig |
Maintainer: | Henning Redestig <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.21 |
Built: | 2024-10-30 02:59:32 UTC |
Source: | https://github.com/hredestig/crmn |
Subset an data set to only contain the analytes.
analytes(object, standards=NULL, ...)
analytes(object, standards=NULL, ...)
object |
an |
standards |
a logical vector indicating which rows are internal analytes |
... |
not used |
subsetted dataset
Henning Redestig
data(mix) analytes(mix) analytes(exprs(mix), fData(mix)$tag == 'IS')
data(mix) analytes(mix) analytes(exprs(mix), fData(mix)$tag == 'IS')
Subset an expression set to remove the internal standards
analytes_eset(object, where = "tag", what = "IS", ...)
analytes_eset(object, where = "tag", what = "IS", ...)
object |
an |
where |
Column index or name of fData which equals
|
what |
What the column |
... |
not used |
ExpressionSet
Henning Redestig
data(mix) analytes(mix) fData(mix)$test <- fData(mix)$tag analytes(mix, where="test")
data(mix) analytes(mix) fData(mix)$test <- fData(mix)$tag analytes(mix, where="test")
Subset an expression set to remove the internal standards
analytes_other(object, standards, ...)
analytes_other(object, standards, ...)
object |
an |
standards |
a logical vector indicating which rows are internal standards |
... |
not used |
ExpressionSet
Henning Redestig
data(mix) analytes(exprs(mix), fData(mix)$tag == 'IS')
data(mix) analytes(exprs(mix), fData(mix)$tag == 'IS')
Normalize metabolomics data using CCMN and other methods
Package: | crmn |
Type: | Package |
Developed since: | 2009-05-14 |
Depends: | Biobase, pcaMethods (>= 1.20.2), pls, methods |
License: | GPL (>=3) |
LazyLoad: | yes |
A package implementing the 'Cross-contribution compensating
multiple standard normalization' described in Redestig et al. (2009)
Analytical Chemistry, https://doi.org/10.1021/ac901143w. Can be used to
normalize metabolomics data. Do openVignette("crmn")
to see
the manual.
Henning Redestig
Drop unused factor levels in a data frame.
dropunusedlevels(x)
dropunusedlevels(x)
x |
the data frame |
Henning Redestig
iris[1:10,]$Species dropunusedlevels(iris[1:10,])$Species
iris[1:10,]$Species dropunusedlevels(iris[1:10,])$Species
Construct a design matrix
makeX(object, factors, ...) ## S4 method for signature 'ANY,matrix' makeX(object, factors, ...) ## S4 method for signature 'ExpressionSet,character' makeX(object, factors, ...)
makeX(object, factors, ...) ## S4 method for signature 'ANY,matrix' makeX(object, factors, ...) ## S4 method for signature 'ExpressionSet,character' makeX(object, factors, ...)
object |
an |
factors |
column names from the pheno data of |
... |
not used |
Make a design matrix from the pheno data slot of an expression
set, taking care that factors and numerical are handled
properly. No interactions are included and formula is the most
simple possible, i.e. y~-1+term1+term2+...
. Can also be given
anything as object in which case factor
must be a design matrix.
It that case the same design matrix is returned.
a design matrix
Henning Redestig
data(mix) makeX(mix, "runorder") runorder <- mix$runorder makeX(mix, model.matrix(~-1+runorder))
data(mix) makeX(mix, "runorder") runorder <- mix$runorder makeX(mix, model.matrix(~-1+runorder))
Get the method
method(object, ...) method(object, ...)
method(object, ...) method(object, ...)
object |
an |
... |
not used |
the method (content differs between normlization methods)
Henning Redestig
Get the expression data from an ExpressionSet
or
just return the given matrix
mexprs(object) mexprs(object) ## S4 method for signature 'ExpressionSet' mexprs(object)
mexprs(object) mexprs(object) ## S4 method for signature 'ExpressionSet' mexprs(object)
object |
an |
the expression data
Henning Redestig
data(mix) head(mexprs(mix)) head(mexprs(exprs(mix)))
data(mix) head(mexprs(mix)) head(mexprs(exprs(mix)))
Matrix safe setter of expression slot
mexprs(object) <- value ## S4 replacement method for signature 'ExpressionSet,matrix' mexprs(object) <- value mexprs(object) <- value
mexprs(object) <- value ## S4 replacement method for signature 'ExpressionSet,matrix' mexprs(object) <- value mexprs(object) <- value
object |
an |
value |
the value to assign |
Set the expression data in an ExpressionSet
or
just return the given matrix
the expression data
Henning Redestig
data(mix) test <- mix mexprs(test) <- exprs(mix) * 0 head(mexprs(test)) test <- exprs(mix) mexprs(test) <- test * 0 head(mexprs(test))
data(mix) test <- mix mexprs(test) <- exprs(mix) * 0 head(mexprs(test)) test <- exprs(mix) mexprs(test) <- test * 0 head(mexprs(test))
Mixture dilution series
data(mix)
data(mix)
Multi-component dilution series. GC-TOF/MS measurements by Miyako Kusano. Input concentrations are known and given in the original publication.
Henning Redestig
data(mix) fData(mix) exprs(mix) pData(mix)
data(mix) fData(mix) exprs(mix) pData(mix)
Get the model
model(object, ...) model(object, ...)
model(object, ...) model(object, ...)
object |
an |
... |
not used |
the model (content differs between normlization models)
Henning Redestig
Common class representation for normalization models.
Henning Redestig
Normalization methods for metabolomics data
normalize(object, method, segments = NULL, ...)
normalize(object, method, segments = NULL, ...)
object |
an |
method |
the desired method |
segments |
normalization in a cross-validation setup, only to use for validation/QC purposes. |
... |
passed on to |
Wrapper function for normFit
and normPred
the normalized dataset
Henning Redestig
normFit
, normPred
data(mix) normalize(mix, "crmn", factor="type", ncomp=3) #other methods normalize(mix, "one") normalize(mix, "avg") normalize(mix, "nomis") normalize(mix, "t1") normalize(mix, "ri") normalize(mix, "median") normalize(mix, "totL2") ## can also do normalization with matrices Y <- exprs(mix) G <- with(pData(mix), model.matrix(~-1+type)) isIS <- with(fData(mix), tag == "IS") normalize(Y, "crmn", factor=G, ncomp=3, standards=isIS)
data(mix) normalize(mix, "crmn", factor="type", ncomp=3) #other methods normalize(mix, "one") normalize(mix, "avg") normalize(mix, "nomis") normalize(mix, "t1") normalize(mix, "ri") normalize(mix, "median") normalize(mix, "totL2") ## can also do normalization with matrices Y <- exprs(mix) G <- with(pData(mix), model.matrix(~-1+type)) isIS <- with(fData(mix), tag == "IS") normalize(Y, "crmn", factor=G, ncomp=3, standards=isIS)
Fit the parameters for normalization of a metabolomics data set.
normFit( object, method, one = "Succinate_d4", factors = NULL, lg = TRUE, fitfunc = lm, formula = TRUE, ... )
normFit( object, method, one = "Succinate_d4", factors = NULL, lg = TRUE, fitfunc = lm, formula = TRUE, ... )
object |
an |
method |
chosen normalization method |
one |
single internal standard to use for normalization |
factors |
column names in the pheno data slot describing the biological factors. Or a design matrix directly. |
lg |
logical indicating that the data should be log transformed |
fitfunc |
the function that creates the model fit for
normalization, must use the same interfaces as |
formula |
if fitfunc has formula interface or not |
... |
passed on to |
Normalization is first done by fitting a model and then applying
that model either to new data or the same data using
normPred
. Five different methods are implemented.
divide by row-means of the scaled internal standards
divide by value of a single, user defined, internal standard
divide by the square of sums of the full dataset
See Sysi-Aho et al.
See Redestig et al.
a normalization model
Henning Redestig
Sysi-Aho, M.; Katajamaa, M.; Yetukuri, L. & Oresic, M. Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics, 2007, 8, 93
Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K. & Kusano, M. Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data Anal Chem, 2009, 81, 7974-7980
normPred
, standards
, model.matrix
data(mix) nfit <- normFit(mix, "crmn", factors="type", ncomp=3) slplot(sFit(nfit)$fit$pc, scol=as.integer(mix$runorder)) ## same thing Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' nfit <- normFit(Y, "crmn", factors=G, ncomp=3, standards=isIS) slplot(sFit(nfit)$fit$pc, scol=as.integer(mix$runorder))
data(mix) nfit <- normFit(mix, "crmn", factors="type", ncomp=3) slplot(sFit(nfit)$fit$pc, scol=as.integer(mix$runorder)) ## same thing Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' nfit <- normFit(Y, "crmn", factors=G, ncomp=3, standards=isIS) slplot(sFit(nfit)$fit$pc, scol=as.integer(mix$runorder))
Predict the normalized data using a previously fitted normalization model.
normPred(normObj, newdata, factors = NULL, lg = TRUE, predfunc = predict, ...)
normPred(normObj, newdata, factors = NULL, lg = TRUE, predfunc = predict, ...)
normObj |
the result from |
newdata |
an |
factors |
column names in the pheno data slot describing the biological factors. Or a design matrix. |
lg |
logical indicating that the data should be log transformed |
predfunc |
the function to use to get predicted values from the fitted object (only for crmn) |
... |
passed on to |
Apply fitted normalization parameters to new data to get normalized data. Current can not only handle matrices as input for methods 'RI' and 'one'.
the normalized data
Henning Redestig
normFit
data(mix) nfit <- normFit(mix, "crmn", factor="type", ncomp=3) normedData <- normPred(nfit, mix, "type") slplot(pca(t(log2(exprs(normedData)))), scol=as.integer(mix$type)) ## same thing Y <- exprs(mix) G <- with(pData(mix), model.matrix(~-1+type)) isIS <- fData(mix)$tag == 'IS' nfit <- normFit(Y, "crmn", factors=G, ncomp=3, standards=isIS) normedData <- normPred(nfit, Y, G, standards=isIS) slplot(pca(t(log2(normedData))), scol=as.integer(mix$type))
data(mix) nfit <- normFit(mix, "crmn", factor="type", ncomp=3) normedData <- normPred(nfit, mix, "type") slplot(pca(t(log2(exprs(normedData)))), scol=as.integer(mix$type)) ## same thing Y <- exprs(mix) G <- with(pData(mix), model.matrix(~-1+type)) isIS <- fData(mix)$tag == 'IS' nfit <- normFit(Y, "crmn", factors=G, ncomp=3, standards=isIS) normedData <- normPred(nfit, Y, G, standards=isIS) slplot(pca(t(log2(normedData))), scol=as.integer(mix$type))
PCA and Q2 issues warnings about biasedness and poorly estimated PCs. The first is non-informative and the poorly estimated PCs will show up as poor overfitting which leads to a choice of fewer PCs i.e. not a problem. This function is mean to muffle those warnings. Only used for version of pcaMethods before 1.26.0.
pcaMuffle(w)
pcaMuffle(w)
w |
a warning |
nothing
Henning Redestig
Simple plot function for a CRMN normalization model.
## S3 method for class 'nFit' plot(x, y = NULL, ...)
## S3 method for class 'nFit' plot(x, y = NULL, ...)
x |
an |
y |
not used |
... |
passed on to the scatter plot calls |
Shows Tz and the optimization (if computed) of the PCA model. The number of components used for normalization should not exceed the maximum indicated by Q2. The structure shown in the Tz plot indicate the analytical variance which is exactly independent of the experimental design. The corresponding loading plot shows how this structure is capture by the used ISs.
nothing
Henning Redestig
slplot
data(mix) nfit <- normFit(mix, "crmn", factors="type", ncomp=2) plot(nfit)
data(mix) nfit <- normFit(mix, "crmn", factors="type", ncomp=2) plot(nfit)
Get the sFit
sFit(object, ...) sFit(object, ...)
sFit(object, ...) sFit(object, ...)
object |
an |
... |
not used |
the sFit is only defined for CRMN
Henning Redestig
Show some basic information for an nFit model
## S4 method for signature 'nFit' show(object)
## S4 method for signature 'nFit' show(object)
object |
the |
prints some basic information
Henning Redestig
data(mix) normFit(mix, "avg")
data(mix) normFit(mix, "avg")
Show method for nFit
show_nfit(object)
show_nfit(object)
object |
the |
prints some basic information
Henning Redestig
Subset an data set to only contain the labeled internal standards.
standards(object, standards=NULL, ...)
standards(object, standards=NULL, ...)
object |
an |
standards |
a logical vector indicating which rows are internal standards |
... |
not used |
subsetted dataset
Henning Redestig
data(mix) standards(mix) standards(exprs(mix), fData(mix)$tag == 'IS')
data(mix) standards(mix) standards(exprs(mix), fData(mix)$tag == 'IS')
Subset an data set to only contain the labeled internal standards.
standards_eset(object, where = "tag", what = "IS", ...)
standards_eset(object, where = "tag", what = "IS", ...)
object |
an |
where |
Column index or name in fData which equals
|
what |
What the column |
... |
not used |
subsetted dataset
Henning Redestig
data(mix) standards(mix) fData(mix)$test <- fData(mix)$tag standards(mix, where="test")
data(mix) standards(mix) fData(mix)$test <- fData(mix)$tag standards(mix, where="test")
Subset an data set to only contain the labeled internal standards.
standards_other(object, standards, ...)
standards_other(object, standards, ...)
object |
an |
standards |
a logical vector indicating which rows are internal standards |
... |
not used |
subsetted dataset
Henning Redestig
data(mix) standards(exprs(mix), fData(mix)$tag == 'IS')
data(mix) standards(exprs(mix), fData(mix)$tag == 'IS')
Fit a model which describes the variation of the labeled internal standards from the biological factors.
standardsFit(object, factors, ncomp = NULL, lg = TRUE, fitfunc = lm, ...)
standardsFit(object, factors, ncomp = NULL, lg = TRUE, fitfunc = lm, ...)
object |
an |
factors |
the biological factors described in the pheno data
slot if |
ncomp |
number of PCA components to use. Determined by
cross-validation if left |
lg |
logical indicating that the data should be log transformed |
fitfunc |
the function that creates the model fit for
normalization, must use the same interfaces as |
... |
passed on to |
There is often unwanted variation in among the labeled internal standards which is related to the experimental factors due to overlapping peaks etc. This function fits a model that describes that overlapping variation using a scaled and centered PCA / multiple linear regression model. Scaling is done outside the PCA model.
a list containing the PCA/MLR model, the recommended number of components for that model, the standard deviations and mean values and Q2/R2 for the fit.
Henning Redestig
makeX
, standardsPred
data(mix) sfit <- standardsFit(mix, "type", ncomp=3) slplot(sfit$fit$pc) ## same thing Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' sfit <- standardsFit(Y, G, standards=isIS, ncomp=3)
data(mix) sfit <- standardsFit(mix, "type", ncomp=3) slplot(sfit$fit$pc) ## same thing Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' sfit <- standardsFit(Y, G, standards=isIS, ncomp=3)
Predicted values for the standards
standardsPred(model, newdata, factors, lg = TRUE, ...)
standardsPred(model, newdata, factors, lg = TRUE, ...)
model |
result from |
newdata |
an |
factors |
the biological factors described in the pheno data
slot if |
lg |
logical indicating that the data should be log transformed |
... |
passed on to |
There is often unwanted variation in among the labeled internal
standards which is related to the experimental factors due to
overlapping peaks etc. This predicts this effect given a model of
the overlapping variance. The prediction is given by
the corrected data
Henning Redestig
makeX
, standardsFit
data(mix) fullFit <- standardsFit(mix, "type", ncomp=3) sfit <- standardsFit(mix[,-1], "type", ncomp=3) pred <- standardsPred(sfit, mix[,1], "type") cor(scores(sfit$fit$pc)[1,], scores(fullFit$fit$pc)[1,]) ## could just as well have been done as Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' fullFit <- standardsFit(Y, G, ncomp=3, standards=isIS) sfit <- standardsFit(Y[,-1], G[-1,], ncomp=3, standards=isIS) pred <- standardsPred(sfit, Y[,1,drop=FALSE], G[1,,drop=FALSE], standards=isIS) cor(scores(sfit$fit$pc)[1,], scores(fullFit$fit$pc)[1,])
data(mix) fullFit <- standardsFit(mix, "type", ncomp=3) sfit <- standardsFit(mix[,-1], "type", ncomp=3) pred <- standardsPred(sfit, mix[,1], "type") cor(scores(sfit$fit$pc)[1,], scores(fullFit$fit$pc)[1,]) ## could just as well have been done as Y <- exprs(mix) G <- model.matrix(~-1+mix$type) isIS <- fData(mix)$tag == 'IS' fullFit <- standardsFit(Y, G, ncomp=3, standards=isIS) sfit <- standardsFit(Y[,-1], G[-1,], ncomp=3, standards=isIS) pred <- standardsPred(sfit, Y[,1,drop=FALSE], G[1,,drop=FALSE], standards=isIS) cor(scores(sfit$fit$pc)[1,], scores(fullFit$fit$pc)[1,])
Normalize samples by their weight (as in grams fresh weight)
weightnorm(object, weight = "weight", lg = FALSE)
weightnorm(object, weight = "weight", lg = FALSE)
object |
an |
weight |
a string naming the pheno data column with the weight or a numeric vector with one weight value per sample. |
lg |
is the assay data already on the log-scale or not. If lg, the weight value is also log-transformed and subtraction is used instead of division. |
Normalize each sample by dividing by the loaded sample weight. The weight argument is takes from the pheno data (or given as numerical vector with one value per sample). Missing values are not tolerated.
the normalized expression set
Henning Redestig
data(mix) w <- runif(ncol(mix),1, 1.3) weightnorm(mix, w)
data(mix) w <- runif(ncol(mix),1, 1.3) weightnorm(mix, w)