Last updated: 2021-11-23
Checks: 7 0
Knit directory: paed-BAL-meth-ref/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210927)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5187339. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: code/deident.R
Ignored: code/geoprep.R
Ignored: data/idat/
Ignored: data/processedData.RData
Ignored: output/intensities.csv
Ignored: output/processed.csv
Unstaged changes:
Modified: .gitignore
Modified: data/Samplesheet_BAL_reference.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/dataPreprocess.Rmd
) and HTML (docs/dataPreprocess.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 5187339 | Jovana Maksimovic | 2021-11-23 | wflow_publish(c(“analysis/dataPreprocess.Rmd”, “analysis/estimateCellProportions.Rmd”)) |
html | 5952136 | Jovana Maksimovic | 2021-11-23 | Build site. |
Rmd | 4675dde | Jovana Maksimovic | 2021-11-23 | wflow_publish(c(“analysis/dataPreprocess.Rmd”, “analysis/estimateCellProportions.Rmd”)) |
html | a90dc68 | Jovana Maksimovic | 2021-09-28 | Build site. |
Rmd | e968190 | Jovana Maksimovic | 2021-09-28 | wflow_publish(c(“analysis/dataPreprocess.Rmd”, “analysis/estimateCellProportions.Rmd”)) |
html | 10147c0 | Jovana Maksimovic | 2021-09-27 | Build site. |
Rmd | 1114da3 | Jovana Maksimovic | 2021-09-27 | wflow_publish(c(“analysis/index.Rmd”, “analysis/dataPreprocess.Rmd”, |
Load all the packages required for analysis.
library(here)
library(workflowr)
library(limma)
library(minfi)
library(missMethyl)
library(matrixStats)
library(minfiData)
library(stringr)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(IlluminaHumanMethylationEPICmanifest)
library(tidyverse)
library(ggplot2)
library(patchwork)
library(glue)
Load the EPIC array annotation data that describes the genomic context of each of the probes on the array.
# Get the EPICarray annotation data
annEPIC <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
annEPIC %>% data.frame %>%
dplyr::select("chr",
"pos",
"strand",
"UCSC_RefGene_Name",
"UCSC_RefGene_Group") %>%
head() %>% knitr::kable()
chr | pos | strand | UCSC_RefGene_Name | UCSC_RefGene_Group | |
---|---|---|---|---|---|
cg18478105 | chr20 | 61847650 | - | YTHDF1 | TSS200 |
cg09835024 | chrX | 24072640 | - | EIF2S3 | TSS1500 |
cg14361672 | chr9 | 131463936 | + | PKN3 | TSS1500 |
cg01763666 | chr17 | 80159506 | + | CCDC57 | Body |
cg12950382 | chr14 | 105176736 | + | INF2;INF2 | Body;Body |
cg02115394 | chr13 | 115000168 | + | CDC16;CDC16 | TSS200;TSS200 |
Read the sample information and IDAT file paths into R.
# absolute path to the directory where the data is (relative to the Rstudio project)
dataDirectory <- here("data")
# read in the sample sheet for the experiment
read.metharray.sheet(dataDirectory,
pattern = "Samplesheet_BAL_reference.csv") %>%
mutate(Basename = gsub("c(\"","", Basename, fixed=TRUE)) %>%
mutate(Sample_ID = paste(Slide, Array, sep = "_")) %>%
mutate_if(is.character, stringr::str_replace_all, pattern = "Granuloycte",
replacement = "Granulocyte") %>%
mutate_if(is.character, stringr::str_replace_all, pattern = "Epithelialcell",
replacement = "EpithelialCell") %>%
mutate_if(is.character, stringr::str_replace_all, pattern = "Old",
replacement = "Erasmus MC") %>%
mutate_if(is.character, stringr::str_replace_all, pattern = "New",
replacement = "GenomeScan") %>%
dplyr::select(Sample_ID, Sample_Group, Sample_source, Sample_run,
Basename, Num_individuals) -> targets
[1] "/oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-BAL-meth-ref/data/Samplesheet_BAL_reference.csv"
targets %>% dplyr::select(-Basename) %>% knitr::kable()
Sample_ID | Sample_Group | Sample_source | Sample_run | Num_individuals |
---|---|---|---|---|
202905570075_R02C01 | EpithelialCell | A1 | Erasmus MC | 12 |
202900540047_R01C01 | Macrophage | M1 | Erasmus MC | 4 |
202900540047_R02C01 | Macrophage | M2 | Erasmus MC | 4 |
202900540047_R03C01 | Macrophage | M3 | Erasmus MC | 4 |
202900540047_R04C01 | Granulocyte | G1 | Erasmus MC | 4 |
202900540047_R05C01 | Granulocyte | G2 | Erasmus MC | 4 |
202900540047_R06C01 | Granulocyte | G3 | Erasmus MC | 4 |
202900540047_R07C01 | Lymphocyte | L1 | Erasmus MC | 6 |
202900540047_R08C01 | Lymphocyte | L2 | Erasmus MC | 6 |
204071680134_R04C01 | Macrophage | M1 | GenomeScan | 2 |
204071680134_R05C01 | Macrophage | M2 | GenomeScan | 3 |
204071680134_R06C01 | Granulocyte | G1 | GenomeScan | 5 |
204071680134_R07C01 | Lymphocyte | L1 | GenomeScan | 5 |
204071680134_R08C01 | EpithelialCell | AEC1 | GenomeScan | 5 |
204074230020_R01C01 | Raw | CF4 | GenomeScan | 1 |
204074230020_R02C01 | Raw | CF1 | GenomeScan | 1 |
204074230020_R03C01 | Raw | CF3 | GenomeScan | 1 |
204074230020_R04C01 | Raw | CF2 | GenomeScan | 1 |
204074230020_R05C01 | Raw | CF5 | GenomeScan | 1 |
204074230020_R06C01 | Raw | CTRL | GenomeScan | 1 |
Read in the raw methylation data.
# read in the raw data from the IDAT files
rgSet <- read.metharray.exp(targets = targets)
rgSet
class: RGChannelSet
dim: 1051815 20
metadata(0):
assays(2): Green Red
rownames(1051815): 1600101 1600111 ... 99810990 99810992
rowData names(0):
colnames(20): 202905570075_R02C01 202900540047_R01C01 ...
204074230020_R05C01 204074230020_R06C01
colData names(7): Sample_ID Sample_Group ... Num_individuals filenames
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Calculate the detection P-values for each probe so that we can check for any failed samples.
# calculate the detection p-values
detP <- detectionP(rgSet)
head(detP[, 1:5]) %>% knitr::kable()
202905570075_R02C01 | 202900540047_R01C01 | 202900540047_R02C01 | 202900540047_R03C01 | 202900540047_R04C01 | |
---|---|---|---|---|---|
cg18478105 | 0 | 0 | 0 | 0 | 0 |
cg09835024 | 0 | 0 | 0 | 0 | 0 |
cg14361672 | 0 | 0 | 0 | 0 | 0 |
cg01763666 | 0 | 0 | 0 | 0 | 0 |
cg12950382 | 0 | 0 | 0 | 0 | 0 |
cg02115394 | 0 | 0 | 0 | 0 | 0 |
We can see the mean detection p-values across all the samples for the two runs.
dat <- reshape2::melt(colMeans(detP))
dat$type <- targets$Sample_Group
dat$sample <- targets$Sample_source
dat$run <- targets$Sample_run
pal <- scales::hue_pal()(length(unique(targets$Sample_Group)))
names(pal) <- c("Granulocyte","Raw","Lymphocyte",
"Macrophage","EpithelialCell")
# examine mean detection p-values across all samples to identify failed samples
p <- ggplot(dat, aes(x = sample, y = value, fill = type)) +
geom_bar(stat = "identity") +
facet_wrap(vars(run), ncol = 2, scales = "free_x") +
labs(x = "Sample", y = "Detection p-value", fill = "Sample type") +
theme(axis.text.y = element_text(size = 7),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_fill_manual(values = pal)
p
Version | Author | Date |
---|---|---|
10147c0 | Jovana Maksimovic | 2021-09-27 |
Filter out samples with mean detection p-values > 0.01 to avoid filtering out too many probes downstream.
keep <- colMeans(detP) < 0.01
table(keep)
keep
TRUE
20
rgSet <- rgSet[, keep]
targets <- targets[keep, ]
detP <- detP[, keep]
Normalise the data.
# normalize the data; this results in a GenomicRatioSet object
mSetSq <- preprocessQuantile(rgSet)
[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.
mSetSq
class: GenomicRatioSet
dim: 865859 20
metadata(0):
assays(2): M CN
rownames(865859): cg14817997 cg26928153 ... cg07587934 cg16855331
rowData names(0):
colnames(20): 202905570075_R02C01 202900540047_R01C01 ...
204074230020_R05C01 204074230020_R06C01
colData names(10): Sample_ID Sample_Group ... yMed predictedSex
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.36.0
Manifest version: 0.3.0
# create a MethylSet object from the raw data for plotting
mSetRaw <- preprocessRaw(rgSet)
bSq <- getBeta(mSetSq)
bRaw <- getBeta(mSetRaw)
sq <- reshape2::melt(bSq)
sq$type <- targets$Sample_Group
sq$process <- "Normalised"
raw <- reshape2::melt(bRaw)
raw$type <- targets$Sample_Group
raw$process <- "Raw"
dat <- bind_rows(sq, raw)
colnames(dat)[1:3] <- c("cpg", "ID", "beta")
ggplot(data = dat,
aes(x = beta, colour = type)) +
geom_density(show.legend = NA) +
labs(colour = "Sample type") +
facet_wrap(vars(process), ncol = 2) +
scale_color_manual(values = pal)
Warning: Removed 270 rows containing non-finite values (stat_density).
Version | Author | Date |
---|---|---|
10147c0 | Jovana Maksimovic | 2021-09-27 |
Explore the data to look for any structure. As expected, sex is the most significant source of variation. The patient samples, in particular, are clearly grouping by sex.
mDat <- getM(mSetSq)
mds <- plotMDS(mDat, top = 1000, gene.selection="common", plot = FALSE)
dat <- tibble(x = mds$x,
y = mds$y,
sample = targets$Sample_Group,
run = targets$Sample_run,
source = targets$Sample_source,
sex = targets$Sex)
p1 <- ggplot(dat, aes(x = x, y = y, colour = sample)) +
geom_point(aes(shape = run), size = 3) +
labs(colour = "Sample type", shape = "Run",
x = "Principal component 1",
y = "Principal component 2") +
ggtitle("All data") +
scale_color_manual(values = pal)
p1
Version | Author | Date |
---|---|---|
10147c0 | Jovana Maksimovic | 2021-09-27 |
Colour ONLY the sorted cells samples using different variables. Encouragingly, the cell types cluster tightly when samples from both runs are combined, indicating that differences between cell types are more significant than any bath effects between the two runs. The cell types also cluster as expected for each of the individual runs.
cells <- !(targets$Sample_Group %in% c("Raw"))
mds <- plotMDS(mDat[, cells], top = 1000, gene.selection="common", plot = FALSE)
dat <- tibble(x = mds$x,
y = mds$y,
sample = targets$Sample_Group[cells],
run = targets$Sample_run[cells],
source = targets$Sample_source[cells])
p1 <- ggplot(dat, aes(x = x, y = y, colour = sample)) +
geom_point(aes(shape = run), size = 3) +
labs(colour = "Sample type", shape = "Run",
x = "Principal component 1",
y = "Principal component 2") +
ggtitle("Cell types") +
scale_color_manual(values = pal[-2])
p1
Filter out poor performing probes, sex chromosome probes, SNP probes and cross reactive probes.
# ensure probes are in the same order in the mSetSq and detP objects
detP <- detP[match(featureNames(mSetSq), rownames(detP)),]
# remove any probes that have failed in one or more samples
keep <- rowSums(detP < 0.01) == ncol(mSetSq)
# subset data
mSetSqFlt <- mSetSq[keep,]
mSetSqFlt
class: GenomicRatioSet
dim: 862448 20
metadata(0):
assays(2): M CN
rownames(862448): cg14817997 cg26928153 ... cg07587934 cg16855331
rowData names(0):
colnames(20): 202905570075_R02C01 202900540047_R01C01 ...
204074230020_R05C01 204074230020_R06C01
colData names(10): Sample_ID Sample_Group ... yMed predictedSex
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.36.0
Manifest version: 0.3.0
Calculate M and beta values for downstream use in analysis and visulalisation.
# calculate M-values and beta values for downstream analysis and visualisation
mVals <- getM(mSetSqFlt)
head(mVals[,1:5]) %>% knitr::kable()
202905570075_R02C01 | 202900540047_R01C01 | 202900540047_R02C01 | 202900540047_R03C01 | 202900540047_R04C01 | |
---|---|---|---|---|---|
cg14817997 | 2.4883113 | 2.225993 | 1.7929596 | 1.9652944 | 2.0662500 |
cg26928153 | 2.6937237 | 2.279881 | 1.9807748 | 2.3369908 | 2.3210052 |
cg16269199 | 1.7402849 | 1.016383 | 1.0947408 | 1.0124683 | 1.1203640 |
cg13869341 | 2.8729742 | 2.898040 | 2.5100382 | 2.5347448 | 2.9068319 |
cg14008030 | 1.5839008 | 1.271259 | 0.9548359 | 1.0340442 | 0.1394236 |
cg12045430 | -0.7965316 | -0.760002 | -0.8327795 | -0.9423224 | -0.8896792 |
bVals <- getBeta(mSetSqFlt)
head(bVals[,1:5]) %>% knitr::kable()
202905570075_R02C01 | 202900540047_R01C01 | 202900540047_R02C01 | 202900540047_R03C01 | 202900540047_R04C01 | |
---|---|---|---|---|---|
cg14817997 | 0.8487417 | 0.8238919 | 0.7760484 | 0.7961232 | 0.8072463 |
cg26928153 | 0.8661278 | 0.8292460 | 0.7978593 | 0.8347784 | 0.8332445 |
cg16269199 | 0.7696389 | 0.6691854 | 0.6810967 | 0.6685844 | 0.6849419 |
cg13869341 | 0.8798905 | 0.8817146 | 0.8506650 | 0.8528274 | 0.8823487 |
cg14008030 | 0.7498620 | 0.7070645 | 0.6596740 | 0.6718898 | 0.5241415 |
cg12045430 | 0.3653742 | 0.3712651 | 0.3595682 | 0.3422760 | 0.3505372 |
Remove SNP probes and multi-mapping probes.
mVals <- DMRcate::rmSNPandCH(mVals, mafcut = 0, rmcrosshyb = TRUE, rmXY = FALSE)
bVals <- DMRcate::rmSNPandCH(bVals, mafcut = 0, rmcrosshyb = TRUE, rmXY = FALSE)
dim(mVals)
[1] 749821 20
Remove sex chromosome probes.
mValsNoXY <- DMRcate::rmSNPandCH(mVals, mafcut = 0, rmcrosshyb = TRUE,
rmXY = TRUE)
bValsNoXY <- DMRcate::rmSNPandCH(bVals, mafcut = 0, rmcrosshyb = TRUE,
rmXY = TRUE)
dim(mValsNoXY)
[1] 732778 20
After filtering, sorted cell types still cluster tightly together. The mixed samples are centered between the sorted cell types, which is indicative of the mixtures.
mDat <- mValsNoXY
mds <- plotMDS(mDat, top = 1000, gene.selection="common", plot = FALSE)
dat <- tibble(x = mds$x,
y = mds$y,
sample = targets$Sample_Group,
run = targets$Sample_run,
source = targets$Sample_source,
sex = targets$Sex)
p1 <- ggplot(dat, aes(x = x, y = y, colour = sample)) +
geom_point(aes(shape = run), size = 3) +
labs(colour = "Sample type", shape = "Run",
x = "Principal component 1",
y = "Principal component 2") +
ggtitle("All data") +
scale_color_manual(values = pal)
p1
Version | Author | Date |
---|---|---|
10147c0 | Jovana Maksimovic | 2021-09-27 |
After filtering, the sorted cell types cluster tightly together and there is no evidence of a significant batch effect.
cells <- !(targets$Sample_Group %in% c("Raw"))
mds <- plotMDS(mDat[, cells], top = 1000, gene.selection="common", plot = FALSE)
dat <- tibble(x = mds$x,
y = mds$y,
sample = targets$Sample_Group[cells],
run = targets$Sample_run[cells],
source = targets$Sample_source[cells])
p1 <- ggplot(dat, aes(x = x, y = y, colour = sample)) +
geom_point(aes(shape = run), size = 3) +
labs(colour = "Sample type", shape = "Run",
x = "Principal component 1",
y = "Principal component 2") +
ggtitle("Cell types") +
scale_color_manual(values = pal[-2])
p1
Calculate the percentage of variance explained by principal components.
PCs <- prcomp(t(mValsNoXY), scale = TRUE)
summary(PCs)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 461.2224 385.6336 281.0357 225.07463 159.04781 145.86124
Proportion of Variance 0.2903 0.2029 0.1078 0.06913 0.03452 0.02903
Cumulative Proportion 0.2903 0.4933 0.6010 0.67016 0.70468 0.73372
PC7 PC8 PC9 PC10 PC11
Standard deviation 143.04165 138.20916 132.14354 127.12944 124.94789
Proportion of Variance 0.02792 0.02607 0.02383 0.02206 0.02131
Cumulative Proportion 0.76164 0.78771 0.81154 0.83359 0.85490
PC12 PC13 PC14 PC15 PC16
Standard deviation 121.33963 119.96664 116.90175 116.15774 114.55490
Proportion of Variance 0.02009 0.01964 0.01865 0.01841 0.01791
Cumulative Proportion 0.87499 0.89463 0.91328 0.93169 0.94960
PC17 PC18 PC19 PC20
Standard deviation 113.94919 111.17276 107.64848 4.526e-12
Proportion of Variance 0.01772 0.01687 0.01581 0.000e+00
Cumulative Proportion 0.96732 0.98419 1.00000 1.000e+00
summary(PCs)$importance %>%
t() %>%
data.frame() %>%
rownames_to_column(var = "PC") %>%
slice_head(n = 10) %>%
ggplot(aes(x = forcats::fct_reorder(PC, 1:10),
y = Proportion.of.Variance * 100,
group = 1)) +
geom_point(size = 4) +
geom_line() +
labs(y = "% Variance explained",
x = "Principal component") +
cowplot::theme_cowplot()
dims <- list(c(1,2), c(1,3), c(2,3))
pcvar <- summary(PCs)$importance
p <- vector("list", length(dims))
for(i in 1:length(p)){
mds <- plotMDS(mValsNoXY[, cells], top = 1000, gene.selection = "common",
plot = FALSE, dim.plot = dims[[i]])
dat <- tibble(x = mds$x,
y = mds$y,
sample = targets$Sample_Group[cells],
run = targets$Sample_run[cells],
source = targets$Sample_source[cells])
p[[i]] <- ggplot(dat, aes(x = x, y = y, colour = sample)) +
geom_point(aes(shape = run), size = 3) +
labs(colour = "Sample type", shape = "Run",
x = glue("Principal component {dims[[i]][1]} ({round(pcvar[2,dims[[i]][1]]*100,1)}%)"),
y = glue("Principal component {dims[[i]][2]} ({round(pcvar[2,dims[[i]][2]]*100,1)}%)")) +
scale_color_manual(values = pal[-2]) +
theme(legend.direction = "vertical")
}
(p[[1]] | p[[2]] | p[[3]]) + plot_layout(guides = "collect") &
theme(legend.position = "bottom")
Version | Author | Date |
---|---|---|
5952136 | Jovana Maksimovic | 2021-11-23 |
If the 4 sorted cell types are fully representative of raw BAL, then fully methylated/unmethylated loci across all the sorted cell types should also be fully methylated/unmethylated in raw BAL. We identified 3479 probes with mean beta values either greater than 0.95 or less than 0.05 across all the sorted cell types. These probes are consistently fully methylated or fully unmethylated across both the sorted cell types and the raw BAL samples, confirming that the raw BAL is unlikely to contain additional nucleated cell types.
cellTypes <- unique(targets$Sample_Group)[-5]
tmp <- bValsNoXY[, targets$Sample_Group %in% cellTypes]
keep <- which(rowMeans(tmp) >= 0.95 | rowMeans(tmp) <= 0.05)
o <- order(targets$Sample_Group)
NMF::aheatmap(bValsNoXY[keep, o],
annCol = list(SampleType = as.character(targets$Sample_Group[o])),
labCol = NA, labRow = NA, annColors = list(pal),
Colv = NA)
Version | Author | Date |
---|---|---|
5952136 | Jovana Maksimovic | 2021-11-23 |
We assessed the impact of our pooling strategy on the methylomes of the sorted cell types by examining the variance of pools with different numbers of contributing donors. If the variance of beta values (or M values) in the methylomes derived from pools with fewer donors is not greater than the variance of methylomes of pools from a higher number of donors, then our pooling strategy is unlikely to be significantly biasing the methylation data. Thus, we examined the relationship between the number of individuals contributing to a pooled sample and the variance across CpGs in the sample. The result indicates that there is no statistically significant relationship between the number of individuals contributing to a pool and its variance across CpGs.
data.frame(poolVars = colVars(mValsNoXY)) %>%
mutate(cellType = targets$Sample_Group,
num = targets$Num_individuals) %>%
dplyr::filter(cellType != "Raw" ) %>%
mutate(cellType = forcats::fct_drop(cellType)) -> dat
f <- lm(poolVars ~ num, data = dat)
s <- summary(f)
df <- data.frame(label = as.character(glue::glue("Adj. R^2 = {signif(s$adj.r.squared, 4)}\n
Intercept = {signif(f$coef[[1]], 4)}\n
Slope = {signif(f$coef[[2]], 4)}\n
P-value = {signif(s$coef[2, 4], 4)}")),
x = 9,
y = 6.75,
hjust = 0,
vjust = 1,
col = "black")
ggplot(dat, aes(x = num, y = poolVars, color = cellType)) +
geom_point() +
geom_abline(slope = coef(f)[[2]], intercept = coef(f)[[1]],
linetype = "dashed", colour = "red") +
scale_color_manual(values = pal[-2]) +
scale_x_continuous(breaks = 1:nrow(dat)) +
cowplot::theme_cowplot(font_size = 12) +
labs(x = "No. contributing individuals",
y = "Variance across CpGs",
color = "Sample type") +
ggtext::geom_richtext(data = df, inherit.aes = FALSE,
aes(x, y, label = label, hjust = hjust,
vjust = vjust),
fill = NA, label.color = NA,
size = 3, color = "black")
Version | Author | Date |
---|---|---|
5952136 | Jovana Maksimovic | 2021-11-23 |
The data is of good quality and shows no evidence of unexpected sources of variation. Save the various data objects for faster downstream analysis.
outFile <- here("data/processedData.RData")
if(!file.exists(outFile)){
save(annEPIC, mSetSqFlt, rgSet, mVals, bVals, targets, mValsNoXY, bValsNoXY,
pal, cells, file = outFile)
}
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /config/binaries/R/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /config/binaries/R/4.0.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] RColorBrewer_1.1-2
[2] DMRcatedata_2.8.2
[3] ExperimentHub_1.16.0
[4] AnnotationHub_2.22.0
[5] BiocFileCache_1.14.0
[6] dbplyr_2.1.0
[7] glue_1.4.2
[8] patchwork_1.1.1
[9] forcats_0.5.1
[10] dplyr_1.0.4
[11] purrr_0.3.4
[12] readr_1.4.0
[13] tidyr_1.1.2
[14] tibble_3.1.2
[15] ggplot2_3.3.5
[16] tidyverse_1.3.0
[17] IlluminaHumanMethylationEPICmanifest_0.3.0
[18] stringr_1.4.0
[19] minfiData_0.36.0
[20] IlluminaHumanMethylation450kmanifest_0.4.0
[21] missMethyl_1.24.0
[22] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[23] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[24] minfi_1.36.0
[25] bumphunter_1.32.0
[26] locfit_1.5-9.4
[27] iterators_1.0.13
[28] foreach_1.5.1
[29] Biostrings_2.58.0
[30] XVector_0.30.0
[31] SummarizedExperiment_1.20.0
[32] Biobase_2.50.0
[33] MatrixGenerics_1.2.1
[34] matrixStats_0.59.0
[35] GenomicRanges_1.42.0
[36] GenomeInfoDb_1.26.7
[37] IRanges_2.24.1
[38] S4Vectors_0.28.1
[39] BiocGenerics_0.36.1
[40] limma_3.46.0
[41] here_1.0.1
[42] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 rtracklayer_1.50.0
[3] R.methodsS3_1.8.1 pkgmaker_0.32.2
[5] bit64_4.0.5 knitr_1.31
[7] DelayedArray_0.16.3 R.utils_2.10.1
[9] data.table_1.13.6 rpart_4.1-15
[11] doParallel_1.0.16 RCurl_1.98-1.3
[13] GEOquery_2.58.0 AnnotationFilter_1.14.0
[15] generics_0.1.0 GenomicFeatures_1.42.1
[17] preprocessCore_1.52.1 cowplot_1.1.1
[19] RSQLite_2.2.5 bit_4.0.4
[21] xml2_1.3.2 lubridate_1.7.9.2
[23] httpuv_1.5.5 assertthat_0.2.1
[25] xfun_0.23 hms_1.0.0
[27] evaluate_0.14 promises_1.2.0.1
[29] fansi_0.5.0 scrime_1.3.5
[31] progress_1.2.2 readxl_1.3.1
[33] DBI_1.1.1 htmlwidgets_1.5.3
[35] reshape_0.8.8 ellipsis_0.3.2
[37] backports_1.2.1 markdown_1.1
[39] permute_0.9-5 annotate_1.68.0
[41] gridBase_0.4-7 biomaRt_2.46.3
[43] sparseMatrixStats_1.2.0 vctrs_0.3.8
[45] ensembldb_2.14.0 cachem_1.0.4
[47] withr_2.4.2 Gviz_1.34.1
[49] BSgenome_1.58.0 checkmate_2.0.0
[51] GenomicAlignments_1.26.0 prettyunits_1.1.1
[53] mclust_5.4.7 cluster_2.1.0
[55] lazyeval_0.2.2 crayon_1.4.1
[57] genefilter_1.72.1 edgeR_3.32.1
[59] pkgconfig_2.0.3 DMRcate_2.4.1
[61] labeling_0.4.2 nlme_3.1-152
[63] ProtGenerics_1.22.0 nnet_7.3-15
[65] rlang_0.4.11 lifecycle_1.0.0
[67] registry_0.5-1 modelr_0.1.8
[69] dichromat_2.0-0 cellranger_1.1.0
[71] rprojroot_2.0.2 rngtools_1.5
[73] base64_2.0 Matrix_1.3-2
[75] Rhdf5lib_1.12.1 reprex_1.0.0
[77] base64enc_0.1-3 whisker_0.4
[79] png_0.1-7 bitops_1.0-7
[81] R.oo_1.24.0 rhdf5filters_1.2.0
[83] blob_1.2.1 DelayedMatrixStats_1.12.3
[85] doRNG_1.8.2 nor1mix_1.3-0
[87] jpeg_0.1-8.1 scales_1.1.1
[89] memoise_2.0.0.9000 magrittr_2.0.1
[91] plyr_1.8.6 zlibbioc_1.36.0
[93] compiler_4.0.2 illuminaio_0.32.0
[95] Rsamtools_2.6.0 cli_3.0.0
[97] DSS_2.38.0 htmlTable_2.1.0
[99] Formula_1.2-4 MASS_7.3-53.1
[101] tidyselect_1.1.0 stringi_1.5.3
[103] highr_0.8 yaml_2.2.1
[105] askpass_1.1 latticeExtra_0.6-29
[107] grid_4.0.2 VariantAnnotation_1.36.0
[109] tools_4.0.2 rstudioapi_0.13
[111] foreign_0.8-81 git2r_0.28.0
[113] bsseq_1.26.0 gridExtra_2.3
[115] farver_2.1.0 digest_0.6.27
[117] BiocManager_1.30.10 ggtext_0.1.1
[119] shiny_1.6.0 quadprog_1.5-8
[121] gridtext_0.1.4 Rcpp_1.0.6
[123] siggenes_1.64.0 broom_0.7.4
[125] BiocVersion_3.12.0 later_1.1.0.1
[127] org.Hs.eg.db_3.12.0 httr_1.4.2
[129] AnnotationDbi_1.52.0 biovizBase_1.38.0
[131] colorspace_2.0-2 rvest_0.3.6
[133] XML_3.99-0.5 fs_1.5.0
[135] splines_4.0.2 statmod_1.4.35
[137] multtest_2.46.0 xtable_1.8-4
[139] jsonlite_1.7.2 R6_2.5.0
[141] Hmisc_4.4-2 pillar_1.6.1
[143] htmltools_0.5.1.1 mime_0.10
[145] NMF_0.23.0 fastmap_1.1.0
[147] BiocParallel_1.24.1 interactiveDisplayBase_1.28.0
[149] beanplot_1.2 codetools_0.2-18
[151] utf8_1.2.1 lattice_0.20-41
[153] curl_4.3 gtools_3.8.2
[155] openssl_1.4.3 survival_3.2-7
[157] rmarkdown_2.6 munsell_0.5.0
[159] rhdf5_2.34.0 GenomeInfoDbData_1.2.4
[161] HDF5Array_1.18.1 haven_2.3.1
[163] reshape2_1.4.4 gtable_0.3.0