Aplicaciones y Discusiones en Desarrollo Animal - Taller 2
Set your working directory and load the libraries we will need.
# set working directory
setwd("~/Documents/git_projects/ADDA_taller2/")
# install pacman if not already installed
if (!require("pacman")) install.packages("pacman")
# use pacman to load libraries
pacman::p_load(tidyverse,DESeq2, ageglm,ggVennDiagram,UpSetR,plotly,ggrepel,scico)
## Warning: package 'ageglm' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'ageglm'
## Warning in pacman::p_load(tidyverse, DESeq2, ageglm, ggVennDiagram, UpSetR, : Failed to install/load:
## ageglm
In the last exercise, we performed a differential gene expression
analysis with DESeq2
. We contrasted dorsal and ventral skin
tissues from tadpoles reared on light and dark backgrounds. We were
finished up with DESeqResults
objects for pair-wise
comparisons. These we saved as a list.
The aim of this exercise is to explore these results in greater detail. To do that, it would be good to know the functions of the genes in the Pelobates cultripes genome. One important extra file that you will use for this, is therefore an annotation file generated through BLASTing the genes against the Ensembl Xenopus tropicalis proteome.
Let's load the results files from the previous exercise, plus a file with annotations.
# DEG object
dds<-readRDS("./results/deseq2_dds.rds")
# the list of DEG results
res<-readRDS("./results/deseq2_results.rds")
# Load BLAST results
xtrop<-read_csv("data/PCU23_annotations_xtr105_genes.csv")
If we take a closer look at the annotation file, we should notice a few things:
head(xtrop, 20)
## # A tibble: 20 × 10
## gene_id transcript_id pep_id pep_description xtr_pep_id_x xtr_pep_name_x
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… kiaa1522
## 2 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… med9
## 3 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… ENSXETG000000…
## 4 PECUL23A000… PECUL23A0000… PECUL… BPI fold-conta… ENSXETP0000… pltp
## 5 PECUL23A000… PECUL23A0000… PECUL… fragile X ment… ENSXETP0000… fxr1
## 6 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… ENSXETG000000…
## 7 PECUL23A000… PECUL23A0000… PECUL… lysozyme g-like ENSXETP0000… lyg2
## 8 PECUL23A000… PECUL23A0000… PECUL… <NA> <NA> <NA>
## 9 PECUL23A000… PECUL23A0000… PECUL… keratin, type … ENSXETP0000… ccdc58
## 10 PECUL23A000… PECUL23A0000… PECUL… BUB3-interacti… ENSXETP0000… znf207
## 11 PECUL23A000… PECUL23A0000… PECUL… disintegrin an… ENSXETP0000… crot
## 12 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… camlg
## 13 PECUL23A000… PECUL23A0000… PECUL… spermatid peri… ENSXETP0000… strbp
## 14 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… ENSXETG000000…
## 15 PECUL23A000… PECUL23A0000… PECUL… <NA> ENSXETP0000… ENSXETG000000…
## 16 PECUL23A000… PECUL23A0000… PECUL… lipoyltransfer… <NA> <NA>
## 17 PECUL23A000… PECUL23A0000… PECUL… <NA> <NA> <NA>
## 18 PECUL23A000… PECUL23A0000… PECUL… AP-2 complex s… ENSXETP0000… ap2m1
## 19 PECUL23A000… PECUL23A0000… PECUL… general transc… ENSXETP0000… ENSXETG000000…
## 20 PECUL23A000… PECUL23A0000… PECUL… transmembrane … ENSXETP0000… wnk4
## # ℹ 4 more variables: xtr_pep_description_x <chr>, xtr_pep_id_p <chr>,
## # xtr_pep_name_p <chr>, xtr_pep_description_p <chr>
The annotation table has merged a couple of sources of information.
The gene_id
, transcript_id
and
pep_id
are Pelobates-specific identifiers. These
are reference IDs from the Pelobates genome. The
pep_description
are annotations from that genome. The rest
(starting with xtr_
) are Xenopus tropicalis
peptide ids, names and descriptions. These have been derived from either
a BLASTx or BLASTp search (using diamond). We used the
Xenopus tropicalis Ensembl v105 proteome as a target for this
search. These are therefore homolog annotations.
As a reminder, calling summary()
on our
DESeqResults
objects, returns the number (and percentage)
of significant up and down regulated genes for a specific pair-wise
comparison.
summary(res$bD_bV, alpha=0.05)
##
## out of 18110 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 214, 1.2%
## LFC < 0 (down) : 181, 1%
## outliers [1] : 103, 0.57%
## low counts [2] : 1756, 9.7%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
However, we have multiple sets of pair-wise comparisons, and we might be interested to see how many genes which are differentially expressed are shared among sets. In other words, how many genes that are differentially expressed when comparing dorsal and ventral skin white tadpoles are also differentially expressed in the same comparison for dark tadpoles?
More often than not, you will come across comparisons of sets of raw numbers like this displayed as Venn diagrams.
The first think we have to do, is to make lists of genes per set.
I.e. which are these 395
genes that are differentially
expressed in in the dorsal vs. ventral comparison in black tadpoles?
# For a single comparison
res$bD_bV %>%
as_tibble(rownames = "gene") %>%
filter(padj<0.05) %>%
pull(gene)
## [1] "PECUL23A000242" "PECUL23A000289" "PECUL23A000301" "PECUL23A000483"
## [5] "PECUL23A000532" "PECUL23A000547" "PECUL23A000601" "PECUL23A000996"
## [9] "PECUL23A001048" "PECUL23A001317" "PECUL23A001325" "PECUL23A001345"
## [13] "PECUL23A001399" "PECUL23A001415" "PECUL23A001772" "PECUL23A001782"
## [17] "PECUL23A002033" "PECUL23A002170" "PECUL23A002175" "PECUL23A002731"
## [21] "PECUL23A002949" "PECUL23A003209" "PECUL23A003510" "PECUL23A003564"
## [25] "PECUL23A003960" "PECUL23A004007" "PECUL23A004018" "PECUL23A004157"
## [29] "PECUL23A004243" "PECUL23A004279" "PECUL23A005059" "PECUL23A005175"
## [33] "PECUL23A005304" "PECUL23A005421" "PECUL23A005492" "PECUL23A005644"
## [37] "PECUL23A005728" "PECUL23A005890" "PECUL23A005934" "PECUL23A005997"
## [41] "PECUL23A006153" "PECUL23A006314" "PECUL23A006325" "PECUL23A006364"
## [45] "PECUL23A006474" "PECUL23A006487" "PECUL23A006615" "PECUL23A006799"
## [49] "PECUL23A006848" "PECUL23A006960" "PECUL23A007161" "PECUL23A007207"
## [53] "PECUL23A007351" "PECUL23A007359" "PECUL23A007370" "PECUL23A007854"
## [57] "PECUL23A008090" "PECUL23A008121" "PECUL23A008291" "PECUL23A008298"
## [61] "PECUL23A008357" "PECUL23A008364" "PECUL23A008406" "PECUL23A008436"
## [65] "PECUL23A008645" "PECUL23A008908" "PECUL23A009026" "PECUL23A009138"
## [69] "PECUL23A009242" "PECUL23A009671" "PECUL23A009683" "PECUL23A009836"
## [73] "PECUL23A009947" "PECUL23A010042" "PECUL23A010082" "PECUL23A010138"
## [77] "PECUL23A010569" "PECUL23A011313" "PECUL23A011414" "PECUL23A011512"
## [81] "PECUL23A011595" "PECUL23A011679" "PECUL23A011713" "PECUL23A012596"
## [85] "PECUL23A012675" "PECUL23A013228" "PECUL23A013580" "PECUL23A013867"
## [89] "PECUL23A013934" "PECUL23A013990" "PECUL23A014116" "PECUL23A014128"
## [93] "PECUL23A014246" "PECUL23A014248" "PECUL23A014270" "PECUL23A014652"
## [97] "PECUL23A014845" "PECUL23A014974" "PECUL23A015015" "PECUL23A015118"
## [101] "PECUL23A015799" "PECUL23A016000" "PECUL23A016049" "PECUL23A016263"
## [105] "PECUL23A016345" "PECUL23A016456" "PECUL23A016572" "PECUL23A016704"
## [109] "PECUL23A017095" "PECUL23A017142" "PECUL23A017272" "PECUL23A017312"
## [113] "PECUL23A017478" "PECUL23A017585" "PECUL23A017694" "PECUL23A017764"
## [117] "PECUL23A017798" "PECUL23A017859" "PECUL23A017946" "PECUL23A018361"
## [121] "PECUL23A018705" "PECUL23A018797" "PECUL23A018947" "PECUL23A018991"
## [125] "PECUL23A019228" "PECUL23A019284" "PECUL23A019391" "PECUL23A019524"
## [129] "PECUL23A019669" "PECUL23A019709" "PECUL23A019720" "PECUL23A019828"
## [133] "PECUL23A020018" "PECUL23A020060" "PECUL23A020154" "PECUL23A020206"
## [137] "PECUL23A020245" "PECUL23A020624" "PECUL23A020757" "PECUL23A020820"
## [141] "PECUL23A020868" "PECUL23A021160" "PECUL23A021181" "PECUL23A021198"
## [145] "PECUL23A021230" "PECUL23A021291" "PECUL23A021447" "PECUL23A021715"
## [149] "PECUL23A022098" "PECUL23A022797" "PECUL23A022919" "PECUL23A023185"
## [153] "PECUL23A023189" "PECUL23A023265" "PECUL23A023280" "PECUL23A023288"
## [157] "PECUL23A023354" "PECUL23A023416" "PECUL23A023445" "PECUL23A023579"
## [161] "PECUL23A023644" "PECUL23A023841" "PECUL23A024227" "PECUL23A024283"
## [165] "PECUL23A024487" "PECUL23A024569" "PECUL23A024600" "PECUL23A024834"
## [169] "PECUL23A024836" "PECUL23A024875" "PECUL23A024986" "PECUL23A025057"
## [173] "PECUL23A025261" "PECUL23A025304" "PECUL23A025483" "PECUL23A025500"
## [177] "PECUL23A026047" "PECUL23A026380" "PECUL23A026425" "PECUL23A026681"
## [181] "PECUL23A026956" "PECUL23A027266" "PECUL23A027524" "PECUL23A027613"
## [185] "PECUL23A027639" "PECUL23A027859" "PECUL23A027901" "PECUL23A027921"
## [189] "PECUL23A028526" "PECUL23A028742" "PECUL23A028929" "PECUL23A029036"
## [193] "PECUL23A029093" "PECUL23A029281" "PECUL23A029428" "PECUL23A029622"
## [197] "PECUL23A029689" "PECUL23A029700" "PECUL23A029936" "PECUL23A030813"
## [201] "PECUL23A031434" "PECUL23A031470" "PECUL23A031701" "PECUL23A031731"
## [205] "PECUL23A031934" "PECUL23A031954" "PECUL23A032195" "PECUL23A032398"
## [209] "PECUL23A032494" "PECUL23A032670" "PECUL23A032811" "PECUL23A032812"
## [213] "PECUL23A032888" "PECUL23A032916" "PECUL23A033374" "PECUL23A033445"
## [217] "PECUL23A033449" "PECUL23A033940" "PECUL23A034238" "PECUL23A034305"
## [221] "PECUL23A034484" "PECUL23A034593" "PECUL23A034809" "PECUL23A035089"
## [225] "PECUL23A035272" "PECUL23A035283" "PECUL23A035351" "PECUL23A035395"
## [229] "PECUL23A036005" "PECUL23A036393" "PECUL23A036483" "PECUL23A036592"
## [233] "PECUL23A036730" "PECUL23A036831" "PECUL23A037085" "PECUL23A037236"
## [237] "PECUL23A037292" "PECUL23A037490" "PECUL23A037833" "PECUL23A037859"
## [241] "PECUL23A037914" "PECUL23A037937" "PECUL23A038004" "PECUL23A038022"
## [245] "PECUL23A038095" "PECUL23A038230" "PECUL23A038254" "PECUL23A038345"
## [249] "PECUL23A038571" "PECUL23A038940" "PECUL23A038955" "PECUL23A039103"
## [253] "PECUL23A039662" "PECUL23A039712" "PECUL23A040111" "PECUL23A040496"
## [257] "PECUL23A040678" "PECUL23A040780" "PECUL23A040934" "PECUL23A040985"
## [261] "PECUL23A041107" "PECUL23A041125" "PECUL23A041453" "PECUL23A041490"
## [265] "PECUL23A041601" "PECUL23A041725" "PECUL23A042049" "PECUL23A042054"
## [269] "PECUL23A042243" "PECUL23A042374" "PECUL23A042395" "PECUL23A042485"
## [273] "PECUL23A042900" "PECUL23A043180" "PECUL23A043421" "PECUL23A043541"
## [277] "PECUL23A043631" "PECUL23A043688" "PECUL23A043785" "PECUL23A043862"
## [281] "PECUL23A044154" "PECUL23A044338" "PECUL23A044492" "PECUL23A044693"
## [285] "PECUL23A044716" "PECUL23A044741" "PECUL23A044790" "PECUL23A045010"
## [289] "PECUL23A045121" "PECUL23A045400" "PECUL23A045522" "PECUL23A045550"
## [293] "PECUL23A045689" "PECUL23A045704" "PECUL23A045709" "PECUL23A045929"
## [297] "PECUL23A045992" "PECUL23A046141" "PECUL23A046248" "PECUL23A046269"
## [301] "PECUL23A046282" "PECUL23A046588" "PECUL23A046907" "PECUL23A046914"
## [305] "PECUL23A047165" "PECUL23A047351" "PECUL23A047359" "PECUL23A047454"
## [309] "PECUL23A047486" "PECUL23A047519" "PECUL23A047562" "PECUL23A047567"
## [313] "PECUL23A047578" "PECUL23A047615" "PECUL23A047738" "PECUL23A047807"
## [317] "PECUL23A048049" "PECUL23A048147" "PECUL23A048451" "PECUL23A048686"
## [321] "PECUL23A048847" "PECUL23A049240" "PECUL23A049396" "PECUL23A049401"
## [325] "PECUL23A049468" "PECUL23A049670" "PECUL23A049823" "PECUL23A049908"
## [329] "PECUL23A049956" "PECUL23A050008" "PECUL23A050057" "PECUL23A050240"
## [333] "PECUL23A050285" "PECUL23A050517" "PECUL23A050520" "PECUL23A051023"
## [337] "PECUL23A051129" "PECUL23A051608" "PECUL23A051856" "PECUL23A051917"
## [341] "PECUL23A052305" "PECUL23A052472" "PECUL23A052620" "PECUL23A052743"
## [345] "PECUL23A052805" "PECUL23A052910" "PECUL23A052911" "PECUL23A053070"
## [349] "PECUL23A053380" "PECUL23A053412" "PECUL23A053625" "PECUL23A053655"
## [353] "PECUL23A053849" "PECUL23A053970" "PECUL23A054009" "PECUL23A054143"
## [357] "PECUL23A054440" "PECUL23A054533" "PECUL23A054628" "PECUL23A054646"
## [361] "PECUL23A054699" "PECUL23A054984" "PECUL23A055220" "PECUL23A055439"
## [365] "PECUL23A055459" "PECUL23A056272" "PECUL23A056429" "PECUL23A056699"
## [369] "PECUL23A056736" "PECUL23A056920" "PECUL23A056969" "PECUL23A057491"
## [373] "PECUL23A057668" "PECUL23A058171" "PECUL23A058337" "PECUL23A058375"
## [377] "PECUL23A058712" "PECUL23A059095" "PECUL23A059130" "PECUL23A059140"
## [381] "PECUL23A059202" "PECUL23A059461" "PECUL23A060059" "PECUL23A060175"
## [385] "PECUL23A060319" "PECUL23A060421" "PECUL23A060554" "PECUL23A060899"
## [389] "PECUL23A060909" "PECUL23A060987" "PECUL23A061197" "PECUL23A061578"
## [393] "PECUL23A061612" "PECUL23A061888" "PECUL23A061990"
We are going to do this repeatedly, so we can just turn this code into a function:
# turn it into a function so we can apply it to a list!
extract_degs<-function(x) {
return(
x %>%
as_tibble(rownames = "gene") %>%
filter(padj<0.05) %>%
pull(gene)
)
}
We can now apply this to our list of DESeqResults
.
# extract all
sig_deg<-lapply(res, FUN=extract_degs)
str(sig_deg)
## List of 4
## $ bD_bV: chr [1:395] "PECUL23A000242" "PECUL23A000289" "PECUL23A000301" "PECUL23A000483" ...
## $ wD_wV: chr [1:370] "PECUL23A000301" "PECUL23A000532" "PECUL23A000547" "PECUL23A000601" ...
## $ bV_wV: chr [1:98] "PECUL23A000301" "PECUL23A000532" "PECUL23A002321" "PECUL23A004402" ...
## $ bD_wD: chr [1:73] "PECUL23A000532" "PECUL23A000872" "PECUL23A002033" "PECUL23A002231" ...
We can now plot this as a Venn diagram. Let's do this just with two sets, for the dorsal-ventral comparisons for the black and the white tadpoles
# comparing just the dorsal vs. ventral
sig_deg[c("bD_bV","wD_wV")] %>%
ggVennDiagram(edge_size = 0) +
scale_fill_scico(palette = "batlow")
Question: What does this tell us?
Question: What would this look like if we compared all four sets? i.e. go ahead and make a Venn diagram comparing all four sets.
ggVennDiagram(sig_deg,edge_size = 0) +
scale_fill_scico(palette = "batlow")
8
genes that are differentially expressed in
all comparisons.Personally, once you compare more than three sets, I find Venn diagrams to become difficult to interpret. An interesting alternative is an "upset" diagram. This shows exactly the same information.
upset(fromList(sig_deg),
number.angles = 0, point.size = 3, line.size = 1,
sets.x.label = "Number of DEGs",
set_size.show = TRUE,
set_size.scale_max = max(sapply(sig_deg, length))+50, # needed only to expand the axis a bit
text.scale = c(1.2, 1.2, 1.2, 1.2, 1.5, 1.5),
order.by=c("degree","freq"))
Question: What do we think? is this easier to interpret? Is there any additional information we can gain from this?
A plot you may also have seen is a "Volcano" plot, where you are showing the log fold change plotted against the adjusted p-values per gene. To make small p-values very large, the p-values are usually -log10() transformed. The most basic plot would look like this:
res$bD_bV %>%
as_tibble(rownames = "gene_id") %>%
ggplot(aes(x=log2FoldChange, y=-log10(padj))) +
geom_point(alpha=0.75, shape=16)
## Warning: Removed 1859 rows containing missing values (`geom_point()`).
However, we want to squeeze as much information out of it as possible, so we will add the following:
gg_res <- res %>%
lapply(as_tibble,rownames = "gene_id") %>%
bind_rows(.id="comparison") %>%
drop_na(padj) %>% # drop all genes with NAs
filter(padj<0.5) %>% # reduce the number of points that need to be plotted
mutate(sig= padj<0.05 & abs(log2FoldChange)>=2) %>% # make a variable to indicate if a gene is significant based on a specific thresholds
left_join(xtrop, by=c("gene_id")) %>% # add annotations
ggplot(aes(x=log2FoldChange, y=-log10(padj), color=sig,
text=paste0("</br>Pcu23 gene: ", gene_id,
"</br>Pcu23 peptide: ", pep_description,
"</br>X.tr peptide: ", xtr_pep_name_x
)
)) +
geom_point(alpha=0.75, shape=16) +
facet_wrap(~comparison, ncol = 2, scales = "free") +
xlim(-15,15) +
theme_minimal() +
theme(legend.position = "none")
gg_res
## Warning: Removed 4 rows containing missing values (`geom_point()`).
# we can now turn this into an interactive plot:
ggplotly(gg_res, tooltip="text")
Question: What gene are positively differentially expressed when comparing the black vs. the white dorsal tissue?
Some other comments:
wD
(white dorsal) is the reference, and so
a positive fold change means genes are over-expressed in the
treatment, which is the black dorsal tissue.We could also make publication-ready, static plots with annotations of specific genes. For example, all the significant DEGs for the black dorsal vs. white dorsal.
res$bD_wD %>%
as_tibble(rownames = "gene_id") %>%
drop_na(padj) %>% # drop all genes with NAs
#filter(padj<0.99) %>% # reduce the number of points that need to be plotted
mutate(sig= padj<0.05 & abs(log2FoldChange)>=2) %>% # make a variable to indicate if a gene is significant based on a specific thresholds
left_join(xtrop, by=c("gene_id")) %>% # add annotations
filter(padj<0.5) %>%
ggplot(aes(x=log2FoldChange, y=-log10(padj), color=sig)) +
geom_point(alpha=0.75, shape=16) +
geom_text_repel(data=. %>% filter(sig),
aes(label=xtr_pep_name_x),
max.overlaps = 50,
size=2) +
xlim(-10,10) +
ggtitle("DEGs in Black Dorsal Skin in Comparison to White Dorsal Skin") +
theme_bw() +
theme(legend.position = "none")
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_text_repel()`).
Question: What have we learned about the gene expression that controls pigemnation plasticity in dorsal skin of these tadpoles?
Just some of the many answers:
In the previous exercise, we looked at MA-plots. We used them mostly as a diagnostic plot, but very often, you will also see them in publications, to display a ratio of gene expression (the fold change) against the mean expression of that gene (base mean). Let's quickly remind us what this looks like:
DESeq2::plotMA(res$bD_bV)
Here, the "noise" of genes with low counts is over-powering the plot.
The DESeq2 package incorporates a prior on log2 fold changes, resulting in moderated estimates from genes with low counts and highly variable counts, as can be seen by the narrowing of spread of points on the left side of the MA plot. This is called "shrinkage" and is done to avoid that these values, which otherwise would frequently be unrealistically large, dominate the top-ranked log fold changes.
A useful exercise to reduce the noise of the low-count genes is to
apply a more aggressive shrinkage method. The most widely used type is
the apglm
method.
# apeglm shrinkage can only be done on already calculated coefficients:
resultsNames(dds)
## [1] "Intercept"
## [2] "condition_black_ventral_vs_black_dorsal"
## [3] "condition_white_dorsal_vs_black_dorsal"
## [4] "condition_white_ventral_vs_black_dorsal"
bD_bV_res <- results(dds,
contrast= c("condition","white_ventral", "black_dorsal"))
bD_bV_res
## log2 fold change (MLE): condition white_ventral vs black_dorsal
## Wald test p-value: condition white ventral vs black dorsal
## DataFrame with 18110 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## PECUL23A000004 557.4102 -0.100434 0.552488 -0.181786 0.855751
## PECUL23A000011 943.9019 -0.274858 0.186806 -1.471356 0.141195
## PECUL23A000017 308.9902 -0.089849 0.488532 -0.183916 0.854079
## PECUL23A000021 389.3509 0.161886 0.128565 1.259179 0.207966
## PECUL23A000024 61.2082 -0.697616 0.797256 -0.875021 0.381562
## ... ... ... ... ... ...
## PECUL23A062638 7640.25635 -0.331002 0.285494 -1.159400 0.2462931
## PECUL23A062639 14.84163 -0.269193 0.576309 -0.467098 0.6404298
## PECUL23A062644 1.19938 -0.600717 1.455699 -0.412665 0.6798517
## PECUL23A062646 7.21864 -1.430253 1.053540 -1.357569 0.1746006
## PECUL23A062647 109.72323 0.558908 0.272927 2.047831 0.0405766
## padj
## <numeric>
## PECUL23A000004 0.974429
## PECUL23A000011 0.611279
## PECUL23A000017 0.973855
## PECUL23A000021 0.692827
## PECUL23A000024 0.815316
## ... ...
## PECUL23A062638 0.728480
## PECUL23A062639 0.924804
## PECUL23A062644 NA
## PECUL23A062646 0.656003
## PECUL23A062647 0.361351
# with shrinkage
bD_bV_shr<-lfcShrink(dds,
coef="condition_white_ventral_vs_black_dorsal",
type="apeglm")
bD_bV_shr
## log2 fold change (MAP): condition white ventral vs black dorsal
## Wald test p-value: condition white ventral vs black dorsal
## DataFrame with 18110 rows and 5 columns
## baseMean log2FoldChange lfcSE pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric>
## PECUL23A000004 557.4102 -0.00950165 0.172577 0.855751 0.974429
## PECUL23A000011 943.9019 -0.15418290 0.159944 0.141195 0.611279
## PECUL23A000017 308.9902 -0.01072601 0.170359 0.854079 0.973855
## PECUL23A000021 389.3509 0.11292774 0.113455 0.207966 0.692827
## PECUL23A000024 61.2082 -0.03428805 0.181538 0.381562 0.815316
## ... ... ... ... ... ...
## PECUL23A062638 7640.25635 -0.11169956 0.183750 0.2462931 0.728480
## PECUL23A062639 14.84163 -0.02433933 0.175078 0.6404298 0.924804
## PECUL23A062644 1.19938 -0.00918329 0.180246 0.6798517 NA
## PECUL23A062646 7.21864 -0.04015298 0.185490 0.1746006 0.656003
## PECUL23A062647 109.72323 0.26784749 0.280337 0.0405766 0.361351
This produces the same type of "results" object, but you should notice that the first object shows the maximum likelihood estimates (MLE) of the fold change, whereas the second shows the shrunk (MAP) fold changes.
The differences become very clear when repeating the MA-plot
par(mfrow=c(1,2))
par(mar=c(4,4,4,1))
DESeq2::plotMA(bD_bV_res, main="MLE")
DESeq2::plotMA(bD_bV_shr, main="MAP")
Very often you will see the shrinkage versions of MA-plots, simply because they allow for a more pleasing visualization of the significant genes.
Question: How do the fold changes and adjusted p-values compare?
par(mfrow=c(1,2))
par(mar=c(4,4,1,1))
# p values
plot(bD_bV_res$padj~bD_bV_shr$padj,
xlab="adjusted p-value (shrunk)",
ylab="adjusted p-value (MLE)")
# fold changes
plot(bD_bV_res$log2FoldChange~bD_bV_shr$log2FoldChange,
xlab="Fold Change (shrunk)",
ylab="Fold Change (MLE)")
par(mfrow=c(1,1))