The following code replicates the analyses shown in the Kandinsky paper aiming to identify:
The CosMx dataset analysed in this vignette has been downloaded from the CosMx SMI Datasets webpage available at the NanoString website.
It consists of 18 fields of view (FOVs) from a human pancreas FFPE tissue, profiled with the CosMx Human Whole Transcriptome panel.
Cell type annotation file obtained via the R package InSituType was also provided by NanoString together with Cosmx data.
In order to work with CosMx data using Kandinsky, we first need to set the path to the folder containing CosMx raw input data:
list.files(path)
#> [1] "CellComposite" "CellLabels"
#> [3] "CellOverlay" "CellType_Accessory_Data"
#> [5] "CompartmentLabels" "Pancreas_exprMat_file.csv"
#> [7] "Pancreas_fov_positions_file.csv" "Pancreas_metadata_file.csv"
#> [9] "Pancreas_tx_file.csv" "Pancreas-CosMx-ReadMe.html"
#> [11] "Pancreas-CosMx-WTx-FlatFiles.zip" "Pancreas-polygons.csv"
Now we can import and load CosMx data into a
Seurat object
using the Kandinsky
prepare_cosmx_seurat()
function specifying:
the path
to the input folder
the dataset.id
which will be used to name the
resulting Seurat object
Additional optional parameters are:
pattern
: to selectively import only files whose
filenames contain a specific string, useful when the folder includes
input files from multiple datasets
fovs
: to load only the cells located within a
specified list of FOVs. To load the full dataset, you can set
fovs = NULL
(default)
# Load CosMX data
cosmx = prepare_cosmx_seurat(path = path, dataset.id = 'Pancreas', pattern = 'Pancreas')
#> Preparing gene expression matrix...
#> Loading cell metadata...
#> Loading cell polygon file...
#> Creating Seurat object...
#> Locating cell transcript file...
#> Loading FOV position file...
Now that the new Seurat object
is ready, we use
Kandinsky to identify cell
neighborhoods based on cell-to-cell relationships using the
kandinsky_init()
function.
Since we are working with CosMx data, we set the parameter
tech = 'cosmx'
to let Kandinsky correctly interpret the
CosMx data format.
For neighborhood identification, we choose the membrane
distance method (nb.method = 'M'
) and set the
distance threshold to 0um (d.max = 0
). In this way, we will
define c-NBs based on the presence of physical contact between
cells.
# Initialize Kandinsky (neighbour method: membrane distance = 0um)
cosmx = kandinsky_init(cosmx, tech = 'cosmx', nb.method = 'M', d.max = 0)
#> Building Kandinsky slot "sf"...
#> Building Kandinsky slot "nb"...
#> Building Kandinsky slot "tx"...
#> Building Kandinsky slot "fov_mask"...
#> Kandinsky data loaded into Seurat object!
We can load the annotation file and store the cell type information
within the Seurat object
metadata. The dataset includes
various cell populations from the pancreatic islets (Alpha, beta, delta,
gamma, epsilon cells).
We will create a new annotation field islet
to label all
cells belonging to any of the aforementioned islet cell types.
insitu_anno = readRDS("Kandinsky/data/CosMx_test_dataset_WTX/CellType_Accessory_Data/Pancreas_celltype_InSituType.rds")
rownames(insitu_anno) = insitu_anno$cell_ID
insitu_anno = insitu_anno[cosmx$cell,]
insitu_anno[is.na(insitu_anno$cell_types),]$cell_types = 'Unknown'
cosmx$cell_types = insitu_anno$cell_types
cosmx$islet = ifelse(stringr::str_detect(cosmx$cell_types,'Alpha|Beta|Gamma|Delta|Epsilon'), 'Y', 'N')
We can now visualize a subset of our dataset (from FOV 51 to 59)
using the Kandinsky function KanPlot()
, with cells coloured
according to their annotated cell type:
pancreas_pal = c("#AEDCED", "#CC6677", "#DDCC77", "#117733", "#7E7ECE", "#AA4499", "#44AA99", "#999933", "#882255","#661100", "#6699CC", "#888888")
p1 = KanPlot(subset(cosmx,subset = fov <= 59)
, feature = 'cell_types'
, theme = 'dark', bg_img = F, palette_discrete = pancreas_pal
, show_border = T, border.size = 0.1) +
labs(fill = 'Cell Type')
#> Updating Kandinsky data
p1
Before continuing with the downstream analysis, we need to perform
gene expression normalization using Seurat function
NormalizeData()
:
Acinar cells proximal to islets are known to up-regulate trypsin and other digestive enzymes {Egozi, 2020} and undergo acinar-to-beta cell reprogramming {Dahiya, 2024}.
To assess this regulatory mechanism, we can use Kandinsky
nnMat()
function to build a single-cell
neighborhood composition matrix.
We specify the label
argument to indicate the column
containing the cell type annotations.
The resulting matrix will be stored within the nnMat
Kandinsky slot of the Seurat object
. This matrix will
include as columns:
cell type labels
the number of cells of each cell type contributing to each cell’s neighbourhood
the total number of neighbours per cell
(tot_nn
)
We can inspect the composition matrix using the
command KanData(cosmx,'nnMat')
. This will return the matrix
along with the neighbourhood identification method and the name of the
cell annotation column.
To retrieve only the composition matrix, we use
KanData(cosmx,'nnMat')$nnMat
.
Alternatively, if we prefer not to store the result inside the
Seurat object
, we can call the function
nnMat()
specifying return.seurat = F
to return
the matrix directly.
## CENTROID DISTANCE ##
# Create neighbour matrix using default neighbour approach previously defined in `kandinsky_init()`
cosmx = nnMat(cosmx, label = 'cell_types', return.seurat = T)
#> Creating neighbour matrix using pre-defined neighbour network
knitr::kable(head(KanData(cosmx,'nnMat')$nnMat) )
Acinar.1 | Acinar.2 | Active stellate | Alpha cells | Beta cells | Delta cells | Ductal | Epsilon cells | Gamma cells | Macrophage | Quiescent stellate | Unknown | tot_nn | cell_types | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
51_1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | Macrophage |
51_2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | Acinar.1 |
51_3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | Acinar.1 |
51_4 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | Ductal |
51_5 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 5 | Acinar.1 |
51_6 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | Ductal |
We can then identify peri-islet acinar cells having at least one islet cell in their neighbourhood, and annotate the remaining acinar cells as non peri-islet.
This neihbourhood-based (NB) selection of acinar
cells can be performed using the Kandinsky
nn_query()
function, which internally calls
dplyr::filter()
.
The main parameters of nn_query()
are:
query
: a string that will be parsed and passed to
the internal filter()
function. It must follow the syntax
compatible with tidyverse
functions.
anno.name
: a string specyfing the name of the new
column where the output annotation will be stored
# Search for acinar cells with at least 1 neighbours belonging to an islet cell type (alpha/beta/delta/gamma/epsilon)
cosmx = nn_query(cosmx
, query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1) &
cell_types %in% c("Acinar.1","Acinar.2")'
, anno_name = 'Peri_islet')
The main output of nn_query
is a new annotation column
that will be stored both in the nnMat
matrix and in the
main Seurat metadata.
This column reports a boolean annotation (‘Y’,‘N’) for each cell,
indicating whether its neighbourhood composition meets the selection
criteria specified in the query
argument.
For visualization purposes, we will rename ‘Y’ and ‘N’ cells into
peri-islet and non peri-islet,
respectively.
After renaming the cell groups, we can visualize the peri-islet and non
peri-islet populations using KanPlot()
.
cosmx$Acinar_Type = ifelse(cosmx$Peri_islet == 'Y','peri-islet','non peri-islet')
cosmx$anno_plot = ifelse(cosmx$cell_types %in% c("Acinar.1","Acinar.2"),cosmx$Acinar_Type,
ifelse(cosmx$Peri_islet == 'N' & cosmx$islet=='Y','Islet','Other'))
p2 = KanPlot(subset(cosmx,subset = fov <= 59 & anno_plot %in% c('Islet','peri-islet','non peri-islet'))
, feature = 'anno_plot'
, theme='dark', bg_img = F, palette_discrete = c('#5F5FC4','#84C441','#7B1B38')
, show_border = T, border.size = 0.1)
#> Updating Kandinsky data
p2
Additionally, by setting the diffexpr
argument to
TRUE
, we can directly perform differential
expression analysis between the newly defined ‘Y’ and ‘N’ cell
populations.
Internally, the Seurat function FindMarkers()
is used
setting ‘Y’ and ‘N’ cell groups as ident.1
and
ident.2
, respectively.
Any additional parameter that can be set within
FindMarkers()
can also be directly specified within
nn_query()
.
# Differential expression analysis between peri-islet and non peri-islet acinar cells#
de_m = nn_query(cosmx,query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)'
, anno_name = 'Peri_islet'
, diffexpr = T
, label = 'cell_types'
, which = c('Acinar.1','Acinar.2')
, logfc.threshold = log2(1.1))
#> Differential expression between 532 Peri_islet-Y and 33465 Peri_islet-N cells
de_m$gene = rownames(de_m)
# Limit DE results due to contamination by setting a threshold on both the % of expressing cells in both acinar cell populations (pct.1/pct.2)
de_m = de_m %>% filter(p_val_adj < 0.01,pct.1 >= 0.2, pct.2 > 0.05)
de_m = de_m %>% arrange(desc(avg_log2FC))
de_m$gene = factor(de_m$gene, levels = de_m$gene)
de_m$FC = 2^(de_m$avg_log2FC)
de_m$sign = ifelse(de_m$FC > 1,'Up','Down')
We can then repeat the same neighbourhood query and
differential expression analysis procedure after changing our definition
of cell neighbourhood. For instance, we can now define cell
neighbourhood based on K-nearest neighbour (KNN) method
(nb.method='K'
) and set the number of neighbours to 10
(k=10
).
The change of neighbourhood definition can be made through
nnMat()
, although only the creation of the neighbourhood
composition matrix will be affected.
To permanently change neighbourhood definition of
our Seurat object
and associated Kandinsky
data for all downstream analysis, use the nb_update()
Kandinsky function.
## K-NEAREST NEIGHBOUR ##
# Change neighbour approach to KNN (with K=10), then repeat DE analysis
cosmx = nnMat(cosmx, label = 'cell_types', method='K', k = 10, return.seurat = T)
#> Neighbour method: K_10
# Search for acinar cells with at least 1 neighbours belonging to an islet cell type (alpha/beta/delta/gamma/epsilon)
cosmx = nn_query(cosmx
, query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1) &
cell_types %in% c("Acinar.1","Acinar.2")'
, anno_name = 'Peri_islet')
cosmx$Acinar_Type = ifelse(cosmx$Peri_islet == 'Y','peri-islet','non peri-islet')
cosmx$anno_plot = ifelse(cosmx$cell_types %in% c("Acinar.1","Acinar.2"),cosmx$Acinar_Type,
ifelse(cosmx$Peri_islet == 'N' & cosmx$islet == 'Y','Islet','Other'))
p3 = KanPlot(subset(cosmx,subset = fov <= 59 & anno_plot %in% c('Islet','peri-islet','non peri-islet'))
, feature = 'anno_plot'
, theme = 'light', bg_img = F, palette_discrete = c('gray40','#84C441','#7B1B38')
, show_border = T, border.size = 0.1)
#> Updating Kandinsky data
p3
# Repeat DE
de_k = nn_query(cosmx
, query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)'
, anno_name = 'Peri_islet'
, diffexpr = T, label = 'cell_types'
, which = c('Acinar.1','Acinar.2'), logfc.threshold = log2(1.1))
#> Differential expression between 1395 Peri_islet-Y and 32602 Peri_islet-N cells
de_k = de_k %>% filter(p_val_adj < 0.01, pct.1 >= 0.2, pct.2 >= 0.05)
de_k$gene = rownames(de_k)
de_k = de_k %>% arrange(desc(avg_log2FC))
de_k$gene = factor(de_k$gene, levels = de_k$gene)
de_k$FC = 2^(de_k$avg_log2FC)
de_k$sign = ifelse(de_k$FC > 1, 'Up', 'Down')
We can now assess how the differential expression results between peri-islet and non peri-islet acinar cells have been affected by their associated neighbourhood definition.
We will plot all significant differentially expressed (DE) genes in a separate barplots for each neighbourhood definition, with genes ordered according to their associated fold change:
FC > 1 : gene is more expressed in peri-islet acinar cells;
FC < 1 : gene is more expressed in non per-islet acinar cells.
# DE genes barplot
p4 = ggplot(de_m, aes(y = avg_log2FC, x = gene, fill = sign))+
theme_classic() +
geom_bar(stat = 'identity')+
scale_y_continuous(breaks = c(min(de_m$avg_log2FC),log2(1.1),log2(0.9), max(de_m$avg_log2FC)),
label = round(2^(c(min(de_m$avg_log2FC),log2(1.1),log2(0.9),max(de_m$avg_log2FC))), 2))+
geom_hline(yintercept = log2(1.1),linetype='dashed')+
geom_hline(yintercept = log2(0.9),linetype='dashed')+
xlab('Gene') + ylab('FC') + labs(fill = 'DE type')+
coord_flip() +
scale_fill_manual(values = c('Up' = '#7B1B38', 'Down' = '#84C441'))+
theme(axis.text = element_text(color = 'black'))
p5 = ggplot(de_k, aes(y = avg_log2FC, x = gene, fill = sign))+
theme_classic() +
geom_bar(stat = 'identity')+
scale_y_continuous(breaks = c(min(de_k$avg_log2FC),log2(1.1),log2(0.9), max(de_k$avg_log2FC)),
label = round(2^(c(min(de_k$avg_log2FC),log2(1.1),log2(0.9),max(de_k$avg_log2FC))), 2))+
geom_hline(yintercept = log2(1.1),linetype='dashed')+
geom_hline(yintercept = log2(0.9),linetype='dashed')+
xlab('Gene') + ylab('FC') + labs(fill = 'DE type')+
coord_flip() +
scale_fill_manual(values = c('Up' = '#7B1B38', 'Down' = '#84C441')) +
theme(axis.text = element_text(color = 'black'))
(p4 + ggtitle('method: physical contact'))+
(p5 + ggtitle('method: KNN (k=10)'))+
plot_layout(guides = 'collect', axis_titles = 'collect', ncol = 3)
The imaging mass cytometry (IMC) dataset of pancreatic ductal adenocarcinoma (PDAC) samples was described in Sussman et al..
The IMC PDAC dataset consists of a total of 34 regions of interest (ROIs) collected across 9 PDAC patients. The data can be downloaded from Zenodo using the following link
In order to work with IMC PDAC data using Kandinsky, we first need to load the dataset:
# Specify the folder where IMC PDAC dataset has been downloaded
data = list.files(path, full.names = FALSE)[1]
data
#> [1] "PDAC_IMC_Seurat_FINAL.rds"
This dataset was originally saved as a Seurat object
.
However, if it was not produced with a compatible spatial technology,
Kandinsky requires IMC data to be reported in a tabular format.
Therefore, we can extract the metadata associated with PDAC cells from
the IMC Seurat object
, including protein measurements and
cell centroid coordinates:
imc = readRDS(file.path(path, data))
# Extract metadata from Seurat object
imc = imc@meta.data
# Create unique cell identifiers
imc$cell_ID = paste0(imc$ROIs_Short, '_', rownames(imc))
imc$ROIs_Short = as.character(imc$ROIs_Short)
rownames(imc) = imc$cell_ID
Then, we can call the Kandinsky function
prepare_seurat_other()
using the following parameters:
stitch = TRUE
: to merge and align x/y coordinates
from multiple regions of interest (ROIs) acquired at different time
points in a new unique coordinate system
sample.id
: to specify the ROIs identifiers used as
sample IDs
xcoord
and ycoord
: metadata column to
use for retrieving coordinates
This function will create a new Seurat object
in a
format compatible with downstream Kandinsky analysis.
# Define columns corresponding to IMC protein measurements
markers = colnames(imc)[11:48]
markers = markers[str_detect(markers,'DNA|In1|Dy163|La139', negate = TRUE)]
# Create new Seurat object (merge coordinates from different ROIs setting stitch=T)
seurat = prepare_seurat_other(imc
, markers.ids = markers
, stitch = TRUE
, sample.id = 'ROIs_Short'
, xcoord = 'centroid-0'
, ycoord = 'centroid-1')
The new Seurat object
will contain two new columns
x_global
and y_global
as part of its metadata,
reporting the updated stitched x/y cell coordinates
from all ROIs.
Now that the new Seurat object
is ready, we can create
its associated Kandinsky data using the kandinsky_init()
function.
Since IMC is not part of the list of pre-defined spatial technologies
(CosMx, Visium/VisiumHD, Xenium, MERSCOPE), we set the tech
parameter to other
.
Additionally, we specify nb.method = 'K'
, with
k = 20
to create cell neighbourhoods considering the 20
closest neighbours to each cell using the K-nearest neighbour
(KNN) method.
Cell centroid coordinates are defined using the arguments
xcoord_other
and ycoord_other
.
# Create Kandinsky data
seurat = kandinsky_init(seurat
, tech = 'other'
, k = 20
, nb.method = 'K'
, xcoord_other = 'x_global'
, ycoord_other = 'y_global')
#> Building Kandinsky slot "sf"...
#> Building Kandinsky slot "nb"...
#> Building Kandinsky slot "tx"...
#> Building Kandinsky slot "fov_mask"...
#> Kandinsky data loaded into Seurat object!
seurat$Cell_Types = as.character(seurat$Cell_Types)
seurat$Neighborhoods = as.character(seurat$Neighborhoods)
We can now visualize a subset of our samples using the Kandinsky
KanPlot()
function, with cells colored according to their
cell type of origin described in Sussman
et al.:
samples = c('P1-R1','P4-R1','P6-R3','P8-R2')
pal = c('#ADE2D0','#FF6F00','#84D7E1','#C71000','#FF6348','#008EA0','#FF95A8','#3D3B25','#8A4198','#5A9599')
pal2 = c('#4BB749','#D42B91','#E2A725','#00807E','#8DC1E9','#ECE949','#44AA99','#999933','#864E9F','#FAC0B2')
p1 = lapply(samples, function(x){
KanPlot(seurat[,seurat$ROIs_Short == x]
, feature = 'Neighborhoods'
, pt.size = 0.25
, palette_discrete = pal2
, pt.shape = 16)
})
#> Updating Kandinsky data
#> Updating Kandinsky data
#> Updating Kandinsky data
#> Updating Kandinsky data
p1 = (wrap_plots(p1, ncol = 6, guides = 'collect') +
plot_annotation('PDAC Niches'
, theme = theme(plot.title = element_text(hjust = 0.5)))) & theme(legend.position = 'none')
p1
Having defined cell-specific neighbourhoods with Kandinksy, we can now performed an unsupervised model-based clustering of all cells across PDAC samples.
We start by defining the single cell neighbourhood
composition (in terms of cell types) using the
nnMat()
function.
This will create and store a neighbourhood composition matrix in the
Kandinsky slot, which counts the number of cells of each cell type
(annotated in the metadata variable specified through the
label
parameter), contributing to each cell
neighbourhood.
The nbCluster()
function will then be used to perform
unsupervised clustering of single cell based on the similarity of their
neighbourhood composition.
This function relies on the clustering method implemented in the R
package mclust.
The argument n_clust
in nbCluster()
allows
specifying one or more cluster numbers to test, and the optimal number
of cluster will be selected according to the Bayesian information
criterion (BIC).
For our analysis, we can directly set n_clust = 10
to
compare our clustering results with the PDAC niches described in Sussman
et al..
The Kandinsky nbCluster()
function will create a new
cluster annotation column in the Seurat metadata, labelled according to
the parameter used to initially define Kandinsky neighbourhood
(nb.method_parameter
; in our case, K_20
)
# Replicate K-nearest neighbour niches (n=10) from the original paper (only clustering algorithm will change)
seurat = nnMat(seurat, label = 'Cell_Types')
#> Creating neighbour matrix using pre-defined neighbour network
seurat = nbCluster(seurat
, n_clust = 10)
#> Performing model-based clustering using mclust...
#> fitting ...
#> | | | 0% | |===== | 7% | |========= | 13% | |============== | 20% | |=================== | 27% | |======================= | 33% | |============================ | 40% | |================================= | 47% | |===================================== | 53% | |========================================== | 60% | |=============================================== | 67% | |=================================================== | 73% | |======================================================== | 80% | |============================================================= | 87% | |================================================================= | 93% | |======================================================================| 100%
seurat$nbClust.K_20 = factor(seurat$nbClust.K_20
, levels = c(9, 1, 6, 10, 5, 2, 7, 8, 3, 4)
, labels = c(1:10))
We can now visually compare the spatial distribution of the new c-NB clusters with the PDAC niches described in the original study by Sussman et al.:
# Visualize Kandinsky clusters
pal = c('#ADE2D0','#FF6F00','#84D7E1','#C71000','#FF6348','#008EA0','#FF95A8','#3D3B25','#8A4198','#5A9599')
pal2 = c('#4BB749','#D42B91','#E2A725','#00807E','#8DC1E9','#ECE949','#44AA99','#999933','#864E9F','#FAC0B2')
p2 = lapply(samples,function(x){
KanPlot(seurat[,seurat$ROIs_Short %in% x]
, feature = 'nbClust.K_20'
, pt.size = 0.25
, palette_discrete = pal
, pt.shape = 16) +
scale_color_manual(values = setNames(pal, sort(unique(as.numeric(seurat$nbClust.K_20)))))
})
#> Updating Kandinsky data
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#>
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#>
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#>
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
p2 = (wrap_plots(p2, ncol = 6, guides = 'collect') +
plot_annotation('c-NB Clusters'
, theme = theme(plot.title = element_text(hjust = 0.5)))) & theme(legend.position = 'none')
p2
ari = mclust::adjustedRandIndex(as.integer(as.factor(seurat$Neighborhoods)), as.integer(as.factor(seurat$nbClust.K_20)))
message('Adjusted Rand Index (ARI) between annotations: ', round(ari,2))
#> Adjusted Rand Index (ARI) between annotations: 0.41
The adjusted Rand index (ARI) of 0.41 obtained by comparing the two cluster annotations suggests a similar grouping of cells observed in both c-NB clusters and PDAC niches.
We can visualize the overlap between different cell annotations, such as cell types and Kandinsky c-NB groups, via one-tailed Fisher’s exact test.
Specifically, we will test the enrichment within each c-NB group of different PDAC niches and cell types.
The PropPlot()
function creates a stacked barplot
showing the proportion of each class of the annotation layer
var.2
within each class of the annotation layer
var.1
. This can be used to help the interpretation of
Fisher’s test enrichment results.
# Check PDAC niches and cell type enrichment within Kandinsky clusters
# Reorder levels of cell types and PDAC niches for better visualization
seurat$Cell_Types = factor(seurat$Cell_Types
, levels = rev(c('Alpha Cells', 'Beta Cells', 'Delta Cells'
, 'CD11b+ Granulocytes', 'Proliferating Tumor/Ductal Cells'
, 'Stromal/ECM-2', 'Tumor/Ductal Cells', 'Stromal/ECM-1', 'CD68+ Macrophages'
, 'Endothelial/Vascular Cells', 'CD68+ CD44+ Macrophages', 'Proliferating Macrophages'
, 'CD8+ T Cells', 'CD4+ T Cells', 'B Cells', 'Artifact'))
)
seurat$Neighborhoods = factor(seurat$Neighborhoods
, levels = c("Endocrine", "Granulocyte Enriched", "Proliferating Tumor", "Stromal Enriched-2"
, "Tumor/Ductal Enriched", "Tumor/Stromal Mixed", "Vascular", "Stromal Enriched-1"
, "Macrophage Enriched", "Lymphoid Enriched"))
prop = PropPlot(seurat
, var.1 = 'nbClust.K_20'
, var.2 = 'Neighborhoods'
, cols = setNames(object = pal2, nm = sort(unique(as.character(seurat$Neighborhoods)))))
clusts =model.matrix(data = seurat@meta.data, ~ 0 + nbClust.K_20) %>% as.data.frame()
colnames(clusts) = gsub('nbClust.K_20','',colnames(clusts))
niches = model.matrix(data = seurat@meta.data, ~ 0 + Neighborhoods) %>% as.data.frame()
colnames(niches) = gsub('Neighborhoods','',colnames(niches))
ctypes = model.matrix(data = seurat@meta.data, ~ 0 + Cell_Types) %>% as.data.frame()
colnames(ctypes) = gsub('Cell_Types','',colnames(ctypes))
##Test overlap between PDAC spatial niches and c-NB groups
overlaps = list()
for(c in colnames(clusts)){
overlap_c = list()
for(n in colnames(niches)){
prop = prop.table(table(clusts[[c]],niches[[n]]),1) %>% as.data.frame() %>% filter(Var1==1,Var2==1)
prop = prop$Freq
pval = fisher.test(table(clusts[[c]],niches[[n]]),alternative='greater')$p.value
res = data.frame(niches=n,cluster=c,overlap=prop,pval=pval)
overlap_c[[n]]=res
}
overlap_c = purrr::reduce(overlap_c,rbind)
overlap_c$padj = p.adjust(overlap_c$pval,method='BH')
overlaps[[c]] = overlap_c
}
overlaps = purrr::reduce(overlaps,rbind)
overlaps$label = ifelse(overlaps$padj < 0.01,"*","")
overlaps$niches = factor(overlaps$niches,levels = levels(seurat$Neighborhoods))
overlaps$cluster = factor(overlaps$cluster,levels=levels(seurat$nbClust.K_20))
ggplot(overlaps, aes(x = .data[["cluster"]], y = .data[["niches"]],
fill = .data[["overlap"]])) + theme_classic() + geom_tile(color = "black") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.text = element_text(color = "black"), axis.line = element_blank(),
axis.ticks = element_blank()) + labs(fill = "Proportion") +
labs(x = 'c-NB groups', y = 'PDAC Niche') + coord_fixed(ratio = 1)+scale_fill_gradient2(low = "white", high = "red",
midpoint =0, limits = c(0,round(max(overlaps$overlap),2)), breaks=c(0,round(max(overlaps$overlap),2)), oob = scales::squish)+
scale_y_discrete(limits=rev)+
geom_text(stat = "identity", aes(label = .data[["label"]]),size = 4)
##Test overlap between PDAC cell types and c-NB groups
overlaps2 = list()
for(c in colnames(clusts)){
overlap_c = list()
for(ct in colnames(ctypes)){
prop = prop.table(table(clusts[[c]],ctypes[[ct]]),1) %>% as.data.frame() %>% filter(Var1==1,Var2==1)
prop = prop$Freq
pval = fisher.test(table(clusts[[c]],ctypes[[ct]]),alternative='greater')$p.value
res = data.frame(cell_type=ct,cluster=c,overlap=prop,pval=pval)
overlap_c[[ct]]=res
}
overlap_c = purrr::reduce(overlap_c,rbind)
overlap_c$padj = p.adjust(overlap_c$pval,method='BH')
overlaps2[[c]] = overlap_c
}
overlaps2 = purrr::reduce(overlaps2,rbind)
overlaps2$label = ifelse(overlaps2$padj < 0.01,"*","")
overlaps2$cell_type = factor(overlaps2$cell_type,levels = levels(seurat$Cell_Types))
overlaps2$cluster = factor(overlaps2$cluster,levels=levels(seurat$nbClust.K_20))
ggplot(overlaps2, aes(x = .data[["cluster"]], y = .data[["cell_type"]],
fill = .data[["overlap"]])) + theme_classic() + geom_tile(color = "black") +
theme(#axis.text.x = element_text( hjust = 1),
axis.text = element_text(color = "black"), axis.line = element_blank(),
axis.ticks = element_blank()) + labs(fill = "Proportion") +
labs(x = 'c-NB groups', y = 'Cell Type') + coord_fixed(ratio = 1)+scale_fill_gradient2(low = "white", high = "red",
midpoint =0, limits = c(0,round(max(overlaps2$overlap),2)), breaks = c(0,round(max(overlaps2$overlap),2)),oob = scales::squish)+
geom_text(stat = "identity", aes(label = .data[["label"]]),size = 4)
Previous enrichment analysis suggests that Kandinsky clusters 9 and 10 are significantly enriched for the Lymphoid enriched PDAC Niche, with both clusters showing enrichment in B-cells.
However, the composition of cluster 9 seems to be more strongly
influenced by B-cells compared to cluster 10.
We can compare B-cell proportions measured in these two clusters using
Fisher’s exact test:
# Compare proportion of B cells in Kandinsky cluster 9 and 10 (both enriched for B cells)
imc_meta = seurat@meta.data
imc_meta$is_Bcell = ifelse(seurat$Cell_Types == 'B Cells','Y','N')
imc_meta$is_Bcell = factor(imc_meta$is_Bcell, levels = c('Y','N'))
fisher_data = imc_meta[imc_meta$nbClust.K_20 %in% c(9,10),] %>%
mutate(nbClust.K_20 = droplevels(nbClust.K_20))
fisher_pval = fisher.test(fisher_data$nbClust.K_20, fisher_data$is_Bcell)$p.value
if(fisher_pval < 2.2e-16){
fisher_pval = 'Fisher pvalue < 2.2e-16'
}else{
fisher_pval = paste0('Fisher pvalue = ',format(fisher_pval, scientific = TRUE, digits = 3) )
}
cluster910_prop = fisher_data %>%
group_by(nbClust.K_20) %>%
count(is_Bcell) %>%
mutate(prop = n/sum(n), percentage = prop*100) %>%
ungroup()
tot_cells = cluster910_prop %>%
group_by(nbClust.K_20) %>%
summarise(n = sum(n)) %>%
ungroup()
ggplot(cluster910_prop %>% filter(is_Bcell == 'Y'), aes(x = nbClust.K_20, y = percentage, fill = nbClust.K_20))+
theme_classic() + geom_bar(stat = 'identity', color = 'black') + scale_fill_manual(values = c('9' = '#8A4198','10' = '#5A9599'))+
scale_y_continuous(breaks = c(0, cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage)
, labels = c(0,round(cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage,1)))+
annotate('text', x = 1.5, y = max(cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage+5), label = fisher_pval)+
guides(x = guide_axis(cap = "both"),y = guide_axis(cap = "both"))+
theme(axis.line = element_line(linewidth = 0.5, lineend = "round"),
axis.ticks = element_line(linewidth = 0.5))+
scale_x_discrete(labels = paste0(unique(cluster910_prop$nbClust.K_20), paste0('\n(',(tot_cells$n),')')))+
labs(x = 'Kandinsky clusters', y = 'B Cells (%)')
Indeed, cluster 9 shows a significantly higher percentage of B-cells (~35%) compared to cluster 10 (~5%).