Introduction


The following code replicates the analyses shown in the Kandinsky paper aiming to identify:


CosMx


The CosMx dataset analysed in this vignette has been downloaded from the CosMx SMI Datasets webpage available at the NanoString website.

It consists of 18 fields of view (FOVs) from a human pancreas FFPE tissue, profiled with the CosMx Human Whole Transcriptome panel.

Cell type annotation file obtained via the R package InSituType was also provided by NanoString together with Cosmx data.


1) Loading data


In order to work with CosMx data using Kandinsky, we first need to set the path to the folder containing CosMx raw input data:

path = 'Kandinsky/test_data/CosMx_test_dataset_WTX'
list.files(path)
#>  [1] "CellComposite"                    "CellLabels"                      
#>  [3] "CellOverlay"                      "CellType_Accessory_Data"         
#>  [5] "CompartmentLabels"                "Pancreas_exprMat_file.csv"       
#>  [7] "Pancreas_fov_positions_file.csv"  "Pancreas_metadata_file.csv"      
#>  [9] "Pancreas_tx_file.csv"             "Pancreas-CosMx-ReadMe.html"      
#> [11] "Pancreas-CosMx-WTx-FlatFiles.zip" "Pancreas-polygons.csv"

Now we can import and load CosMx data into a Seurat object using the Kandinsky prepare_cosmx_seurat() function specifying:

  • the path to the input folder

  • the dataset.id which will be used to name the resulting Seurat object

Additional optional parameters are:

  • pattern: to selectively import only files whose filenames contain a specific string, useful when the folder includes input files from multiple datasets

  • fovs: to load only the cells located within a specified list of FOVs. To load the full dataset, you can set fovs = NULL (default)

# Load CosMX data
cosmx = prepare_cosmx_seurat(path = path, dataset.id = 'Pancreas', pattern = 'Pancreas')
#> Preparing gene expression matrix...
#> Loading cell metadata...
#> Loading cell polygon file...
#> Creating Seurat object...
#> Locating cell transcript file...
#> Loading FOV position file...


2) c-NB identification


Now that the new Seurat object is ready, we use Kandinsky to identify cell neighborhoods based on cell-to-cell relationships using the kandinsky_init() function.
Since we are working with CosMx data, we set the parameter tech = 'cosmx' to let Kandinsky correctly interpret the CosMx data format.

For neighborhood identification, we choose the membrane distance method (nb.method = 'M') and set the distance threshold to 0um (d.max = 0). In this way, we will define c-NBs based on the presence of physical contact between cells.

# Initialize Kandinsky (neighbour method: membrane distance = 0um)
cosmx = kandinsky_init(cosmx, tech = 'cosmx', nb.method = 'M', d.max = 0)
#> Building Kandinsky slot "sf"...
#> Building Kandinsky slot "nb"...
#> Building Kandinsky slot "tx"...
#> Building Kandinsky slot "fov_mask"...
#> Kandinsky data loaded into Seurat object!


3) Loading cell type annotation


We can load the annotation file and store the cell type information within the Seurat object metadata. The dataset includes various cell populations from the pancreatic islets (Alpha, beta, delta, gamma, epsilon cells).

We will create a new annotation field islet to label all cells belonging to any of the aforementioned islet cell types.

insitu_anno = readRDS("Kandinsky/data/CosMx_test_dataset_WTX/CellType_Accessory_Data/Pancreas_celltype_InSituType.rds")
rownames(insitu_anno) = insitu_anno$cell_ID
insitu_anno = insitu_anno[cosmx$cell,]
insitu_anno[is.na(insitu_anno$cell_types),]$cell_types = 'Unknown'
cosmx$cell_types = insitu_anno$cell_types
cosmx$islet = ifelse(stringr::str_detect(cosmx$cell_types,'Alpha|Beta|Gamma|Delta|Epsilon'), 'Y', 'N')

We can now visualize a subset of our dataset (from FOV 51 to 59) using the Kandinsky function KanPlot(), with cells coloured according to their annotated cell type:

pancreas_pal = c("#AEDCED", "#CC6677", "#DDCC77", "#117733", "#7E7ECE", "#AA4499", "#44AA99", "#999933", "#882255","#661100", "#6699CC", "#888888")
p1 = KanPlot(subset(cosmx,subset = fov <= 59)
             , feature = 'cell_types'
             , theme = 'dark', bg_img = F, palette_discrete = pancreas_pal
             , show_border = T, border.size = 0.1) + 
  labs(fill = 'Cell Type')
#> Updating Kandinsky data
p1

Before continuing with the downstream analysis, we need to perform gene expression normalization using Seurat function NormalizeData():

# Normalize counts
cosmx = NormalizeData(cosmx)
#> Normalizing layer: counts


4) c-NB based composition


Acinar cells proximal to islets are known to up-regulate trypsin and other digestive enzymes {Egozi, 2020} and undergo acinar-to-beta cell reprogramming {Dahiya, 2024}.

To assess this regulatory mechanism, we can use Kandinsky nnMat() function to build a single-cell neighborhood composition matrix.
We specify the label argument to indicate the column containing the cell type annotations.

The resulting matrix will be stored within the nnMat Kandinsky slot of the Seurat object. This matrix will include as columns:

  • cell type labels

  • the number of cells of each cell type contributing to each cell’s neighbourhood

  • the total number of neighbours per cell (tot_nn)

We can inspect the composition matrix using the command KanData(cosmx,'nnMat'). This will return the matrix along with the neighbourhood identification method and the name of the cell annotation column.

To retrieve only the composition matrix, we use KanData(cosmx,'nnMat')$nnMat.

Alternatively, if we prefer not to store the result inside the Seurat object, we can call the function nnMat() specifying return.seurat = F to return the matrix directly.

## CENTROID DISTANCE ##
# Create neighbour matrix using default neighbour approach previously defined in `kandinsky_init()`
cosmx = nnMat(cosmx, label = 'cell_types', return.seurat = T)
#> Creating neighbour matrix using pre-defined neighbour network
knitr::kable(head(KanData(cosmx,'nnMat')$nnMat) )
Acinar.1 Acinar.2 Active stellate Alpha cells Beta cells Delta cells Ductal Epsilon cells Gamma cells Macrophage Quiescent stellate Unknown tot_nn cell_types
51_1 1 0 0 0 0 0 0 0 0 1 0 0 2 Macrophage
51_2 2 0 0 0 0 0 0 0 0 0 0 0 2 Acinar.1
51_3 2 0 0 0 0 0 0 0 0 0 0 0 2 Acinar.1
51_4 4 0 0 0 0 0 0 0 0 0 0 0 4 Ductal
51_5 2 1 0 0 0 0 2 0 0 0 0 0 5 Acinar.1
51_6 1 1 0 0 0 0 1 0 0 0 0 0 3 Ductal

We can then identify peri-islet acinar cells having at least one islet cell in their neighbourhood, and annotate the remaining acinar cells as non peri-islet.

This neihbourhood-based (NB) selection of acinar cells can be performed using the Kandinsky nn_query() function, which internally calls dplyr::filter().

The main parameters of nn_query() are:

  • query: a string that will be parsed and passed to the internal filter() function. It must follow the syntax compatible with tidyverse functions.

  • anno.name: a string specyfing the name of the new column where the output annotation will be stored

# Search for acinar cells with at least 1 neighbours belonging to an islet cell type (alpha/beta/delta/gamma/epsilon)
cosmx = nn_query(cosmx
                 , query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)  & 
  cell_types %in% c("Acinar.1","Acinar.2")'
  , anno_name = 'Peri_islet')

The main output of nn_query is a new annotation column that will be stored both in the nnMat matrix and in the main Seurat metadata.
This column reports a boolean annotation (‘Y’,‘N’) for each cell, indicating whether its neighbourhood composition meets the selection criteria specified in the query argument.

For visualization purposes, we will rename ‘Y’ and ‘N’ cells into peri-islet and non peri-islet, respectively.
After renaming the cell groups, we can visualize the peri-islet and non peri-islet populations using KanPlot().

cosmx$Acinar_Type = ifelse(cosmx$Peri_islet == 'Y','peri-islet','non peri-islet')
cosmx$anno_plot = ifelse(cosmx$cell_types %in% c("Acinar.1","Acinar.2"),cosmx$Acinar_Type,
                         ifelse(cosmx$Peri_islet == 'N' & cosmx$islet=='Y','Islet','Other'))

p2 = KanPlot(subset(cosmx,subset = fov <= 59 & anno_plot %in% c('Islet','peri-islet','non peri-islet'))
             , feature = 'anno_plot'
             , theme='dark', bg_img = F, palette_discrete = c('#5F5FC4','#84C441','#7B1B38')
             , show_border = T, border.size = 0.1)
#> Updating Kandinsky data
p2

Additionally, by setting the diffexpr argument to TRUE, we can directly perform differential expression analysis between the newly defined ‘Y’ and ‘N’ cell populations.

Internally, the Seurat function FindMarkers() is used setting ‘Y’ and ‘N’ cell groups as ident.1 and ident.2, respectively.
Any additional parameter that can be set within FindMarkers() can also be directly specified within nn_query().

# Differential expression analysis between peri-islet and non peri-islet acinar cells#
de_m = nn_query(cosmx,query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)'
                , anno_name = 'Peri_islet'
                , diffexpr = T
                , label = 'cell_types'
                , which = c('Acinar.1','Acinar.2')
                , logfc.threshold = log2(1.1))
#> Differential expression between 532 Peri_islet-Y and 33465 Peri_islet-N cells


de_m$gene = rownames(de_m)

# Limit DE results due to contamination by setting a threshold on both the % of expressing cells in both acinar cell populations (pct.1/pct.2)
de_m = de_m %>% filter(p_val_adj < 0.01,pct.1 >= 0.2, pct.2 > 0.05)
de_m = de_m %>% arrange(desc(avg_log2FC))
de_m$gene = factor(de_m$gene, levels = de_m$gene)
de_m$FC = 2^(de_m$avg_log2FC)
de_m$sign = ifelse(de_m$FC > 1,'Up','Down')


5) Re-defining c-NB identitification


We can then repeat the same neighbourhood query and differential expression analysis procedure after changing our definition of cell neighbourhood. For instance, we can now define cell neighbourhood based on K-nearest neighbour (KNN) method (nb.method='K') and set the number of neighbours to 10 (k=10).

The change of neighbourhood definition can be made through nnMat(), although only the creation of the neighbourhood composition matrix will be affected.

To permanently change neighbourhood definition of our Seurat object and associated Kandinsky data for all downstream analysis, use the nb_update() Kandinsky function.

## K-NEAREST NEIGHBOUR ##
# Change neighbour approach to KNN (with K=10), then repeat DE analysis
cosmx = nnMat(cosmx, label = 'cell_types', method='K', k = 10, return.seurat = T)
#> Neighbour method: K_10


# Search for acinar cells with at least 1 neighbours belonging to an islet cell type (alpha/beta/delta/gamma/epsilon)
cosmx = nn_query(cosmx
                 , query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)  & 
                 cell_types %in% c("Acinar.1","Acinar.2")'
                 , anno_name = 'Peri_islet')


cosmx$Acinar_Type = ifelse(cosmx$Peri_islet == 'Y','peri-islet','non peri-islet')
cosmx$anno_plot = ifelse(cosmx$cell_types %in% c("Acinar.1","Acinar.2"),cosmx$Acinar_Type,
                         ifelse(cosmx$Peri_islet == 'N' & cosmx$islet == 'Y','Islet','Other'))


p3 = KanPlot(subset(cosmx,subset = fov <= 59 & anno_plot %in% c('Islet','peri-islet','non peri-islet'))
             , feature = 'anno_plot'
             , theme = 'light', bg_img = F, palette_discrete = c('gray40','#84C441','#7B1B38')
             , show_border = T, border.size = 0.1)
#> Updating Kandinsky data
p3


# Repeat DE
de_k = nn_query(cosmx
                , query = '(`Beta cells` >=1 | `Alpha cells` >=1 | `Delta cells` >=1 | `Gamma cells` >=1 | `Epsilon cells` >=1)'
                , anno_name = 'Peri_islet'
                , diffexpr = T, label = 'cell_types'
                , which = c('Acinar.1','Acinar.2'), logfc.threshold = log2(1.1))
#> Differential expression between 1395 Peri_islet-Y and 32602 Peri_islet-N cells


de_k = de_k %>% filter(p_val_adj < 0.01, pct.1 >= 0.2, pct.2 >= 0.05)
de_k$gene = rownames(de_k)
de_k = de_k %>% arrange(desc(avg_log2FC))
de_k$gene = factor(de_k$gene, levels = de_k$gene)
de_k$FC = 2^(de_k$avg_log2FC)
de_k$sign = ifelse(de_k$FC > 1, 'Up', 'Down')


6) Visualizing DE analysis results


We can now assess how the differential expression results between peri-islet and non peri-islet acinar cells have been affected by their associated neighbourhood definition.

We will plot all significant differentially expressed (DE) genes in a separate barplots for each neighbourhood definition, with genes ordered according to their associated fold change:

  • FC > 1 : gene is more expressed in peri-islet acinar cells;

  • FC < 1 : gene is more expressed in non per-islet acinar cells.

# DE genes barplot

p4 = ggplot(de_m, aes(y = avg_log2FC, x = gene, fill = sign))+
  theme_classic() +
  geom_bar(stat = 'identity')+
  scale_y_continuous(breaks = c(min(de_m$avg_log2FC),log2(1.1),log2(0.9), max(de_m$avg_log2FC)),
                     label = round(2^(c(min(de_m$avg_log2FC),log2(1.1),log2(0.9),max(de_m$avg_log2FC))), 2))+
  geom_hline(yintercept = log2(1.1),linetype='dashed')+
  geom_hline(yintercept = log2(0.9),linetype='dashed')+
  xlab('Gene') + ylab('FC') + labs(fill = 'DE type')+
  coord_flip() +
  scale_fill_manual(values = c('Up' = '#7B1B38', 'Down' = '#84C441'))+
  theme(axis.text = element_text(color = 'black'))

p5 = ggplot(de_k, aes(y = avg_log2FC, x = gene, fill = sign))+
  theme_classic() +
  geom_bar(stat = 'identity')+
  scale_y_continuous(breaks = c(min(de_k$avg_log2FC),log2(1.1),log2(0.9), max(de_k$avg_log2FC)),
                     label = round(2^(c(min(de_k$avg_log2FC),log2(1.1),log2(0.9),max(de_k$avg_log2FC))), 2))+
  geom_hline(yintercept = log2(1.1),linetype='dashed')+
  geom_hline(yintercept = log2(0.9),linetype='dashed')+
  xlab('Gene') + ylab('FC') + labs(fill = 'DE type')+
  coord_flip() +
  scale_fill_manual(values = c('Up' = '#7B1B38', 'Down' = '#84C441')) +
  theme(axis.text = element_text(color = 'black'))


  (p4 + ggtitle('method: physical contact'))+
  (p5 + ggtitle('method: KNN (k=10)'))+
  plot_layout(guides = 'collect', axis_titles = 'collect', ncol = 3)


IMC


The imaging mass cytometry (IMC) dataset of pancreatic ductal adenocarcinoma (PDAC) samples was described in Sussman et al..

The IMC PDAC dataset consists of a total of 34 regions of interest (ROIs) collected across 9 PDAC patients. The data can be downloaded from Zenodo using the following link


1) Loading data


In order to work with IMC PDAC data using Kandinsky, we first need to load the dataset:

path = 'Kandinsky/data/IMC/PDAC_Sussman_2024/'
# Specify the folder where IMC PDAC dataset has been downloaded
data = list.files(path, full.names = FALSE)[1]
data
#> [1] "PDAC_IMC_Seurat_FINAL.rds"

This dataset was originally saved as a Seurat object. However, if it was not produced with a compatible spatial technology, Kandinsky requires IMC data to be reported in a tabular format.
Therefore, we can extract the metadata associated with PDAC cells from the IMC Seurat object, including protein measurements and cell centroid coordinates:

imc = readRDS(file.path(path, data))

# Extract metadata from Seurat object
imc = imc@meta.data

# Create unique cell identifiers
imc$cell_ID = paste0(imc$ROIs_Short, '_', rownames(imc))
imc$ROIs_Short = as.character(imc$ROIs_Short)
rownames(imc) = imc$cell_ID

Then, we can call the Kandinsky function prepare_seurat_other() using the following parameters:

  • stitch = TRUE: to merge and align x/y coordinates from multiple regions of interest (ROIs) acquired at different time points in a new unique coordinate system

  • sample.id: to specify the ROIs identifiers used as sample IDs

  • xcoord and ycoord: metadata column to use for retrieving coordinates

This function will create a new Seurat object in a format compatible with downstream Kandinsky analysis.

# Define columns corresponding to IMC protein measurements
markers = colnames(imc)[11:48]
markers = markers[str_detect(markers,'DNA|In1|Dy163|La139', negate = TRUE)]

# Create new Seurat object (merge coordinates from different ROIs setting stitch=T)
seurat = prepare_seurat_other(imc
                              , markers.ids = markers
                              , stitch = TRUE
                              , sample.id = 'ROIs_Short'
                              , xcoord = 'centroid-0'
                              , ycoord = 'centroid-1')

The new Seurat object will contain two new columns x_global and y_global as part of its metadata, reporting the updated stitched x/y cell coordinates from all ROIs.


2) c-NB identification


Now that the new Seurat object is ready, we can create its associated Kandinsky data using the kandinsky_init() function.
Since IMC is not part of the list of pre-defined spatial technologies (CosMx, Visium/VisiumHD, Xenium, MERSCOPE), we set the tech parameter to other.

Additionally, we specify nb.method = 'K', with k = 20 to create cell neighbourhoods considering the 20 closest neighbours to each cell using the K-nearest neighbour (KNN) method.

Cell centroid coordinates are defined using the arguments xcoord_other and ycoord_other.

# Create Kandinsky data
seurat = kandinsky_init(seurat
                        , tech = 'other'
                        , k = 20
                        , nb.method = 'K'
                        , xcoord_other = 'x_global'
                        , ycoord_other = 'y_global')
#> Building Kandinsky slot "sf"...
#> Building Kandinsky slot "nb"...
#> Building Kandinsky slot "tx"...
#> Building Kandinsky slot "fov_mask"...
#> Kandinsky data loaded into Seurat object!

seurat$Cell_Types = as.character(seurat$Cell_Types)
seurat$Neighborhoods = as.character(seurat$Neighborhoods)

We can now visualize a subset of our samples using the Kandinsky KanPlot() function, with cells colored according to their cell type of origin described in Sussman et al.:

samples = c('P1-R1','P4-R1','P6-R3','P8-R2')

pal = c('#ADE2D0','#FF6F00','#84D7E1','#C71000','#FF6348','#008EA0','#FF95A8','#3D3B25','#8A4198','#5A9599')
pal2 = c('#4BB749','#D42B91','#E2A725','#00807E','#8DC1E9','#ECE949','#44AA99','#999933','#864E9F','#FAC0B2')

p1 = lapply(samples, function(x){
  KanPlot(seurat[,seurat$ROIs_Short == x]
          , feature = 'Neighborhoods'
          , pt.size = 0.25
          , palette_discrete = pal2
          , pt.shape = 16)
})
#> Updating Kandinsky data
#> Updating Kandinsky data
#> Updating Kandinsky data
#> Updating Kandinsky data

p1 = (wrap_plots(p1, ncol = 6, guides = 'collect') + 
        plot_annotation('PDAC Niches'
                        , theme = theme(plot.title = element_text(hjust = 0.5)))) & theme(legend.position = 'none')

p1


3) c-NB based composition


Having defined cell-specific neighbourhoods with Kandinksy, we can now performed an unsupervised model-based clustering of all cells across PDAC samples.

We start by defining the single cell neighbourhood composition (in terms of cell types) using the nnMat() function.

This will create and store a neighbourhood composition matrix in the Kandinsky slot, which counts the number of cells of each cell type (annotated in the metadata variable specified through the label parameter), contributing to each cell neighbourhood.

The nbCluster() function will then be used to perform unsupervised clustering of single cell based on the similarity of their neighbourhood composition.
This function relies on the clustering method implemented in the R package mclust.

The argument n_clust in nbCluster() allows specifying one or more cluster numbers to test, and the optimal number of cluster will be selected according to the Bayesian information criterion (BIC).

For our analysis, we can directly set n_clust = 10 to compare our clustering results with the PDAC niches described in Sussman et al..

The Kandinsky nbCluster() function will create a new cluster annotation column in the Seurat metadata, labelled according to the parameter used to initially define Kandinsky neighbourhood (nb.method_parameter; in our case, K_20)

# Replicate K-nearest neighbour niches (n=10) from the original paper (only clustering algorithm will change)
seurat = nnMat(seurat, label = 'Cell_Types')
#> Creating neighbour matrix using pre-defined neighbour network
seurat = nbCluster(seurat
                   , n_clust = 10)
#> Performing model-based clustering using mclust...
#> fitting ...
#>   |                                                                              |                                                                      |   0%  |                                                                              |=====                                                                 |   7%  |                                                                              |=========                                                             |  13%  |                                                                              |==============                                                        |  20%  |                                                                              |===================                                                   |  27%  |                                                                              |=======================                                               |  33%  |                                                                              |============================                                          |  40%  |                                                                              |=================================                                     |  47%  |                                                                              |=====================================                                 |  53%  |                                                                              |==========================================                            |  60%  |                                                                              |===============================================                       |  67%  |                                                                              |===================================================                   |  73%  |                                                                              |========================================================              |  80%  |                                                                              |=============================================================         |  87%  |                                                                              |=================================================================     |  93%  |                                                                              |======================================================================| 100%

seurat$nbClust.K_20 = factor(seurat$nbClust.K_20
                             , levels = c(9, 1, 6, 10, 5, 2, 7, 8, 3, 4)
                             , labels = c(1:10))

We can now visually compare the spatial distribution of the new c-NB clusters with the PDAC niches described in the original study by Sussman et al.:

# Visualize Kandinsky clusters

pal = c('#ADE2D0','#FF6F00','#84D7E1','#C71000','#FF6348','#008EA0','#FF95A8','#3D3B25','#8A4198','#5A9599')
pal2 = c('#4BB749','#D42B91','#E2A725','#00807E','#8DC1E9','#ECE949','#44AA99','#999933','#864E9F','#FAC0B2')


p2 = lapply(samples,function(x){
  KanPlot(seurat[,seurat$ROIs_Short %in% x]
          , feature = 'nbClust.K_20'
          , pt.size = 0.25
          , palette_discrete = pal
          , pt.shape = 16) + 
    scale_color_manual(values = setNames(pal, sort(unique(as.numeric(seurat$nbClust.K_20)))))
})
#> Updating Kandinsky data
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#> 
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#> 
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Updating Kandinsky data
#> 
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

p2 = (wrap_plots(p2, ncol = 6, guides = 'collect') + 
        plot_annotation('c-NB Clusters'
                        , theme = theme(plot.title = element_text(hjust = 0.5)))) & theme(legend.position = 'none')

p2


p1 / p2 & theme(legend.position = "none")

library(mclust)
ari = mclust::adjustedRandIndex(as.integer(as.factor(seurat$Neighborhoods)), as.integer(as.factor(seurat$nbClust.K_20)))
message('Adjusted Rand Index (ARI) between annotations: ', round(ari,2))
#> Adjusted Rand Index (ARI) between annotations: 0.41

The adjusted Rand index (ARI) of 0.41 obtained by comparing the two cluster annotations suggests a similar grouping of cells observed in both c-NB clusters and PDAC niches.


4) Compare overlap between Kandinsky c-NB clusters and PDAC niches and cell types


We can visualize the overlap between different cell annotations, such as cell types and Kandinsky c-NB groups, via one-tailed Fisher’s exact test.

Specifically, we will test the enrichment within each c-NB group of different PDAC niches and cell types.

The PropPlot() function creates a stacked barplot showing the proportion of each class of the annotation layer var.2 within each class of the annotation layer var.1. This can be used to help the interpretation of Fisher’s test enrichment results.

# Check PDAC niches and cell type enrichment within Kandinsky clusters

# Reorder levels of cell types and PDAC niches for better visualization
seurat$Cell_Types = factor(seurat$Cell_Types
                           , levels = rev(c('Alpha Cells', 'Beta Cells', 'Delta Cells'
                                            , 'CD11b+ Granulocytes', 'Proliferating Tumor/Ductal Cells'
                                            , 'Stromal/ECM-2', 'Tumor/Ductal Cells', 'Stromal/ECM-1', 'CD68+ Macrophages'   
                                            , 'Endothelial/Vascular Cells', 'CD68+ CD44+ Macrophages', 'Proliferating Macrophages'
                                            , 'CD8+ T Cells', 'CD4+ T Cells', 'B Cells', 'Artifact'))
)



seurat$Neighborhoods = factor(seurat$Neighborhoods
                              , levels = c("Endocrine", "Granulocyte Enriched", "Proliferating Tumor", "Stromal Enriched-2"
                                           , "Tumor/Ductal Enriched", "Tumor/Stromal Mixed", "Vascular", "Stromal Enriched-1"
                                           , "Macrophage Enriched", "Lymphoid Enriched"))

prop = PropPlot(seurat
                , var.1 = 'nbClust.K_20'
                , var.2 = 'Neighborhoods'
                , cols = setNames(object = pal2, nm = sort(unique(as.character(seurat$Neighborhoods)))))



clusts =model.matrix(data = seurat@meta.data, ~ 0 + nbClust.K_20) %>% as.data.frame()
colnames(clusts) = gsub('nbClust.K_20','',colnames(clusts))
niches = model.matrix(data = seurat@meta.data, ~ 0 + Neighborhoods) %>% as.data.frame()
colnames(niches) = gsub('Neighborhoods','',colnames(niches))

ctypes = model.matrix(data = seurat@meta.data, ~ 0 + Cell_Types) %>% as.data.frame()
colnames(ctypes) = gsub('Cell_Types','',colnames(ctypes))

##Test overlap between PDAC spatial niches and c-NB groups
overlaps = list()
for(c in colnames(clusts)){
  overlap_c = list()
  for(n in colnames(niches)){
    prop = prop.table(table(clusts[[c]],niches[[n]]),1) %>% as.data.frame() %>% filter(Var1==1,Var2==1)
    prop = prop$Freq
    pval = fisher.test(table(clusts[[c]],niches[[n]]),alternative='greater')$p.value
    res = data.frame(niches=n,cluster=c,overlap=prop,pval=pval)
    overlap_c[[n]]=res
  }
  overlap_c = purrr::reduce(overlap_c,rbind)
  overlap_c$padj = p.adjust(overlap_c$pval,method='BH')
  overlaps[[c]] = overlap_c
}
overlaps = purrr::reduce(overlaps,rbind)
overlaps$label = ifelse(overlaps$padj < 0.01,"*","")

overlaps$niches = factor(overlaps$niches,levels = levels(seurat$Neighborhoods))
overlaps$cluster = factor(overlaps$cluster,levels=levels(seurat$nbClust.K_20))


ggplot(overlaps, aes(x = .data[["cluster"]], y = .data[["niches"]], 
                     fill = .data[["overlap"]])) + theme_classic() + geom_tile(color = "black") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        axis.text = element_text(color = "black"), axis.line = element_blank(), 
        axis.ticks = element_blank()) + labs(fill = "Proportion") + 
  labs(x = 'c-NB groups', y = 'PDAC Niche') + coord_fixed(ratio = 1)+scale_fill_gradient2(low = "white", high = "red", 
                                                                                           midpoint =0, limits = c(0,round(max(overlaps$overlap),2)), breaks=c(0,round(max(overlaps$overlap),2)), oob = scales::squish)+
  scale_y_discrete(limits=rev)+
  geom_text(stat = "identity", aes(label = .data[["label"]]),size = 4)


##Test overlap between PDAC cell types and c-NB groups
overlaps2 = list()
for(c in colnames(clusts)){
  overlap_c = list()
  for(ct in colnames(ctypes)){
    prop = prop.table(table(clusts[[c]],ctypes[[ct]]),1) %>% as.data.frame() %>% filter(Var1==1,Var2==1)
    prop = prop$Freq
    pval = fisher.test(table(clusts[[c]],ctypes[[ct]]),alternative='greater')$p.value
    res = data.frame(cell_type=ct,cluster=c,overlap=prop,pval=pval)
    overlap_c[[ct]]=res
  }
  overlap_c = purrr::reduce(overlap_c,rbind)
  overlap_c$padj = p.adjust(overlap_c$pval,method='BH')
  overlaps2[[c]] = overlap_c
}
overlaps2 = purrr::reduce(overlaps2,rbind)
overlaps2$label = ifelse(overlaps2$padj < 0.01,"*","")

overlaps2$cell_type = factor(overlaps2$cell_type,levels = levels(seurat$Cell_Types))
overlaps2$cluster = factor(overlaps2$cluster,levels=levels(seurat$nbClust.K_20))


ggplot(overlaps2, aes(x = .data[["cluster"]], y = .data[["cell_type"]], 
                     fill = .data[["overlap"]])) + theme_classic() + geom_tile(color = "black") + 
  theme(#axis.text.x = element_text( hjust = 1), 
        axis.text = element_text(color = "black"), axis.line = element_blank(), 
        axis.ticks = element_blank()) + labs(fill = "Proportion") + 
  labs(x = 'c-NB groups', y = 'Cell Type') + coord_fixed(ratio = 1)+scale_fill_gradient2(low = "white", high = "red", 
                                                                                           midpoint =0, limits = c(0,round(max(overlaps2$overlap),2)), breaks = c(0,round(max(overlaps2$overlap),2)),oob = scales::squish)+
  geom_text(stat = "identity", aes(label = .data[["label"]]),size = 4)


5) Compare B-cell proportion between Kandinsky clusters


Previous enrichment analysis suggests that Kandinsky clusters 9 and 10 are significantly enriched for the Lymphoid enriched PDAC Niche, with both clusters showing enrichment in B-cells.

However, the composition of cluster 9 seems to be more strongly influenced by B-cells compared to cluster 10.
We can compare B-cell proportions measured in these two clusters using Fisher’s exact test:

# Compare proportion of B cells in Kandinsky cluster 9 and 10 (both enriched for B cells)

imc_meta = seurat@meta.data
imc_meta$is_Bcell = ifelse(seurat$Cell_Types == 'B Cells','Y','N')
imc_meta$is_Bcell = factor(imc_meta$is_Bcell, levels = c('Y','N'))

fisher_data = imc_meta[imc_meta$nbClust.K_20 %in% c(9,10),] %>% 
  mutate(nbClust.K_20 = droplevels(nbClust.K_20))

fisher_pval = fisher.test(fisher_data$nbClust.K_20, fisher_data$is_Bcell)$p.value

if(fisher_pval < 2.2e-16){
  fisher_pval = 'Fisher pvalue < 2.2e-16'
}else{
  fisher_pval = paste0('Fisher pvalue = ',format(fisher_pval, scientific = TRUE, digits = 3) )
}

cluster910_prop = fisher_data %>% 
  group_by(nbClust.K_20) %>% 
  count(is_Bcell) %>% 
  mutate(prop = n/sum(n), percentage = prop*100) %>% 
  ungroup() 

tot_cells = cluster910_prop %>% 
  group_by(nbClust.K_20) %>% 
  summarise(n = sum(n)) %>% 
  ungroup()

ggplot(cluster910_prop %>% filter(is_Bcell == 'Y'), aes(x = nbClust.K_20, y = percentage, fill = nbClust.K_20))+
  theme_classic() + geom_bar(stat = 'identity', color = 'black') + scale_fill_manual(values = c('9' = '#8A4198','10' = '#5A9599'))+
  scale_y_continuous(breaks = c(0, cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage)
                     , labels = c(0,round(cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage,1)))+
  annotate('text', x = 1.5, y = max(cluster910_prop[cluster910_prop$is_Bcell=='Y',]$percentage+5), label = fisher_pval)+
  guides(x = guide_axis(cap = "both"),y = guide_axis(cap = "both"))+
  theme(axis.line = element_line(linewidth = 0.5, lineend = "round"),
        axis.ticks = element_line(linewidth = 0.5))+
  scale_x_discrete(labels = paste0(unique(cluster910_prop$nbClust.K_20), paste0('\n(',(tot_cells$n),')')))+
  labs(x = 'Kandinsky clusters', y = 'B Cells (%)')

Indeed, cluster 9 shows a significantly higher percentage of B-cells (~35%) compared to cluster 10 (~5%).