scRNA-seq+scATAC-seq integration

This tutorial shows loading, preprocessing, VIPCCA integration and visualization of scRNA-seq and scATAC-seq dataset.

we obtained a scRNA-seq data consisting of gene expression measurements on 33,538 genes in 11,769 cells and a scATAC-seq data consisting of 89,796 open chromatin peaks on 8,728 nuclei, both were produced by 10X Genomics Chromium system and were on PBMCs.

Preprocessing with R

For the scRNA-seq data: Seurat have previously pre-processed and clustered a scRNA-seq dataset and annotate 13 celltype, and provide the object here. The 13 cell types include 460 B cell progenitor, 2,992 CD14+ Monocytes, 328 CD16+ Monocytes, 1,596 CD4 Memory, 1,047 CD4 Naïve, 383 CD8 effector, 337 CD8 Naïve, 74 Dendritic cell, 592 Double negative T cell, 544 NK cell, 68 pDC, 52 Plateletes, and 599 pre-B cell.

For the scATAC-seq data: First, we load in the provided peak matrix and collapse the peak matrix to a “gene activity matrix” using Signac. Then we filtered out cells that have with fewer than 5,000 total peak counts to focus on a final set of 7,866 cells for analysis. See the Signac website for tutorial and documentation for analysing scATAC-seq data.

After preprocessing, you can converting Seurat Object to AnnData via h5Seurat using R packages (SeuratDisk). In this case, the atac.h5ad file will be generated in the corresponding path.

library(SeuratDisk)
SaveH5Seurat(atac, filename = "atac.h5Seurat")
Convert("atac.h5Seurat", dest = "h5ad")

Besides, you can also directly download the processed scRNA-seq dataset and scATAC-seq dataset files in H5ad format.

Importing vipcca package

[1]:
import vipcca.model.vipcca as vip
import vipcca.tools.utils as tl
import vipcca.tools.plotting as pl
import vipcca.tools.transferLabel as tfl
import scanpy as sc
from sklearn.preprocessing import LabelEncoder

import matplotlib
matplotlib.use('TkAgg')

# Command for Jupyter notebooks only
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from matplotlib.axes._axes import _log as matplotlib_axes_logger
matplotlib_axes_logger.setLevel('ERROR')

Loading data in python

[2]:
adata_rna = tl.read_sc_data("./rna.h5ad")
adata_atac = tl.read_sc_data("./geneact.h5ad")

Data preprocessing

Here, we filter and normalize each data separately and concatenate them into one AnnData object. For more details, please check the preprocessing API.

[3]:
adata_all= tl.preprocessing([adata_rna, adata_atac])
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.

VIPCCA Integration

[4]:
# Command for Jupyter notebooks only
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

# The four hyperparameters epochs, lambda_regulizer, batch_input_size, batch_input_size2 are user-defined parameters.
# Given a dataset with ~10K cells, we suggest to pick up
# a value larger than 600 for epochs,
# a value in the range of [2,10] for lambda_regulizer,
# a value at least less than the number of input features for batch_input_size,
# a value in the range of [8,16] for batch_input_size2.
handle = vip.VIPCCA(adata_all=adata_all,
                           res_path='./atac_result/',
                           mode='CVAE',
                           split_by="_batch",
                           epochs=20,
                           lambda_regulizer=2,
                           batch_input_size=64,
                           batch_input_size2=14,
                           )
adata_integrate=handle.fit_integrate()
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
Model: "vae_mlp"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
encoder_input (InputLayer)      [(None, 2000)]       0
__________________________________________________________________________________________________
batch_input1 (InputLayer)       [(None, 64)]         0
__________________________________________________________________________________________________
encoder_mlp (Functional)        [(None, 16), (None,  285088      encoder_input[0][0]
                                                                 batch_input1[0][0]
__________________________________________________________________________________________________
batch_input2 (InputLayer)       [(None, 14)]         0
__________________________________________________________________________________________________
decoder_mlp (Functional)        (None, 2000)         542748      encoder_mlp[0][2]
                                                                 batch_input2[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 64)           4160        batch_input1[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 64)           192         dense[0][0]
__________________________________________________________________________________________________
activation (Activation)         (None, 64)           0           batch_normalization[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 64)           4160        activation[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 64)           192         dense_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64)           0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 2064)         0           encoder_input[0][0]
                                                                 activation_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 128)          264320      concatenate[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 128)          384         dense_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 128)          0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
dropout (Dropout)               (None, 128)          0           activation_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 64)           8256        dropout[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 64)           192         dense_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 64)           0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 64)           0           activation_3[0][0]
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           2080        dropout_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32)           96          dense_4[0][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda)   (None, 2000)         0           encoder_input[0][0]
                                                                 decoder_mlp[0][0]
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 32)           0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean (TFOpLambda (1, 1)               0           tf.math.subtract[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 32)           0           activation_4[0][0]
__________________________________________________________________________________________________
tf.math.subtract_1 (TFOpLambda) (None, 2000)         0           tf.math.subtract[0][0]
                                                                 tf.math.reduce_mean[0][0]
__________________________________________________________________________________________________
encoder_log_var (Dense)         (None, 16)           528         dropout_2[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense)            (None, 16)           528         dropout_2[0][0]
__________________________________________________________________________________________________
tf.convert_to_tensor (TFOpLambd (None, 2000)         0           decoder_mlp[0][0]
__________________________________________________________________________________________________
tf.cast (TFOpLambda)            (None, 2000)         0           encoder_input[0][0]
__________________________________________________________________________________________________
tf.math.square (TFOpLambda)     (None, 2000)         0           tf.math.subtract_1[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 16)           0           encoder_log_var[0][0]
__________________________________________________________________________________________________
tf.math.square_1 (TFOpLambda)   (None, 16)           0           encoder_mean[0][0]
__________________________________________________________________________________________________
tf.math.squared_difference (TFO (None, 2000)         0           tf.convert_to_tensor[0][0]
                                                                 tf.cast[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_1 (TFOpLamb ()                   0           tf.math.square[0][0]
__________________________________________________________________________________________________
tf.math.subtract_2 (TFOpLambda) (None, 16)           0           tf.__operators__.add_1[0][0]
                                                                 tf.math.square_1[0][0]
__________________________________________________________________________________________________
tf.math.exp (TFOpLambda)        (None, 16)           0           encoder_log_var[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_2 (TFOpLamb (None,)              0           tf.math.squared_difference[0][0]
__________________________________________________________________________________________________
tf.math.truediv (TFOpLambda)    ()                   0           tf.math.reduce_mean_1[0][0]
__________________________________________________________________________________________________
tf.math.log (TFOpLambda)        ()                   0           tf.math.reduce_mean_1[0][0]
__________________________________________________________________________________________________
tf.math.subtract_3 (TFOpLambda) (None, 16)           0           tf.math.subtract_2[0][0]
                                                                 tf.math.exp[0][0]
__________________________________________________________________________________________________
tf.math.multiply (TFOpLambda)   (None,)              0           tf.math.reduce_mean_2[0][0]
                                                                 tf.math.truediv[0][0]
__________________________________________________________________________________________________
tf.math.multiply_1 (TFOpLambda) ()                   0           tf.math.log[0][0]
__________________________________________________________________________________________________
tf.math.reduce_sum (TFOpLambda) (None,)              0           tf.math.subtract_3[0][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None,)              0           tf.math.multiply[0][0]
                                                                 tf.math.multiply_1[0][0]
__________________________________________________________________________________________________
tf.math.multiply_2 (TFOpLambda) (None,)              0           tf.math.reduce_sum[0][0]
__________________________________________________________________________________________________
tf.math.multiply_3 (TFOpLambda) (None,)              0           tf.__operators__.add[0][0]
__________________________________________________________________________________________________
tf.math.multiply_4 (TFOpLambda) (None,)              0           tf.math.multiply_2[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_2 (TFOpLam (None,)              0           tf.math.multiply_3[0][0]
                                                                 tf.math.multiply_4[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_3 (TFOpLamb ()                   0           tf.__operators__.add_2[0][0]
__________________________________________________________________________________________________
add_loss (AddLoss)              ()                   0           tf.math.reduce_mean_3[0][0]
==================================================================================================
Total params: 827,836
Trainable params: 826,080
Non-trainable params: 1,756
__________________________________________________________________________________________________
Epoch 1/20
68/68 [==============================] - 5s 24ms/step - loss: 5699.5386
Epoch 2/20
68/68 [==============================] - 1s 22ms/step - loss: 5387.9050
Epoch 3/20
68/68 [==============================] - 2s 28ms/step - loss: 5354.2480
Epoch 4/20
68/68 [==============================] - 2s 23ms/step - loss: 5326.5728
Epoch 5/20
68/68 [==============================] - 2s 25ms/step - loss: 5316.6463
Epoch 6/20
68/68 [==============================] - 1s 22ms/step - loss: 5305.0356
Epoch 7/20
68/68 [==============================] - 1s 19ms/step - loss: 5304.1235
Epoch 8/20
68/68 [==============================] - 1s 19ms/step - loss: 5302.7018
Epoch 9/20
68/68 [==============================] - 1s 19ms/step - loss: 5290.4808
Epoch 10/20
68/68 [==============================] - 1s 19ms/step - loss: 5281.9169
Epoch 11/20
68/68 [==============================] - 1s 19ms/step - loss: 5289.3734
Epoch 12/20
68/68 [==============================] - 1s 20ms/step - loss: 5285.7788
Epoch 13/20
68/68 [==============================] - 1s 19ms/step - loss: 5285.3835
Epoch 14/20
68/68 [==============================] - 1s 18ms/step - loss: 5288.9461
Epoch 15/20
68/68 [==============================] - 1s 19ms/step - loss: 5277.2054
Epoch 16/20
68/68 [==============================] - 1s 20ms/step - loss: 5281.0600
Epoch 17/20
68/68 [==============================] - 1s 19ms/step - loss: 5281.3589
Epoch 18/20
68/68 [==============================] - 1s 20ms/step - loss: 5275.7597
Epoch 19/20
68/68 [==============================] - 1s 19ms/step - loss: 5276.1251
Epoch 20/20
68/68 [==============================] - 1s 20ms/step - loss: 5279.9665
WARNING:tensorflow:Found duplicated `Variable`s in Model's `weights`. This is usually caused by `Variable`s being shared by Layers in the Model. These `Variable`s will be treated as separate `Variable`s when the Model is restored. To avoid this, please save with `save_format="tf"`.
... storing '_batch' as categorical
... storing 'celltype' as categorical
... storing 'tech' as categorical

Cell type prediction

[5]:
atac=tfl.findNeighbors(adata_integrate)

store celltype in adata_integrate.obs[‘celltype’]

[6]:
adata_atac.obs['celltype'] = atac['celltype']
adata = adata_rna.concatenate(adata_atac)
adata_integrate.obs['celltype'] = adata.obs['celltype']

Loading result

Loading result from h5ad file: The output.h5ad file of the trained result can be downloaded here

[7]:
adata_integrate = tl.read_sc_data('./atac_result/output_save.h5ad')

UMAP Visualization

[8]:
sc.pp.neighbors(adata_integrate, use_rep='X_vipcca')
sc.tl.umap(adata_integrate)

sc.set_figure_params(figsize=[5.5, 4.5])
sc.pl.umap(adata_integrate, color=['_batch', 'celltype'])

../_images/tutorials_vipcca_atac_tutorial_18_0.png