API¶
Preprocessing¶
-
vipcca.tools.utils.
preprocessing
(datasets, min_cells=1, min_genes=1, n_top_genes=2000, mt_ratio=0.8, lognorm=True, hvg=True, index_unique=None)[source]¶ Preprocess and merge data sets from different batches
- Parameters
datasets (list, optional (default: None)) – the list of anndata objects from different batches
min_cells (int, optional (default: 1)) – Minimum number of counts required for a cell to pass filtering.
min_genes (int, optional (default: 1)) – Minimum number of counts required for a gene to pass filtering.
n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.
mt_ratio (double, optional (default: 0.8)) – Maximum proportion of mito genes for a cell to pass filtering.
lognorm (bool, optional (default: True)) – If True, execute lognorm() function.
hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.
index_unique (string, optional (default: None)) – Make the index unique by joining the existing index names with the batch category, using index_unique=’-‘, for instance. Provide None to keep existing indices.
- Returns
adata_norm
- Return type
AnnData
-
vipcca.tools.utils.
read_sc_data
(input_file, fmt='h5ad', backed=None, transpose=False, sparse=False, delimiter=' ', unique_name=True, batch_name=None, var_names='gene_symbols')[source]¶ Read single cell dataset
- Parameters
input_file (string) – The path of the file to be read.
fmt (string, optional (default: 'h5ad')) – The file type of the file to be read.
backed (Union[Literal[‘r’, ‘r+’], bool, None] (default: None)) – If ‘r’, load AnnData in backed mode instead of fully loading it into memory (memory mode). If you want to modify backed attributes of the AnnData object, you need to choose ‘r+’.
transpose (bool, optional (default: False)) – Whether to transpose the read data.
sparse (bool, optional (default: False)) – Whether the data in the dataset is stored in sparse matrix format.
delimiter (str, optional (default: ' ')) – Delimiter that separates data within text file. If None, will split at arbitrary number of white spaces, which is different from enforcing splitting at single white space ‘ ‘.
unique_name (bool, optional (default: False)) – If Ture, AnnData object execute var_names_make_unique() and obs_names_make_unique() functions.
batch_name (string, optional (default: None)) – Batch name of current batch data
var_names (Literal[‘gene_symbols’, ‘gene_ids’] (default: 'gene_symbols')) – The variables index when the file type is ‘mtx’.
- Returns
adata
- Return type
AnnData
-
vipcca.tools.utils.
spatial_preprocessing
(datasets, min_cells=1, min_genes=1, n_top_genes=2000, lognorm=True, hvg=True)[source]¶ Preprocess and merge two visium datasets from different batches
- Parameters
datasets (list, optional (default: None)) – The list of anndata objects from different batches
min_cells (int, optional (default: 1)) – Minimum number of counts required for a cell to pass filtering.
min_genes (int, optional (default: 1)) – Minimum number of counts required for a gene to pass filtering.
n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.
lognorm (bool, optional (default: True)) – If True, execute lognorm() function.
hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.
- Returns
adata
- Return type
AnnData
-
vipcca.tools.utils.
spatial_rna_preprocessing
(adata_spatial, adata_rna, lognorm=True, hvg=True, n_top_genes=2000)[source]¶ Preprocess and merge visium dataset with scRNA-seq dataset.
- Parameters
adata_spatial (AnnData) – AnnData object of visium dataset.
adata_rna (AnnData) – AnnData object of scRNA-seq dataset.
lognorm (bool, optional (default: True)) – If True, execute lognorm() function.
hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.
n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.
- Returns
adata
- Return type
AnnData
Plotting¶
-
vipcca.tools.plotting.
plotCorrelation
(y, y_pred, save=True, result_path='./', show=True, rnum=10000.0, lim=20)[source]¶ Plot correlation between original data and corrected data
- Parameters
y (matrix or csr_matrix) – The original data matrix.
y_pred (matrix or csr_matrix) – The data matrix integrated by vipcca.
save (bool, optional (default: True)) – If True, save the figure into result_path.
result_path (string, optional (default: './')) – The path for saving the figure.
show (bool, optional (default: True)) – If True, show the figure.
rnum (double, optional (default: 1e4)) – The number of points you want to sample randomly in the matrix.
lim (int, optional (default: 20)) – the right parameter of matplotlib.pyplot.xlim(left, right)
VIPCCA¶
-
class
vipcca.model.vipcca.
VIPCCA
(adata_all=None, patience_es=50, patience_lr=25, epochs=100, res_path=None, split_by='_batch', method='lognorm', hvg=True, batch_input_size=128, batch_input_size2=16, activation='softplus', dropout_rate=0.01, hidden_layers=[128, 64, 32, 16], lambda_regulizer=5.0, initializer='glorot_uniform', l1_l2=(0.0, 0.0), mode='CVAE', model_file=None, save=True)[source]¶ Bases:
object
Initialize VIPCCA object
Parameters
- patience_es: int, optional (default: 50)
number of epochs with no improvement after which training will be stopped.
- patience_lr: int, optional (default: 25)
number of epochs with no improvement after which learning rate will be reduced.
- epochs: int, optional (default: 100)
Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.
- res_path: string, (default: None)
Folder path to save model training results model.h5 and output data adata.h5ad.
- split_by: string, optional (default: ‘_batch’)
the obsm_name of obsm used to distinguish different batches.
- method: string, optional (default: ‘lognorm’)
the normalization method for input data, one of {“qqnorm”,”count”, other}.
- batch_input_size: int, optional (default: 128)
the length of the batch vector that concatenate with the input layer.
- batch_input_size2: int, optional (default: 16)
the length of the batch vector that concatenate with the latent layer.
- activation: string, optional (default: “softplus”)
the activation function of hidden layers.
- dropout_rate: double, optional (default: 0.01)
the dropout rate of hidden layers.
- hidden_layers: list, optional (default: [128,64,32,16])
Number of hidden layer neurons in the model
- lambda_regulizer: double, optional (default: 5.0)
The coefficient multiplied by KL_loss
- initializer: string, optional (default: “glorot_uniform”)
Regularizer function applied to the kernel weights matrix.
- l1_l2: tuple, optional (default: (0.0, 0.0))
[L1 regularization factor, L2 regularization factor].
- mode: string, optional (default: ‘CVAE’)
one of {“CVAE”, “CVAE2”, “CVAE3”}
- model_file: string, optional (default: None)
The file name of the trained model, the default is None
- save: bool, optional (default: True)
If true, save output adata file.