about projects people publications resources resources visit us visit us search search

Quick Links

RBVI Apps Home

Current Apps (3.x)

Single Cell RNA-Seq Analysis

scNetViz

Structure Integration

structureViz2

Chem Informatics

chemViz2

Cluster Analysis

clusterMaker2

Network Access

cddApp
stringApp

Animation

CyAnimator

Layouts

boundaryLayout

Utility Apps

Adjacency Matric Reader
CDT (Cluster) Reader
Enhanced Graphics
Layout Saver (Mapper)
Sets App
CyBrowser

Typographic Conventions

Mouse button
Dialog button
Keyboard button
Menu item
Dialog label
UCSF RBVI Cytoscape Plugins

scNetViz: Cytoscape networks for scRNA-seq analysis

scNetViz is a Cytoscape app for identifying differentially expressed genes from single-cell RNA sequencing data and displaying networks of the corresponding proteins for further analysis. Several ways of plotting the cells and gene expression data are also available. This app enables scientists who may not be experts in scRNA-seq to explore the data and to develop biological hypotheses.

scNetViz works with other Cytoscape apps, namely stringApp, the enhancedGraphics app, cyBrowser, and cyPlot, as well as web services hosted by the RBVI, to provide:

Contents

  1. Installation
  2. Main Menu
  3. Loading an Experiment
  4. Experiment Table
  5. Plotting Cells
  6. Adding Categories
  7. Differential Expression Analysis
  8. Loading Protein Networks
  9. Results Panel
  10. Data Cleaning

Installation

To download and install scNetViz, start Cytoscape and bring up the App Manager (Apps→App Manager). You can search for scNetViz directly by name or by any of its tags: automation, integrated analysis, enrichment analysis, gene expression, and PPI-network. Select the scNetViz app and click Install.

An alternative approach is to navigate to the Cytoscape App Store using a web browser, search for and select scNetViz as above, and download the jar file. In Cytoscape, the app can be installed from file using the App Manager (Apps→App Manager).

Source code is available from https://github.com/RBVI/scNetViz/

Main Menu

scNetViz adds entries to the Cytoscape main Apps menu:

Loading an Experiment

To browse and load an experiment from an online repository:

  • Single Cell Expression Atlas (SCEA): Click the icon in the Cytoscape toolbar or use main menu: Apps→scNetViz→Load Experiment→From Single Cell Expression Atlas
  • Human Cell Atlas (HCA): Click the icon in the Cytoscape toolbar or use main menu: Apps→scNetViz→Load Experiment→From the Human Cell Atlas

The resulting experiment browser lists the available datasets along with their accession codes, brief descriptions, numbers of cells, and other information. Clicking a column header sorts by the contents of that column. Searching with a term of interest highlights all rows with matching text in the accession, experiment (SCEA), description (HCA), or organisms column.

SCEA Experiment Browser (click any figure to enlarge it...)

Clicking a row to highlight it chooses an experiment. If multiple rows are highlighted, only the first is treated as the chosen experiment for the following actions:

Whether the Double-Click Action (the result of double-clicking a row in the experiment browser) should be View Data or Create Networks is specified in the settings, along with the default parameters for differential expression analysis and loading networks.

The Settings dialog can be shown by choosing Apps→scNetViz→Settings from the main menu or by clicking the icon near the upper right corner of either atlas browser or any experiment table.

To load an experiment from file:

Choose Apps→scNetViz→Load Experiment→Import from file from the menu, then browse to locate and open a zip, tar.gz, tgz, or gzip file of the three MatrixMarket files (.mtx, .mtx_cols, .mtx_rows) comprising a normalized scRNA-seq quantification dataset. The species must also be specified in the dialog to enable the later step of loading networks for the corresponding proteins.

Experiment Table

The experiment table contains all of the information loaded for an experiment, as well as analysis results. It has three tabbed sections:

Experiment Table: TPM
Experiment Table: TPM
  • TPM – RNA quantification in transcripts per million, with genes as rows and cells as columns. Double-clicking a gene name sorts the columns by the values in that row. Standard column sorting by clicking a column header (in this case, a cell identifier) can also be done. Menus and buttons across the top:

    • New Cell Plot – plot cells in 2D (details...) with coloring by TPM values for the currently chosen gene (the row highlighted in the table)
      • t-SNE (local)
      • UMAP
      • Graph layout
      • t-SNE (on server)

    • View <cell-plot-type>, for example, View tSNE or View UMAP – re-show the most recently calculated cell plot, but with coloring by TPM values for the currently chosen gene (the row highlighted in the table)

    • Add Category – read in or compute additional classifications of the cells (details...)
      • Import from file
      • Louvain clustering
      • Leiden clustering

    • Export CSV – export table as a text file with comma- or tab-separated values

  • Categories – sets of labels such as cluster numbers or cell-type assignments, with categories as rows and cells as columns. Each row defines a grouping that could be used for differential expression analysis. Within a given category, some cells may lack a label (group assignment). Categories can be added from input files or clustering calculations, and the menu under Available categories can be used to switch between the resulting sets of categories. Clicking a row chooses that category, and cutoff criteria for which genes to include can be adjusted before Calculate Diff Exp is clicked to launch the analysis.

    • New Cell Plot – plot cells in 2D (details...) with coloring by the currently chosen category and hiding cells without labels for that category (e.g., cells not assigned to any cluster)
      • t-SNE (local)
      • UMAP
      • Graph layout
      • t-SNE (on server)

    • View <cell-plot-type>, for example, View tSNE or View UMAP – re-show the most recently calculated cell plot, but with coloring by the currently chosen category and hiding cells without labels for that category (e.g., cells not assigned to any cluster)

    The other controls are as described for the TPM tab above.

  • DiffExp – results (if any) of differential expression analysis, with genes as rows and DE statistics as columns. In this section, double-clicking a gene name opens a browser window showing information for that gene at the Ensembl website.

The table for a previously loaded experiment can be shown by choosing Apps→scNetViz→Show Experiment Tables from the main menu.

Plotting Cells

The New Cell Plot menu is available from the experiment table or under Apps→scNetViz in the main menu, with choice of method and adjustable parameters. Parameters are explained in more detail in the balloon help from mousing over the dialogs.

UMAP Colored by Cluster
UMAP Colored by Cluster
  • t-SNE (local) – t-SNE (t-Distributed Stochastic Neighbor Embedding) calculated locally after data cleaning
    • Initial Dimensions (initial default 10)
    • Perplexity (initial default 20)
    • Number of iterations (initial default 1000)
    • Use Barnes-Hut approximation (initial default off)
    • Theta value for Barnes-Hut (max: 0 min: 2) (initial default 0.001)
    • Log normalize the data (initial default on)
    • Center and scale the data (initial default off)

    Even with unchanged parameters, t-SNE results may vary between runs due to randomization inherent in the method.

  • UMAPUniform Manifold Approximation and Projection calculated on web server
    • Number of neighbors (initial default 10)
    • Minimum distance (initial default 0.5)
    • Advanced preprocessing parameters – see data cleaning

  • Graph layout – force-directed graph drawing as implemented in scanpy, calculated on web server
    • Graph layout algorithm
      • fa (ForceAtlas2) (initial default)
      • kk (Kamada Kawai)
      • fr (Fruchterman Reingold)
      • lgl (Large Graph)
      • dlr (Distributed Recursive Layout)
      • rt (Reingold Tilford tree layout)
    • Advanced preprocessing parameters – see data cleaning

  • t-SNE (on server) – t-SNE (t-Distributed Stochastic Neighbor Embedding) calculated on web server
    • Perplexity (initial default 20)
    • Initial dimensions (initial default 0, meaning not to use principal components analysis)
    • Early exaggeration (initial default 12)
    • Learning rate (initial default 1000)
    • Advanced preprocessing parameters – see data cleaning

Clicking a cell in the plot scrolls to the corresponding column in the Categories tab of the experiment table. With the magnifying-glass icon chosen (initial default) in the plot window, click-dragging to select a rectangle automatically enlarges that region. Clicking the house icon resets to showing the whole plot.

Adding Categories

For the purposes of scNetViz, a “category” is any classification or labeling of the cells. Within a given category, the cells in an experiment might all have the same label (for example, species = Homo sapiens) or different labels (for example, cluster number = 1, 2, ...). Categories can be viewed and sorted in the Categories tab of the experiment table.

A category in which the cells have at least two different labels is required for differential expression analysis.

The Add Category menu is available from the experiment table or under Apps→scNetViz in the main menu, with options:

  • Import from file – the file can be comma- or tab-separated (CSV or TSV), with categories and cells as rows and columns or vice versa. If the columns are categories, the File needs to be pivoted option should be checked on. The number of header lines and the data type should be indicated.

  • Louvain clusteringLouvain clustering as implemented in scanpy, calculated on web server
    • Number of neighbors (initial default 15)
    • Advanced preprocessing parameters – see data cleaning

  • Leiden clusteringLeiden clustering as implemented in scanpy, calculated on web server
    • Number of neighbors (initial default 15)
    • Advanced preprocessing parameters – see data cleaning
Experiment Table: Categories
Experiment Table: Categories

Differential Expression Analysis

In the Categories tab of the experiment table, each row defines a category or grouping that could be used for differential expression (DE) analysis. Cutoffs indicate which genes should be included, with factory defaults:

  • absolute magnitude of Log2FC (log2 fold change) at least 0.5
  • gene detected in at least Min.pct 10% of cells in either comparison set

The values can be edited directly, and defaults (the values shown initially) adjusted in the settings. Genes not meeting the criteria will still be listed in the results, but without significance values.

The default grouping (category) for analysis of SCEA data is the clustering with sel.K value true, if any, or else the first clustering listed. A different category can be chosen by clicking its row. Clicking the Calculate Diff Exp button performs the analysis with the current settings. Not all cells may be assigned to a cluster, and more generally, some cells may lack a label (may not be assigned to a group) within the chosen category; these cells are excluded from analysis.

Expression is compared between each group and the set of all other groups in that category. With the default cutoffs, a gene is omitted from the calculation if the absolute magnitude of its log2 fold change (ratio of expression levels for the two sets of cells) is less than 0.5 or the gene is detected in fewer than 10% of the cells in each of the two sets.

In the DiffExp tab of the experiment table, the rows are genes, and result columns for each group (e.g. cluster) include:

  • MTC – mean transcript count (in TPM), i.e., average over all cells in the group
  • Min.pct – percent of cells with gene detected in the group or in the comparison set, whichever is more
  • MDTC – mean (data-available) transcript count, i.e., average over cells in the group with the gene detected
  • log2FC – log2 of the fold change (FC). FC = (MTC of the group) ÷ (MTC of the comparison set)
  • pValue – p-value for expression difference, group vs. comparison set, from the Mann-Whitney U (Wilcox rank-sum) test
  • FDR – false discovery rate according to the Benjamini-Hochberg procedure

The menus under Comparison can be used to show the results for different clusterings (different values of k) or categories. Menus and buttons on the top right:

Heatmap
Heatmap

The top differentially expressed genes as shown in the heatmaps and networks may be considered putative markers, but their biological relevance cannot be assessed by statistics alone. Important factors include the specific experiment, its scope and conditions, and the category groupings used for differential expression analysis.

Loading Protein Networks

Protein networks are fetched from the STRING database, either automatically or when the Create Networks button in the DiffExp tab of the experiment table is clicked.

Network analysis cutoffs indicate which proteins should be loaded as a network for each comparison in the differential expression analysis. Factory defaults are to include only the proteins for genes with:

  • FDR (false discovery rate) no greater than 0.05
  • Log2FC (log2 fold change) absolute magnitude at least 1.0

Entering a value for the Max genes further limits the set of proteins to no more than the specified number of top hits ranked by log2FC (factory default 200). The Positive only option indicates whether only genes with positive log2FC values (higher expression than in the comparison set) should be included, with factory default off, meaning to include genes with both higher and lower expression than the comparison set. The values can be edited directly, and defaults (the values shown initially) adjusted in the settings. Networks are generated according to the current criteria when the Create Networks button is clicked.

Assuming some genes meet the criteria for each comparison, the number of networks loaded will be the same as the number of groups in the category, plus one network that is the union of the others. Network node coloring is by log2FC, from red for most positive to blue for most negative, as in the heatmap.

Some of the top-ranked genes as shown in the heatmap may be missing from the network because there is no corresponding protein in STRING (for example, noncoding RNA).

The Cytoscape Node Table lists attributes of each node (protein) including its log2FC magnitude rank in the network that is being viewed. Sorting on the rank column gives the top genes from differential expression analysis, essentially the protein version of Seurat FindMarkers results. For example, the scNetViz Cluster 5 Rank column gives the top putative markers for the comparison of cluster 5 vs. all others.

Networks from scNetViz
Networks from scNetViz

Results Panel

The Results Panel within the main Cytoscape window includes options similar to those appearing elsewhere in scNetViz, plus controls for performing enrichment analyses of terms (annotations) in the network vs. the whole genome of the organism. The panel can be shown by choosing Apps→scNetViz→Show Results Panel from the main menu.

To View data and plots:
  • Tables – the respective sections of the experiment table:
    • TPM Table
    • Category Table
    • DE Table
  • Plots – differential expression plots:
    • Heatmap – a heat map of genes colored by log2FC for each comparison in the current category, as above; if nodes are selected in the network, include only the corresponding genes
    • Violin – the log2FC distribution of genes for each comparison in the current category, as above; if nodes are selected in the network, include only the corresponding genes
To Reanalyze:
  • The menus under Comparison specify what category or clustering (different values of k) should be used to group the cells for differential expression analysis, and which pairwise comparisons should be made.
  • Several cutoffs indicate which proteins should be included in the networks based on the differential expression of their genes:
    • FDR – maximum false discovery rate for expression difference
    • Log2FC – least magnitude of log2 fold change
    • Max genes – include no more than the specified number of top hits ranked by log2 fold change
    • Positive only – whether to include only those with higher expression than the comparison set

    Clicking Create Networks generates the networks according to the current criteria. Default values can be adjusted in the settings.

    Network Showing Enrichment
To Get Enrichment of terms (annotations from Gene Ontology and other sources) within a network relative to the organism's genome, specify which proteins to include:
  • Entire network
  • Positive only – only use the proteins with higher expression than the comparison set
  • Negative only – only use the proteins with lower expression than the comparison set
  • Selected only – only use the proteins corresponding to the currently selected nodes
  • FDR cutofffalse discovery rate above which to exclude proteins from enrichment analysis

    Clicking Retrieve Table performs the enrichment analysis and loads the STRING Enrichment table of results, with redundancies removed. The terms found to be enriched for a network are displayed as colored segments (a “donut chart”) encircling the corresponding nodes, with color-coding as shown in the table.

Enrichment analysis is described in more detail in the stringApp paper:

Cytoscape stringApp: Network analysis and visualization of proteomics data. Doncheva NT, Morris J, Gorodkin J, Jensen LJ. J Proteome Res. 2019 Feb 1;18(2):623-632.

Data Cleaning

For cell-plot and clustering calculations on the web server, data cleaning settings can be adjusted in the Advanced preprocessing parameters section of the respective dialogs:

  • Minimum number of genes/cell (initial default 100)
  • Minimum number of cells/gene (initial default 1)
  • Normalize (initial default on)
  • Log transform (initial default on)
  • Highly variable genes (initial default on)
  • Scale the final matrix (initial default on)

For local t-SNE calculations, data cleaning entails:

  • removing all genes for which expression was not detected in any cells and all cells in which no genes were found to be expressed
  • log-normalizing expression values (TPMs)
  • limiting the gene set to the top variable genes using a reimplementation of the Seurat “find variable genes” routine

Last updated on November 21, 2019


About RBVI | Projects | People | Publications | Resources | Visit Us

Copyright 2018 Regents of the University of California. All rights reserved.

AltStyle によって変換されたページ (->オリジナル) /