TIMEOR accepts 2 input types: (1) raw .fastq files and SraRunTable (e.g. here) or a (2) RNA-seq time-series read count matrix (e.g. here) and metadata file (e.g. here).
TIMEOR is available online at https://timeor.brown.edu.
Import SraRunTable from GEO* where TIMEOR will process raw data through retrieving .fastq files, quality control, alignment, and read count matrix creation. Read first tab of TIMEOR (Getting Started) for information about this input specification. Read this section for information about how to process these data in TIMEOR. We strongly encourage users to upload a read count matrix, or process raw .fastq data via TIMEOR’s interface locally using Docker (see 4 steps here).
Import metadata file** and count matrix *** (skipping raw data retrieval, quality control, alignment, and read count matrix creation) and proceeding straight to normalization and correction. Read first tab of TIMEOR (Getting Started) for information about this input specification. Read this section for information about how to process these data in TIMEOR.
Then simply follow the prompts. Fill out the grey boxes to begin interacting with each stage and tab.
NOTE: see first tab of TIMEOR called Getting Started for specifications.
ID, condition, time, batch
An example might be:
ID batch condition time
simT0.1 1 control 0
simT0.2 2 control 0
simT0.3 3 control 0
simT1.1 1 case 1
simT1.2 2 case 1
simT1.3 3 case 1
simT2.1 1 case 2
simT2.2 2 case 2
simT2.3 3 case 2
simT3.1 1 case 3
simT3.2 2 case 3
simT3.3 3 case 3
NOTE Importantly, we assume that replicate batches are sampled with each batch sampled at each time point. That means batch 1 across 4 timepoints would have corresponding replicate 1 at time point 1, replicate 1 at time point 2, replicate 1 at time point 3 and replicate 1 at time point 4. This process continues for all batches. This structure is adopted to work with all three differential expression methods. Moreover, it is a common structure to control for non-biological variation in a time-series experiment. For example, say RNA-seq was performed on a cell line after insulin stimulation, on 10 consecutive time points, every 20 minutes, with three biological replicates (such as in our manuscript). This means that non-biological factors could be considered when determining temporal differential expression by being able to compare three biological replicates!
Please read through this first page of information to help guide you through TIMEOR.
Importantly, before beginning the analysis, TIMEOR requires the user to give data details to set the adaptive default methods used downstream. The ‘Getting Started’ Tab above under section ‘Suggestions for How to Answer Six ’Determine Adaptive Default Methods’ Questions’ (here) provides suggestions for how to answer these.
NOTE: We stronly recommend keeping ‘Yes’ as the answer for Question 5 to compare multiple methods, as this highlights valuable TIMEOR functionality.
This tutorial uses a subset of real data used in the TIMEOR publication to take the user through TIMEOR’s “Process Raw Data” tab. You will first see this pop-up. Please read. There are 4 steps.
NOTE: When possible we advise the user to start from a read count matrix, as our file size limit is 10GB. For larger datasets, the user can use TIMEOR locally (still through web interface) via Docker, by following the 4 steps below, and here.
This tutorial uses simulaated data and takes the user through TIMEOR’s full functionality beginning from a read count matrix (genes x sample/time). NOTE: figures with two panels are the same page, just split. There are 20 steps.
The user can begin this tutorial before or after following “Run TIMEOR from Raw Data: Starting from .fastq Time-Series RNA-seq”.
In the far-left side-bar click on “Example Data” and then under “Load simulated data” click on “Metadata & read count file”.
Follow the pop-up to explore results on the Pre-processing tabs “Process Raw Data” and “Process Count Matrix”.
Proceed to Primary Analysis and click “Run”.
At the bottom right you will see a pop-up to click “Render Venn Diagram” in the top right to compare differential expression results between three methods (ImpulseDE2, Next maSigPro, and DESeq2) and choose which method results to proceed with. You will then see a pop-up saying that you have completed Primary Analysis. Feel free to move on with TIMEOR’s default parameters, or explore Primary Analysis options (see next step).
Examine differential expression method results in the bottom row. Toggle under “Display Desired Differential Expression Method Results” between ImpulseDE2, Next maSigPro, and DESeq2 on the left, and the interactive clustermap with automated clustering will display the differentially expressed gene trajectories for the chosen method. TIMEOR provides an automatic clustering option (shown) which takes the mode between three unsupervised clustering methods (partition around medoids, Silhouette, and Calinski criterion) to automatically return the number of gene trajectory clusters to the user. PDF available on download to see clustering plots. You can then also toggle “Cluster Gene Expression Trajectories” to choose the number of clusters desired. For this demonstration, we chose “ImpulseDE2” differentially expressed gene output. We also chose “automatic” clustering of gene trajectories. On new data the user can choose these two parameters. NOTE: ImpulseDE2 is chosen because it has the largest differential expressed gene overlap with the previous study and other methods.
Click ‘Fix cluster number’ to solidify the number of clusters. This button fetches the genomic sequences of each gene to perform motif enrichment per gene trajectory cluster for tab ‘Factor Binding’ in the Secondary Analysis stage.
As said in pop-up, proceed to Secondary Analysis tab in side-bar.
Clusters are labeled in ascending order from 1 for top-most cluster. Under Gene Expression Trajectory Clusters choose cluster 1, 2, or 3 in the dropdown. On the right under “Chosen Cluster Gene Set” you will see the genes in that cluster. Genes are the same color as the gene trajectory cluster to which they belong.
Once you have chosen which genes set to test for enrichment, click the “Analyze” toggle to “ON”.
Wait to view any enriched gene ontology (GO) terms (Molecular Function, Biological Process, or Cellular Component), pathway, network, and/or motif analysis. NOTE you may download the interactive motif results for viewing.
Toggle the “Analyze” button to “OFF” to choose another gene set, and repeat steps 9-12. NOTE if no images show simply toggle ‘OFF’ then ‘ON’ again to view any results. If no enrichment is found, that box is left blank.
In that same table on the far right you will see ENCODE IDs indicating published ChIP-seq data for the predicted transcription factors. For this tutorial, can either see an example provided with “pho”. You may also either download these read-depth normalized .bigWig files here (ENCFF467OWR, ENCFF609FCZ, ENCFF346CDA) or follow the prompts (step 16) in the grey box under “Upload .bigWig Files”. Any .bigWig files from protein-DNA data are accepted.
If you are interested, click on the “+” under “See each method’s predicted transcription factors:” to see the ranked lists of transcription factors and motifs by method. Blanks indicate an enriched motif is not assigned to a transcription factor region (to see motifs click ‘Download interactive cluster motif result’). Search for a method (e.g. transfac in blue box), enrichment score, etc. Row names are top 1 - 4 transcription factors.
To run TIMEOR outside of website (recommended for preprocessing from raw .fastq files), users may use Docker and Docker Hub. First, the TIMEOR repository must be cloned (https://github.com/ashleymaeconard/TIMEOR.git). To use Docker, it must be installed (version 20.10.0 recommended).
/genomes_info/
) into desired location (e.g. /Users/USERNAME/Desktop/test_folder/genomes_info/
) to mount later.
/genomes_info/dme/
/genomes_info/mmu/
/genomes_info/hsa/
/genomes_info/
: https://drive.google.com/drive/folders/1KEnpCOU0dQU5p1tnEy3o9l02NE0uYnpm?usp=sharing/genome_info/
are readable.
chmod -R 777 /Users/USERNAME/Desktop/test_folder/genomes_info/dme/
.$ docker pull ashleymaeconard/timeor:latest
$ docker images
$ docker run -v /Users/USERNAME/Desktop/test_folder/:/srv/ -p 3838:3838 <IMAGE_ID>
localhost:3838
.NOTE: This could take a while. Please follow these commands:
$ cd /PATH/TO/TIMEOR/
$ docker build -t timeor_env .
$ docker container ls
$ docker exec -it <CONTAINER_NAME> /bin/bash/
The original temporal RNA-seq data analyzed in our paper comes from Zirin et al., 2019). In this tutorial SRR8843750 and SRR8843738 are analyzed to demonstrate the “Process Raw Data” tab in which raw RNA-seq data are retrieved, quality checked, aligned (with HISAT2 and Bowtie2), and converted to a read count matrix. The real data subset folder (which TIMEOR automatically generates) can be downloaded here.
The original simulated data folder can be downloaded here.
To get the top 4 TFs a 25% concensus threshold was used, with a normalized enrichement score threshold of 3.
Command used: Rscript get_top_tfs.r /PATH/TO/simulated_results/ dme 3 4 25 /PATH/TO/TIMEOR/
The following bigWig files were collected:
ENCFF467OWR (read-depth normalized signal between both replicates) within dataset ENCSR240ADR for Stat92E
ENCFF609FCZ (read-depth normalized signal between both replicates) within dataset ENCSR681YMA for pho
ENCFF346CDA (read-depth normalized signal between all three replicates) within dataset ENCSR776AVR for CG7786
The results presented in TIMEOR’s publication can be downloaded in TIMEOR’s automatically generated folders here.