FPKM (Fragments Per Kilobase per Million) - a measure of gene expression. The larger the FPKM, the greater the expression for that gene in the SCN. Generated using cuffQuant (see methodology).
Expression level (counts) - a measure of gene expression. The mean number of reads mapped to the gene over 24 hours. (Note: this measure of expression is confounded by gene length so FPKM is a better measure of expression if comparing different genes.)
SCN enriched - whether gene expression is determined to have significant enrichment in the SCN opposed to brain (see methodology).
Enrichment - the fold-change difference of expression between the SCN and brain tissues (i.e. an enrichment score of 10 indicates expression is 10-fold higher in the SCN.)
Fluctuating - whether the gene is defined as significantly fluctuating over time (see methodology).
JTK.padj - does gene expression cycle sinusoidally. A gene with a p.adj < 0.05 suggests sinusoidal cycling.
DESeq2.padj - does gene expression fluctuate over time. A gene with a p.adj < 0.05 suggests significant change over time.
WGCNA module - the coexpression module into which that gene was partitioned. Note: the twin-peaking genes form the lightsteelblue1 module. Generated using WGCNA (see methodology).
Peak phase - the phase at which gene expression is seen to peak. (Note: this statistic is most applicable to sinusoidal cycling genes.)
Peak phase (binned) - the phase at which gene expression is seen to peak. This binned into 6 different timepoints (ZT2, ZT6, ZT10, ZT14, ZT18, ZT22).
Dynamic range - the fold-change difference between the peak and trough of gene expression over time.
SCN Panda - whether the gene cycles in the SCN array study (Panda et al., 2002; value < 0.1 suggests significant sinusoidal cycling).
Tissue cyc no - the number of other tissues (max. 12) in which the gene is seen to display sinusoidal expression pattern (JTK.padj < 0.05 (see methodology) ; Zhang et al., 2014).
To determine the expression level for each gene in each sample, aligned reads were quantified using either HTSeq (Anders et al., 2015) or cuffNorm. Both methods used a GTF file generated from combining both the most recent Ensembl annotation and the identified set of lincRNAs. FPKM values were generated using cuffQuant and normalised with cuffNorm, whereas count tables were generated using HTseq and normalised using DESeq2 (Love et al., 2014).
For this process additional RNA-seq datasets from six independent studies were used (GSE54124, GSE30352, GSE54652, GSE41637, GSE41338, and GSE36874).
To identify SCN enriched genes, genes were required to show significantly greater expression (q < 0.05) in the brain than in liver and significantly greater expression (q < 0.05) in both this studys SCN dataset and the published SCN dataset (Azzi, 2014) relative to the brain using ANOVA in R. FPKM values were then averaged for each study and quantile normalised. Fold change (FC) enrichment scores were calculated by averaging the FPKM values for each study for a particular tissue and then comparing this average to the mean FPKM value for another tissue. A gene was defined as being SCN enriched if it showed a 3 fold enrichment in the SCN over Brain samples.
The R packages JTK_CYCLE (Hughes, 2010) and DESeq2 were used together to identify genes whose expression significantly altered over time with a sinusoidal oscillatory pattern. DESeq2 was used to perform a likelihood ratio test to determine the genes whose expression significantly altered over time (q < 0.05) and the JTK_CYCLE was used to identify genes whose expression followed 24-hr periodic waveforms (q < 0.05). To identify genes with non-sinusoidal oscillatory patterns, signed WGCNA (Langfelder, 2008) was used to identify modules based upon gene coexpression. Modules were merged if their similarity was greater than 0.3 according to dendrogram height. Of the resultant 23 modules only the lightsteelblue1 module was investigated further because it was the only non-sinusoidal module whose expression pattern was not influenced by anomalous expression of a single sample at a particular timepoint. Genes of this module were defined as fluctuating if their expression significantly altered over time (q < 0.05 , DESeq2).
Nearest PC L / R - the closest protein coding gene located to the genes left / right.
Dist PC L / R - the distance of the lincRNA to that protein coding gene.
FPKM (Fragments Per Kilobase per Million) - a measure of gene expression. The larger the FPKM, the greater the expression for that gene in the SCN. Generated using cuffQuant (see methodology).
Expression level (counts) - a measure of gene expression. The mean number of reads mapped to the gene over 24 hours. (Note: this measure of expression is confounded by gene length so FPKM is a better measure of expression if comparing different genes.)
Fluctuating - whether the lincRNA is defined as significantly fluctuating over time (see methodology).
JTK.padj - does lincRNA expression cycle sinusoidally. A gene with a p.adj < 0.05 suggests sinusoidal cycling.
DESeq2.padj - does lincRNA expression fluctuate over time. A gene with a p.adj < 0.05 suggests significant change over time.
WGCNA module - the coexpression module into which that lincRNA was partitioned. Note: the twin-peaking genes form the lightsteelblue1 module. Generated using WGCNA (see methodology).
Peak phase - the phase at which lincRNA expression is seen to peak. (Note: this statistic is most applicable to sinusoidal cycling genes.)
Dynamic range - the fold-change difference between the peak and trough of lincRNA expression over time.
PC L/R Correlation - the Pearsons expression correlation between the lincRNA and the closest protein coding gene located to its left / right.
linc.PHAST - the phastCons score for the lincRNA (a high phastCons score suggests strong conservation).
z.score TE - the conservation of the lincRNA relative to a proxy for neutral evolution (the conservation score for transposable elements (TEs) within 50kb of the lincRNA locus). (i.e. a z-score of 3 suggests the lincRNA is 3-fold conserved relative to neutral sequence.)
z.score Flank - the conservation of the lincRNA relative to a proxy for neutral evolution (a 500 bp region flanking 2kb away from the lincRNA locus).
GeneName - if the gene is annotated, what is its name.
Belgard - lincRNA previously identified in cortex (Belgard et al. 2011).
Ramos - lincRNA previously identified in neural stem cells (Ramos et al. 2013).
CAGE permissive (Cap Analysis of Gene Expression) - is there evidence for a transcription start site at this loci (http://fantom.gsc.riken.jp/protocols/basic.html).
Retro Pseudogene - does this lincRNA overlap a retroposed pseudogene.
TE Overlap - does this lincRNA overlap a transposable element.
Identification of lincRNAs was conducted through the use of the CGAT NGS pipelines rnaseqtranscripts.py and rnaseqlncrna.py (Sims, 2014). The first pipeline identifies transfrags using cufflinks and retains those present in at least two samples. The second pipeline predicts lncRNAs by removing transfrags which overlap protein coding exons. These lncRNAs are then assessed for coding potential using the coding potential calculator (CPC; Kong, 2007) and removed if annotated as 'coding' (CP score >1). Only the intergenic (>2kb from any protein coding gene), multiexonic lncRNAs with expression in over 6 biological replicates were chosen for further investigation.
See the methodology in the 'Gene table' section.
De novo transcript assembly was conducted using cufflinks v2.0.2 (Trapnell et al., 2010) allowing identification of 6,610 novel exons. These were required not to overlap but were within a 10kb window of any feature in the most recent Ensembl or UCSC gene annotation sets. Of these exons, only those i) with a reciprocal overlap of <25% with any retroposed pseudogene (ucscRetroAli6), ii) with a reciprocal overlap of <50% with any transposable element, iii) with over 20 spliced reads in the novel exon, iv) with over 10 spliced reads per 100 bp of the novel exon, v) with over 20 reads that splice into a known transcript and were vi) shorter than 3,000bp, were retained providing a robust set of 1,013 novel exons.