Plot

Determine the circadian expression profile in the SCN for a given gene (eg. Avp / ENSMUSG00000037727).




Gene table

Move slider to set threshold for gene expression level. Note: Keyboard arrows are also able to adjust the value.

Optional Columns

Definitions


Expression level:

FPKM (Fragments Per Kilobase per Million) - a measure of gene expression. The larger the FPKM, the greater the expression for that gene in the SCN. Generated using cuffQuant (see methodology).

Expression level (counts) - a measure of gene expression. The mean number of reads mapped to the gene over 24 hours. (Note: this measure of expression is confounded by gene length so FPKM is a better measure of expression if comparing different genes.)

Enrichment:

SCN enriched - whether gene expression is determined to have significant enrichment in the SCN opposed to brain (see methodology).

Enrichment - the fold-change difference of expression between the SCN and brain tissues (i.e. an enrichment score of 10 indicates expression is 10-fold higher in the SCN.)

Temporal expression:

Fluctuating - whether the gene is defined as significantly fluctuating over time (see methodology).

JTK.padj - does gene expression cycle sinusoidally. A gene with a p.adj < 0.05 suggests sinusoidal cycling.

DESeq2.padj - does gene expression fluctuate over time. A gene with a p.adj < 0.05 suggests significant change over time.

WGCNA module - the coexpression module into which that gene was partitioned. Note: the twin-peaking genes form the lightsteelblue1 module. Generated using WGCNA (see methodology).

Peak phase - the phase at which gene expression is seen to peak. (Note: this statistic is most applicable to sinusoidal cycling genes.)

Peak phase (binned) - the phase at which gene expression is seen to peak. This binned into 6 different timepoints (ZT2, ZT6, ZT10, ZT14, ZT18, ZT22).

Dynamic range - the fold-change difference between the peak and trough of gene expression over time.

SCN Panda - whether the gene cycles in the SCN array study (Panda et al., 2002; value < 0.1 suggests significant sinusoidal cycling).

Tissue cyc no - the number of other tissues (max. 12) in which the gene is seen to display sinusoidal expression pattern (JTK.padj < 0.05 (see methodology) ; Zhang et al., 2014).

Expression quantification

To determine the expression level for each gene in each sample, aligned reads were quantified using either HTSeq (Anders et al., 2015) or cuffNorm. Both methods used a GTF file generated from combining both the most recent Ensembl annotation and the identified set of lincRNAs. FPKM values were generated using cuffQuant and normalised with cuffNorm, whereas count tables were generated using HTseq and normalised using DESeq2 (Love et al., 2014).

Identification of SCN enriched genes

For this process additional RNA-seq datasets from six independent studies were used (GSE54124, GSE30352, GSE54652, GSE41637, GSE41338, and GSE36874).

To identify SCN enriched genes, genes were required to show significantly greater expression (q < 0.05) in the brain than in liver and significantly greater expression (q < 0.05) in both this studys SCN dataset and the published SCN dataset (Azzi, 2014) relative to the brain using ANOVA in R. FPKM values were then averaged for each study and quantile normalised. Fold change (FC) enrichment scores were calculated by averaging the FPKM values for each study for a particular tissue and then comparing this average to the mean FPKM value for another tissue. A gene was defined as being SCN enriched if it showed a 3 fold enrichment in the SCN over Brain samples.

Identification of fluctuating genes

The R packages JTK_CYCLE (Hughes, 2010) and DESeq2 were used together to identify genes whose expression significantly altered over time with a sinusoidal oscillatory pattern. DESeq2 was used to perform a likelihood ratio test to determine the genes whose expression significantly altered over time (q < 0.05) and the JTK_CYCLE was used to identify genes whose expression followed 24-hr periodic waveforms (q < 0.05). To identify genes with non-sinusoidal oscillatory patterns, signed WGCNA (Langfelder, 2008) was used to identify modules based upon gene coexpression. Modules were merged if their similarity was greater than 0.3 according to dendrogram height. Of the resultant 23 modules only the lightsteelblue1 module was investigated further because it was the only non-sinusoidal module whose expression pattern was not influenced by anomalous expression of a single sample at a particular timepoint. Genes of this module were defined as fluctuating if their expression significantly altered over time (q < 0.05 , DESeq2).

Plot

Determine the circadian expression profile in the SCN for an identified lincRNA (eg. NONCO7761).



Table

Move slider to set threshold for gene expression level. Note: Keyboard arrows are also able to adjust the value.

Optional Columns

Definitions


Location:

Nearest PC L / R - the closest protein coding gene located to the genes left / right.

Dist PC L / R - the distance of the lincRNA to that protein coding gene.

Expression level:

FPKM (Fragments Per Kilobase per Million) - a measure of gene expression. The larger the FPKM, the greater the expression for that gene in the SCN. Generated using cuffQuant (see methodology).

Expression level (counts) - a measure of gene expression. The mean number of reads mapped to the gene over 24 hours. (Note: this measure of expression is confounded by gene length so FPKM is a better measure of expression if comparing different genes.)

Temporal expression:

Fluctuating - whether the lincRNA is defined as significantly fluctuating over time (see methodology).

JTK.padj - does lincRNA expression cycle sinusoidally. A gene with a p.adj < 0.05 suggests sinusoidal cycling.

DESeq2.padj - does lincRNA expression fluctuate over time. A gene with a p.adj < 0.05 suggests significant change over time.

WGCNA module - the coexpression module into which that lincRNA was partitioned. Note: the twin-peaking genes form the lightsteelblue1 module. Generated using WGCNA (see methodology).

Peak phase - the phase at which lincRNA expression is seen to peak. (Note: this statistic is most applicable to sinusoidal cycling genes.)

Dynamic range - the fold-change difference between the peak and trough of lincRNA expression over time.

PC L/R Correlation - the Pearsons expression correlation between the lincRNA and the closest protein coding gene located to its left / right.

Conservation:

linc.PHAST - the phastCons score for the lincRNA (a high phastCons score suggests strong conservation).

z.score TE - the conservation of the lincRNA relative to a proxy for neutral evolution (the conservation score for transposable elements (TEs) within 50kb of the lincRNA locus). (i.e. a z-score of 3 suggests the lincRNA is 3-fold conserved relative to neutral sequence.)

z.score Flank - the conservation of the lincRNA relative to a proxy for neutral evolution (a 500 bp region flanking 2kb away from the lincRNA locus).

Previously Identified:

GeneName - if the gene is annotated, what is its name.

Belgard - lincRNA previously identified in cortex (Belgard et al. 2011).

Ramos - lincRNA previously identified in neural stem cells (Ramos et al. 2013).

Other:

CAGE permissive (Cap Analysis of Gene Expression) - is there evidence for a transcription start site at this loci (http://fantom.gsc.riken.jp/protocols/basic.html).

Retro Pseudogene - does this lincRNA overlap a retroposed pseudogene.

TE Overlap - does this lincRNA overlap a transposable element.

Identification of lincRNAs

Identification of lincRNAs was conducted through the use of the CGAT NGS pipelines rnaseqtranscripts.py and rnaseqlncrna.py (Sims, 2014). The first pipeline identifies transfrags using cufflinks and retains those present in at least two samples. The second pipeline predicts lncRNAs by removing transfrags which overlap protein coding exons. These lncRNAs are then assessed for coding potential using the coding potential calculator (CPC; Kong, 2007) and removed if annotated as 'coding' (CP score >1). Only the intergenic (>2kb from any protein coding gene), multiexonic lncRNAs with expression in over 6 biological replicates were chosen for further investigation.

Identification of fluctuating genes

See the methodology in the 'Gene table' section.

Novel exons

Search for a given gene in the top right corner and view it on UCSC Genome browser via the link.

Identification of novel exons

De novo transcript assembly was conducted using cufflinks v2.0.2 (Trapnell et al., 2010) allowing identification of 6,610 novel exons. These were required not to overlap but were within a 10kb window of any feature in the most recent Ensembl or UCSC gene annotation sets. Of these exons, only those i) with a reciprocal overlap of <25% with any retroposed pseudogene (ucscRetroAli6), ii) with a reciprocal overlap of <50% with any transposable element, iii) with over 20 spliced reads in the novel exon, iv) with over 10 spliced reads per 100 bp of the novel exon, v) with over 20 reads that splice into a known transcript and were vi) shorter than 3,000bp, were retained providing a robust set of 1,013 novel exons.

Outline


The suprachiasmatic nucleus (SCN) is the master daily pacemaker in mammals, responsible for driving daily rhythmic behaviour and physiology. By combining laser capture microdissection (LCM) and RNA-seq, we provide the first detailed examination of the mammalian SCN transcriptome over a 24-hour light / dark cycle.


Methodology

SCN RNA sample preparation

Male C3H/HeH mice at 10-12 weeks of age were group housed in light-tight chambers equipped with LED lighting at 150 lux at the cage floor. After 7 days of acclimatisation, mice were singly housed for 7 days prior to tissue harvesting. Brains were removed at one of six zeitgeiber times (ZT) and immediately frozen on dry ice in OCT mounting media. Eighteen frozen coronal sections at 15uM were cut spanning the entire rostral to caudal region of the SCN and mounted onto polyethylene napthalate (PEN) slides . Laser-capture microdissection was carried out from dehydrated, Nissl-stained sections using the PALM system as previously described (Dulneva et al., 2015). Tissue from five animals was pooled to generate one biological replicate sample and total RNA was purified using the RNeasy micro kit (Qiagen). RNA quality was determined using an RNA Picochip (Agilent), with RIN values over 8 for all samples and average yield of approximately 10ng per replicate. Approximately 1ng of RNA was amplified using the SMARTer protocol and the resulting libraries were 100bp paired-end sequenced on the HiSeq (Illumina; Wellcome Trust Centre for Human Genetics Sequencing Core). The average read pair count obtained was ~35M per technical replicate (~100M per biological replicate).

Additional methodology is provided within each subsection.


Further data

Raw and mapped data files and count tables will be available for download upon publication.


The SCNseq data explorer was built by William Pembroke using R Shiny.