mRNA summary

The mRNA summary report builder provides expression sequencing data information for supplied samples. If the supplied tumor sample has expression data for its corresponding normal sample and information about the TCGA project and the availablility of other data types (i.e., WES, CNV, miRNA, and methylation) for the same individual, this report builder also provides a rough estimation of the fold change of each gene.

../_images/mRNASummary.png

mRNA summary module in Sequence Miner

The expression value can be viewed in one of the following three normalized or non-normalized formats, depending on user input.

Expression value format

Description

HTSeq

Raw read counts of the expression data at individual gene level, calculated using the HTSeq tool 1

FPKM

FPKM, or fragments per kilobase of transcript per million mapped reads, is a normalized format of the expression value that normalizes read count based on gene length and the total number of mapped reads 2. It is calculated as follows:

\[FPKM = [RMg * 109 ] / [RMt * L]\]
  • RMg: The number of reads mapped to the gene

  • RMt: The total number of read mapped to protein-coding sequences in the alignment

  • L: The length of the gene in base pairs

FPKM UQ (default)

FPKM UQ, or fragments per kilobase of transcript per million mapped reads upper quartile, is a modified version of the FPKM normalization method 3. It is calculated as follows:

\[FPKM = [RMg * 109 ] / [RM75 * L]\]
  • RMg: The number of reads mapped to the gene

  • RM75: The number of read mapped to the 75th percentile gene in the alignment.

  • L: The length of the gene in base pairs

Interpreting the output

Output columns can be viewed in different Perspectives (see also Column descriptions).

The Default view perspective shows the expression value of each gene per supplied sample, whereas the Candidate genes perspective shows only the genes provided in the candidate_gene_report_grid or candidate_gene_report_file fields.

If available, expression values are provided for both the tumor sample and its corresponding normal sample. A fold change in the expression of the gene in the tumor sample is also calculated relative to the normal sample. A positive value indicates higher expression in the tumor and vice versa. The fold change value can provide insight about the most highly expressed or least expressed genes at the individual sample level or the group level.

The PN and Flag output columns can provide insight about the presence of other data types that are available for both the tumor sample and its corresponding normal sample, which can be useful for a multi-omic study.

Column descriptions

Report output columns and descriptions

Group

Column

Description

FPKM

These columns are displayed only when either FPKM or FPKM UQ is selected in the expression_value field.

normal

FPKM normalized expression value of the corresponding normal sample

tumor

FPKM normalized expression value of the tumor sample

UQ_normal

FPKM (upper quartile) normalized expression value of the corresponding normal sample

UQ_tumor

FPKM (upper quartile) normalized expression value of the tumor sample

read

These columns are displayed only when HTSeq is selected in the expression_value field.

count_normal

Raw (non-normalized) read counts from HTSeq of the corresponding normal sample

count_tumor

Raw (non-normalized) read counts from HTSeq of the tumor sample

Other columns

candidate_gene

A binary value (0 or 1) indicating whether the gene is one of the candidate genes selected in the candidate_gene_report_grid or candidate_gene_report_file; 1 indicates TRUE and 0 indicates FALSE

Expr_value_log2_fold_change

The log2 fold change in expression in tumor relative to normal; a positive values indicates higher expression in the tumor and vice versa; the formula for calculation is as follows: log2((expression value in tumor + 1)/(expression value in normal + 1))

cdsStart

Start of the protein coding sequence (CDS) of the gene

cdsEnd

End of the protein coding sequence (CDS) of the gene

Biotype

The biological type of the gene (protein coding, lincRNA, miRNA, etc.)

Description

The name of the gene and other information about the gene, in a longer format

SubjectID

The TCGA ID of the individual (TCGA--**)

Disease_type

The type of cancer

Primary_site

The primary site or organ of the cancer type

TCGA_project

A four letter code for the project of the cancer type

PN

CNV_normal

Sample name/ID of the CNV data of the tumor sample’s corresponding normal sample

CNV_tumor

Sample name/ID of the CNV data of the same tumor sample

Methyl_normal

Sample name/ID of the methylation sequencing data of the tumor sample’s corresponding normal sample

Methyl_tumor

Sample name/ID of methylation sequencing data of the same tumor sample

miRNA_normal

Sample name/ID of miRNA sequencing data of corresponding normal sample for the same tumor sample

miRNA_tumor

Sample name/ID of miRNA sequencing data of the same tumor sample

RNASeq_normal

Sample name/ID of mRNA sequencing data of corresponding normal sample for the same tumor sample

RNASeq_tumor

Sample name/ID of mRNA sequencing data of the same tumor sample

WES

Sample name/ID of WES data of the same tumor sample

Flag

CNV_normal

A logical value (true/false) indicating whether the CNV data of the corresponding normal sample is available for the same tumor sample

CNV_tumor

A logical value (true/false) indicating whether CNV data of the same tumor sample is available

Methyl_normal

A logical value (true/false) indicating whether methylation sequencing data of corresponding normal sample is available for the same tumor sample

Methyl_tumor

A logical value (true/false) indicating whether methylation sequencing data of the same tumor sample is available

miRNA_normal

A logical value (true/false) indicating whether miRNA sequencing data of corresponding normal sample is available for the same tumor sample

miRNA_tumor

A logical value (true/false) indicating whether miRNA sequencing data of the same tumor sample is available

RNASeq_normal

A logical value (true/false) indicating whether mRNA sequencing data of corresponding normal sample is available for the same tumor sample

RNASeq_tumor

A logical value (true/false) indicating whether mRNA sequencing data of the same tumor sample is available

WES

A logical value (true/false) indicating whether WES data of the same tumor sample is available

Perspective views

Perspectives subtabs focus on subsets of the columns in the Default view.

Perspectives

Perspective

Description

Candidate genes

Displays only the genes selected in the candidate_gene_report_grid or candidate_gene_report_file fields.

Default view

Displays all genes

References

1

https://docs.gdc.cancer.gov/Encyclopedia/pages/HTSeq-Counts/

2

https://docs.gdc.cancer.gov/Encyclopedia/pages/HTSeq-FPKM/

3

https://docs.gdc.cancer.gov/Encyclopedia/pages/HTSeq-FPKM-UQ/