Decade of GeneCards Symposium
New Search (GeneCards Home)  |  GeneCards Guide  |    User Feedback WIN AN
 iPod!!!
 |  Terms of Use  |  Notice about third-party sites
  This service is provided free to academic non-profit institutions. ALL other users require a Commercial License from XenneX, Inc.
What's New
GeneCards Guide
  Getting Started
  About GeneCards
  Data Sources
  Citing This Resource
  Publications

Mirror sites

Weizmann Institute of
Science

Crown Human Genome
Center

Bioinformatics Unit

Jobs


What's in a GeneCard?


This page provides information about the various GeneCards sections and tables.

General Comments

  • The sections that follow are linked to by the GeneCards links labeled About, About this table, About this scheme, and About these images found in the corresponding GeneCards section.
  • Superscripts in the data refer to the sources (shown on the left column of the card) from where the data was extracted.
  • Tooltips offering explanatory information about images can be viewed by placing your mouse over the images (see expression).
  • Text background color changes for easy identification of mouseovers.
  • To keep the GeneCards page more compact, many of the tables/columns initially offer only partial, high-scoring results (e.g: the top 10 SNPs sorted by type, with coding, nonsynonomous SNPs shown first; the chemical compounds matching highest with the gene; etc). In these cases, a hyperlink is always provided for viewing all of the information available for that item.

GeneCards Categories

    CategoryDescription
    protein-codingEntrez Gene type is 'protein coding', or data source is Ensembl and an Ensembl protein exists
    pseudogene Entrez Gene type is 'pseudo' or symbol contains 'pseudo'
    RNA geneEntrez Gene type ends with 'RNA'
    gene clustersymbol ends with '@'
    genetic locus none of the above, but there is disease information , or 'QTL' in the symbol
    uncategorizednone of the above

  • Categories are based on Entrez Gene type and status, as well as several other factors.
  • The former categories 'predicted' and 'predicted with support' are manifested now in the attibute 'predicted', which appears in the upper left box, where the category and GCid appear. This attribute means that the Entrez Gene status is 'PREDICTED', 'INFERRED', or 'MODEL' or the symbol source is Ensembl.
  • The former category "reserved symbol" no longer exists because it is no longer used by HUGO.

GeneCard Header

This section provides the gene's symbol, GCid, and category in the box on the left hand side.
Each gene category has its distinct color: protein-coding, pseudogene, RNA gene, gene cluster, genetic locus, and uncategorized.
The gene's symbol and GCid are the color of the gene's category.
The header also consists of a short description of the gene, and whether or not the gene symbol is HUGO Gene Nomenclature Committee (HGNC) database approved.

Aliases & Descriptions

This section displays synonyms and aliases for the relevant GeneCards gene, as extracted from GDB, OMIM, HGNC, Entrez Gene, UniProt (Swiss-Prot/TrEMBL), GeneLoc, and Ensembl. Also shown are accessions from HGNC, EntrezGene, UniProt, and/or Ensembl, and previous GC identifiers where relevant (for cases that GeneLoc deems it necessary to assign a new identifier to a gene based on updated information about its chromosomal location). Such GC ids will always remain with their original genes and will not be reused with other symbols.

Genomic Location

This section displays the chromosome, cytogenetic band and map location of the GeneCards gene as extracted from GeneLoc, HGNC, Entrez Gene, and miRBase, as well as genomic views from UCSC and Ensembl. The GeneLoc integrated location is shown in red on the image. If this differs from the location provided by Entrez Gene and/or Ensembl, their locations are shown on the image in green and/or blue respectively. Also provided are links to the GeneLoc gene density information for this gene's chromosome, which shows the number of genes in each 1 Mb interval along the chromosome, and to detailed exon information as provided by GeneLoc.

Proteins

This section provides annotated information of the proteins encoded by GeneCards genes according to UniProt, and/or Ensembl, the capability to view phosphorylation sites using Phosphosite and Invitrogen, reference sequences (RefSeq) according to NCBI, cellular component ontologies visualized by the Gene Ontology Consortium (more information), and links for ordering antibodies from Invitrogen, Millipore, BIOMOL, Cell Signaling Technology, Abcam and/or GeneTex. Direct links to three-dimensional visualization of PDB structures provided by the OCA browser are also provided via the (3D) hyperlink shown next to each PDB identifier.

Protein Domains/Families

This section provides annotated information about protein domains and families according to InterPro, ProtoNet, UniProt and Blocks.

For InterPro entries, one can view other genes with these domains or in these families.
By selecting the InterPro entries of interest and clicking on the GeneDecks link, one arrives to a result page, containing a list of these genes and their descriptions.

Gene Function

This section provides annotated information about gene function according to MGD, UniProt IUBMB, and Genatlas, including: RNAi and Clones products from Invitrogen and Millipore , and Primers products from Invitrogen, Cell based and Kinase Function and Binding Assays from DiscoveRx, as well as molecular function ontologies visualized by the Gene Ontology Consortium (more information). Information from MGD includes phenotypes for mouse orthologs and a popup table with information on phenotypic alleles of the orthologs. This table presents the following columns:

  • Allele Name - Official symbol for the allele with link to MGD record
  • MGI id - MGI identifier of the allele (linked to in previous column)
  • Category - Type of allele by mode of origin
  • Observed Phenotypes in Mouse - Phenotypic details for all genotypes that include at least one of the alleles
GeneCards Inferred Functionality Scores (GIFTS)

The GIFTS algorithm uses the wealth of GeneCards annotations to produce scores aimed at predicting the degree of the genes' functionality. Since the degree of known functionality is correlated with the amount of research done on a particular gene or its product, we use these annotations in a scoring system aimed at inferring functionality. Note that while the accumulation of data for a specific gene in certain databases is merely correlated with functionality, many GeneCards sources, like the Gene Ontology (GO) Consortium and Genatlas provide definitive information about functionality.

Our goal is to use these two types of annotations in order to measure the functionality of GeneCards genes. Initially, we calculate scores based on the existence of data for each GeneCards gene in selected sources that were previously found to be correlated with functionality, namely the list used as the basis for finding random annotated genes. Each gene receives the sum of binary scores based on selected fields (as noted below). The existence of data in each one of those fields for a certain gene contributes the score of 1; the absence of data, zero. The sum of these scores define the current version of GIFTS.

We have started to investigate giving bigger weights to data from sources that give more definitive indications of functionality. We expect the GIFTS to improve accordingly.

The prediction of functionality is based on existence of data in the following sources:
AceView Aliases ASD Atlas of Genetics and Cytogenetics in Oncology and Haematology AKS CGAP Cell Signaling Technology (CST) CGAP Database Of Transcribed Sequences(DoTS) ECgene Ensembl Entrez Gene Genatlas GeneAnnot GeneLoc GeneNote GeneTide Gene Ontology Consortium HUGO Gene Nomenclature Committee (HGNC) database HomoloGene InterPro Invitrogen Kyoto Encyclopedia of Genes and Genomes (KEGG) Research Articles MINT MGD ProtoNet SNPs/Variants Swiss-Prot TrEMBL Unigene



Pathways & Interactions

This section provides links to pathways and interactions according to information extracted from Invitrogen iPath, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Cell Signaling Technology (CST), UniProt, and MINT, ELISA, Kinase Binding, and Pathway Activation Assays from DiscoveRx (DiscoveRx users need to log in once per session, first time users require free registration), as well as biological process ontologies visualized by the Gene Ontology Consortium (more information).

For Kegg, CST and Invitrogen iPath pathways, one can view other genes that participate in these pathways.
By selecting the pathways of interest and clicking on the GeneDecks link, one arrives to a result page, containing a list of these genes and their descriptions.

Interacting proteins

Each line in this table represents one interacting protein, according to EBI-IntAct, MINT, or both. The following columns are presented:

  • Interactant - Links to the GeneCards page (first sub-column) and the UniProt page (second sub-column) for the interacting protein. Superscript links: 1 - the comments section in the UniProt page for the interactant; 2 - the page of all interactions between the two proteins, or all experiments supporting them, in the MINT database.
  • Interaction Details - Links to the interaction page in the database from which in was retrieved. In the case of IntAct, this page may include several different experiments supporting the same interaction. In the MINT database each distinct interaction definition or experiment supporting it is assigned a different mint id, all are presented.
Drugs & Chemical Compounds

This section provides relationships between GeneCards genes and both chemical compounds and drugs, as well as links to drugs and compounds for purchase at BIOMOL. Chemical compound relationships are from AKS. Drug compound relationships are from PharmGKB.

AKS chemical compound relationships.

This table presents the following columns:

  • Compound - The name of the chemical compound related to this GeneCards gene.
  • Score - The AKS score of the relevance of the chemical compound to this gene, based on their literature text-mining algorithms.
  • Articles - The number of articles in which both the gene's symbol and the compound appear.
  • PubMed IDs for Articles with Shared Sentences (# sentences) - PubMed IDs of articles in which both the gene symbol and the compound appear in the same sentence, sorted by the number of sentences (shown in parentheses in the column) in which they both appear.

PharmGKB drug compound relationships.

This table presents the following columns:

  • Drug Compound - The name of the drug compound related to this GeneCards gene.
  • PharmGKB Relations - description of the relationship between the gene and the drug:
    • CO - Clinical Outcome
    • PD - Pharmacodynamics and Drug Response
    • PK - Pharmacokinetics
    • FA - Molecular and Cellular Functional Assays
    • GN - Genotype
  • PubMed IDs for articles supporting these relationships - PubMed IDs of articles in which both the gene symbol and the drug are discussed.
Transcripts

This section contains associated Unigene clusters and repesentative sequences, REFSEQ mRNAs with associated expression assays from Applied Biosystems when available, RNAi products from Invitrogen and Millipore, OriGene clones, DOTS assemblies (sorted by a scoring scheme that gives preferences to mRNAs over EST associations), GeneTide highest scoring ESTs, transcript and alignment information from AceView, additional gene/cDNA sequences from GenBank, alternative splicing information, and transcript links to Ensembl.

Alternative Splicing

This subsection contains alternative splicing information according to ASD followed by alternative splicing isoforms from ECgene. Exons with alternative splice sites in different isoforms were broken into Exonic Units (ExUns). The letters indicate the order of the ExUns in the exon. The symbol ' ^ ' between ExUns indicates an intron, while ' ·' indicates the junction of two ExUns.

Expression in Human Tissues

This section contains links to expression assays from Applied Biosystems, experimental results from GeneNote, probeset-to-gene annotations from GeneAnnot and GeneTide, electronic Northern data images and clone count from UniGene, SAGE expression data images and tag counts based on data extracted from CGAP, followed by links to SOURCE, and/or EXPOLDB, and/or tissue specificity data from UniProt.

An association of GeneCards genes to Affymetrix probe-sets, through GeneAnnot and GeneTide is presented in a table.
One can view other genes that share binary patterns of normal tissue expression based on GeneNote data (HG-U95 arrays).
By selecting the probe-sets of interest and clicking on the GeneDecks link, one arrives to a result page, containing a list
of these genes and their descriptions.

Other columns include data from GeneAnnot and GeneTide, where an asterisk next to the probe set name indicates lower quality annotation, as follows:

  • Array - The Affymetrix GeneChip® expression array. Note that U95-A refers to Affymetrix array AV2 (version 2).
GeneAnnot
  • # genes - The number of genes related to this probe set.
  • Sensitivity - The fraction of probes that hit transcripts related to that gene (range: 0-1).
  • Specificity - The degree to which the individual probes of a given probe set match a certian gene and only that gene (range: 0-1, where 1 is most specific).
GeneNote
  • Correlation - see description below in "GeneNote - individual probe-sets variation" (range: 0-1).
  • Length - see description below in "GeneNote - individual probe-sets variation" (range: 0-4).
GeneTide
  • Gb_Accession - The mRNA's GenBank accession number.
  • Consensus - The fraction of annotating resources that agree that the cDNA belongs to the gene (range: 0-1).
  • Uniqueness - A confidence score that says how convinced each resource is that this is the only possible gene associated with the sequence (range: 0-1).
  • Score - The 'Consensus' and 'Uniqueness' parameters collapsed into one score (range: 0-1).
  • Rank - The position of the specific gene among all other genes associated with this transcript.
After the table, 3 pairs of images, for GeneNote, electronic Northern, and SAGE tissue expression data respectively, are presented, with the following tooltips:

GeneNote - expression arrays

Experimental tissue vectors: Duplicate measurements were obtained for twelve normal human tissues hybridized against Affymetrix GeneChips HG-U95A-E. The intensity values (shown on the y-axis) were normalized and drawn on a novel scale, which is an intermediate between log and linear scales. This enables displaying several orders of magnitude on the same graph, while emphasizing the differences between them. Noise was not subtracted out, so values below 10 may be suspect. Further, each probeset's expression profile was converted into binary form when possible. At most 5 unique binary patterns, which reflect the over-expression (in black) and under-expression (in white) in different tissues, are shown per gene, with their counts on the left. (The grey stripes show undefined binary patterns). Please note: under-expression does not always mean the lack of expression.

GeneNote - individual probe-sets variation

Multiple probe-sets corresponding to this gene are included for its tissue vector calculation only if their normalized intensity levels reach a threshold in at least one tissue. The variation of included and excluded probe-sets are visualized in the x-y plane: the x-axis shows Pearson's correlations between individual probe-sets vectors and the average tissue vector; the y-axis shows the relative length of an individual probe-set vector (its scalar length divided by that of the average vector). The average is shown as a black square, while individual probe-sets are depicted as colored circles.

UniGene - electronic Northern

Electronic Northern: For the shown set of non-fetal normal human tissues, NCBI's Unigene dataset (Hs.data) is mined for information about the number of unique clones per gene per tissue. Clones are assigned to particular tissues by applying data-mining heuristics to Unigene's library information file (Hs.lib.info). Electronic expression results were calculated by dividing the number of clones per gene by the number of clones per tissue. They were then normalized by multiplying by 1M, and the obtained normalized counts are presented on the same root scale as the experimental tissue vectors.

CGAP:SAGE

Serial Analysis of Gene Expression: For ten normal human tissues (currently the relevant SAGE libraries are not available for spleen and thymus, shown in lower case and flagged with a *). CGAP datasets Hs.frequencies and Hs.libraries are mined for information about the number of SAGE tags per tissue. Tags are reassigned to a Unigene cluster and after that to a particular gene by mining Hs.best_gene, Hs.best_tag and Hs_GeneData. The expression level of a particular gene in a particular tissue was calculated as the number of appearances of the corresponding tag divided by the total number of tags in libraries derived from that tissue. These fractions were then normalized by multiplying by 1.2M and the obtained normalized counts are presented on the same root scale as that used for the electronic Northern pictures. Please note: Currently, only associations with minimal ambiguity participate in the analysis.

Similar Genes in Other Organisms

This section contains similar genes in other organisms from HomoloGene, euGenes, SGD, and MGD, with possible further links to Flybase and WormBase.

Orthologs

This table presents the following columns:

  • Organism - The names of the homologous species, using both scientific and popular terminology.
  • Gene - The symbol for the gene in the homologous species.
  • Locus - The position of the gene in the homologous species.
  • Description - Its description.
  • Human Similarity - The percent similarity to the human gene, followed either by (n) where the comparison was based on nucleic acids or (a) for amino acid based comparisons.
  • NCBI accessions - links to the sequences for the gene in NCBI databases including GenBank and Entrez Gene.

Upon clicking the "Species with no ortholog" link, a pop-up window appears. It lists the species that do NOT have an ortholog to the relevant gene.

Superscripts represent the source from which this data was extracted. If a '~' follows the superscript for HomoloGene, it means that data for this species exists only in the older version of HomoloGene, which used unfinished genomes and where the homologs found might not be true orthologs.

Following the table is a link to Ensembl gene trees.

Paralogs

This section contains Paralogs from HomoloGene and Ensembl , and Pseudogenes from Pseudogene.org .

SNPs/Variants

This section contains SNPs/Variants from the NCBI SNP Database, Ensembl, and PupaSUITE/ PupaSNP, with descriptions from UniProt, Linkage Disequilibrium images from HapMap, and links for ordering reagents from Applied Biosystems.

NCBI SNPs

SNP information is currently extracted from dbSNP XML files. Filtering is done to include only those that are not artifacts, not connected to gene duplication, not withdrawn by NCBI, fully specified, without ambiguous locations or low map quality, and having single Entrez Gene and contig ids. The order of a gene's displayed SNPs can be determined by the user. By default, SNPs are sorted first (shown in the select box as 1st) by validation status (validated before non-validated), then, within these groups, by ordered location type (first coding non-synonymous, then coding synonymous, followed by coding, splice site, mRNA-UTR, intron, locus, reference, and/or exception), as the secondary (2nd) nested criterion, and finally, by the number of validations (up to 4). The user can change this default sort order and define up to three hierarchical sorting priorities from fields available as select boxes above the relevant columns on the section's button line as follows: rs-numbers (sorted in ascending order), validation status, position on the chromosome (ascending order), location type, allele frequencies (existing info before non-existing), population types (alphabetical order), and total sample size (largest to smallest). Each displayed line includes genomic, expression, and allele frequency data sections. Only the summary is shown for the expression and allele frequency sections, with a link to the detailed information (via the magnifying glass icon).

This table presents the following columns:

  • AB - The AB logo is presented if an Applied Biosystem TaqMan genotyping assay exists for the SNP. Click on the logo to access the relevant page at AB.
  • SNP ID - The NCBI rs number for this SNP
  • Valid - The validation method(s) associated with this SNP:
    • C - by-cluster
      has 2+ submissions, with 1+ submissions assayed with a non-computational method
    • A - by-2hit-2allele
      all alleles have been observed in 2+ chromosomes
    • F - by-frequency
      subsnp has frequency data submitted
    • H - by-hapmap
      validated by HapMap project
    • O - by-other-pop
  • Chr pos - chromosomal position: position of variation(strand).
  • Sequence - The sequence flanking the base pair variation (highlighted in blue/orange/green/pink). Lower case letters indicate repetitive or low-complexity sequence.
  • Recs - number of records for expression/allele frequency data
  • AAChg - The change in amino acid resulting from this SNP
  • Type - The SNP type:
    • nonsynon - coding, non-synonymous
      change in peptide with respect to contig sequence
    • synon - coding, synonymous
      no change in peptide for allele with respect to contig seq
    • cds - coding
      variation in coding region of gene, assigned if allele-specific class unknown
    • spl - splice-site
      variation in first 2 or last 2 bases of intron
    • utr - mrna untranslated region
      variation in transcript, but not in coding region interval
    • int - intron
      variation in intron, but not in first 2 or last 2 bases of intron
    • exc - exception
      variation in coding region with exception raised on alignment.This occurs when protein with gap
      in sequence is aligned back to contig sequence. variations 3' of the gap have undefined functional inference.
    • ref - reference
      allele observed in reference contig sequence
    • loc - locus-region
      variation in region of gene, but not in transcript
    • PupaSnp Designations:
      ese - exonic splicing enhancer
    • spl - splice-site
      trp - triplex forming sequences
      tfbs - transcription factor binding sites
  • More - View individual records
  • Allele freq - Average frequency of the allelles for all populations, displayed as a pie-chart (only if 2 alleles). Alleles are in the same orientation and color as the displayed SNP sequence. Numeric info about the frequencies is available using the mouseover.
  • Pop - population type
  • Total sample - total data sample size (number of chromosomes)
Additional columns in Expression data popup:
  • mRNA Accession - The mRNA sequence at NCBI
  • Protein Accession - The protein sequence at NCBI
  • Phase - Codon position.(1, 2, or 3)
  • Protein Position - Position number of the amino acid in the protein.
Additional columns in Allele Frequency data popup:
  • Het - estimated heterozygosity of population
  • Sample Size - population data sample size (number of chromosomes)
Additional SNPs, found in Applied Biosystems data source but not in NCBI, are displayed under the table (both "see all" options display these SNPs).
This section also provides Linkage Disequilibrium (LD) information from HapMap. The first link opens a popup window showing the LD map for the length of the gene in population CEU (Utah residents with ancestry from northern and western Europe). For other populations (at HapMap), click on the second link.

Disorders & Mutations

This section contains Disorders & Mutations in which GeneCards genes are involved, according to OMIM, UniProt, AKS, PharmGKB, Genatlas, GeneTests, HGMD, GAD, HuGENet, BCGD, and/or TGDB.

AKS disease relationships

This table presents the following columns:

  • Disease - The name of the disease related to this GeneCards gene.
  • Score - The AKS score of the relevance of the disease to this gene, based on their literature text-mining algorithms.
  • Articles - The number of articles in which both the gene's symbol or description and the disease appear.
  • PubMed IDs for Articles with Shared Sentences (# sentences) - PubMed IDs of articles in which both the gene symbol and the disease appear in the same sentence, sorted by the number of sentences (shown in parentheses in the column) in which they both appear.

PharmGKB disease relationships

This table presents the following columns:

  • Disease - The name of the disease related to this GeneCards gene.
  • PharmGKB Relations - description of the relationship between the gene and the disease:
    • CO - Clinical Outcome
    • PD - Pharmacodynamics and Drug Response
    • PK - Pharmacokinetics
    • FA - Molecular and Cellular Functional Assays
    • GN - Genotype
  • PubMed IDs for articles supporting these relationships - PubMed IDs of articles in which both the gene symbol and the disease are discussed.
Medical News

This section provides links to possibly related articles in Doctor's Guide.

Research Articles

This section provides titles of and links to research articles in PubMed, as associated via AKS, HGNC, Entrez Gene, UniProt, PharmGKB, and/or GAD.

The articles are ranked, first according to the number of GeneCards sources that associate the article with this gene, then by date of publication, and then according to the AKS score for this article/gene relationship. The year of publication appears in parentheses after the title of each article. Lower ranked articles may also appear in partial results if their titles or authors contain your search term.

Search Box

This section allows the user to search PubMed, OMIM, or NCBI Bookshelf. The current gene's aliases and disorders are provided, as well as the search string that led to the gene, to be used as search fodder. The user can also add new search terms.

Databases

These sections provide links to the GeneCards genes in other databases:

Licensable Technologies

This section features technologies that are available for licensing. Institutions currently featured include the Weizmann Institute of Science, the Salk Institute for Biological Studies, and Tufts University.

Services

This section provides links to reagents available from Applied Biosystems, Invitrogen, Millipore, and/or R&D Systems, antibodies available from Cell Signaling Technology, Abcam, GeneTex, Invitrogen, Millipore, R&D Systems and/or Sigma-Aldrich, clones available from OriGene, Invitrogen, and/or RZPD, and GPCR/Kinase Profiling, Assay development, β-Arrestin, GPCR & ELISA assays available from DiscoveRx and/or R&D Systems.


Gene Ontology (GO) Tables

The Gene Ontology sections in Proteins, Gene Function, and Pathways & Interactions display a table with the following columns:

  • GO ID
    The identifier used by GO and linked to the GO entry
  • Qualified GO term
    The description of this entry, possibly qualified with "NOT", "colocalizes with", or "contributes to"
  • Evidence
    A 2 or 3 letter code
    • Curator-assigned Evidence Codes
         Experimental Evidence Codes:
      IDA: Inferred from Direct Assay
      IPI: Inferred from Physical Interaction
      IMP: Inferred from Mutant Phenotype
      IGI: Inferred from Genetic Interaction
      IEP: Inferred from Expression Pattern
         Computational Analysis Evidence Codes:
      ISS: Inferred from Sequence or Structural Similarity
      IGC: Inferred from Genetic Context
      RCA: Inferred from Reviewed Computational Analysis
         Author Statement Evidence Codes:
      TAS: Traceable Author Statement
      NAS: Non-traceable Author Statement
         Curator Statement Evidence Codes:
      IC: Inferred by Curator
      ND: No biological Data available
    • Automatically-assigned Evidence Codes
      IEA: Inferred from Electronic Annotation
    • Obsolete Evidence Codes
      NR: Not Recorded
  • PubMed ids
    References in the literature, if relevant, obtained from EntrezGene
Selected Algorithms

AKS Scoring Algorithm

The relevance scores of elements related to genes (chemical substances and diseases) are based on the analysis of co-occurrences of two elements in Medline documents. The observed number of documents where both elements appear together and the number of documents where both appear independently are compared to an expected value based on a hypergeometric distribution. The more co-occurrences are observed in relation to the number expected the more unlikely it is that this happened by chance and the higher will be the value. Unfortunately the absolute numbers are not meaningful but can only give an order of importance (i.e. in the list of chemicals related to a gene the order is meaningful and the first chemicals in the list are, statistically, stronger related to the gene than the following ones but the absolute values of the scores may change from one release to another)











Developed at the Crown Human Genome Center & Weizmann Institute of Science

Back to top


Copyright © 1997-2008, Weizmann Institute of Science. All Rights Reserved.