|
Explanation for supplementary table S1
de Lichtenberg et al., some journal, 2004
(last updated the 13th of July 2004)
Columns
A) The systematic Open Reading Frame name according to the Saccharomyces
Genome Database (SGD), located at http://www.yeastgenome.org.
B) The standard name of the gene according to the Saccharomyces Genome Database (SGD), located at http://www.yeastgenome.org.
C) Periodicity analysis performed in this study. For each gene, a fourier score is calculated based on the expression profile during the two cell cycles covered in the experiment (see the paper).
D) The fourier score is compared to a distribution of scores from random perturbations of the same expression profile to assign a p-value (i.e. chance of observing a better score by random shuffling of the observed points).
E) The false discovery rate (FDR) is estimated by the multiple testing correction procedure of Benjamini and Hochberg, J. Roy. Stat. Soc. B Met., 57(1):289-300,1995. The FDR at a given p-value gives the expected proportion of false positives among the genes scoring above that p-value. Here, false positives should be understood as genes that are wrongfully classified as being periodic, although there are in reality not.
F) As described in the paper, the offset of the best fitting sine curve is calculated for each profile and converted into an estimate of the time of peak expression during the cell cycle. Column E contains this estimate, in units of cell cycle progression with 0 and 1 being the time of cell division. It should be noted that a value is estimated for all genes, although the profile may not appear periodic. We therefore advise to take into account all three scores (columns C-E) when evaluating the periodicity and peaktime of a gene.
G) The phase in which the periodic gene
peaks in expression. These phase estimates were obtained by comparing the
peaktime of every gene to the peaktime distribution of the phase specific gene
clusters defined by Spellman et al. and Johansson et al., as summarized in
Figure 3 of the Supplementary Information.
H) The prediction score for the protein encoded by this gene. The score is based on the predictions of an ensemble of so-called artificial neural networks trained to recognize cell cycle regulated proteins from their features. These include subcellular localization, phosphorylation and glycosylation potential, degradation signals, isoelectric point and other physiochemical properties. For details, see de Lichtenberg et al., J. Mol. Biol., 329(4): 663-74, 2003.
IN) In this study, we compare our results to two different analyses of the three publicly available cell cycle data sets in yeast (Alpha, Cdc15 and Cdc28) originally published by Spellman et al., Mol. Biol.Cell, 9: 3273-3297, 1998. The first analysis is that of Johansson et al., Bioinformatics, 19(4): 467-473, 2003 which identifies 178, 227 and 151 genes, respectively, as significantly periodic in the three experiments. The second reference study was Luan and Li, Bioinformatics 20(3): 332-339, 2004. These authors found 297, 482 and 623 genes significantly periodic in the Alpha, Cdc15 and Cdc28 data sets, respectively. In each experiment we use the notation:
● = Johansson et al. ○ = Luan & Li
OP) These columns report periodically expressed homologs identified by microarray studies in other species (Human & Fission yeast).
For human, we compared to the results of Whitfield et al. (Molecular Biology of the Cell, 13, p. 1977-2000, 2002) who identified 875 periodically expressed genes in a microarray study of synchronized human HeLa cells. Unigene clusters reported in this experiment were matched against human protein sequences requirering 90% identify over at least 100 amino acids. Protein hits were compared (using BLAST) with the yeast proteome, defining human-yeast homologs as those with an e-value below 1e-5. Column N reports protein identifier (from SwissProt, PIR or RefSeq) of human homologs that were among the 875 periodically expressed human genes.
S. Pombe (fission yeast) orthologs of S. cerevisia proteins were taken from the study by Rustici et al (Nature Genetics, 2004). These authors identified 407 periodically expressed genes in S. Pombe. In cases, where several periodic S. Pombe proteins matched a S. cerevisiae protein only one is listed and the total number of periodic orthologs given in parentheses.
QAA) In these columns we visualize data from studies on the genome-wide location of transcription factor binding, i.e. which transcription factors are physically associated with the promoter region of the gene. Columns P and Q show data from a study by Iyer et al., Nature, 409: 533-538, 2001, on the binding of the two complexes SBF (consists of proteins Swi4p and Swi6p) and MBF (consists of protein Mbp1p and Swi6p).
Columns R-Z represent the combination of two studies from the same lab, namely Simon et al., Science, 298: 799-804, 2002 and Lee et al., Cell, 106: 697-708, 2001. In both of these studies 9 known cell cycle transcription factors were investigated and a p-value was estimated for the binding of each factor to each promoter in the S. cerevisiae genome. Here we report binding of the factor only in cases where the p-value was lower than 0.01 in both studies.
AB) Data is presented from a study of targets of the Cdk1 (Cdc28) in S. cerevisiae, performed by Ubersax et al., Nature, 425(6960): 859-864, 2003. The authors report a list of high confidence targets and a list of lower confidence targets discovered in their screen. These data are referenced in the table as high and low, respectively.
AC-E) Gene Ontology (http://www.geneontology.org/) annotations of Biological Process, Molecular Function and Cellular Component according to the Saccharomyces Genome Database (SGD), located at http://www.yeastgenome.org
The * denotes that the gene is annotated with more GO terms than then one shown.
|
|