OBJECTIVE: To assess whether Parkinson disease (PD) genes are somatically mutated in cutaneous melanoma (CM) tissue, because CM occurs in patients with PD at higher rates than in the general population and PD is more common than expected in CM cohorts.METHODS: We cross-referenced somatic mutations in metastatic CM detected by whole-exome sequencing with the 15 known PD (PARK) genes. We computed the empirical distribution of the sum of mutations in each gene (Smut) and of the number of tissue samples in which a given gene was mutated at least once (SSampl) for each of the analyzable genes, determined the 90th and 95th percentiles of the empirical distributions of these sums, and verified the location of PARK genes in these distributions. Identical analyses were applied to adenocarcinoma of lung (ADENOCA-LUNG) and squamous cell carcinoma of lung (SQUAMCA-LUNG). We also analyzed the distribution of the number of mutated PARK genes in CM samples vs the 2 lung cancers.RESULTS: Somatic CM mutation analysis (n = 246) detected 315,914 mutations in 18,758 genes. Somatic CM mutations were found in 14 of 15 PARK genes. Forty-eight percent of CM samples carried ?1 PARK mutation and 25% carried multiple PARK mutations. PARK8 mutations occurred above the 95th percentile of the empirical distribution for SMut and SSampl. Significantly more CM samples harbored multiple PARK gene mutations compared with SQUAMCA-LUNG (p = 0.0026) and with ADENOCA-LUNG (p
Biocatalysts showcase the upper limit obtainable for high-speed molecular processing and transformation. Efforts to engineer functionality in synthetic nanostructured materials are guided by the increasing knowledge of evolving architectures, which enable controlled molecular motion and precise molecular recognition. The cellulosome is a biological nanomachine, which, as a fundamental component of the plant-digestion machinery from bacterial cells, has a key potential role in the successful development of environmentally-friendly processes to produce biofuels and fine chemicals from the breakdown of biomass waste. Here, the progress toward so-called designer cellulosomes, which provide an elegant alternative to enzyme cocktails for lignocellulose breakdown, is reviewed. Particular attention is paid to rational design via computational modeling coupled with nanoscale characterization and engineering tools. Remaining challenges and potential routes to industrial application are put forward.
In the course of investigating anti-DNA autoantibodies, we examined IgM and IgG antibodies to poly-G and other oligonucleotides in the sera of healthy persons and those diagnosed with systemic lupus erythematosus (SLE), scleroderma (SSc), or pemphigus vulgaris (PV); we used an antigen microarray and informatic analysis. We now report that all of the 135 humans studied, irrespective of health or autoimmune disease, manifested relatively high amounts of IgG antibodies binding to the 20-mer G oligonucleotide (G20); no participants entirely lacked this reactivity. IgG antibodies to homo-nucleotides A20, C20 or T20 were present only in the sera of SLE patients who were positive for antibodies to dsDNA. The prevalence of anti-G20 antibodies led us to survey human, mouse and Drosophila melanogaster (fruit fly) genomes for runs of T20 and G20 or more: runs of T20 appear >170000 times compared with only 93 runs of G20 or more in the human genome; of these runs, 40 were close to brain-associated genes. Mouse and fruit fly genomes showed significantly lower T20/G20 ratios than did human genomes. Moreover, sera from both healthy and SLE mice contained relatively little or no anti-G20 antibodies; so natural anti-G20 antibodies appear to be characteristic of humans. These unexpected observations invite investigation of the immune functions of anti-G20 antibodies in human health and disease and of runs of G20 in the human genome.
MicroRNAs (miRs) regulate a variety of cellular processes, and their impaired expression is involved in cancer. Silencing of tumor-suppressive miRs in cancer can occur through epigenetic modifications, including DNA methylation and histone deacetylation. We performed comparative miR profiling on cultured lung cancer cells before and after treatment with 5'aza-deoxycytidine plus Trichostatin A to reverse DNA methylation and histone deacetylation, respectively. Several tens of miRs were strongly induced by such 'epigenetic therapy'. Two representatives, miR-512-5p (miR-512) and miR-373, were selected for further analysis. Both miRs were secreted in exosomes. Re-expression of both miRs augmented cisplatin-induced apoptosis and inhibited cell migration; miR-512 also reduced cell proliferation. TEAD4 mRNA was confirmed as a direct target of miR-512; likewise, miR-373 was found to target RelA and PIK3CA mRNA directly. Our results imply that miR-512 and miR-373 exert cell-autonomous and non-autonomous tumor-suppressive effects in lung cancer cells, where their re-expression may benefit epigenetic cancer therapy.
p53 is a pivotal tumor suppressor and a major barrier against cancer. We now report that silencing of the Hippo pathway tumor suppressors LATS1 and LATS2 in nontransformed mammary epithelial cells reduces p53 phosphorylation and increases its association with the p52 NF-kappaB subunit. Moreover, it partly shifts p53's conformation and transcriptional output toward a state resembling cancer-associated p53 mutants and endows p53 with the ability to promote cell migration. Notably, LATS1 and LATS2 are frequently down-regulated in breast cancer; we propose that such down-regulation might benefit cancer by converting p53 from a tumor suppressor into a tumor facilitator.
Background: HOX genes are a family of developmental genes that are expressed neither in the developing forebrain nor in the normal brain. Aberrant expression of a HOX-gene dominated stem-cell signature in glioblastoma has been linked with increased resistance to chemo-radiotherapy and sustained proliferation of glioma initiating cells. Here we describe the epigenetic and genetic alterations and their interactions associated with the expression of this signature in glioblastoma. Results: We observe prominent hypermethylation of the HOXA locus 7p15.2 in glioblastoma in contrast to non-tumoral brain. Hypermethylation is associated with a gain of chromosome 7, a hallmark of glioblastoma, and may compensate for tumor-driven enhanced gene dosage as a rescue mechanism by preventing undue gene expression. We identify the CpG island of the HOXA10 alternative promoter that appears to escape hypermethylation in the HOX-high glioblastoma. An additive effect of gene copy gain at 7p15.2 and DNA methylation at key regulatory CpGs in HOXA10 is significantly associated with HOX-signature expression. Additionally, we show concordance between methylation status and presence of active or inactive chromatin marks in glioblastoma-derived spheres that are HOX-high or HOX-low, respectively. Conclusions: Based on these findings, we propose co-evolution and interaction between gene copy gain, associated with a gain of chromosome 7, and additional epigenetic alterations as key mechanisms triggering a coordinated, but inappropriate, HOX transcriptional program in glioblastoma.
The number of Genome Wide Association Studies (GWAS) of schizophrenia is rapidly growing. However, the small effect of individual genes limits the number of reliably implicated genes, which are too few and too diverse to perform reliable pathway analysis; hence the biological roles of the genes implicated in schizophrenia are unclear. To overcome these limitations we combine GWAS with genome-wide expression data from human post-mortem brain samples of schizophrenia patients and controls, taking these steps: 1) Identify 36 GWAS-based genes which are expressed in our dataset. 2) Find a cluster of 19 genes with highly correlated expression. We show that this correlation pattern is robust and statistically significant. 3) GO-enrichment analysis of these 19 genes reveals significant enrichment of ion channels and calcium-related processes. This finding (based on analyzing a small number of coherently expressed genes) is validated and enhanced in two ways: First, the emergence of calcium channels and calcium signaling is corroborated by identifying proteins that interact with those encoded by the cluster of 19. Second, extend the 19 cluster genes into 1028 genes, whose expression is highly correlated with the cluster's average profile. When GO-enrichment analysis is performed on this extended set, many schizophrenia related pathways appear, with calcium-related processes enriched with high statistical significance. Our results give further, expression-based validation to GWAS results, support a central role of calcium-signaling in the pathogenesis of schizophrenia, and point to additional pathways potentially related to the disease. (C) 2015 Elsevier B.V. All rights reserved.
Cells cope with replication-blocking lesions via translesion DNA synthesis (TLS). TLS is carried out by low-fidelity DNA polymerases that replicate across lesions, thereby preventing genome instability at the cost of increased point mutations. Here we perform a twostage siRNA-based functional screen for mammalian TLS genes and identify 17 validated TLS genes. One of the genes, NPM1, is frequently mutated in acute myeloid leukaemia (AML). We show that NPM1 (nucleophosmin) regulates TLS via interaction with the catalytic core of DNA polymerase-eta (pol eta), and that NPM1 deficiency causes a TLS defect due to proteasomal degradation of pol eta. Moreover, the prevalent NPM1c+ mutation that causes NPM1 mislocalization in similar to 30% of AML patients results in excessive degradation of pol eta. These results establish the role of NPM1 as a key TLS regulator, and suggest a mechanism for the better prognosis of AML patients carrying mutations in NPM1.
Systemic lupus erythematosus (SLE) is an autoimmune disease that can attack many different body organs; the triggering event is unknown. SLE has been associated with more than 100 different autoantibody reactivities-anti-dsDNA is prominent. Nevertheless, autoantibodies to dsDNA occur in only two-thirds of SLE patients. We previously reported the use of an antigen microarray to characterize SLE serology. We now report the results of an expanded study of serology in SLE patients and scleroderma (SSc) patients compared with healthy controls. The analysis validated and extended previous findings: two-thirds of SLE patients reacted to a large spectrum of self-molecules that overlapped with their reactivity to dsDNA; moreover, some SLE patients manifested a deficiency of natural IgM autoantibodies. Most significant was the finding that many SLE patients who were negative for autoantibodies to dsDNA manifested abnormal antibody responses to Epstein-Barr virus (EBV): these subjects made IgG antibodies to EBV antigens to which healthy subjects did not respond or they failed to make antibodies to EBV antigens to which healthy subjects did respond. This observation suggests that SLE may be associated with a defective immune response to EBV. The SSc patients shared many of these serological abnormalities with SLE patients, but differed from them in increased IgG autoantibodies to topoisomerase and centromere B; 84% of SLE patients and 58% of SSc patients could be detected by their abnormal antibodies to EBV. Hence an aberrant immune response to a ubiquitous viral infection such as EBV might set the stage for an autoimmune disease.
Oocyte quality is a well-established determinant of embryonic fate. However, the molecular participants and biological markers that affect and may predict adequate embryonic development are largely elusive. Our aim was to identify the components of the oocyte molecular machinery that part take in the production of a healthy embryo. For this purpose, we used an animal model, generated by us previously, the oocytes of which do not express Cx43 (Cx43(del/del)). In these mice, oogenesis appears normal, fertilisation does occur, early embryonic development is successful but implantation fails. We used magnetic resonance imaging analysis combined with histological examination to characterise the embryonic developmental incompetence. Reciprocal embryo transfer confirmed that the blastocyst evolved from the Cx43(del/del) oocyte is responsible for the implantation disorder. In order to unveil the genes, the impaired expression of which brings about the development of defective embryos, we carried out a genomic screening of both the oocytes and the resulting blastocysts. This microarray analysis revealed a low expression of Egr1, Rpl21 and Eif4a1 in Cx43(del/del) oocytes and downregulation of Rpl15 and Eif4g2 in the resulting blastocysts. We propose that global deficiencies in genes related to the expression of ribosomal proteins and translation initiation factors in apparently normal oocytes bring about accumulation of defects, which significantly compromise their developmental capacity. The blastocysts resulting from such oocytes, which grow within a confined space until implantation, may be unable to generate enough biological mass to allow their expansion. This information could be implicated to diagnosis and treatment of infertility, particularly to IVF.
Accurate prognosis and prediction of response to therapy are essential for personalized treatment of cancer. Even though many prognostic gene lists and predictors have been proposed, especially for breast cancer, high-throughput "omic" methods have so far not revolutionized clinical practice, and their clinical utility has not been satisfactorily established. Different prognostic gene lists have very few shared genes, the biological meaning of most signatures is unclear, and the published success rates are considered to be overoptimistic. This review examines critically the manner in which prognostic classifiers are derived using machine-learning methods and suggests reasons for the shortcomings and problems listed above. Two approaches that may hold hope for obtaining improved prognosis are presented. Both are based on using existing prior knowledge; one proposes combining molecular "omic" predictors with established clinical ones, and the second infers biologically relevant pathway deregulation scores for each tumor from expression data, and uses this representation to study and stratify individual tumors. Approaches such as the second one are referred to in the physics literature as "phenomenology"; they will, hopefully, play a significant role in future studies of cancer. (C) 2014 AACR.
Pemphigus vulgaris (PV) is an autoimmune skin disease, which has been characterized by IgG autoantibodies to desmoglein 3. Here we studied the antibody signatures of PV patients compared with healthy subjects and with patients with two other autoimmune diseases with skin manifestations (systemic lupus erythematosus and scleroderma), using an antigen microarray and informatics analysis. We now report a previously unobserved phenomenon - patients with PV, compared with the healthy subjects and the two other diseases, show a significant decrease in IgG autoantibodies to a specific set of self-antigens. This novel finding demonstrates that an autoimmune disease may be associated with a loss of specific, healthy IgG autoantibodies and not only with a gain of specific, pathogenic IgG autoantibodies.
Signal transduction by receptor tyrosine kinases (RTKs) and nuclear receptors for steroid hormones is essential for body homeostasis, but the cross-talk between these receptor families is poorly understood. We observed that glucocorticoids inhibit signalling downstream of EGFR, an RTK. The underlying mechanism entails suppression of EGFR's positive feedback loops and simultaneous triggering of negative feedback loops that normally restrain EGFR. Our studies in mice reveal that the regulation of EGFR's feedback loops by glucocorticoids translates to circadian control of EGFR signalling: EGFR signals are suppressed by high glucocorticoids during the active phase (night-time in rodents), while EGFR signals are enhanced during the resting phase. Consistent with this pattern, treatment of animals bearing EGFR-driven tumours with a specific kinase inhibitor was more effective if administered during the resting phase of the day, when glucocorticoids are low. These findings support a circadian clock-based paradigm in cancer therapy.
Collective migration is an important cellular trait, which is intensely studied by both basic and translational researchers. Investigation of the underlying mechanisms necessitates high-throughput assays and computational algorithms capable of generating reproducible quantitative measurements of cell migration. We present a desktop tool that can be used easily by any researcher, to quantify both fluorescent and phase-contrast images produced in the course of commonly used gap closure ("scratch," "wound healing") collective migration assays. The software has a simple graphical interface that allows the user to tune the relevant parameters and process large numbers of images (or movies). The output contains segmented images and the numerical values inferred from them, allowing easy quantitative analysis of the results.
We propose a method, Temperature Integration, which allows an efficient calculation of free energy differences between two systems of interest, with the same degrees of freedom, which may have rough energy landscapes. The method is based on calculating, for each single system, the difference between the values of In Z at two temperatures, using a Parallel Tempering procedure. If our two systems of interest have the same phase space volume, they have the same values of In Z at high-T, and we can obtain the free energy difference between them, using the two single-system calculations described above. If the phase space volume of a system is known, our method can be used to calculate its absolute (versus relative) free energy as well. We apply our method and demonstrate its efficiency on a "toy model" of hard rods on a 1-dimensional ring. (C) 2013 Elsevier B.V. All rights reserved.
Signal-induced transcript isoform variation (TIV) includes alternative promoter usage as well as alternative splicing and alternative polyadenylation of mRNA. To assess the phenotypic relevance of signal-induced TIV, we employed exon arrays and breast epithelial cells, which migrate in response to the epidermal growth factor (EGF). We show that EGF rapidly - within one hour - induces widespread TIV in a significant fraction of the transcriptome. Importantly, TIV characterizes many genes that display no differential expression upon stimulus. In addition, similar EGF-dependent changes are shared by a panel of mammary cell lines. A functional screen, which utilized isoform-specific siRNA oligonucleotides, indicated that several isoforms play essential, non-redundant roles in EGF-induced mammary cell migration. Taken together, our findings highlight the importance of TIV in the rapid evolvement of a phenotypic response to extracellular signals.
Contemporary microRNA research has led to significant advances in our understanding of the process of tumorigenesis. MicroRNAs participate in different events of a cancer cell's life, through their ability to target hundreds of putative transcripts involved in almost every cellular function, including cell cycle, apoptosis, and differentiation. The relevance of these small molecules is even more evident in light of the emerging linkage between their expression and both prognosis and clinical outcome of many types of human cancers. This identifies microRNAs as potential therapeutic modifiers of cancer phenotypes. From this perspective, we overview here the miR-10b locus and its involvement in cancer, focusing on its role in the establishment (miR-10b*) and spreading (miR-10b) of breast cancer. We conclude that targeting the locus of microRNA 10b holds great potential for cancer treatment.
TP53 mutation is associated with decreased survival rate in head and neck squamous cell carcinoma (HNSCC) patients. We set out to identify microRNAs (miRNAs) whose expression associates with TP53 mutation and survival in HNSCC. We analyzed TP53 status by direct sequencing of exons 2 through 11 of a prospective series of 121 HNSCC samples and assessed its association with outcome in 109 followed-up patients. We carried out miRNA expression profiling on 121 HNSCC samples and 66 normal counterparts. miRNA associations with TP53 mutations and outcome were evaluated. A TP53 mutation was present in 58% of the tumors and TP53 mutations were significantly associated with a shorter recurrence-free survival. This association was stronger in the clinical subgroup of patients subjected to adjuvant therapy after surgery. The expression of 49 miRNAs was significantly associated with TP53 status. Among these 49, we identified a group of 12 miRNAs whose expression correlates with recurrence-free survival and a group of 4 miRNAs that correlates with cancer-specific survival. The two groups share three miRNAs. Importantly, miRNAs that correlate with survival are independent prognostic factors either when considered individually or as signatures. miRNAs expression associates with TP53 status and with reduced survival after surgical treatment of squamous cell carcinoma of the head and neck.
The transmembrane neural cell adhesion receptor L1 is a Wnt/beta-catenin target gene expressed in many tumor types. In human colorectal cancer, L1 localizes preferentially to the invasive front of tumors and when overexpressed in colorectal cancer cells, it facilitates their metastasis to the liver. In this study, we investigated genes that are regulated in human colorectal cancer and by the L1-NF-kappa B pathway that has been implicated in liver metastasis. c-Kit was the most highly suppressed gene in both colorectal cancer tissue and the L1-NF-kB pathway. c-Kit suppression that resulted from L1-mediated signaling relied upon NF-kappa B, which directly inhibited the transcription of SP1, a major activator of the c-Kit gene promoter. Reconstituting c-Kit expression in L1-transfected cells blocked the biological effects conferred by L1 overexpression in driving motility and liver metastasis. We found that c-Kit expression in colorectal cancer cells is associated with a more pronounced epithelial morphology, along with increased expression of E-cadherin and decreased expression of Slug. Although c-Kit overexpression inhibited the motility and metastasis of L1-expressing colorectal cancer cells, it enhanced colorectal cancer cell proliferation and tumorigenesis, arguing that separate pathways mediate tumorigenicity and metastasis by c-Kit. Our findings provide insights into how colorectal cancer metastasizes to the liver, the most common site of dissemination in this cancer. (C) 2013 AACR.
We introduce Pathifier, an algorithm that infers pathway deregulation scores for each tumor sample on the basis of expression data. This score is determined, in a context-specific manner, for every particular dataset and type of cancer that is being investigated. The algorithm transforms gene-level information into pathway-level information, generating a compact and biologically relevant representation of each sample. We demonstrate the algorithm's performance on three colorectal cancer datasets and two glioblastoma multiforme datasets and show that our multipathway-based representationis reproducible, preserves much of the original information, and allows inference of complex biologically significant information. We discovered several pathways that were significantly associated with survival of glioblastoma patients and two whose scores are predictive of survival in colorectal cancer: CXCR3-mediated signaling and oxidative phosphorylation. We also identified a subclass of proneural and neural glioblastoma with significantly better survival, and an EGF receptor-deregulated subclass of colon cancers.
Motivation: Real time quantitative polymerase chain reaction (qPCR) is an important tool in quantitative studies of DNA and RNA molecules; especially in transcriptome studies, where different primer combinations allow identification of specific transcripts such as splice variants or precursor messenger RNA. Several softwares that implement various rules for optimal primer design are available. Nevertheless, as designing qPCR primers needs to be done manually, the repeated task is tedious, time consuming and prone to errors. Results: We used a set of rules to automatically design all possible exon-exon and intron-exon junctions in the human and mouse transcriptomes. The resulting database is included as a track in the UCSC genome browser, making it widely accessible and easy to use.
The signaling pathways that commit cells to migration are incompletely understood. We employed human mammary cells and two stimuli: epidermal growth factor (EGF), which induced cellular migration, and serum factors, which stimulated cell growth. In addition to strong activation of ERK by EGF, and AKT by serum, early transcription remarkably differed: while EGF induced early growth response-1 (EGR1), and this was required for migration, serum induced c-Fos and FosB to enhance proliferation. We demonstrate that induction of EGR1 involves ERK-mediated down-regulation of microRNA-191 and phosphorylation of the ETS2 repressor factor (ERF) repressor, which subsequently leaves the nucleus. Unexpectedly, knockdown of ERF inhibited migration, which implies migratory roles for exported ERF molecules. On the other hand, chromatin immunoprecipitation identified a subset of direct EGR1 targets, including EGR1 autostimulation and SERPINB2, whose transcription is essential for EGF-induced cell migration. In summary, EGR1 and the EGF-ERK-ERF axis emerge from our study as major drivers of growth factor-induced mammary cell migration.-Tarcic, G., Avraham, R., Pines, G., Amit, I., Shay, T., Lu, Y., Zwang, Y., Katz, M., Ben-Chetrit, N., Jacob-Hirsch, J., Virgilio, L., Rechavi, G., Mavrothalassitis, G., Mills, G. B., Domany, E., Yarden, Y. EGR1 and the ERK-ERF axis drive mammary cell migration in response to EGF. FASEB J. 26, 1582-1592 (2012). www.fasebj.org
MicroRNAs (miRs) function primarily as post-transcriptional negative regulators of gene expression through binding to their mRNA targets. Reliable prediction of a miR's targets is a considerable bioinformatic challenge of great importance for inferring the miR's function. Sequence-based prediction algorithms have high false-positive rates, are not in agreement, and are not biological context specific. Here we introduce CoSMic (Context-Specific MicroRNA analysis), an algorithm that combines sequence-based prediction with miR and mRNA expression data. CoSMic differs from existing methods-it identifies miRs that play active roles in the specific biological system of interest and predicts with less false positives their functional targets. We applied CoSMic to search for miRs that regulate the migratory response of human mammary cells to epidermal growth factor (EGF) stimulation. Several such miRs, whose putative targets were significantly enriched by migration processes were identified. We tested three of these miRs experimentally, and showed that they indeed affected the migratory phenotype; we also tested three negative controls. In comparison to other algorithms CoSMic indeed filters out false positives and allows improved identification of context-specific targets. CoSMic can greatly facilitate miR research in general and, in particular, advance our understanding of individual miRs' function in a specific context.
Embryonic stem cells (ESCs) maintain high genomic plasticity, which is essential for their capacity to enter diverse differentiation pathways. Posttranscriptional modifications of chromatin histones play a pivotal role in maintaining this plasticity. We now report that one such modification, monoubiquitylation of histone H2B on lysine 120 (H2Bub1), catalyzed by the E3 ligase RNF20, increases during ESC differentiation and is required for efficient execution of this process. This increase is particularly important for the transcriptional induction of relatively long genes during ESC differentiation. Furthermore, we identify the deubiquitinase USP44 as a negative regulator of H2B ubiquitylation, whose downregulation during ESC differentiation contributes to the increase in H2Bub1. Our findings suggest that optimal ESC differentiation requires dynamic changes in H2B ubiquitylation patterns, which must occur in a timely and well-coordinated manner.
Deregulated proliferation is a hallmark of cancer cells. Here, we show that microRNA-10b* is a master regulator of breast cancer cell proliferation and is downregulated in tumoural samples versus matched peritumoural counterparts. Two canonical CpG islands (5?kb) upstream from the precursor sequence are hypermethylated in the analysed breast cancer tissues. Ectopic delivery of synthetic microRNA-10b* in breast cancer cell lines or into xenograft mouse breast tumours inhibits cell proliferation and impairs tumour growth in vivo, respectively. We identified and validated in vitro and in vivo three novel target mRNAs of miR-10b* (BUB1, PLK1 and CCNA2), which play a remarkable role in cell cycle regulation and whose high expression in breast cancer patients is associated with reduced disease-free survival, relapse-free survival and metastasis-free survival when compared to patients with low expression. This also suggests that restoration of microRNA-10b* expression might have therapeutic promise.
A large fraction of ductal carcinoma in situ ( DCIS), a non-invasive precursor lesion of invasive breast cancer, overexpresses the HER2/neu oncogene. The ducts of DCIS are abnormally filled with cells that evade apoptosis, but the underlying mechanisms remain incompletely understood. We overexpressed HER2 in mammary epithelial cells and observed growth factor-independent proliferation. When grown in extracellular matrix as three-dimensional spheroids, control cells developed a hollow lumen, but HER2-overexpressing cells populated the lumen by evading apoptosis. We demonstrate that HER2 overexpression in this cellular model of DCIS drives transcriptional upregulation of multiple components of the Notch survival pathway. Importantly, luminal filling required upregulation of a signaling pathway comprising Notch3, its cleaved intracellular domain and the transcriptional regulator HES1, resulting in elevated levels of c-MYC and cyclin D1. In line with HER2-Notch3 collaboration, drugs intercepting either arm reverted the DCIS-like phenotype. In addition, we report upregulation of Notch3 in hyperplastic lesions of HER2 transgenic animals, as well as an association between HER2 levels and expression levels of components of the Notch pathway in tumor specimens of breast cancer patients. Therefore, it is conceivable that the integration of the Notch and HER2 signaling pathways contributes to the pathophysiology of DCIS. Oncogene ( 2012) 31, 907-917; doi: 10.1038/onc.2011.279; published online 11 July 2011
The HER2/neu oncogene encodes a receptor-like tyrosine kinase whose overexpression in breast cancer predicts poor prognosis and resistance to conventional therapies. However, the mechanisms underlying aggressiveness of HER2 (human epidermal growth factor receptor 2)-overexpressing tumors remain incompletely understood. Because it assists epidermal growth factor (EGF) and neuregulin receptors, we overexpressed HER2 in MCF10A mammary cells and applied growth factors. HER2-overexpressing cells grown in extracellular matrix formed filled spheroids, which protruded outgrowths upon growth factor stimulation. Our transcriptome analyses imply a two-hit model for invasive growth: HER2-induced proliferation and evasion from anoikis generate filled structures, which are morphologically and transcriptionally analogous to preinvasive patients' lesions. In the second hit, EGF escalates signaling and transcriptional responses leading to invasive growth. Consistent with clinical relevance, a gene expression signature based on the HER2/EGF-activated transcriptional program can predict poorer prognosis of a subgroup of HER2-overexpressing patients. In conclusion, the integration of a three-dimensional cellular model and clinical data attributes progression of HER2-overexpressing lesions to EGF-like growth factors acting in the context of the tumor's microenvironment. Oncogene (2012) 31, 3569-3583; doi:10.1038/onc.2011.547; published online 5 December 2011
Duplication of chromosomal arm 20q occurs in prostate, cervical, colon, gastric, bladder, melanoma, pancreas and breast cancer, suggesting that 20q amplification may play a causal role in tumorigenesis. According to an alternative view, chromosomal imbalance is mainly a common side effect of cancer progression. To test whether a specific genomic aberration might serve as a cancer initiating event, we established an in vitro system that models the evolutionary process of early stages of prostate tumor formation; normal prostate cells were immortalized by the over-expression of human telomerase catalytic subunit hTERT, and cultured for 650 days till several transformation hallmarks were observed. Gene expression patterns were measured and chromosomal aberrations were monitored by spectral karyotype analysis at different times. Several chromosomal aberrations, in particular duplication of chromosomal arm 20q, occurred early in the process and were fixed in the cell populations, while other aberrations became extinct shortly after their appearance. A wide range of bioinformatic tools, applied to our data and to data from several cancer databases, revealed that spontaneous 20q amplification can promote cancer initiation. Our computational model suggests that 20q amplification induced deregulation of several specific cancer-related pathways including the MAPK pathway, the p53 pathway and Polycomb group factors. In addition, activation of Myc, AML, B-Catenin and the ETS family transcription factors was identified as an important step in cancer development driven by 20q amplification. Finally we identified 13 "cancer initiating genes", located on 20q13, which were significantly over-expressed in many tumors, with expression levels correlated with tumor grade and outcome suggesting that these genes induce the malignant process upon 20q amplification.
The steep rise in availability and usage of high-throughput technologies in biology brought with it a clear need for methods to control the False Discovery Rate (FDR) in multiple tests. Benjamini and Hochberg (BH) introduced in 1995 a simple procedure and proved that it provided a bound on the expected value, FDR
Background: The Cox-2 inhibitor, celecoxib (Pfizer Inc., N.Y., USA), is a promising chemopreventive agent [Arber et al.: N Engl J Med 2006;355:885-895; Bertagnolli et al.: N Engl J Med 2006;355:873-884]. This study aims to explore its mechanism by defining changes in gene expression between neoplastic and normal tissue samples before and after treatment. Methods: Patients with documented colorectal neoplasia in screening colonoscopy, destined to undergo surgical colectomy, were randomized for treatment with celecoxib (n = 11; 400 mg/day) or placebo (n = 3) for 30 days. Tissue samples were taken from the tumor and from normal adjacent mucosa during both colonoscopy and surgery. RNA was extracted and analyzed using Affymetrix Genechip(R). Results: 687 genes differentiated tumor samples before and after treatment, among which 310 genes did not show the same differential expression in the placebo group or normal samples. These genes were significantly related to pathways of cell cycle regulation and inflammation, and of note was the TGF-beta pathway, which held a strong association with the list of genes formerly found to be associated with the colorectal cancer expression profile in microarray analyses, as summarized in a meta-analysis by Cardoso et al. [Biochim Biophys Acta 2007;1775:103-137]. Conclusions: Celecoxib selectively affects genes and pathways involved in inflammation and malignant transformation in tumor but not normal tissues, this may assist in the development of safer and more effective chemopreventive agents. Copyright (C) 2011 S. Karger AG, Basel
Background: Alzheimer's disease and Schizophrenia are two common diseases of the brain with significant differences in neuropathology, etiology and symptoms. This dissimilarity in the two diseases makes a comparison of the two ideal for detecting molecular substrates that are common to brain disorders in general. Methods: In this study, we compared gene expression profiles across multiple brain areas, taken postmortem from patients with well-characterized Alzheimer's disease and Schizophrenia, and from cognitively normal control group with no neuro-or psychopathology. Results: Although the totality of gene expression changes in the two diseases is dissimilar, a subset of genes appears to play a role in both diseases in specific brain regions. We find at Brodmann area 22, the superior temporal gyrus, a statistically significant number of genes with apparently disregulated expression in both diseases. Furthermore, we found genes that differentiate the two diseases from the control across multiple brain regions, and note that these genes were usually down-regulated. Brodmann area 8, part of the superior frontal cortex, is relatively abundant with them. Conclusion: We show overwhelming statistical evidence for Alzheimer's and Schizophrenia sharing a specific molecular background at the superior temporal gyrus. We suggest that impairment of the regulation of autophagy pathway is shared, in BA 22, by the two diseases.
Normal cells require continuous exposure to growth factors in order to cross a restriction point and commit to cell-cycle progression. This can be replaced by two short, appropriately spaced pulses of growth factors, where the first pulse primes a process, which is completed by the second pulse, and enables restriction point crossing. Through integration of comprehensive proteomic and transcriptomic analyses of each pulse, we identified three processes that regulate restriction point crossing: (1) The first pulse induces essential metabolic enzymes and activates p53-dependent restraining processes. (2) The second pulse eliminates, via the PI3K/AKT pathway, the suppressive action of p53, as well as (3) sets an ERK-EGR1 threshold mechanism, which digitizes graded external signals into an all-or-none decision obligatory for S phase entry. Together, our findings uncover two gating mechanisms, which ensure that cells ignore fortuitous growth factors and undergo proliferation only in response to consistent mitogenic signals.
A mutation within one allele of the p53 tumor suppressor gene can inactivate the remaining wild-type allele in a dominant-negative manner and in some cases can exert an additional oncogenic activity, known as mutant p53 'gain of function' (GOF). To study the role of p53 mutations in prostate cancer and to discriminate between the dominant-negative effect and the GOF activity of mutant p53, we measured, using microarrays, the expression profiles of three immortalized prostate epithelial cultures expressing wild-type, inactivated p53 or mutated p53. Analysis of these gene expression profiles showed that both inactivated p53 and p53(R175H) mutant expression resulted in the upregulation of cell cycle progression genes. A second group, which was upregulated exclusively by mutant p53(R175H), was predominantly enriched in developmental genes. This group of genes included the Twist1, a regulator of metastasis and epithelial-mesenchymal transition (EMT). Twist1 levels were also elevated in metastatic prostate cancer-derived cell line DU145, in immortalized lung fibroblasts and in a subset of lung cancer samples, all in a mutant p53-dependent manner. p53(R175H) mutant bearing immortalized epithelial cells showed typical features of EMT, such as higher expression of mesenchymal markers, lower expression of epithelial markers and enhanced invasive properties in vitro. The mechanism by which p53(R175H) mutant induces Twist1 expression involves alleviation of the epigenetic repression. Our data suggest that Twist1 expression might be upregulated following p53 mutation in cancer cells. Cell Death and Differentiation (2011) 18, 271-281; doi: 10.1038/cdd.2010.94; published online 6 August 2010
P>Up to one in four lung-transplanted patients develop pulmonary infiltrates and impaired oxygenation within the first days after lung transplantation. Known as primary graft dysfunction (PGD), this condition increases mortality significantly. Complex interactions between donor lung and recipient immune system are the suspected cause. We took an integrative, systems-level approach by first exploring whether the recipient's immune response to PGD includes the development of long-lasting autoreactivity. We next explored whether proteins displaying such differential autoreactivity also display differential gene expression in donor lungs that later develop PGD compared with those that did not. We evaluated 39 patients from whom autoantibody profiles were already available for PGD based on chest radiographs and oxygenation data. An additional nine patients were evaluated for PGD based on their medical records and set aside for validation. From two recent donor lung gene expression studies, we reanalysed and paired gene profiles with autoantibody profiles. Primary graft dysfunction can be distinguished by a profile of differentially reactive autoantibodies binding to 17 proteins. Functional analysis showed that 12 of these proteins are part of a protein-protein interaction network (P = 3 x 10-6) involved in proliferative processes. A nearest centroid classifier assigned correct PGD grades to eight out of the nine patients in the validation cohort (P = 0 center dot 048). We observed significant positive correlation (r = 0 center dot 63, P = 0 center dot 011) between differences in IgM reactivity and differences in gene expression levels. This connection between donor lung gene expression and long-lasting recipient IgM autoantibodies towards a specific set of proteins suggests a mechanism for the development of autoimmunity in PGD.
Hypoxia-inducible factor 1 (HIF-1), the major transcription factor specifically activated during hypoxia, regulates genes involved in critical aspects of cancer biology, including angiogenesis, cell proliferation, glycolysis and invasion. The HIF-1a subunit is stabilized by low oxygen, genetic alteration and cobaltous ions, and its over-expression correlates with drug resistance and increased cancer mortality in various cancer types, therefore representing an important anticancer target. Zinc supplementation has been shown to counteract the hypoxic phenotype in cancer cells, in vitro and in vivo, hence, understanding the molecular pathways modulated by zinc under hypoxia may provide the basis for reprogramming signalling pathways for anticancer therapy. Here we performed genome-wide analyses of colon cancer cells treated with combinations of cobalt, zinc and anticancer drug and evaluated the effect of zinc on gene expression patterns. Using Principal Component Analysis we found that zinc markedly reverted the cobalt-induced changes of gene expression, with reactivation of the drug-induced transcription of pro-apoptotic genes. We conclude that the hypoxia pathway is a potential therapeutic target addressed by zinc that also influences tumor cell response to anticancer drug.
The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.
Aberrant activation of kinases has emerged to be a key event along with tumor progression, maintenance of tumor phenotype and response to anticancer treatments. This study documents the existence of an oncogenic autoregulatory feedback loop that includes the polo-like kinase-2 (Snk/Plk2) and mutant p53 proteins. Plk2 protein binds to and phosphorylates mutant p53, thereby potentiating its oncogenic activities. Phosphorylated mutant p53 binds more efficiently to p300, consequently strengthening its own transcriptional activity. Plk2 gene is regulated at a transcriptional level by both wt- and mutant p53 proteins. This leads to growth suppression or enhanced cell proliferation and chemo-resistance, respectively. In turn, the siRNA-mediated knockdown of either mutant p53 or Plk2 proteins significantly curtails the growth properties of tumor cells and their chemo-resistance to anticancer treatments. Therefore, this paper identifies a novel tumor network including Plk2 and mutant p53 proteins whose triggering in response to DNA damage might disclose important implications for the treatment of human cancers.
Gene expression-based prediction of genomic copy number aberrations in the chromosomal region 12q13 to 12q15 that is flanked by MDM2 and CDK4 identified Wnt inhibitory factor 1 (WIF1) as a candidate tumor suppressor gene in glioblastoma. WIF1 encodes a secreted Wnt antagonist and was strongly downregulated in most glioblastomas as compared with normal brain, implying deregulation of Wnt signaling, which is associated with cancer. WIF1 silencing was mediated by deletion (7/69, 10%) or epigenetic silencing by promoter hypermethylation (29/110, 26%). Co-amplification of MDM2 and CDK4 that is present in 10% of glioblastomas was associated in most cases with deletion of the whole genomic region enclosed, including the WIF1 locus. This interesting pathogenetic constellation targets the RB and p53 tumor suppressor pathways in tandem, while simultaneously activating oncogenic Wnt signaling. Ectopic expression of WIF1 in glioblastoma cell lines revealed a dose-dependent decrease of Wnt pathway activity. Furthermore, WIF1 expression inhibited cell proliferation in vitro, reduced anchorage-independent growth in soft agar, and completely abolished tumorigenicity in vivo. Interestingly, WIF1 overexpression in glioblastoma cells induced a senescence-like phenotype that was dose dependent. These results provide evidence that WIF1 has tumor suppressing properties. Downregulation of WIF1 in 75% of glioblastomas indicates frequent involvement of aberrant Wnt signaling and, hence, may render glioblastomas sensitive to inhibitors of Wnt signaling, potentially by diverting the tumor cells into a senescence-like state.
Transcriptional responses to extracellular stimuli involve tuning the rates of transcript production and degradation. Here, we show that the time-dependent profiles of these rates can be inferred from simultaneous measurements of precursor mRNA (pre-mRNA) and mature mRNA profiles. Transcriptome-wide measurements demonstrate that genes with similar mRNA profiles often exhibit marked differences in the amplitude and onset of their production rate. The latter is characterized by a large dynamic range, with a group of genes exhibiting an unexpectedly strong transient production overshoot, thereby accelerating their induction and, when combined with time-dependent degradation, shaping transient responses with precise timing and amplitude. Molecular Systems Biology 7: 529; published online 13 September 2011; doi:10.1038/msb.2011.62
P>Patients with systemic lupus erythematosus (SLE) produce antibodies to many different self-antigens. Here, we investigated antibodies in SLE sera using an antigen microarray containing many hundreds of antigens, mostly self-antigens. The aim was to detect sets of antibody reactivities characteristic of SLE patients in each of various clinical states - SLE patients with acute lupus nephritis, SLE patients in renal remission, and SLE patients who had never had renal involvement. The analysis produced two novel findings: (i) an SLE antibody profile persists independently of disease activity and despite long-term clinical remission, and (ii) this SLE antibody profile includes increases in four specific immunoglobulin G (IgG) reactivities to double-stranded DNA (dsDNA), single-stranded DNA (ssDNA), Epstein-Barr virus (EBV) and hyaluronic acid; the profile also includes decreases in specific IgM reactivities to myeloperoxidase (MPO), CD99, collagen III, insulin-like growth factor binding protein 1 (IGFBP1) and cardiolipin. The reactivities together showed high sensitivity (> 93%) and high specificity for SLE (> 88%). A healthy control subject who had the SLE antibody profile was later found to develop clinical SLE. The present study did not detect antibody reactivities that differentiated among the various subgroups of SLE subjects with statistical significance. Thus, SLE is characterized by an enduring antibody profile irrespective of clinical state. The association of SLE with decreased IgM natural autoantibodies suggests that these autoantibodies might enhance resistance to SLE.
P>Obliterative bronchiolitis (OB) continues to be the major limitation to long-term survival after lung transplantation. The specific aetiology and pathogenesis of OB are not well understood. To explore the role of autoreactivity in OB, we spotted 751 different self molecules onto glass slides, and used these antigen microarrays to profile 48 human serum samples for immunoglobulin G (IgG) and IgM autoantibodies; 27 patients showed no or mild bronchiolitis obliterans syndrome (BOS; a clinical correlate of OB) and 15 patients showed medium to severe BOS. We now report that these BOS grades could be differentiated by a profile of autoantibodies binding to 28 proteins or their peptides. The informative autoantibody profile included down-regulation as well as up-regulation of both IgM and IgG specific reactivities. This profile was evaluated for robustness using a panel of six independent test patients. Analysis of the functions of the 28 informative self antigens showed that eight of them are connected in an interaction network involved in apoptosis and protein metabolism. Thus, a profile of autoantibodies may reflect pathological processes in the lung allograft, suggesting a role for autoimmunity in chronic rejection leading to OB.
The anterior heart field (AHF) encompasses a niche in which mesoderm-derived cardiac progenitors maintain their multipotent and undifferentiated nature in response to signals from surrounding tissues. Here, we investigate the signaling mechanism that promotes the shift from proliferating cardiac progenitors to differentiating cardiomyocytes in chick embryos. Genomic and systems biology approaches, as well as perturbations of signaling molecules, in vitro and in vivo, reveal tight crosstalk between the bone morphogenetic protein (BMP) and fibroblast growth factor (FGF) signaling pathways within the AHF niche: BMP4 promotes myofibrillar gene expression and cardiomyocyte contraction by blocking FGF signaling. Furthermore, inhibition of the FGF-ERK pathway is both sufficient and necessary for these processes, suggesting that FGF signaling blocks premature differentiation of cardiac progenitors in the AHF. We further revealed that BMP4 induced a set of neural crest-related genes, including MSX1. Overexpression of Msx1 was sufficient to repress FGF gene expression and cell proliferation, thereby promoting cardiomyocyte differentiation. Finally, we show that BMP-induced cardiomyocyte differentiation is diminished following cranial neural crest ablation, underscoring the key roles of these cells in the regulation of AHF cell differentiation. Hence, BMP and FGF signaling pathways act via inter-and intra-regulatory loops in multiple tissues, to coordinate the balance between proliferation and differentiation of cardiac progenitors.
The recent decade has witnessed a surge of physicists to biology. Some of the activities of the participating groups focus on bona fide physics questions, posed on biological systems (such as the physics of molecular motors, for example). Another kind of research in which physicists take part alongside computer scientists and applied mathematicians, deals with questions that are of direct interest to biologists: they come under the umbrella of computational and systems biology. The topic of these lectures lies at the most biological end of this spectrum, addressing problems of clinical relevance which were posed and initiated by biologists. The objective of these lectures is to help the curious physicist to learn and to understand more about this emerging, highly interdisciplinary field of research, by providing brief introductions to molecular biology and cancer research. This is followed by a cursory review of some recent research done by the "Domany group" and its collaborations with biological and clinical labs. Furthermore, we mention (mainly in footnotes) a small subset of studies in which physicists have contributed to this field during the past years. A more detailed review of recent contributions by physicists is beyond the scope of this introductory text. The introductory nature of these lecture notes naturally induces a strong bias regarding publications cited: consequently, these lecture notes do not provide a fair, historically correct and updated review of relevant literature. (C) 2010 Elsevier B.V. All rights reserved.
One proposed strategy for bone regeneration involves ex vivo tissue engineering, accomplished using bone-forming cells, biodegradable scaffolds, and dynamic culture systems, with the goal of three-dimensional tissue formation. Rotating wall vessel bioreactors generate simulated microgravity conditions ex vivo, which lead to cell aggregation. Human mesenchymal stem cells (hMSCs) have been extensively investigated and shown to possess the potential to differentiate into several cell lineages. The goal of the present study was to evaluate the effect of simulated microgravity on all genes expressed in hMSCs, with the underlying hypothesis that many important pathways are affected during culture within a rotating wall vessel system. Gene expression was analyzed using a whole genome microarray and clustering with the aid of the National Institutes of Health's Database for Annotation, Visualization and Integrated Discovery database and gene ontology analysis. Our analysis showed 882 genes that were downregulated and 505 genes that were upregulated after exposure to simulated microgravity. Gene ontology clustering revealed a wide variety of affected genes with respect to cell compartment, biological process, and signaling pathway clusters. The data sets showed significant decreases in osteogenic and chondrogenic gene expression and an increase in adipogenic gene expression, indicating that ex vivo adipose tissue engineering may benefit from simulated microgravity. This finding was supported by an adipogenic differentiation assay. These data are essential for further understanding of ex vivo tissue engineering using hMSCs.
Background: In many microarray experiments, analysis is severely hindered by a major difficulty: the small number of samples for which expression data has been measured. When one searches for differentially expressed genes, the small number of samples gives rise to an inaccurate estimation of the experimental noise. This, in turn, leads to loss of statistical power. Results: We show that the measurement noise of genes with similar expression levels (intensity) is identically and independently distributed, and that this (intensity dependent) distribution is approximately normal. Our method can be easily adapted and used to test whether these statement hold for data from any particular microarray experiment. We propose a method that provides an accurate estimation of the intensity-dependent variance of the noise distribution, and demonstrate that using this estimation we can detect differential expression with much better statistical power than that of standard t test, and can compare the noise levels of different experiments and platforms. Conclusions: When the number of samples is small, the simple method we propose improves significantly the statistical power in identifying differentially expressed genes.
The genetic profiling of B-cell malignancies is rapidly expanding, providing important information on the tumorigenic potential, response to treatment, and clinical outcome of these diseases. However, the relative contributions of inherent gene expression versus microenvironmental effects are poorly understood. The regulation of gene expression programs by means of adhesive interactions was studied here in ARH-77 human malignant B-cell variants, derived from the same cell line by selective adhesion to a fibronectin matrix. The populations included cells that adhere to fibronectin and are highly tumorigenic (designated "type A" cells) and cells that fail to adhere to fibronectin and fail to develop tumors in vivo ("type F" cells). To identify genes directly affected by cell adhesion to fibronectin, type A cells deprived of an adhesive substrate (designated "AF cells") were also examined. Bioinformatic analyses revealed a remarkable correlation between cell adhesion and both B-cell differentiation state and the expression of multiple myeloma (MM)-associated genes. The highly adherent type A cells expressed higher levels of NF.B-regulated genes, many of them associated with MM. Moreover, we found that the transcription of several MM-related proto-oncogenes is stimulated by adhesion to fibronectin. In contrast, type F cells, which display poor adhesive and tumorigenic properties, expressed genes associated with higher levels of B-cell differentiation. Our findings indicate that B-cell differentiation, as manifested by gene expression profiles, is attenuated by cell adhesion to fibronectin, leading to upregulation of specific genes known to be associated with the pathogenesis of MM. Mol Cancer Res; 8(4); 482-93. (C) 2010 AACR.
Highly regenerative tissues such as blood must possess effective DNA damage responses (DDR) that balance long-term regeneration with protection from leukemogenesis. Hematopoietic stem cells (HSCs) sustain life-long blood production, yet their response to DNA damage remains largely unexplored. We report that human HSCs exhibit delayed DNA double-strand break rejoining, persistent gamma H2AX foci, and enhanced p53- and ASPP1-dependent apoptosis after gamma-radiation compared to progenitors. p53 inactivation or Bcl-2 overexpression reduced radiation-induced apoptosis and preserved in vivo repopulating HSC function. Despite similar protection from irradiation-induced apoptosis, only Bcl-2-overexpressing HSCs showed higher self-renewal capacity, establishing that intact p53 positively regulates self-renewal independently from apoptosis. The reduced self-renewal of HSCs with inactivated p53 was associated with increased spontaneous gamma H2AX foci in secondary transplants of HSCs. Our data reveal distinct physiological roles of p53 that together ensure optimal HSC function: apoptosis regulation and prevention of gamma H2AX foci accumulation upon HSC self-renewal.
Prostate cancer (PC) is a heterogeneous disease whose aggressive phenotype is the second leading cause of cancer-related death in men. The identification of key molecules and pathways that play a pivotal role in PC progression towards an aggressive form is crucial. A major effort towards this end has been taken by global analyses of gene expression profiles. However, the large body of data did not provide a definitive idea about the genes which are associated with the aggressive growth of PC. In order to identify such genes, we performed an interspecies comparison between several human data sets and high quality microarray data that we generated from the transgenic adenocarcinoma of mouse prostate (TRAMP) strain. The TRAMP PC mimics the histological and pathological appearance as well as the aggressive phenotype of human PC (huPC). Analysis of the microarray data, derived from microdissected TRAMP specimens removed at different stages of the disease yielded genetic signatures delineating the TRAMP PC development and progression. Comparison of the TRAMP data with a set of genes representing the core expression signature of huPC yielded a limited set genes. Some of these genes are known predictors of poor prognosis in huPC. Interestingly, the modulation of genes responsible for the invasive phenotype of huPC occurs in TRAMP already during the transition to prostate intraepithelial neoplasia (PIN) and onwards to localized tumors. We therefore suggest that critical oncogenic events leading to an aggressive phenotype of huPC can be studied in the PIN stage of TRAMP. Prostate 69:1.034-1.044, 2009. (C) 2009 Wiley- Liss, Inc.
ID4 ( inhibitor of DNA binding 4) is a member of a family of proteins that function as dominant-negative regulators of basic helix-loop-helix transcription factors. Growing evidence links ID proteins to cell proliferation, differentiation and tumorigenesis. Here we identify ID4 as a transcriptional target of gain-of-function p53 mutants R175H, R273H and R280K. Depletion of mutant p53 protein severely impairs ID4 expression in proliferating tumor cells. The protein complex mutant p53-E2F1 assembles on specific regions of the ID4 promoter and positively controls ID4 expression. The ID4 protein binds to and stabilizes mRNAs encoding pro-angiogenic factors IL8 and GRO-alpha. This results in the increase of the angiogenic potential of cancer cells expressing mutant p53. These findings highlight the transcriptional axis mutant p53, E2F1 and ID4 as a still undefined molecular mechanism contributing to tumor neo-angiogenesis.
During disease progression the cells that comprise solid malignancies undergo significant changes in gene copy number and chromosome structure. Colorectal cancer provides an excellent model to study this process. To indentify and characterize chromosomal abnormalities in colorectal cancer, we performed a statistical analysis of 299 expression and 130 SNP arrays profiled at different stages of the disease, including normal tissue, adenoma, stages 1-4 adenocarcinoma, and metastasis. We identified broad (> 1/2 chromosomal arm) and focal (
Aging is often associated with a decline in hippocampus-dependent spatial memory. Here, we show that functional cell-mediated immunity is required for the maintenance of hippocampus-dependent spatial memory. Sudden imposition of immune compromise in young mice caused spatial memory impairment, whereas immune reconstitution reversed memory deficit in immune-deficient mice. Analysis of hippocampal gene expression suggested that immune-dependent spatial memory performance was associated with the expression of insulin-like growth factor (Igf1) and of genes encoding proteins related to presynaptic activity (Syt10, Cplx2). We further showed that memory loss in aged mice could be attributed to age-related attenuation of the immune response and could be reversed by immune system activation. Homeostatic-driven proliferation of lymphocytes, which expands the existing T cell repertoire, restored spatial memory deficits in aged mice. Thus, our results identify a novel function of the immune system in the maintenance of spatial memory and suggest an original approach for arresting or reversing age-associated memory loss.
Purpose Glioblastomas are notorious for resistance to therapy, which has been attributed to DNA-repair proficiency, a multitude of deregulated molecular pathways, and, more recently, to the particular biologic behavior of tumor stem-like cells. Here, we aimed to identify molecular profiles specific for treatment resistance to the current standard of care of concomitant chemoradiotherapy with the alkylating agent temozolomide. Patients and Methods Gene expression profiles of 80 glioblastomas were interrogated for associations with resistance to therapy. Patients were treated within clinical trials testing the addition of concomitant and adjuvant temozolomide to radiotherapy. Results An expression signature dominated by HOX genes, which comprises Prominin-1 (CD133), emerged as a predictor for poor survival in patients treated with concomitant chemoradiotherapy (n = 42; hazard ratio = 2.69; 95% CI, 1.38 to 5.26; P = .004). This association could be validated in an independent data set. Provocatively, the HOX cluster was reminiscent of a "self-renewal" signature (P = .008; Gene Set Enrichment Analysis) recently characterized in a mouse leukemia model. The HOX signature and EGFR expression were independent prognostic factors in multivariate analysis, adjusted for the O-6-methylguanine-DNA methyltransferase (MGMT) methylation status, a known predictive factor for benefit from temozolomide, and age. Better outcome was associated with gene clusters characterizing features of tumor-host interaction including tumor vascularization and cell adhesion, and innate immune response. Conclusion This study provides first clinical evidence for the implication of a " glioma stem cell" or "self-renewal" phenotype in treatment resistance of glioblastoma. Biologic mechanisms identified here to be relevant for resistance will guide future targeted therapies and respective marker development for individualized treatment and patient selection.
We developed a method for estimating the positional distribution of transcription factor (TF) binding sites using ChIP-chip data, and applied it to recently published experiments on binding sites of nine TFs: OCT4, SOX2, NANOG, HNF1A, HNF4A, HNF6, FOXA2, USF1 and CREB1. The data were obtained from a genome-wide coverage of promoter regions from 8-kb upstream of the transcription start site (TSS) to 2-kb downstream. The number of target genes of each TF ranges from few hundred to several thousand. We found that for each of the nine TFs the estimated binding site distribution is closely approximated by a mixture of two components: a narrow peak, localized within 300-bp upstream of the TSS, and a distribution of almost uniform density within the tested region. Using Gene Ontology (GO) and Enrichment analysis, we were able to associate (for each of the TFs studied) the target genes of both types of binding with known biological processes. Most GO terms were enriched either among the proximal targets or among those with a uniform distribution of binding sites. For example, the three stemness-related TFs have several hundred target genes that belong to development and morphogenesis whose binding sites belong to the uniform distribution.
About half of cancers sustain mutations in the TP53 gene, whereas the other half maintain a wild-type p53 (wtp53) but may compromise the p53 response because of other alterations. Homeodomain-interacting protein kinase-2 (HIPK2) is a positive regulator of p53 oncosuppressor function. Here, we show, by microarray analysis, that wtp53 lost the target gene activation following stable knockdown of HIPK2 (HIPK2i) in colon cancer cell line. Our data show that the stable knockdown of HIPK2 led to wtp53 misfolding, as detected by p53 immunoprecipitation with conformation-specific antibodies, and that p53 protein misfolding impaired p53 DNA binding and transcription of target genes. We present evidence that zinc supplementation to HIPK2i cells increased p53 reactivity to conformation-sensitive PAb1620 (wild-type conformation) antibody and restored p53 sequence-specific DNA binding in vivo and transcription of target genes in response to Adriamycin treatment. Finally, combination of zinc and Adriamycin suppressed tumor growth in vivo and activated misfolded p53 that induced its target genes in nude mice tumor xenografts derived from HIPK2i cells. Bioinformatics analysis of microarray data from colon cancer patients showed significant association of poor survival with low HIPK2 expression only in tumors expressing wtp53. These results show a critical role of HIPK2 in maintaining the transactivation activity of wtp53 and further suggest that low expression of HIPK2 may impair the p53 function in tumors harboring wtp53.
p73 has been identified as a structural and functional homolog of the tumor suppressor p53. The transcriptional coactivator Yes-associated protein (YAP) has been demonstrated to interact with and to enhance p73-dependent apoptosis in response to DNA damage. Here, we show the existence of a proapoptotic autoregulatory feedback loop between p73, YAP, and the promyelocytic leukemia (PML) tumor suppressor gene. We demonstrate that PML is a direct transcriptional target of p73NAP, and we show that PML transcriptional activation by p73/YAP is under the negative control of the proto-oncogenic Akt/PKB kinase. Importantly, we find that PML and YAP physically interact through their PVPVY and WW domains, respectively, causing PML-mediated sumoylation and stabilization of YAP. Hence, we determine a mechanistic pathway in response to DNA damage that could have relevant implications for the treatment of human cancer.
Mouse models of hepatocellular carcinoma (HCC) simulate specific subgroups of human HCC. We investigated hepatocarcinogenesis in Mdr2-knockout (Mdr2-KO) mice, a model of inflammation-associated HCC, using gene expression profiling and immunohistochemical analyses. Gene expression profiling showed that although Mdr2-KO mice differ from other published murine HCC models, they share several important deregulated pathways and many coordinately differentially expressed genes with human HCC data sets. Analysis of genome positions of differentially expressed genes in liver tumors revealed a prolonged region of down-regulated genes on murine chromosome 8 in three of the six analyzed tumor samples. This region is syntenic to human chromosomal regions that are frequently deleted in human HCC and harbor multiple tumor suppressor genes. Real-time reverse transcription-PCR analysis of 16 tumor samples confirmed down-regulation of several tumor suppressors in most tumors. We show that in the aged Mdr2-KO mice, cyclin D1 nuclear level is increased in dysplastic hepatocytes that do not form nodules; however, it is decreased in most dysplastic nodules and in liver tumors. We found that this decrease is mostly at the protein, rather than the mRNA, level. These findings raise the question on the role of cyclin D1 at early stages of hepatocarcinogenesis in the Mdr2-KO HCC model. Furthermore, we show that most liver tumors in Mdr2-KO mice were characterized by the absence of beta-catenin activation. In conclusion, the Mdr2-KO mouse may serve as a model for beta-catenin-negative subgroup of human HCCs characterized by low nuclear cyclin D1 levels in tumor cells and by down-regulation of multiple tumor suppressor genes.
Background. Transcription factors (TF) regulate expression by binding to specific DNA sequences. A binding event is functional when it affects gene expression. Functionality of a binding site is reflected in conservation of the binding sequence during evolution and in over represented binding in gene groups with coherent biological functions. Functionality is governed by several parameters such as the TF-DNA binding strength, distance of the binding site from the transcription start site (TSS), DNA packing, and more. Understanding how these parameters control functionality of different TFs in different biological contexts is a must for identifying functional TF binding sites and for understanding regulation of transcription. Methodology/Principal Findings. We introduce a novel method to screen the promoters of a set of genes with shared biological function (obtained from the functional Gene Ontology (GO) classification) against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. More than 8000 human (and 23,000 mouse) genes, were assigned to one of 134 GO sets. Their promoters were searched (from 200 bp downstream to 1000 bp upstream the TSS) for 414 known DNA motifs. We optimized the sequence similarity score threshold, independently for every location window, taking into account nucleotide heterogeneity along the promoters of the target genes. The method, combined with binding sequence and location conservation between human and mouse, identifies with high probability functional binding sites for groups of functionally-related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were tested experimentally. Conclusions/Significance. We identified reliably functional TF binding sites. This is an essential step towards constructing regulatory networks. The promoter region proximal to the TSS is of central importance for
Signaling pathways invoke interplays between forward signaling and feedback to drive robust cellular response. In this study, we address the dynamics of growth factor signaling through profiling of protein phosphorylation and gene expression, demonstrating the presence of a kinetically defined cluster of delayed early genes that function to attenuate the early events of growth factor signaling. Using epidermal growth factor receptor signaling as the major model system and concentrating on regulation of transcription and mRNA stability, we demonstrate that a number of genes within the delayed early gene cluster function as feedback regulators of immediate early genes. Consistent with their role in negative regulation of cell signaling, genes within this cluster are downregulated in diverse tumor types, in correlation with clinical outcome. More generally, our study proposes a mechanistic description of the cellular response to growth factors by defining architectural motifs that underlie the function of signaling networks.
Primary immune response to pathogens involves the maturation of antigen-presenting dendritic cells ( DC). Bacterial lipopolysacharride (LPS) is a potent inducer of DC maturation, whereas the transforming growth factor beta (TGF beta) attenuates much of this process. Here, we analyzed the global gene expression pattern in LPS-treated bone marrow derived DC during inhibition of their maturation process by TGF beta. Exposure of DC to LPS induces a pronounced cell response, manifested in altered expression of a large number of genes. Interestingly, TGFb did not affect most of the LPS responding genes. Nevertheless, analysis identified a subset of genes that did respond to TGF beta, among them the two inflammatory cytokines interleukin (IL)-12 and IL-18. Expression of IL-12, the major proinflammatory cytokine secreted by mature DC, was downregulated by TGF beta, whereas the expression level of the proinflammatory cytokine IL-18, known to potentiate the IL-12 effect, was upregulated. Expression of the peroxisome proliferator-activated receptor gamma (PPAR gamma) increased in response to TGF beta, concomitantly with reduced expression of chemokine receptor 7 (CCR7). This finding supports the possibility that TGF beta-dependent inhibition of CCR7 expression in DC is mediated by PPAR gamma.
We investigate the distributions of the link overlap, P(Q), in three-dimensional Ising spin glasses. We use clustering methodology to identify a set of pairs of states from different Gibbs states and calculate its contribution to P(Q). We find that the distribution over this set does not become trivial as the system size increases.
L1-CAM, a neuronal cell adhesion receptor, is also expressed in a variety of cancer cells. Recent studies identified L1-CAM as a target gene of P-catenin-T-cell factor (TCF) signaling expressed at the invasive front of human colon cancer tissue. We found that L1-CAM expression in colon cancer cells lacking L1-CAM confers metastatic capacity, and mice injected in their spleen with such cells form liver metastases. We identified ADAM10, a metalloproteinase that cleaves the L1-CAM extracellular domain, as a novel target gene of beta-catenin-TCF signaling. ADAM10 overexpression in colon cancer cells displaying endogenous L1-CAM enhanced L1-CAM cleavage and induced liver metastasis, and ADAM10 also enhanced metastasis in colon cancer cells stably transfected with L1-CAM. DNA microarray analysis of genes induced by L1-CAM in colon cancer cells identified a cluster of genes also elevated in a large set of human colon carcinoma tissue samples. Expression of these genes in normal colon epithelium was low. These results indicate that there is a gene program induced by L1-CAM in colon cancer cells that is also present in colorectal cancer tissue and suggest that L1-CAM can serve as target for colon cancer therapy.
Chromosomal aneuploidy is commonly observed in neoplastic diseases and is an important prognostic marker. Here we examine how gene expression profiles reflect aneuploidy and whether these profiles can be used to detect changes in chromosome copy number. We developed two methods for detecting such changes in the gene expression profile of a single sample. The first method, fold-change analysis, relies on the availability of gene expression data from a large cohort of patients with the same disease. The expression profile of the sample is compared with that of the dataset. The second method, chromosomal relative expression analysis, is more general and requires the expression data from the tested sample only. We found that the relative expression values are stable among different chromosomes and exhibit little variation between different normal tissues. We exploited this novel finding to establish the set of reference values needed to detect changes in the copy number of chromosomes in a single sample on the basis of gene expression levels. We measured the accuracy of the performance of each method by applying them to two independent leukemia datasets. The second method was also applied to two solid tumor datasets. We conclude that chromosomal aneuploidy can be detected and predicted by analysis of gene expression profiles. This article contains Supplementary Material available at http://www.interscience.wiley.com/jpages/1045-2257/suppmat. (c) 2006 Wiley-Liss, Inc.
Motivation: Existing computational methods that identify transcription factor (TF) binding sites on a gene's promoter are plagued by significant inaccuracies. Binding of a TF to a particular sequence is assessed by comparing its similarity score, obtained from the TF's known position weight matrix (PWM), to a threshold. If the similarity score is above the threshold, the sequence is considered a putative binding site. Determining this threshold is a central part of the problem. for which no satisfactory biologically based solution exists. Results: We present here a method that integrates gene expression data with sequence-based scoring of TF binding sites, for determining a global score threshold for each TF. We validate our method, STOP (Searching TFs Of Promoters), in several ways: (1) we calculate the average expression values of groups of human putative target genes of each TF, and compare them to similar averages derived for random gene groups. The groups of putative targets show significantly higher relative average expression. (2) We find high consistency between the induced lists of putative targets in human and in mouse. (3) The expression patterns associated with human and mouse genes (ordered by PWM scores for each TF) exhibit high similarity between human and mouse, indicating that our method has firm biological basis. (4) Comparison of results obtained by STOP and PRIMA (Elkon et al., 2003) suggests that determining the score threshold using gene expression, as is done in STOP, is more biologically tuned.
Dietary antioxidants and selenium compounds were shown to have a therapeutic effect against hepatocellular carcinoma In several mouse models. We tested the effects of tannic acid and selenomethionine on hepatocellular carcinoma development in Mdr2 knockout (Mdr2-KO) mice. Mdr2-KO and age-matched Mdr2 heterozygous control mice were fed with tannic acid or selenomethionine during the first 3 months of life. Then, several mice from each group were sacrificed, and liver tissue samples were removed for analysis. The remaining mice were fed a regular diet until the age of 16 months, at which time the number and size of liver tumors were determined. Liver tissue samples of 3-month-old mice were subjected to gene expression profiling analysis using cDNA macroarrays containing probes for 240 genes that regulate responses to oxidative stress and inflammation or lipid metabolism. Both tannic acid and selenomethionine had partial chemopreventive effect on development of hepatocellular carcinoma in Mdr2-KO mice: they reduced the incidence of large tumor nodules (diameter > 1 cm) at age 16 months. Both agents inhibited gene expression and reversed up-regulation of many genes that control inflammation or response to oxidative stress in Mdr2-KO livers at age 3 months. This inhibitory effect on gene expression correlated with the ability of agents to reduce incidence of large tumors: selenomethionine was more active than tannic acid in both aspects. Understanding the molecular mechanism of chemoprevention effect could improve our therapeutic modalities while using these agents.
Predicting at the time of discovery the prognosis and metastatic potential of cancer is a major challenge in current clinical research. Numerous recent studies searched for gene expression signatures that outperform traditionally used clinical parameters in outcome prediction. Finding such a signature will free many patients of the suffering and toxicity associated with adjuvant chemotherapy given to them under current protocols, even though they do not need such treatment. A reliable set of predictive genes also will contribute to a better understanding of the biological mechanism of metastasis. Several groups have published lists of predictive genes and reported good predictive performance based on them. However, the gene lists obtained for the same clinical types of patients by different groups differed widely and had only very few genes in common. This lack of agreement raised doubts about the reliability and robustness of the reported predictive gene lists, and the main source of the problem was shown to be the small number of samples that were used to generate the gene lists. Here, we introduce a previously undescribed mathematical method, probably approximately correct (PAC) sorting, for evaluating the robustness of such lists. We calculate for several published data sets the number of samples that are needed to achieve any desired level of reproducibility. For example, to achieve a typical overlap of 50% between two predictive lists of genes, breast cancer studies would need the expression profiles of several thousand early discovery patients.
A bioinformatics-based analysis of endochondral bone formation model detected several genes upregulated in this process. Among these genes the dickkopf homolog 3 (Dkk3) was upregulated and further studies showed that its expression affects in vitro and in vivo osteogenesis. This study indicates a possible role of Dkk3 in regulating bone formation. Introduction: Endochondral bone formation is a complex biological process involving numerous chondrogenic, osteogenic, and angiogenic proteins, only some of which have been well studied. Additional key genes may have important roles as well. We hypothesized that to identify key genes and signaling pathways crucial for bone formation, a comprehensive gene discovery strategy should be applied to an established in vivo model of osteogenesis. Materials and Methods: We used in vivo implanted C3H10T1/2 cells that had been genetically engineered to express human bone morphogenetic protein-2 (BMP2) in a tetracycline-regulated system that controls osteogenic differentiation. Oligonucleotide microarray data from the implants (n = 4 repeats) was analyzed using coupled two-way clustering (CTWC) and statistical methods. For studying the effects of dickkopf homolog 3 (Dkk3) in chondrogenesis and osteogenesis, C3H10T1/2 mesenchymal progenitors were used. Results: The CTWC revealed temporal expression of Dkk3 with other chondrogenesis-, osteogenesis-, and Wnt-related genes. Quantitative RT-PCR confirmed the expression of Dkk3 in the implants. C3H10T1/2 cells that expressed Dkk3 in the presence of BMP2 displayed lower levels of alkaline phosphatase and collagen I mRNA expression than control C3H10T1/2 cells that did not express Dkk3. Interestingly, the levels of collagen 11 mRNA expression, Alcian blue staining, and glucose aminoglycans (GAGs) production were, not influenced by Dkk3 expression. In vivo mu CT and bioluminescence imaging revealed that co-expression of Dkk3 and BMP2 by implanted C3H10T1/2 cells induced the formation of sig
Molecular events preceding the development of hepatocellular carcinoma were studied in the Mdr2-knockout (Mdr2-KO) mice. These mice lack the liver-specific P-glycoprotein responsible for phosphatidylcholine transport across the canalicular membrane. Portal inflammation ensues at an early age followed by hepatocellular carcinoma development after the age of I year. Liver tissue samples of Mdr2-KO mice in the early and late precancerous stages of liver disease were subjected to histologic, biochemical, and gene expression profiling analysis. In an early stage, multiple protective mechanisms were found, including induction of many antiinflammatory and antioxidant genes and increase of total antioxidant capacity of liver tissue. Despite stimulation of hepatocyte DNA replication, their mitotic activity was blocked at this stage. In the late stage of the disease, although the total antioxidant capacity of liver tissue of Mdr2-KO mice was normal, and inflammation was less prominent, many protective genes remained overexpressed. Increased mitotic activity of hepatocytes resulted in multiple dysplastic nodules, some of them being steatotic. Expression of many genes regulating lipid and phospholipid metabolism was distorted, including up-regulation of choline kinase A, a known oncogene. Many other oncogenes, including cyclin D1, Jun, and some Ras tiontologues, were tip-regulated in Mdr2-KO mice at both stages of liver disease. However, we found no increase of Ras activation. Our data suggest that some of the adaptive mechanisms induced in the early stages of hepatic disease, which protect the liver from injury, could have an effect in hepatocarcinogenesis at later stages of the disease in this hepatocellular carcinoma model.
A recent result presented the expansion for the entropy rate of a hidden Markov process (HMP) as a power series in the noise variable c. The coefficients of the expansion around the noiseless (epsilon = 0) limit were calculated up to 11th order, using a conjecture that relates the entropy rate of an HMP to the entropy of a process of finite length (which is calculated analytically). In this letter, we generalize and prove the conjecture and discuss its theoretical and practical consequences.
Finding the entropy rate of Hidden Markov Processes is an active research topic, of both theoretical and practical importance. A recently used approach is studying the asymptotic behavior of the entropy rate in various regimes. In this paper we generalize and prove a previous conjecture relating the entropy rate to entropies of finite systems. We use the proof to establish series expansions for the entropy rate in two different regimes. We also study the radius of convergence of the two series expansions.
Traditionally, immunologic diagnosis has been based on an attempt to correlate each disease with a specific immune reactivity, such as an antibody or a T-cell response to a single antigen specific for the disease entity. The state of the body, however, appears to be encoded by the immune system in collectives of reactivities and not by single reactivities. Here we describe our use of microarray technology and informatics to develop an antigen chip capable of detecting global patterns of antibodies binding to hundreds of antigens simultaneously. The patterns fashion diagnostic signatures.
Tumors contain a fraction of cancer stem cells that maintain the propagation of the disease. The CD34+CD38- cells, isolated from acute myeloid leukemia (AML), were shown to be enriched leukemic stem cells (LSC). We isolated the CD34+CD38- cell fraction from AML and compared their gene expression profiles to the CD34+CD38+ cell fraction, using microarrays. We found 409 genes that were at least twofold over- or underexpressed between the two cell populations. These include underexpression of DNA repair, signal transduction and cell cycle genes, consistent with the relative quiescence of stem cells, and chromosomal aberrations and mutations of leukemic cells. Comparison of the LSC expression data to that of normal hematopoietic stem cells (HSC) revealed that 34% of the modulated genes are shared by both LSC and HSC, supporting the suggestion that the LSC originated within the HSC progenitors. We focused on the Notch pathway since Jagged-2, a Notch ligand was found to be overexpressed in the LSC samples. We show that DAPT, an inhibitor of gamma-secretase, a protease that is involved in Jagged and Notch signaling, inhibits LSC growth in colony formation assays. Identification of additional genes that regulate LSC self-renewal may provide new targets for therapy.
The transcription factor Nanog is uniquely expressed in embryonic stein (ES) cells and in germ cell tumors and is important for self-renewal. To understand the relation between this and cell transformation, we expressed Nanog in NIH3T3 cells, and these cells showed an increased growth rate and a transformed phenotype as demonstrated by foci formation and colony growth in soft agar. This suggests that Nanog possesses Lire oncogenic potential that may be related to the role it plays in germ cell tumors and to its function in self renewal of ES cells. We studied the transcription targets of Nanog using microarrays to identify Nanog regulated genes. The list of genes regulated by Nanog was unique for each cell type and more than 10% of the Nanog regulated genes, including transcription factors, are primary Nanog targets since their promoters bind Nanog in ES cells. Some of these target genes can explain the transformation of NIH3T3. (c) 2006 Elsevier Inc. All rights reserved.
Background: The human genome contains over one million Alu repeat elements whose distribution is not uniform. While metabolism-related genes were shown to be enriched with Alu, in structural genes Alu elements are under-represented. Such observations led researchers to suggest that Alu elements were involved in gene regulation and were selected to be present in some genes and absent from others. This hypothesis is gaining strength due to findings that indicate involvement of Alu elements in a variety of functions; for example, Alu sequences were found to contain several functional transcription factor (TF) binding sites (BSs). We performed a search for new putative BSs on Alu elements, using a database of Position Specific Score Matrices (PSSMs). We searched consensus Alu sequences as well as specific Alu elements that appear on the 5 Kbp regions upstream to the transcription start site (TSS) of about 14000 genes. Results: We found that the upstream regions of the TSS are enriched with Alu elements, and the Alu consensus sequences contain dozens of putative BSs for TFs. Hence several TFs have Alu-associated BSs upstream of the TSS of many genes. For several TFs most of the putative BSs reside on Alu; a few of these were previously found and their association with Alu was also reported. In four cases the fact that the identified BSs resided on Alu went unnoticed, and we report this association for the first time. We found dozens of new putative BSs. Interestingly, many of the corresponding TFs are associated with early markers of development, even though the upstream regions of development-related genes are Alu-poor, compared with translational and protein biosynthesis related genes, which are Alu-rich. Finally, we found a correlation between the mouse BI and human Alu densities within the corresponding upstream regions of orthologous genes. Conclusion: We propose that evolution used transposable elements to insert TF binding motifs into promoter regions. We obser
Several studies have verified the existence of multiple chromosomal abnormalities in colon cancer. However, the relationships between DNA copy number and gene expression have not been adequately explored nor globally monitored during the progression of the disease. In this work, three types of array-generated data (expression, single nucleotide poly morphism, and comparative genomic hybridization) were collected from a large set of colon cancer patients at various stages of the disease. Probes were annotated to specific chromosomal locations and coordinated alterations in DNA copy number and transcription levels were revealed at specific positions. We show that across many large regions of the genome, changes in expression level are correlated with alterations in DNA content. Often, large chromosomal segments, containing multiple genes, are transcriptionally affected in a coordinated way, and we show that the underlying mechanism is a corresponding change in DNA content. This implies that whereas specific chromosomal abnormalities may arise stochastically, the associated changes in expression of some or all of the affected genes are responsible for selecting cells bearing these abnormalities for clonal expansion. Indeed, particular chromosomal regions are frequently gained and overexpressed (e.g., 7p, 8q, 13q, and 20q) or lost and underexpressed (e.g., 1p, 4, 5q, 8p, 14q, 15q, and 18) in primary colon tumors, making it likely that these changes favor tumorigenicity. Furthermore, we show that these aberrations are absent in normal colon mucosa, appear in benign adenomas (albeit only in a small fraction of the samples), become more frequent as disease advances, and are found in the majority of metastatic samples.
The difficulty to dissect a complex phenotype of established malignant cells to several critical transcriptional programs greatly impends our understanding of the malignant transformation. The genetic elements required to transform some primary human cells to a tumorigenic state were described in several recent studies. We took the advantage of the global genomic profiling approach and tried to go one step further in the dissection of the transformation network. We sought to identify the genetic signatures and key target genes, which underlie the genetic alterations in p53, Ras, INK4A locus, and telomerase, introduced in a stepwise manner into primary human fibroblasts. Here, we show that these are the minimally required genetic alterations for sarcomagenesis in vivo. A genome-wide expression profiling identified distinct genetic signatures corresponding to the genetic alterations listed above. Most importantly, unique transformation hallmarks, such as differentiation block, aberrant mitotic progression, increased angiogenesis, and invasiveness, were identified and coupled with genetic signatures assigned for the genetic alterations in the p53, INK4A locus, and H-Ras, respectively. Furthermore, a transcriptional program that defines the cellular response to p53 inactivation was an excellent predictor of metastasis development and bad prognosis in breast cancer patients. Deciphering these transformation fingerprints, which are affected by the most common oncogenic mutations, provides considerable insight into regulatory circuits controlling malignant transformation and will hopefully open new avenues for rational therapeutic decisions.
Specific HPV DNA sequences are associated with more than 90% of invasive carcinomas of the uterine cervix. Viral E6 and E7 oncogenes are key mediators in cell transformation by disrupting TP53 and RB pathways. To investigate molecular mechanisms involved in the progression of invasive cervical carcinoma, we performed a gene expression study on cases selected according to viral and clinical parameters. Using Coupled Two-Way Clustering and Sorting Points Into Neighbourhoods methods, we identified a 'cervical cancer proliferation cluster' composed of 163 highly correlated transcripts. Most of these transcripts corresponded to E2F pathway genes controlling cell division or proliferation, whereas none was known as TP53 primary target. The average expression level of the genes of this cluster was higher in tumours with an early relapse than in tumours with a favourable course (P = 0.026). Moreover, we found that E6/E7 mRNA expression level was positively correlated with the expression level of the cluster genes and with viral DNA load. These findings suggest that HPV E6/E7 expression level plays a key role in the progression of invasive carcinoma of the uterine cervix via the deregulation of cellular genes controlling tumour cell proliferation. HPV expression level may thus provide a biological marker useful for prognosis assessment and specific therapy of the disease.
Deciphering regulatory events that drive malignant transformation represents a major challenge for systems biology. Here, we analyzed genome-wide transcription profiling of an in vitro cancerous transformation process. We focused on a cluster of genes whose expression levels increased as a function of p53 and p16(INK4A) tumor suppressors inactivation. This cluster predominantly consists of cell cycle genes and constitutes a signature of a diversity of cancers. By linking expression profiles of the genes in the cluster with the dynamic behavior of p53 and p16(INK4A), we identified a promoter architecture that integrates signals from the two tumor suppressive channels and that maps their activity onto distinct levels of expression of the cell cycle genes, which, in turn, correspond to different cellular proliferation rates. Taking components of the mitotic spindle as an example, we experimentally verified our predictions that p53-mediated transcriptional repression of several of these novel targets is dependent on the activities of p21, NFY, and E2F. Our study demonstrates how a well-controlled transformation process allows linking between gene expression, promoter architecture, and activity of upstream signaling molecules.
Biology has undergone a revolution during the past decade. Deciphering the human genome has opened new horizons, among which the advent of DNA microarrays has been perhaps the most significant. These miniature measuring devices report the levels at which tens of thousands of genes are expressed in a collection of cells of interest (such as tissue from a tumor). I describe here briefly this technology and present an example of how analysis of data obtained from such high throughput experiments provides insights of possible clinical and therapeutic relevance for Acute Lymphoblastic Leukemia. Next, I describe how gene expression data is used to deduce a new design principle, "Just In Case", used by stem cells. Finally I briefly review a different novel technology, of antigen chips, which provide a fingerprint of a subject's immune system and may become a predictive clinical tool. The work reviewed here was done in collaboration with numerous colleagues and students. (c) 2005 Elsevier B.V. All rights reserved.
The entropy of a binary symmetric Hidden Markov Process is calculated as an expansion in the noise parameter epsilon. We map the problem onto a one-dimensional Ising model in a large field of random signs and calculate the expansion coefficients up to second order in epsilon. Using a conjecture we extend the calculation to 11th order and discuss the convergence of the resulting series.
We calculate the Shannon entropy rate of a binary Hidden Markov Process (HMP), of given transition rate and noise E (emission), as a series expansion in c. The first two orders are calculated exactly. We then evaluate, for finite histories, simple upper-bounds of Cover and Thomas. Surprisingly, we find that for a fixed order k and history of n steps, the bounds become independent of n for large enough n. This observation is the basis of a conjecture, that the upper-bound obtained for n >= (k + 3)/2 gives the exact entropy rate for any desired order k of epsilon.
Motivation: Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression. Results: To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62 839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index tau, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 50% of all expression patterns. We developed a binary classification, indicating for every gene the I (B) tissues in which it is overly expressed, and the 12 - I (B) tissues in which it shows low expression. The 85 dominant midrange patterns with I (B) = 2-11 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, whereby de novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny.
We have analyzed gene expression data from three different kinds of samples: normal human tissues, human cancer cell lines, and leukemic cells from lymphoid and myeloid leukemia pediatric patients. We have searched for genes that are overexpressed in human cancer and also show specific patterns of tissue-dependent expression in normal tissues. Using the expression data of the normal tissues, we identified 4,346 genes with a high variability of expression and clustered these genes according to their relative expression level. Of 91 stable clusters obtained, 24 clusters included genes preferentially expressed either only in hematopoietic tissues or in hematopoietic and one to two other tissues; 28 clusters included genes preferentially expressed in various nonhematopoietic tissues such as neuronal, testis, liver, kidney, muscle, lung, pancreas, and placenta. Analysis of the expression levels of these two groups of genes in the human cancer cell lines and leukemias identified genes that were highly expressed in cancer cells but not in their normal counterparts and, thus, were overexpressed in the cancers. The different cancer cell lines and leukemias varied in the number and identity of these overexpressed genes. The results indicate that many genes that are overexpressed in human cancer cells are specific to a variety of normal tissues, including normal tissues other than those from which the cancer originated. It is suggested that this general property of cancer cells plays a major role in determining the behavior of the cancers, including their metastatic potential.
We introduce a novel unsupervised approach for the organization and visualization of multidimensional data. At the heart of the method is a presentation of the full pairwise distance matrix of the data points, viewed in pseudocolor. The ordering of points is iteratively permuted in search of a linear ordering, which can be used to study embedded shapes. Several examples indicate how the shapes of certain structures in the data (elongated, circular and compact) manifest themselves visually in our permuted distance matrix. It is important to identify the elongated objects since they are often associated with a set of hidden variables, underlying continuous variation in the data. The problem of determining an optimal linear ordering is shown to be NP-Complete, and therefore an iterative search algorithm with O(n(3)) step-complexity is suggested. By using sorting points into neighborhoods, i.e. SPIN to analyze colon cancer expression data we were able to address the serious problem of sample heterogeneity, which hinders identification of metastasis related genes in our data. Our methodology brings to light the continuous variation of heterogeneity-starting with homogeneous tumor samples and gradually increasing the amount of another tissue. Ordering the samples according to their degree of contamination by unrelated tissue allows the separation of genes associated with irrelevant contamination from those related to cancer progression.
The study of the cascade of events of induction and sequential gene activation that takes place during human embryonic development is hindered by the unavailability of postimplantation embryos at different stages of development. Spontaneous differentiation of human embryonic stem cells (hESCs) can occur by means of the formation of embryoid bodies (EBs), which resemble certain aspects of early embryos to some extent. Embryonic vascular formation, vasculogenesis, is a sequential process that involves complex regulatory cascades. In this study, changes of gene expression along the development of human EBs for 4 weeks were studied by large-scale gene screening. Two main clusters were identified-one of downregulated genes such as POU5, NANOG, TDGF1/Cripto (TDGF, teratocarcinoma-derived growth factor-1), LIN28, CD24, TERF1 (telomeric repeat binding factor-1), LEFTB (left-right determination, factor B), and a second of up-regulated genes such as TWIST, WNT5A, WT1, AFP, ALB, NCAM1. Focusing on the vascular system development, genes known to be involved in vasculogenesis and angiogenesis were explored. Up-regulated genes include vasculogenic growth factors such as VEGFA, VEGFC, FIGF (VEGFD), ANG1, ANG2, TGFbeta3, and PDGFB, as well as the related receptors FLT1, FLT4, PDGFRB, TGFbetaR2, and TGFbetaR3, The reproducibility of the array data was verified independently and illustrated that many genes known to be involved in vascular development are activated during the differentiation of hESCs in culture. Hence, the analysis of the vascular system can be extended to other differentiation pathways, allocating human EBs as an in vitro model to study early human development. (C) 2004 Wiley-Liss, Inc.
On the basis of epidemiological studies, infection was suggested to play a role in the etiology of human cancer. While for some cancers such a role was indeed demonstrated, there is no direct biological support for the role of viral pathogens in the pathogenesis of childhood leukemia. Using a novel bioinformatic tool that alternates between clustering and standard statistical methods of analysis, we performed a 'double-blind' search of published gene expression data of subjects with different childhood acute lymphoblastic leukemia (ALL) subtypes, looking for unanticipated partitions of patients, induced by unexpected groups of genes with correlated expression. We discovered a group of about 30 genes, related to the interferon response pathway, whose expression levels divide the ALL samples into two subgroups; high in 50, low in 285 patients. Leukemic subclasses prevalent in early childhood (the age most susceptible to infection) are over-represented in the high-expression subgroup. Similar partitions, induced by the same genes, were found also in breast and ovarian cancer but not in lung cancer, prostate cancer and lymphoma. About 40% of breast cancer samples expressed the 'interferon-related' signature. It is of interest that several studies demonstrated mouse mammary tumor virus-like sequences in about 40% of breast cancer samples. Our discovery of an unanticipated strong signature of an interferon-induced pathway provides molecular support for a role for either inflammation or viral infection in the pathogenesis of childhood leukemia as well as breast and ovarian cancer.
Motivation: Predicting the metastatic potential of primary malignant tissues has direct bearing on the choice of therapy. Several microarray studies yielded gene sets whose expression profiles successfully predicted survival. Nevertheless, the overlap between these gene sets is almost zero. Such small overlaps were observed also in other complex diseases, and the variables that could account for the differences had evoked a wide interest. One of the main open questions in this context is whether the disparity can be attributed only to trivial reasons such as different technologies, different patients and different types of analyses. Results: To answer this question, we concentrated on a single breast cancer dataset, and analyzed it by a single method, the one which was used by van't Veer et al. to produce a set of outcome-predictive genes. We showed that, in fact, the resulting set of genes is not unique; it is strongly influenced by the subset of patients used for gene selection. Many equally predictive lists could have been produced from the same analysis. Three main properties of the data explain this sensitivity: (1) many genes are correlated with survival; (2) the differences between these correlations are small; (3) the correlations fluctuate strongly when measured over different subsets of patients. A possible biological explanation for these properties is discussed.
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The availability of whole genome sequences opens the way for computational methods to search for the key elements in transcription regulation. These include methods for discovering the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). We developed a probabilistic approach for searching for putative binding sites. Given a promoter sequence and a PSSM, we scan the promoter and find the position with the maximal score. Then we calculate the probability to get such a maximal score or higher on a random promoter. This is the p-value of the putative binding site. In this way, we searched for putative binding sites in the upstream sequences of Saccharomyces cerevisiae, where some binding sites are known ( according to the Saccharomyces cerevisiae Promoters Database, SCPD). Our method produces either exact p-values, or a better estimate for them than other methods, and this improves the results of the search. For each gene we found its statistically significant putative binding sites. We measured the rates of true positives, by a comparison to the known binding sites, and also compared our results to these of MatInspector, a commercially available software that looks for putative binding sites in DNA sequences according to PSSMs. Our results were significantly better. In contrast with us, MatInspector doesn't calculate the exact statistical significance of its results.
Human embryonic stem cells (ESC) are undifferentiated and are endowed with the capacities of self-renewal and pluripotential differentiation. Adult stem cells renew their own tissue, but whether they can transdifferentiate to other tissues is still controversial. To understand the genetic program that underlies the pluripotency of stem cells, we compared the transcription profile of ESC with that of progenitor/stem cells of human hematopoietic and keratinocytic origins, along with their mature cells to be viewed as snapshots along tissue differentiation. ESC gene profiles show higher complexity with significantly more highly expressed genes than adult cells. We hypothesize that ESC use a strategy of expressing genes that represent various differentiation pathways and selection of only a few for continuous expression upon differentiation to a particular target. Such a strategy may be necessary for the pluripotency of ESC. The progenitors of either hematopoietic or keratinocytic cells also follow the same design principle. Using advanced clustering, we show that many of the ESC expressed genes are turned off in the progenitors/stem cells followed by a further down-regulation in adult tissues. Concomitantly, genes specific to the target tissue are up-regulated toward mature cells of skin or blood.
One's present repertoire of antibodies encodes the history of one's past immunological experience. Can the present autoantibody repertoire be consulted to predict resistance or susceptibility to the future development of an autoimmune disease? Here, we developed an antigen microarray chip and used bioinformatic analysis to study a model of type 1 diabetes developing in nonobese diabetic male mice in which the disease was accelerated and synchronized by exposing the mice to cyclophosphamide at 4 weeks of age. We obtained sera from 19 individual mice, treated the mice to induce cyclophosphamide-accelerated diabetes (CAD), and found, as expected, that 9 mice became severely diabetic, whereas 10 mice permanently resisted diabetes. We again obtained serum from each mouse after CAD induction. We then analyzed, by using rank-order and superparamagnetic clustering, the patterns of antibodies in individual mice to 266 different antigens spotted on the chip. A selected panel of 27 different antigens (10% of the array) revealed a pattern of IgG antibody reactivity in the pre-CAD sera that discriminated between the mice resistant or susceptible to CAD with 100% sensitivity and 82% specificity (P = 0.017). Surprisingly, the set of IgG antibodies that was informative before CAD induction did not separate the resistant and susceptible groups after the onset of CAD; new antigens became critical for post-CAD repertoire discrimination. Thus, at least for a model disease, present antibody repertoires can predict future disease, predictive and diagnostic repertoires can differ, and decisive information about immune system behavior can be mined by bioinformatic technology. Repertoires matter.
Using DNA microarray and cluster analysis of expressed genes in a cloned line (M1-t-p53) of myeloid leukemic cells, we have analyzed the expression of genes that are preferentially expressed in different normal tissues. Clustering of 547 highly expressed genes in these leukemic cells showed 38 genes preferentially expressed in normal hematopoietic tissues and 122 other genes preferentially expressed in different normal nonhematopoietic tissues, including neuronal tissues, muscle, liver, and testis. We have also analyzed the genes whose expression in the leukemic cells changed after activation of WT p53 and treatment with the cytokine IL-6 or the calcium mobilizer thapsigargin. Of 620 such genes in the leukemic cells that were differentially expressed in normal tissues, clustering showed 80 genes that were preferentially expressed in hematopoietic tissues and 132 genes in different normal nonhematopietic tissues that also included neuronal tissues, muscle, liver, and testis. Activation of p53 and treatment with IL-6 or thapsigargin induced different changes in the genes preferentially expressed in these normal tissues. These myeloid leukemic cells thus express genes that are expressed in normal nonhematopoietic tissues, and various treatments can reprogram these cells to induce other such nonhematopoietic genes. The results indicate that these leukemic cells share with normal hematopoietic stem cells the plasticity of differentiation to different cell types. It is suggested that this reprogramming to induce in malignant cells genes that are expressed in different normal tissues may be of clinical value in therapy.
We study the low-temperature spin-glass phases of the Sherrington-Kirkpatrick (SK) model and of the 3-dimensional short-range Ising spin-glass (3DISG). By using clustering to focus on the relevant parts of phase space and reduce finite size effects, we found that for the SK model ultrametricity becomes clearer as the system size increases, while for the short-range case our results indicate the opposite, i.e., lack of ultrametricity. Another method, which does not rely on clustering, indicates that the mean-field solution works for the SK model but does not apply in detail to the 3DISG, for which stochastic stability is also violated.
To gain insight into the transformation of epidermal cells into squamous carcinoma cells (SCC), we compared the response to ultraviolet B radiation (UVB) of normal human epidermal keratinocytes (NHEK) versus their transformed counterpart, SCC, using biological and molecular profiling. DNA microarray analyses (Affymetrix(R), similar to12 000 genes) indicated that the major group of upregulated genes in keratinocytes fall into three categories: (i) antiapoptotic and cell survival factors, including chemokines of the CXC/CC subfamilies (e.g. IL-8, GRO-1, -2, -3, SCYA20), growth factors (e.g. HB-EGF, CTGF, INSL-4), and proinflammatory mediators (e.g. COX-2, S100A9), (ii) DNA repair-related genes (e.g. GADD45, ERCC, BTG-1, Histones), and (iii) ECM proteases (MMP-1, -10). The major downregulated genes are DeltaNp63 and PUMILIO, two potential markers for the maintenance of keratinocyte stem cells. NHEK were found to be more resistant than SCC to UVB-induced apoptosis and this resistance was mainly because of the protection from cell death by secreted survival factors, since it can be transferred from NHEK to SCC cultures by the conditioned medium. Whereas the response of keratinocytes to UVB involved regulation of key checkpoint genes (p53, MDM2, p21(Cip1), DeltaNp63), as well as antiapoptotic and DNA repair-related genes - no or little regulation of these genes was observed in SCC. The effect of UVB on NHEK and SCC resulted in upregulation of 251 and 127 genes, respectively, and downregulation of 322 genes in NHEK and 117 genes in SCC. To further analyse these changes, we used a novel unsupervised coupled two-way clustering method that allowed the identification of groups of genes that clearly partitioned keratinocytes from SCC, including a group of genes whose constitutive expression levels were similar before UVB. This allowed the identification of discriminating genes not otherwise revealed by simple static comparison in the absence of UVB irradiation. The implicati
The development of targeted treatment strategies adapted to individual patients requires identification of the different tumor classes according to their biology and prognosis. We focus here on the molecular aspects underlying these differences, in terms of sets of genes that control pathogenesis of the different subtypes of astrocytic glioma. By performing cDNA-array analysis of 53 patient biopsies, comprising low-grade astrocytoma, secondary glioblastoma (respective recurrent high-grade tumors), and newly diagnosed primary glioblastoma, we demonstrate that human gliomas can be differentiated according to their gene expression. We found that low-grade astrocytoma have the most specific and similar expression profiles, whereas primary glioblastoma exhibit much larger variation between tumors. Secondary glioblastoma display features of both other groups. We identified several sets of genes with relatively highly correlated expression within groups that: (a) can be associate wit specific biological functions; and (b) effectively differentiate tumor class. One prominent gene cluster discriminating primary versus nonprimary glioblastoma comprises mostly genes involved in angiogenesis, including VEGF fms-related tyrosine kinase 1 but also IGFBP2, that has not yet been directly linked to angiogenesis. In situ hybridization demonstrating coexpression of IGFBP2 and VEGF in pseudopalisading cells surrounding tumor necrosis provided further evidence for a possible involvement of IGFBP2 in angiogenesis. The separating groups of genes were found by the unsupervised coupled two-way clustering method, and their classification power was validated by a supervised construction of a nearly perfect glioma classifier.
The ALL-1 gene is directly involved in 5-10% of acute lymphoblastic leukemias (ALLs) and acute myeloid leukemias (AMLs) by fusion to other genes or through internal rearrangements. DNA microarrays were used to determine expression profiles of ALLs and AMLs with ALL-1 rearrangements. These profiles distinguish those tumors from other ALLs and AMLs. The expression patterns of ALL-1-associated tumors, in particular ALLs, involve oncogenes, tumor suppressors, antiapoptotic genes, drug-resistance genes, etc., and correlate with the aggressive nature of the tumors. The genes whose expression differentiates between ALLs with and without ALL-1 rearrangement were further divided into several groups, enabling separation of ALL-1-associated ALLs into two subclasses. One of the groups included 43 genes that exhibited expression profiles closely linked to ALLs with ALL-1 rearrangements. Further, there were evident differences between the expression profiles of AMLs in which ALL-1 had undergone fusion to other genes and AMLs with partial duplication of ALL-1. The extensive analysis described here pinpointed genes that might have a direct role in pathogenesis.
The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data from colon cancer, brain tumors and breast cancer.
Using DNA microarray and clustering of expressed genes we have analyzed the mechanism of inhibition of wild-type p53-induced apoptosis by the cytokine interleukin 6 (IL-6) and the calcium mobilizer thapsigargin (TG). Clustering analysis of 1,786 genes, the expression level of which changed after activation of wild-type p53 in the absence or presence of IL-6 or TG, showed that these compounds did not cause a general inhibition of the ability of p53 to up-regulate or down-regulate gene expression. Expression of various p53 targets implicated as mediators of p53-induced apoptosis was also not affected by IL-6 or TG. These compounds thus can bypass the effect of wild-type p53 on gene expression and inhibit apoptosis. IL-6 and TG activated different p53-independent pathways of gene expression that include up-regulation of antiapoptotic genes. IL-6 and TG also activated different differentiation-associated genes. The ability of compounds such as cytokines and calcium mobilizers to inhibit p53-mediated apoptosis without generally inhibiting gene expression regulated by p53 can facilitate tumor development and tumor resistance to radiation and chemotherapy in cells that retain wild-type p53.
A novel data set, GeneNote (Gene Normal Tissue Expression), was produced to portray complete gene expression profiles in healthy human tissues using the Affymetrix GeneChip HG-U95 set, which includes 62 839 probe-sets. The hybridization intensities of two replicates were processed and analyzed to yield the complete transcriptome for twelve human tissues. Abundant novel information on tissue specificity provides a baseline for past and future expression studies related to diseases. The data is posted in GeneNote (http://genecards.weizmann.ac.il/genenote/), a widely used compendium of human genes (http://bioinfo.weizmann.ac.il/genecards). (C) 2003 Academie des sciences. Published by Elsevier SAS. All rights reserved.
Clustering gene expression data by exploiting phase transitions in granular ferromagnets requires transforming the data to a granular substrate. We present a method using the recently introduced homogeneity order parameter Lambda [H. Agrawal, Phys. Rev. Lett. 89, 268702 (2002)] for optimizing the parameter controlling the "granularity" and thus the stability of partitions. The model substrates obtained for gene expression data have a highly granular structure. We explore properties of phase transition in high q ferromagnetic Potts models on these substrates and show that the maximum of the width of superparamagnetic domain, corresponding to maximally stable partitions, coincides with the minimum of Lambda.
We present and review coupled two-way clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.
Informatic methodologies are being applied successfully to analyze the complexity of the genome. But beyond the genome, the immune system reflects the state of the body in health and disease. Traditionally, immunologists have reduced the immune system, where possible, to one-to-one relationships between particular antigens and particular antibodies or T-cell clones. Autoimmune diseases, caused by an immune attack against a body component, are usually investigated by following the response to single self-antigens. In this study, we apply informatics to analyze patterns of autoantibodies rather than single species of autoantibodies. This study was designed not to replace traditional approaches to immune diagnosis, but to test whether meaningful patterns of autoantibodies might exist. Using an unbiased solid-phase ELISA antibody test, we detected serum IgG and IgM antibodies in the sera of 20 healthy persons and 20 persons with type I diabetes mellitus binding to an array of 87 different antigens, mostly self-antigens. The healthy subjects manifested autoantibodies to a variety of self-antigens, many known to be associated with autoimmume diseases. We investigated the patterns of these autoantibodies using a coupled two-way clustering algorithm developed for analyzing data from gene arrays. We now report that the reactivity patterns of autoantibodies to particular subsets of self-antigens exhibited non-trivial structure, which significantly discriminated between healthy persons and persons with type I diabetes. The results show that despite the wide prevalence of autoantibodies, the patterns of reactivity to defined subsets of self-antigens can provide information about the state of the body. (C) 2003 Elsevier Ltd. All rights reserved.
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used. (C) 2002 Wiley-Liss, Inc.
We study the statistical properties of contact vectors, a construct to characterize a protein's structure. The contact vector of an N-residue protein is a list of N integers n(i), representing the number of residues in contact with residue i. We study analytically (at mean-field level) and numerically the amount of structural information contained in a contact vector. Analytical calculations reveal that a large variance in the contact numbers reduces the degeneracy of the mapping between contact vectors and structures. Exact enumeration for lengths up to N=16 on the three-dimensional cubic lattice indicates that the growth rate of number of contact vectors as a function of N is only 3% less than that for contact maps. In particular, for compact structures we present numerical evidence that, practically, each contact vector corresponds to only a handful of structures. We discuss how this information can be used for better structure prediction.
DNA chips are novel experimental tools that have revolutionized research in molecular biology and generated considerable excitement. A single chip allows simultaneous measurement of the level at which thousands of genes are expressed. A typical experiment uses a few tens of such chips, each focusing on one sample such as material extracted from a particular tumor. Hence the results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. I provide here a very basic introduction to the subject, with no prior knowledge of any biology assumed. I will explain what genes are, what is gene expression and how it is measured by DNA chips. I will also explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments. I will present results obtained from analysis of data obtained from brain tumors and breast cancer.
We introduce and study an artificial neural network inspired by the probabilistic receptor affinity distribution model of olfaction. Our system consists of N sensory neurons whose outputs converge on a single processing linear threshold element. The system's aim is to model discrimination of a single target odorant from a large number p of background odorants within a range of odorant concentrations. We show that this is possible provided p does not exceed a critical value p(c) and calculate the critical capacity a(c) = p(c)/N. The critical capacity depends on the range of concentrations in which the discrimination is to be accomplished. If the olfactory bulb may be thought of as a collection of such processing elements, each responsible for the discrimination of a single odorant, our study provides a quantitative analysis of the potential computational properties of the olfactory bulb. The mathematical formulation of the problem we consider is one of determining the capacity for linear separability of continuous curves, embedded in a large-dimensional space. This is accomplished here by a numerical study, using a method that signals whether the discrimination task is realizable, together with a finite-size scaling analysis.
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters that are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher-dimensional data, including gene microarray expression data.
We discovered that the distribution of dividends in Korean horse races follows a power law. A simple model of betting is proposed, which reproduces the observed distribution. The model provides a mechanism to arrive at the true underlying winning probabilities, which are initially unknown, in a self-organized collective fashion, through the dynamic process of betting. Numerical simulations yield excellent agreement with the empirical data.
We study in d = 3 dimensions the short-range Ising spin glass with J(ij) = +/-1 couplings and periodic boundary conditions at T = 0. We show that the overlap distribution is non-trivial in the limit of large system size.
Unbiased samples of ground states were generated for the short-range Ising spin glass with J(ij) = +/-1, in three dimensions. Clustering the ground states revealed their hierarchical structure, which is explained by correlated spin domains, serving as cores for macroscopic zero energy "excitations."
We generate equilibrium configurations for the three- and four-dimensional Ising spin glass with Gaussian distributed couplings at temperatures well below the transition temperature T(c). These states are analyzed by a recently proposed method using clustering. The analysis reveals a hierarchical state space structure. At each level of the hierarchy states are labeled by the orientations of a set of correlated macroscopic spin domains. Our picture of the low temperature phase of short-range spin glasses is that of a state hierarchy induced by correlated spin domains (SHICS). The complexity of the low temperature phase is manifest in the fact that the composition of such a spin domain (i.e., its constituent spins), as well as its identifying label, are defined and determined by the "location" in the state hierarchy at which it appears. Mapping out the phase space structure by means of the orientations assumed by these domains enhances our ability to investigate the overlap distribution, which we find to be nontrivial. Evidence is also presented that these states may have a nonultrametric structure.
The transcriptional program regulated by the tumor suppressor p53 was analysed using oligonucleotide microarrays. A human lung cancer cell line that expresses the temperature sensitive murine p53 was utilized to quantitate mRNA levels of various genes at different time points after shifting the temperature to 32 degreesC. Inhibition of protein synthesis by cycloheximide (CHX) was used to distinguish between primary and secondary target genes regulated by p53. In the absence of CHX, 259 and 125 genes were up or down-regulated respectively; only 38 and 24 of these genes were up and down-regulated by p53 also in the presence of CHX and are considered primary targets in this cell line. Cluster analysis of these data using the super paramagnetic clustering (SPC) algorithm demonstrate that the primary genes can be distinguished as a single cluster among a large pool of p53 regulated genes. This procedure identified additional genes that co-cluster with the primary targets and can also be classified as such genes, In addition to cell cycle (e.g. p21, TGF-beta, Cyclin E) and apoptosis (e.g, Fas, Bak, IAP) related genes, the primary targets of p53 include genes involved in many aspects of cell function, including cell adhesion (e.g, Thymosin, Smoothelin), signaling (e.g. H-Ras, Diacylglycerol kinase), transcription (e.g. ATF3, LISCH7), neuronal growth (e.g. Ninjurin, NSCL2) and DNA repair (e.g. BTG2, DDB2), The results suggest that p53 activates concerted opposing signals and exerts its effect through a diverse network of transcriptional changes that collectively alter the cell phenotype in response to stress.
High-density DNA arrays, used to monitor gene expression at a genomic scale, have produced vast amounts of information which require the development of efficient computational methods to analyze them. The important first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of a novel clustering algorithm, super-paramagnetic clustering (SPC) to analysis of gene expression profiles that were generated recently during a study of the yeast cell cycle. SPC was used to organize genes into biologically relevant clusters that are suggestive for their co-regulation. Some of the advantages of SPC are its robustness against noise and initialization, a clear signature of cluster formation and splitting, and an unsupervised self-organized determination of the number of clusters at each resolution. Our analysis revealed interesting correlated behavior of several groups of genes which has not been previously identified. (C) 2000 Elsevier Science B.V, All rights reserved.
A simple model for flowing sand on an inclined plane is introduced. The model is related to recent experiments by Douady and Daerr and reproduces some of the experimentally observed features. Avalanches of intermediate size appear to be compact, placing the critical behavior of the model into the universality class of compact directed percolation. On very large scales, however, the avalanches break up into several branches, leading. to a crossover from compact to ordinary directed percolation, Thus, systems of flowing granular matter on an inclined plane could serve as a first physical realization of directed percolation.
Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on finding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the difference between the native energy and the average energy of the decoys, measured in terms of the width of the decoys' energy distribution (Z-score). Whereas the perceptron method is sensitive mainly to "outlier" (i.e., extremal) decoys, the Z-score optimization is governed by the high density regions in decoy-space. We compare the two methods by deriving contact energies for two very different sets of decoys: the first obtained for model lattice proteins and the second by threading. We find that the potentials derived by the two methods are of similar quality and fairly closely related. This finding indicates that standard, naturally occurring sets of decoys are distributed in a way that yields robust energy parameters (that are quite insensitive to the particular method used to derive them). The main practical implication of this finding is that it is not necessary to fine-tune the potential search method to the particular set of decoys used. Proteins 2000;41:192-201, (C) 2000 Wiley-Liss, Inc.
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied one can use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) if M-p, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance R-c, below which two residues are considered to be in contact. Another important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parameterized in a way that satisfies our basic requirement. As a demonstration of this, we determine the region in the (R-c, M-p) plane in which the problem is solvable, i.e., we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading. Proteins 2000;38:134-148. (C) 2000 Wiley-Liss, Inc.
We analyzed several energy functions for predicting the native state of proteins from an energy minimization procedure. We derived the parameters of a given energy function by imposing the basic requirement that the energy of the native conformation of a protein is lower than that of any conformation chosen from a set of decoys. Our work is motivated by a recent result which proved that the simple pairwise contact approximation of the energy is insufficient to satisfy simultaneously such a basic requirement for all the proteins in a database, Here, we investigate the reasons of such negative results and show how to improve the predictive power of methods based on energy minimization. We generated decoys by gapless threading, and we derive energy parameters by perceptron learning. We first considered hydrophobic contributions to the energy, defined in several ways, and showed that the additional hydrophobic terms enlarge slightly the number of proteins that can be stabilized together. Next, we performed various modifications of the pairwise energy term. We introduced (1) a distinction between inter-residue contacts on the surface and in the core of a protein and (2) a simple distance-dependent pairwise interaction in which a two-tier definition of contact replaces the original (single-tier) one. Our results suggest that a detailed treatment of the pairwise potential is likely to be more relevant than the consideration of other forces. (C) 2000 Wiley-Liss, inc.
We represent a protein's structure by its contact map. Our aim is to identify the unknown fold of a known sequence by minimizing a (free) energy defined in the space of contact maps. To this end, we developed an efficient method to search this space and to generate low energy maps that are also physical. We proved that the standard pairwise approximation to the free energy is unable to stabilize the native fold of a single protein against a set of carefully generated decoys. (C) 2000 Elsevier Science B.V. All rights reserved.
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis, Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.
We introduce and investigate a simple model to describe recent experiments by Douady and Daerr on flowing sand. The model reproduces experimentally observed compact avalanches, whose opening angle decreases linearly as a threshold is approached. On large scales the model exhibits a crossover from compact directed percolation to directed percolation; we predict similar behavior for the experimental system. We estimate the regime where "true" directed percolation morphology and exponents will be observed, providing the first experimental realization for this class of models.
Changing a few contacts in a contact map corresponds to a large scale move in confrontation space; hence, one gains a lot by using the contact map representation for protein folding. We developed an efficient search procedure in the space of physical contact maps, which could identify the native fold as of the lowest free energy, provided on had a free energy function whose ground state is the native map. We prove rigorously that the widely used pairwise contact approximation to the free energy cannot stabilize even a single protein's native map. Testing the native map against a set of decoys obtained by gapless threading, one may be misled to the opposite conclusion. [S0031-9007(98)08231-3].
A contact map is a simple representation of the structure of proteins and other chainlike macromolecules. This representation is quite amenable to numerical studies of folding. We show that the number of contact maps corresponding to the possible configurations of a polypeptide chain of N amino acids, represented by (N - 1)-step self-avoiding walks on a lattice, grows exponentially with N for all dimensions D > 1. We carry out exact enumerations in D = 2 on the square and triangular lattices for walks of up to 20 steps and investigate various statistical properties of contact maps corresponding to such walks.. We also study the exact statistics of contact maps generated by walks on a ladder. [S1063-651X(99)10101-6].
Clustering is an important technique in exploratory data analysis, with applications in image processing, object classification, target recognition, data mining etc. The aim is to partition data according to natural classes present in it, assigning data points that are "more similar" to the same "cluster". We solved this ill-posed problem without making any assumptions about the structure of the data, by using a physical system as an analog computer. The physical system we use is a disordered (granular) magnet. The method was tested successfully on a variety of artificial and real-life problems, such as classification of flowers, processing of satellite images, speech recognition and identification of textures and images. We are currently involved in several collaborations, applying the method to image classification, fMRI data analysis and classification of protein structures.
The aim of clustering is to partition data according to natural classes present in it. We proposed recently a method that makes no explicit assumption about the structure of the data and under very general and natural assumptions solves the clustering problem by evaluating thermal properties of a disordered (granular) magnet. The method was tested successfully on a variety of artificial and real-life problems; here we emphasize its application to analyze results obtained by a novel method of computer vision. The combination of these two techniques provides a powerful tool that succeeded to cluster properly 90 images of 6 objects on the basis of their pairwise dissimilarities. These dissimilarities, which constitute a highly non-metric set of pairwise distances between the images, form the input for clustering. A hierarchical organization of the images that agrees with human intuition, was obtained without assigning to the images coordinates in some abstract space. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
We studied the possibility to approximate a Lennard-Jones interaction by a pairwise contact potential. First we used a Lennard-Jones potential to design off-lattice, protein-like heteropolymer sequences, whose lowest energy (native) conformations were then identified by molecular dynamics. Then we turned to investigate whether one can find a pairwise contact potential, whose ground states are the contact maps associated with these native conformations. We show that such a requirement cannot be satisfied exactly, i.e., no such contact parameters exist. Nevertheless, we found that one can find contact energy parameters for which an energy minimization procedure, acting in the space of contact maps, yields maps whose corresponding structures are close to the native ones. Finally, we show that when these structures are used as the initial point of a molecular dynamics energy minimization process, the correct native folds are recovered with high probability. Proteins 1999;37:544-553. (C) 1999 Wiley-Liss, Inc.
Damage spreading for Ising cluster dynamics :is investigated numerically by using random numbers in a way that conforms with the notion of submitting the two evolving replicas to the same thermal noise. Two damage spreading transitions are found; damage does not spread either at low or high temperatures. We determine some critical exponents at the high-temperature transition point, which seem consistent with directed percolation.
We present a systematic quasi-mean-field model of the Ostwald ripening process in two dimensions. Our approach yields a set of dynamic equations for the temporal evolution of the minority phase droplets' radii. The equations contain only pairwise interactions between the droplets; these interactions are evaluated in a mean-field-type manner. We proceed to solve numerically the dynamic equations for systems of tens of thousands of interacting droplets. The numerical results are compared with the experimental data obtained by Krichevsky and Stavans [Phys. Rev. Lett. 70, 1473 (1993); Phys. Rev. E 52, 1818 (1995)] for the relatively large volume fraction phi=0.13. We found good agreement with experiment even for various correlation functions. [S1063-651X(98)04402-X].
Ostwald ripening is the last stage of the evolution of a system with two coexisting phases. It is a relatively simple nonequilibrium phenomenon with several interesting features. For example, as the system coarsens it goes through a scaling state, one which looks the same (up to an overall length scale, which grows) at all times. The dynamics of the problem can be mapped, in two dimensions, onto an evolving Coulomb system. In this work we present a brief summary of a novel theoretical approach to this problem, based on an analytic derivation (using a mean-field approach) of an effective two-body interaction between droplets of the minority phase. Thr resulting interacting many-body dynamics is solved by a very efficient numerical algorithm, allowing us to follow the evolution of more than 10(6) droplets on a simple workstation. The results are in excellent agreement with recent experiments.
The physical aspects of a recently introduced method for data clustering are considered in detail. This method is based on an inhomogeneous Potts model; no assumption concerning the underlying distribution of the data is made. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations (measured by Monte Carlo computations) serve to partition the data points into clusters. In this paper we examine the effects of varying different details of the method such as the definition of neighbors, the type of interaction, and the number of Potts states q. In addition, we present and solve a granular mean field Potts model relevant to the clustering method. The model consists of strongly coupled groups of spins coupled to noise spins, which are themselves weakly coupled. The phase diagram is computed by solving analytically the model in various limits. Our main result is that in the range of parameters of interest the existence of the superparamagnetic phase is independent of the ordering process of the noise spins. Next we-use the known properties of regular and inhomogeneous Potts models in finite dimensions to discuss the performance of the clustering method. In particular, the spatial resolution of the clustering method is argued to be connected to the correlation length of spin fluctuations. The behavior of the method, as more and mote data points are sampled, is also investigated.
We simulated site dilute Ising models in d = 3 dimensions for several lattice sizes L. For each L singular thermodynamic quantities X were measured at criticality and their distributions P(X) were For L --> infinity the relative width of P(X) tends to a universal constant: there is no self-averaging. The width of the distribution of the sample (i) dependent pseudocritical temperatures T-c(i,L) scales as delta T-c(L) similar to L-1/nu and not as similar to L-d/2. The sample dependence of X-i(T,L) enters dominantly, but not exclusively, via T-c(i,L).
Clustering is an important technique in exploratory data analysis, with applications in image processing, object classification, target recognition, data mining etc. The aim is to partition data according to natural classes present in it, assigning data points that are "more similar" to the same "cluster". We solved this ill-posed problem without making any assumptions about the structure of the data, by using a physical system as an analog computer. The physical system we use is a disordered (granular) magnet. The method was tested successfully on a variety of artificial and real-life problems, such as classification of flowers, processing of satellite images, speech recognition and identification of textures and images.
We discuss the results of our attempt to predict the folded state of proteins by using the contact map representation. In order to use contact maps, one has to solve a few methodological problems. First, we propose an efficient way to explore the space of contact maps to generate creditable candidates for the native state. Second, we introduce a procedure to ensure that the generated maps are physical. Third, and most important, we present a method to derive contact energy parameters based on perceptron learning. The energy function must be able to discriminate between the native state and a set of decoys, simultaneously for all the proteins in a given database. We show that such an energy function exists if the candidates are produced by gapless threading, for a database of 153 proteins. If, however, we use as decoys maps of low energy that are generated by our dynamical scheme, not even one single protein can be stabilized. We conclude by discussing the perspectives of our approach.
We evaluate by Monte Carlo simulations various singular thermodynamic quantities X for ensembles of quenched random Ising and Ashkin-Teller models. The measurements are taken at T-c and we study how the distributions P(X) (and, in particular, their relative squared width, R-X) over the ensemble depend on the system size l. The Ashkin-Teller model was studied in the regime where bond randomness is irrelevant and we found weak self-averaging; R(X)similar to l(alpha/nu)-->0, where alpha
Background: Two problems are of major importance in protein fold prediction: how to generate plausible conformations, and how to choose an energy function to identify the native state. Contact maps are a simple representation of protein structure and offer a promising framework to address these two issues. Results: In this work we develop Monte Carlo dynamics in contact map space. The procedure is divided into four steps: non-local dynamics, in which large-scale 'cluster' moves are performed (clusters are in approximate correspondence with secondary structure elements); local dynamics, in which secondary structure location is optimized; reconstruction, in which the physicality of the contact map is restored; and refinement, which consists of a further Monte Carlo energy minimization in real space. We demonstrate that such a dynamical procedure is effective in producing uncorrelated low-energy states. Conclusions: The procedure introduced in this paper very effectively generates a representative ensemble of conformations, We are able to show that existing sets of pairwise contact energy parameters are not suitable to single out the native state within this ensemble. The remaining outstanding issue in protein folding is to find an energy function that can discriminate the native state from decoys.
We demonstrate that pairwise contact potentials alone cannot be used to predict the native fold of a protein. Ideally, one would hope that a universal energy function exists, for which the native folds of all proteins are the respective ground states. Here we pose a much more restricted question: Is it possible to find a set of contact parameters for which the energy of the native contact map of a single protein (crambin) is lower than that of all possible physically realizable decoy maps? The set of maps we used was derived by energy minimization (not by threading). We seek such a set of parameters by perceptron learning, a procedure which is guaranteed to find such a set if it exists. We found that it is impossible to fine-tune contact parameters that will assign all alternative conformations higher energy than that of the native map. This finding proves that there is no pairwise contact potential that can be used to fold any given protein. Inclusion of additional energy terms, such as hydrophobic (solvation), hydrogen bond, or multibody interactions may help to attain foldability within specific structural families. (C) 1998 American Institute of Physics. [S0021-9606(98)50247-4].
We present two interesting results regarding damage spreading in ferromagnetic Ising medals. First, we show that a damage spreading transition can occur in an Ising chain that evolves in contact with a thermal reservoir. Damage heals at low temperature and spreads at high T. The dynamic rules for the system's evolution for which such a transition is observed are as legitimate as the conventional rules (Glauber, Metropolis, heat bath). Our second result is that such transitions are not always in the directed percolation universality class.
Background: Prediction of a protein's structure from its amino acid sequence is a key issue in molecular biology. While dynamics, performed in the space of two-dimensional contact maps, eases the necessary conformational search, it may also lead to maps that do not correspond to any real three-dimensional structure. To remedy this, an efficient procedure is needed to reconstruct three-dimensional conformations from their contact maps. Results: We present an efficient algorithm to recover the three-dimensional structure of a protein from its contact map representation. We show that when a physically realizable map is used as target, our method generates a structure whose contact map is essentially similar to the target. Furthermore, the reconstructed and original structures are similar up to the resolution of the contact map representation. Next, we use nonphysical target maps, obtained by corrupting a physical one; in this case, our method essentially recovers the underlying physical map and structure. Hence, our algorithm will help to fold proteins, using dynamics in the space of contact maps. Finally, we investigate the manner in which the quality of the recovered structure degrades when the number of contacts is reduced. Conclusions: The procedure is capable of assigning quickly and reliably a three-dimensional structure to a given contact map, it is well suited for use in parallel with dynamics in contact map space to project a contact map onto its closest physically allowed structural counterpart.
We investigate the long-time behavior of the survivors' area in the scaling state of two-dimensional soap froth. We relate this problem to the recently studied temporal decay of the fraction of Potts spins that have never been flipped till time t. The results of our topological simulations are consistent with the value theta = 1 for the scaling exponent of the survivors' areas, in agreement with a recently obtained analytical result. We find, however, that the relaxation time needed to get into the scaling regime depends on the degree of randomness in the topological rearrangements and becomes very large in the deterministic limit.
We present a general definition of damage spreading in a pair of models. Using this general framework. one can define damage spreading in an objective manner that does not depend on the particular dynamic procedure that is being used. The formalism can be used for any spin-model or cellular automaton, with sequential or parallel update rules. At this point we present its application to the Domany-Kinzel cellular automaton in one dimension, this being the simplest model in which damage spreading has been found and studied extensively. We show that the active phase of this model consists of three subphases characterized by different damage-spreading properties.
We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases, At very low temperatures, it is completely ordered; all spins are aligned. At very high temperatures, the system does not exhibit any ordering, and in an intermediate regime, clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.
We summarize some recent developments in approximate descriptions of soap froth evolution in 2D. The questions addressed concern temporal correlations in the scaling state, characterization of transient behavior and evolution from very special initial states. We observed that for these delicate issues mean field theory fails; moreover, we found that widely accepted topological simulation methods also disagree significantly with experiments. We identified the source of these discrepancies in the manner the models select the topological rearrangement that follows a bubble's disappearance. Properly modified topological simulations do yield agreement with experiments. Our analysis allows identification of the manner in which different aspects of the microscopic dynamics affect the long time behavior of the system.
We introduce an energy function for contact maps of proteins. In addition to the standard term, that takes into account pairwise interactions between amino acids, our potential contains a new hydrophobic energy term. Parameters of the energy function were obtained from a statistical analysis of the contact maps of known structures. The quality of our energy function was tested extensively in a variety of ways. In particular, fold recognition experiments revealed that for a fixed sequence the native map is identified correctly in an overwhelming majority of the cases tested. We succeeded in identifying the structure of some proteins that are known to pose difficulties for such tests (BPTI, spectrin, and cro-protein). In addition, many known pairs of homologous structures were correctly identified, even when the two sequences had relatively low sequence homology, We also introduced a dynamic Monte Carlo procedure in the space of contact maps, taking topological and polymeric constraints into account by restrictive dynamic rules. Various aspects of protein dynamics, including high-temperature melting and refolding, were simulated. Perspectives of application of the energy function and the method for structure checking and fold prediction are discussed. (C) 1996 Wiley-Liss, Inc.
Simulations performed using a recently introduced deterministic topological model do not agree with some very recent results concerning the evolution of a single perturbed cluster. We analyze the source of the discrepancy and introduce a topological model that is in very good agreement with experiments and simulations available up to now.
The dynamic behavior of cluster algorithms is analyzed in the classical mean-field limit. Rigorous analytical results below T-c establish that the dynamic exponent has the value z(SW)=1 for the Swendsen-Wang algorithm and z(W)=0 for the Wolff algorithm. An efficient Monte Carlo implementation is introduced, adapted for using these algorithms for fully connected graphs. Extensive simulations both above and below T-c demonstrate scaling and evaluate the finite-size scaling function by means of a rather impressive collapse of the data.
We present a new approach for clustering, based on the physical properties of an inhomogeneous ferromagnetic model. We do not assume any structure of the underlying distribution of the data. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations, measured (by Monte Carlo procedure) in a superparamagnetic regime in which aligned domains appear, serve to partition the data points into clusters. Our method outperforms other algorithms for toy problems as well as for real data.
We investigate an antiferromagnetic triangular Ising model with anisotropic ferromagnetic interactions between further neighbors, originally proposed by Kitatani and Oguchi [J. Phys. Sec. Jpn. 57, 1344 (1988)]. The phase diagram as a function of temperature and the ratio between first- and second-neighbor interaction strengths is thoroughly examined. We search for a Kosterlitz-Thouless transition to a state with algebraic decay of correlations, calculating the correlation lengths on strips of width up to 15 sites by transfer-matrix methods. Phenomenological renormalization, conformal invariance arguments, the Roomany-Wyld approximation, and a direct analysis of the scaled mass gaps are used. Our results provide limited evidence that a Kosterlitz-Thouless phase is present. Alternative scenarios are discussed.
We introduce a modified topological model for the evolution of two-dimensional soap froth. The topological rearrangement associated with a T2 process is deterministic; the final topology depends on the areas of the neighbouring cells. The new model gives agreement with experiments in the transient regime, where the previous models failed qualitatively, and also improves agreement in the scaling state.
We consider the sample to sample fluctuations that occur in the value of a thermodynamic quantity P in an ensemble of finite systems with quenched disorder, at equilibrium. The variance of P, V-P, which characterizes these fluctuations is calculated as a function of the systems' linear size l, focusing on the behavior at the critical point. The specific model considered is the bond-disordered Ashkin-Teller model on a square lattice [Phys. Rev. 64, 178 (1943)]. Using extensive Monte Carlo simulations, several bond-disordered Ashkin-Teller models were examined, including the bond-disordered Ising model and the bond-disordered four-state Potts model. It was found that far from criticality all thermodynamic quantities which were examined (energy, magnetization, specific heat, susceptibility) are strongly self-averaging, that is V-P similar to l(-d) (where d = 2 is the dimension). At criticality though, the results indicate that the magnetization M and the susceptibility chi are nonself-averaging, i.e., V-x/X(2), V-M/M(2) negated right arrow 0. The energy E at criticality is clearly weakly self-averaging; that is V-E similar to l(-y upsilon) with 0 infinity) to the bond-disordered Ashkin-Teller model where alpha/v = 0+. Nonetheless in the accessible range of lattice sizes we found very good agreement between the theory and the data for V-chi and V-E. The theory may also be compatible
We consider two-layered perceptrons consisting of N binary input units, K binary hidden units and one binary output unit, in the limit N >> K greater than or equal to 1. We prove that the weights of a regular irreducible network are uniquely determined by its input-output map up to some obvious global symmetries. A network is regular if its K weight vectors from the input layer to the K hidden units are linearly independent. A (single layered) perceptron is said to be irreducible if its output depends on every one of its input units; and a two-layered perceptron is irreducible if the K + 1 perceptrons that constitute such network are irreducible. By global symmetries we mean, for instance, permuting the labels of the hidden units. Hence, two irreducible regular two-layered perceptrons that implement the same Boolean function must have the same number of hidden units, and must be composed of equivalent perceptrons.
Time-dependent correlations in the scaling state of an evolving two-dimensional soap froth are studied. In particular, we consider the topological distribution function of those cells that are destined to survive for long times. Experimental results are compared with mean-field based dynamic equations and with topological simulations.
We study the extent to which fixing the second-layer weights reduces the capacity and generalization ability of a two-layer perceptron. Architectures with N inputs, K hidden units, and a single output are considered, with both overlapping and nonoverlapping receptive fields. We obtain from simulations one measure of the strength of a network-its critical capacity, alpha(c). Using the ansatz tau(med) is-proportional-to (alpha(c) - alpha)-2 to describe the manner in which the median learning time diverges as alpha(c) is approached, we estimate alpha(c) in a manner that does not depend on arbitrary impatience parameters. The CHIR learning algorithm is used in our simulations. For K = 3 and overlapping receptive fields we show that the general machine is equivalent to the committee machine with the same architecture. For K = 5 and the same connectivity the general machine is the union of four distinct networks with fixed second layer weights, of which the committee machine is the one with the highest alpha(c). Since the capacity of the union of a finite set of machines equals that of the strongest constituent, the capacity of the general machine with K = 5 equals that of the committee machine. We were not able to prove this for general K, but believe that it does hold. We investigated the internal representations used by different machines, and found that high correlations between the hidden units and the output reduce the capacity. Finally we studied the Boolean functions that can be realized by networks with fixed second layer weights. We discovered that two different machines implement two completely distinct sets of Boolean functions.
A cluster Monte Carlo algorithm for the Ashkin-Teller (AT) model is constructed according to the guidelines of a general scheme for such algorithms. Its dynamical behavior is tested for the square lattice AT model. We perform simulations on the line of critical points along which the exponents vary continuously, and find that critical slowing down is significantly reduced. We find continuous variation of the dynamical exponent z along the line, following the variation of the ratio of specific-heat and correlation-length exponents alpha/nu, in a manner which satisfies the Li-Sokal bound Z(cluster) greater-than-or-equal-to alpha/nu, that was so far proved only for Potts models.
We consider an exclusion process with particles injected with rate alpha at the origin and removed with rate beta at the right boundary of a one-dimensional chain of sites. The particles are allowed to hop onto unoccupied sites, to the right only. For the special case of alpha = beta = 1 the model was solved previously by Derrida et al. Here we extend the solution to general alpha, beta. The phase diagram obtained from our exact solution differs from the one predicted by the mean-field approximation.
A simple asymmetric exclusion model with open boundaries is solved exactly in one dimension. The exact solution is obtained by deriving a recursion relation for the steady state: if the steady state is known for all system sizes less than N, then our equation (8) gives the steady state for size N. Using this recursion, we obtain closed expressions (48) for the average occupations of all sites. The results are compared to the predictions of a mean field theory. In particular, for infinitely large systems, the effect of the boundary decays as the distance to the power - 1/2 instead of the inverse of the distance, as predicted by the mean field theory.
We have developed a cluster algorithm for Monte Carlo simulations of the fully frustrated Ising model on the square lattice. The method does not suffer from problems of metastability, and is extremely efficient even at T = 0, as we demonstrated previously. Here we describe results of more extensive simulations of the model at T = 0 which allow us to extract the effective dynamical critical exponent of the algorithm, z almost-equal-to 0.55, as well as the static exponent-eta = 0.5. We also provide an argument that explains why our method works.
Several cluster Monte Carlo methods were developed recently and proved to be very efficient in accelerating simulations of various models. We present a general cluster method for Monte Carlo simulations that unifies many of the previously developed algorithms. Our general scheme satisfies the detailed-balance condition, and may therefore serve as a framework for developing new cluster acceleration techniques.
We report calculations of the critical capacity of perceptrons that are subject to pattern-dependent noise. We also calculate the capacity of a perceptron whose weights take values on a shifted sphere, and show how this system resembles 'noisy' behaviour in some limits.
The time evolution of a wide variety of physical systems exhibiting two-dimensional cellular structures has recently been studied and found to lead to a universal distribution x(l) of the number of sides, l, of the cells. A simple model for the evolution of these structures is presented and analysed. The model exhibits a one-parameter family of fixed-point distributions x(l)*(sigma). Within this model, universality is maintained by a mechanism in which a particular marginally stable fixed point is selected. The predictions of the model compare well with experimental observations in soap froths.