The flashcards below were created by user
on FreezingBlue Flashcards.
What are the major classes of DNA sequences that make up the human genome? See figure 9.1.
- The human genome is comprised of two components: Nuclear genome and mitochondrial genome.
- Nuclear genome:
- 3.1 billion bps (Gb) ~ 93.5% euchromatin (i.e. actively transcribed)
- Remainder = constitutive heterochromatin
- Highly conserved sequences are protein-coding genes (1.1%) and RNA genes (4%), regulatory sequences.
- 45 % of our genome are sequences that either currently or at one time evolutionary derived from transposons, and they can also amplify themselves – transposon based repeats (45%). About 6.5% is constituitive heterocromatin, no identifiable genes, always in compactive state. 44% is other sequences, we don’t know what they do. One of the purposes of the ENCODE was to begin identifying what is all that stuff. Focus will be here!
- Studies have also shown even though we don’t know the function of our genome >90% is transcribed.
DNA and human genome table
- Not a lot code for protein, although a lot of genes are expressed.
- Poorly conserved sequences are transposon-based repeats, heterochromatin and other sequences.
- Table summarizes important stuff. Genome is organized in 23 pairs of chromosomes.
- Big surprise out of human genome project is total number of protein coding genes. ~21000 is most recent estimate. Total number of genes that only produce RNA, prior to the human genome project we knew about rRNA, tRNA, smRNA’s, but we discovered that there were whole masses of genes that final product is not a protein, estimate is 6000.
What is a karyogram? What are the rules used for ordering and orientation of chromosomes in a karyogram?
- Karyogram of a human genome, 23 pairs of chromosomes, diploid
- range from 1-22 autosomes (same for males and fameless), carry most genetic info
- and X and Y (different for individual).
- organized based on the size – Largest to Smallest, and grouped based on centromere placemet.
- Divided into 7 groups of chromosomes.
- First one 1-3, metacentric,
- 2nd pairs 4 and 5 similar in size to 1-3 but centromere is more off center, with distinct arms on both sides.
- 6-12 and X, mostly submetacentric,
- 13-15 centromere placement subtelocentric chromosome.
- 16-18 and 19-20 and 21-22 and Y are the smallest.
- Shorter arm=p arm and in a karyogram is always at the top. Longer arm is q arm. P arm is always on the top, and q in the bottom.
- Karyogram wil be specific for each species. Bunch of different ways to stain chromosomes, different band pattern, etc.
Largest vs smallest chromosome size and expression
- 1 is largest 249MB,
- smallest 21 only 51MB of DNA.
- They also differ by euchromatin or heterochromatin. Most chromosome the bulk is euchromatin, genes can potentially be expressed. Heterochrmoatic are not typically expressed. Y chromosome over 50% is heterochromatic. Very few genes have been mapped to Y chromosome. SRY sex related locus on the Y chromosome (male).
Discuss the roles that tandem duplications and transposons have played in the evolution of eukaryotic genomes, including the human genome.
- Gene duplications and chromosome rearrangements have occurred a lot over the course of evolution. Gene duplication range from relatively short non coding sequences to large sequences that may include genes or lage parts of genes.
- Ie: Amylase gene, 1-6 copies of gene per genome…with number of copies varying between individuals, is this due to a diet or?
- 1.1% of our genome is protein coding.
- Very few genes in our genomes are considered unique. They arose via 1. gene duplication and 2. exon shuffling.
- typically arise through unequal crossing over. *repetitive DNA sequences allow misalignment of homologues.
- If the pairing is not exact you can get a crossover band, ending up with one of chromosomes having two copies of ‘A’, 3 copies of repeat. The other chromosome would have just a single copy of repeat element. They will end up in different gametes, if passed on to offspring the first one provides extra copy of ‘A’, but if second passes the embryo will not survive due to LACK of ‘A’
- Transposable elements create REPEAT elements in our genome, creating conditions for UNEQUAL crossing over.
- LINES (long interspersed elements) and SINES = 30% of genome
- LTR and DNA transposons
- All transposons are 40-45% of genome
- 2 modes of transposition:
- Copy & Paste: duplication of genes between transposons on same chromosome
- Cut & Paste: occurs with DNA transposons, integrates into NEW Location
- Most “single” genes are members of gene families.
- family members may be clustered or dispersed in the genome.
- The extent of sequence and structural identity/similarity can vary significantly between gene families.
- Gene families where conservation extends throughout the coding regions or multiple functional domains:
- - tubulin and actin genes, histone genes
- - motor protein genes (myosins,kinesins and dyneins), cell surface receptor genes
- Groups of gene that share functional properties but often have little sequence similarity.
- Ig (immunoglobulin) – includes genes that encode antibodies AS WELL AS those having Ig-like domains.
- GPCR (G-protein coupled receptor) also known as 7 helix transmembrane receptors. All activate intracellular signaling cascades via binding and activation of trimeric G-proteins. One subfamily =odorant receptors, ~900 members.
What are retrogenes and how are they distinguished from other genes?
- RETROGENES: genes that appear to have arisen via retrotransposition. (DNA gene copied back from RNA by reverse transcription)
- Highly similar to other genes in genome, are expressed, but lack introns found in “parent” gene.
- Retrogenes = could code for a protein.
What are transposons? What are the four major classes of transposable elements found in the human genome? Which method of transposition do they utilize? Briefly describe the copy and paste method of transposition.
- Four major classes identified in human genome.
- (ALL Transposons together make up 45% of our genome)
- 1. LINES – largest transposons.
- all autonomous, they could transpose under the appropriate conditions
- transposone is 6-8kb
- ORF, ORF2 encodes polymerase (reverse transcriptase), they replicate themselves in this manner and spread to the genome.
- Transcribed by RNA polymerase
- 2. SINES – three classes of these: ALU, MIR and MIR3.
- nonautonomous, but you can also get new copies of these throughout the genome.
- (retro-transposase or retro –polymerase) can make DNA copy from one of these SINE elements.
- Transcribed by RNA polymerase
- 3. LTR (retrovirus-like)- two classes: HERV and MaLR.
- less common in the genome
- Nonautonomous when NO pol, but will often encode pol, and that distinguish whether they are autonomous or nonautonomous.
- Autonomous version have intact pol, nonautonomous does not.
- Gag, pol and env gene identify that these elements evolved from retroviruses.
- Move via COPY and paste
- 4. DNA transposons are thought to be fossil elements.
- Most don’t move around, but original ones include transposase.
- Move via CUT and paste mechanism.
- Transposase makes cuts in DNA molecule.
- DNA is intermediate, integrates into new location WITHOUT RNA intermediate(as is w/ others above)!
- Create repetitive elements in our genome that can create conditions for unequal crossing over to occur. Unequal –tandem; dispersed repeat sequences in the genome.
- piRNA: associate with particular class of proteins called pi proteins, primary function is to repress transposition in the germline. Only expressed in germline cells.
Transposition mode of copy and paste (appears most common in human genome) and cut and paste.
Copy and paste: duplication of genes located between transposons on same chromosome. Most extant transposons are defective and unable to transpose, but recent cases of LINE transpositions > child with hemophilia, 2 separate cases of muscular dystrophy and various reports of transpositions in specific cancers. Recent cases of SINE (alu element) insertions in BRCA2 familial breast cancer, factor IX- hemophilia B, ChE-acholinesterasemia, and NF1-neurofibromatosis.
One of the surprising findings coming from the human genome project and subsequent studies is the large number of genes that encode functional RNAs. What is meant by the term “functional RNA”? What are some examples of functional RNA genes (gene classes) and what are the functions of each class?
- Functional RNA may be involved in enzymatic/effector roles.
- Non-coding RNA (ncRNA) is a functional RNA molecule that is NOT translated into a protein.
- Examples of functional RNA are:
- 1. Ribosomal RNA (rRNA): 16s and 18s RNA functions as part of the components of the mitochondrial and cytoplasmic ribosome, respectively.
- 2. Small Nuclear RNA (snRNA): involved in transcription regulation. U1, U2, U4, U5 process GU-AG introns.
- 3. Long regulatory RNAs: can act as trans-acting regulator: HOTAIR acts in trans(acts on some other part of a gene that gets transcribed).
What are the classes of RNA genes that are involved in gene regulation? At what level of gene regulation do they function? Note that the last will vary between different classes of regulatory RNA genes.
- - micro RNA (miRNA)~22nt: 1000 different types;
- o level of gene regulation that they function:
- • multiple important roes in gene regulation, notably in development and implicated in some cancers.
- • RNA based gene-silencing.
- - Piwi-binding RNA (piRNA)
- o level of gene regulation that they function:
- • Derived from repeats
- • expressed only in germ-line cells, where they LIMIT excess transposon activity
- - Endogenous short interfering RNA (endo-siRNA)
- o level of gene regulation that they function:
- • Often derived from pseudogenes, inverted repeats, etc.
- • Involved in gene regulation in somatic cells
- • Also be involved in regulating some types of transposon.
- - Long noncoding regulatory RNA;
- o level of gene regulation that they function:
- • involved in regulating gene expression;
- • some are involved in monoallelic expression and/or as antisense regulators.
During lecture I mentioned several surprising findings that came out of the human genome project. What were they and why were these findings surprising? You should be able to list at least three “surprises from the human genome project.
- 1. Big surprise out of human genome project: Only a small portion of our genome is made up of ‘protein-coding’ genes.
- a. Total number of protein coding genes. ~21000 is most recent estimate.
- b. These protein-coding genes are mostly members of gene families and superfamilies.
- c. Total number of genes that only produce RNA, prior to the human genome project we knew about rRNA, tRNA, smRNA’s, but we discovered that there were whole masses of genes that final product is not a protein, estimate is 6000.
- 2. On the other hand, Transposons make up a significant proportion of our genome (45%). These transposons, that work aid in gene duplication and chromosome rearrangements have occurred a lot over the course of evolution, and continue to occur, which may be reasoning for why gene therapies may not work, due to the unspecified amount of gene tandem duplication and transposition.
- a. Gene duplication range from relatively short non coding sequences. Amilase gene, number of copies varies between individuals, is this due to a diet or? Individuals ranged from having a single copy up to six copies per genome.
- 3. Also it is suprising how many genes encode for functional RNAs. This supports the theory that DNA evolved from RNA.
- a. Functions that these RNAs involve range from proving
- i. protein synthesis and support: mRNA, rRNA, tRNAs
- ii. RNA maturation: splicing (snRNA), base modification
- iii. DNA synthesis: TERC (RNA assoc. w/telomerase)
- iv. Gene regulation: micro, piwiRNA, endo-siRNA, noncoding regulatory RNA (explained in previous question.
- v. Transposon control: piRNA (suppression of more transposons, prevents mutations)
What was the purpose of the HapMap project and what was/is its’ projected/expected utility to human genetics research?
- Aka: Purpose of HapMap & projected utility to human genetics research
- Goal: to make a map with high density of SNP (single nucleotide polymorphism – point mutations) markers throughout genome
- 1. used to identify disease causing mutations: through looking for correlations b/w SNPs & ppl with certain phenotype/disease
- 2. specifically looking for genes involved in simple (involves just one gene – Mendelian inheritance) or complex diseases
- 3. also ID regions of genome that have undergone RECENT selection in order to: gain MORE info about human evolution.
What is a haplotype within the context of the HapMap project? What are tagSNPs and how are they used?
- Haplotype: 2 or more genes/alleles that tend to be co-inherited/show linkage disequilibrium (do NOT follow Mendelian pattern of inheritance). Usually they are on the SAME chromosome
- For Complex diseases: haplotype may include UNLINKED genes that MUST be co-inherited for disease to persist.
- TagSNPs: SNPs with a region with HIGH linkage disequilibrium. These are useful to identify specific genetic variation in genes/alleles WITHOUT identifying EVERY SNP in the chromosomal region.
- (Only have to sequence the area of these tagSNPs to identify which haplotype individual has, not needed to sequence all SNPs)
- Estimated that about 300-600,000 tagSNPs will give us a COMPLETE INFO re: individual’s genotype
****In genome wide association studies (GWAS) the term haplotype has a slightly different meaning. How is this term used in GWAS studies?
It is used to refer to 2 or more genes/alleles that are involved in association with a specific disease???
What is meant by the term Linkage Disequilibrium and how is it used in genome wide association studies? Within the context of these types of studies, what is meant by the term, association?
- Linkage disequilibrium: Situation where genes/alleles do NOT follow Mendelian pattern of inheritance.
- Measure of the degree of linkage between the genes and the associated disease/phenotype.
- The more TIGHT linked the genes that make contribution to disease/phenotype:the HIGHER level of linkage disequilibrium.
Briefly describe how the HapMap Project was done. What was the source of the genome samples, i.e. what was the composition of the study cohort?
- Began with looking at 269 individuals from 4 population groups:
- 1. 90 (30 trios) of Europeans
- 2. 90 (30 trios) from African tribe
- 3. 45 unrelated Chinese
- 4. 44 unrelated Japanese
- Looked for COMMON SNPs: site where minimum of 2 alleles are present in gene pool, and frequency of less common allele is @least 0.05
- Chose 10 regions (500kb each) over 7 chromosomes, sequenced in 48 individuals from the study.
- These SNP sites identified in these 48 were sequenced in all 269 ppl.
What were some of the major findings of the HapMap project?
- SNP density is HIGHER than expected: 1 SNP every 297bp on avg.
- SNPs Not just clustered in CODING regions. (interSNP distances typically LESS than 10kb)
- Amount of sharing of SNPs between European Ancestry (Utah), Yoruba African Tribe, Chinese and Japanese population are similar…BUT Chinese and Japanese have CLOSE relationship to each other compared to others
What are microarrays, how are they produced & how are they used?
- Microarray protocol:
- 1. Obtain genomic DNA sample (or reverse transcribe RNA) and biotin or fluorescently label
- 2. Hybridize to microarray (aka Gene chip w/millions of probes/features) of KNOWN DNA seq/alleles
- 3. Detect sites on microarray where sample has hybridized using FLUORESCENT probe
- Microarray chip details:
- sequences printed on glass wafers.
- Protective groups removed through light deprotection (only area where specific nucleotide, ie T, will be attached), then nucleotides (ie T) add onto linker molecule, and process repeats until about 25 nucleotides have attached to each sequence
- 25 nucleotides gives best specificity results
- Phosphate groups present to prevent branching between sequences, but removed at final step
- When any unprotected nucleotides are left after nucleotide addition step, they are CAPPED to prevent ‘mutant’ sequences
- Fluorescence from the sequences indicate binding has occurred to the feature
In the Styrkarsdottir et al paper the p-value they used to indicate statistical significance at the genome level was p ≤ 1.7 x 10-7. Why did they set their p-value for significance so low? Why didn’t they use the p ≤ 0.05 value that is commonly used in statistics?
The significance was set so low because the study was testing for an association between 301,019 SNPs and the bone mineral density of the hip and lumbar spine. All these 300,000+ SNPs, in order to be reported as statistically significant, needed to be accounted for in order to get the genomic wide significance. The p-value was then taken as 0.05 (standardly used p-value in stats) DIVIDED by 301,019 SNPs in order to give this genomic wide significance.
****What is one of the major limitations of the HapMap data? To put that another way, now that research groups are beginning to use the HapMap data to look for genes associated with complex diseases or traits, what is one of the “issues” that they have encountered?
- For some experiments, such as in the bone mineral density and fractures article, the sample sets included do not represent the entire species, but only subsets of the population.
- They also discovered that the contribution of each allele may be very small in relation to correlating it to bone mineral density.
- There may be a vast more amount of gene’s or loci tht may have SNPs that will also indicate a correlation, but have not been studied yet.
What are the functions of the different types of RNA polymerase in eukaryotic cells?
- RNA pol I: transcribes rRNA (structural component of translational machinery)
- pol II: TRANSCRIBES all PROTEIN coding and many FUNCTIONAL RNA genes, including snoRNAs and microRNAs
- Pol III: TRANSCRIBES tRNA and 5S rRNA genes
- Mitochondrial RNA pol: nuclear encoded protein, but is structurally and functionally related to bacteriophage RNA pol.
What are the different types of “common” promoter elements?
- Are these promoter elements absolutely required for transcription? What role(s) do the common promoter elements play in transcription regulation?
- Common promoter elements:
- 1. GC or CAAT box: binding sites for transcription factors
- 2. TATA box: facilitates BINDING of TBP (TATA-binding protein). Is in 32% of promoters
- 3. BRE: transciption factor recognition site for TFIIB, positions RNA pol. At start of site of transcription
- 4. Inr and DPE other core promoter elements
- Common promotor elements are NOT essential nor SUFFICIENT for initiation of transcription
What are the various components of the RNA pol II initiation complex and what are their functions?
- Composed of many transcription factors, as well as RNA pol II:
- TBP (TATA binding protein): recognizes TATA box
- TAF subunits: recognize DNA seq. near start point, also regulate DNA binding by TBP
- TFIIB: Recognizes BRE element, used to position RNA pol. @ start site of transcription
- TFIIF: stabilizes RNA pol. Interaction with TBP and TFIIB, also attracts other TFs (TFIIE and TFIIH)
- TFIIE: attracts and regulates TFIIH
- TFIIH: UNWINDS DNA @ transcription start point, phosphorylates Ser 5 of RNA pol C-terminal domain, also RELEASES RNA pol. from promoter
What are enhancers and silencers and how do they differ from the “common” promoter elements? Why can enhancers and silencers can be located 1000’s of bps away from the genes they regulate?
- Enhancers and repressors bind to proteins (transcription factors) to initiate or repress transcription.
- Enhancers and silencers are sequences that are located within the promoter region (may also be 1000’s of bps from start of transcription, in which case they can utilize DNA looping so the TFs that they are bound to can interact w/pol II and other TFs at start site of transcription.
- An enhancer can also interact as a repressor for a different gene than it acts as an enhancer for.
- Some enhancers may be involved with TFs that are needed to dimerize to function, and are done through an extensive regulatory complex through various developmental/environmental signals
What are the major & minor grooves of the DNA molecule? Discuss the significance of the major and minor grooves as they relate to gene regulation.
Major groove has its bases more accessible. Methyl groups lie in the MAJOR groove of DNA. Proteins such as transcription factors can also bind to the major groove, giving it an important role in gene regulation
What are the different components of eukaryotic chromatin, how do they interact and what is the importance of chromatin organization to eukaryotic organisms?
- Composed of multiple histones which aids in the condensing and looping of DNA (described in previous question).
- Euchromatin: DNA wrapped around histone, forming nucleosome (aka: beads on a string), fairly open form, higher gene regulation activity vs more compact form
- Euchromatin form: residues in histone tails affect nucleosome interactions, chromatin packing, etc.
- Heterochromatin: 30nm chromatin fibers, COMPACT form, less gene regulation activity (still accessible to transcription machinery)
What are the common types of histone modifications and what effect do they have on chromatin structure? What classes of enzymes are responsible for these modifications?
- Histones help to coil and pack the chromatin structure thereby affecting gene regulation
- Histone modifications are often interdependent (ie: phosphorylation of H3S10 promotes acetylation and INHIBITS meth. Of H3K9)
- 1. nucleosome: DNA wrapped around histones. These separated by linker DNA, which H1 histone interacts with
- 2. solenoid structure: 30nm chromatin fiber, although still accessible to transcription machinery
- 3. looped domain: loops of the 30nm chromatin fiber, which can uncoil to give high-gene expression
- Enzymes responsible for modifications:
- Acetylase: Lysine, when acetylated is open and more active.
- Deacetylase: Lysine, when hypoacetylated, coiled up and closed, inactive gene regulation
- Methylases and demethylases: Lysine and Arginine, with promoters..when methylated, low gene expression. and when demethylated, high gene expression
- Kinases and phosphatases: Serine, when phosphorylated (thru kinase) adds a phosphate group
What is DNA methylation (be specific) and how/why is it important to gene regulation?
- DNA methylation: occurs at specific nucleotides (CpG). The process involves converting the cytosine to 5MeC
- Methyl groups lie in the MAAJOR groove of DNA: can affect binding of activator and silencing proteins
- Interaction with MeCpG-binding proteins: play a role in chromatin STRUCTURE and gene REGULATION
- Also MeCpG important in epigenetics and genetic memory