Even more bioinformatics

The flashcards below were created by user lukemlj on FreezingBlue Flashcards.

  1. What is SMART?
    Simple Modular Architecture Research Tool
  2. At the 5' end of a DNA sequence, what is attached to 5' carbon?
    phosphate group
  3. What tool at SDSC returns multiple alignment sequences in color-coded rows/columns?
  4. What tool at SDSC performs multiple alignments?
  5. What are co-linear sequences?
    Sequences that have the same protein domains in the same order
  6. In the NCBI map viewer, what does the down arrow represent?
    positive strand
  7. What is BRCA1?
    A transcription factor
  8. What does “nr” stand for?
  9. What is “special” about the swissprot database?
    Every protein has been curated by a human
  10. What are two defintions of an ORF?
    • A reading frame that starts with ATG and ends with TAG, TGA, or TAA
    • A sequence that starts with ATG and ends with TAG, TGA, or TAA; and has a length that is a multiple of 3.
  11. What is the difference between a CDS and an ORF?
    CDS always comes from a gene unlike an ORF
  12. ___ CDSs are ORFs.
  13. ___ ORFs are CDSs.
  14. What are two databases of Uniprot?
    • Swissprot: manually curated
    • TrEMBL: automatically annotated
  15. What is sequence alignment?
    The procedure of comparing sequences by searching for a series of individual characters or patterns that are in the same order in the sequences.
  16. What can we infer from pairwise sequence alignment?
    Biological relationships from sequence similarity
  17. What can we infer from multiple sequence alignment?
    Sequence similarity from biological relationships
  18. What do you do pairwise alignment?
    When you suspect two sequences are homologous.
  19. What is "stringency" w.r.t. to dat matrices?
    Number of characters that have to match exactly in window
  20. How are internal repeats visualized in a dot matrix?
    Lines parallel to main diagonal
  21. How are inverted repeats visualized in a dot matrix?
    Lines perpendicular to main diagonal
  22. How are tandem repeats visualized in a dot matrix?
    Boxes composed of diagonal lines to main diagonal
  23. What algorithm is used for global alignments?
    Needleman Wunsch
  24. What modification of Needleman Wunsch is used for local alignments?
    Smith and Waterman.
  25. What does PAM stand for?
    Percent Accepted Mutation
  26. What does BLOSUM stand for?
    Block Substitution Matrix
  27. What are the three steps of ClustalW?
    • Pairwise alignment and calculate distance matrix
    • Create guide tree
    • Use tree to align sequences
  28. When is ClustalW use optimized?
    When sets of sequences are co-linear
  29. What website told us visually the role of BRCA1?
    Cancer Genome Anatomy Project (CGAP) - legend says transcription factor
  30. What website deals with PWMs and Logos?
  31. What is a retrovirus?
    Reverse transcriptase used to convert RNA to dsDNA
  32. When creating a log ratio table from a PWM, what ratio do you take?
  33. How is a log-odds table used?
    Compare sequence to log odd table and sum the matching values. Larger positive sums mean better matches.
  34. What is a logo?
    • A visual representation of a set of aligned sequences that indicates the positional preferences given by information theory.
    • It gives a visual representation of a motif.
  35. What website was used to contruct a PWM?
    Regulatory Sequence Analysis Tools (RSAT)
  36. When analyzing sequences with evolution in mind, is it the differences between them that we need to quantify and score?
    Yes, according to the text, these differences are summarized by “evolutionary or genetic distance.”
  37. What is the purpose of the phylogenetic tree representation?
    The purpose “is to summarize the key aspects of a reconstructed evolutionary history.”
  38. What are species trees?
    These are phylogenetic trees that show the hypothesized relationship between species based on their orthologous sequences.
  39. How are species trees constructed?
    • From the analysis of orthlogous sequences
    • morphological features used in traditional taxonomy
    • the presence of certain restriction sites in the DNA
    • the order of a particular set of genes in the genome
  40. Is the evolutionary history of a set of related genes always the same as that of the species from which the genes were selected?
  41. What is a speciation event?
    They are events that produce divergent species.
  42. How is a speciation event represented in a species trees?
    A speciation event is represented by an internal branch point.
  43. What does the root represent in rooted trees?
    The root represents the last common ancestor of the species represented by the branches coming off of it.
  44. What is the major task of phylogenetic tree reconstruction?
    It is “to identify from the numerous alternatives the topology that best describes the evolution of the data.”
  45. Describe cladograms.
    The topology has meaning, but the branch lengths do not
  46. Describe additive trees.
    Branch lengths are a measure of evolutionary divergence (e.g. number of mutations per site).
  47. Describe ultrametric trees.
    An additive tree with the added property that the rate of mutation is assumed be constant along all branches thus allowing the measurement of actual time
  48. What is meant by bootstrap analysis?
    Bootstrap analysis uses subsets of the original data to make estimates of the support for particular topological features for a given tree construction method.
  49. How can bootstrap analysis be used to construct condensed trees?
    “A condensed tree is produced by removing internal branches that are supported by less than 60% of the bootstrap trees.”
  50. What are the two conditions that would have made phylogenetic tree reconstruction from a set of homologous sequences considerably easier had they held during sequence evolution?
    • “… all the sequences evolved at a constant mutation rate for all mutations at all times”
    • “… the sequences have only diverged to a moderate degree such that no position has been subjected to more than one mutation.”
  51. Where do most mutations that are retained in DNA come from?
    Most retained mutations come from uncorrected errors during replication.
  52. What is the difference between synonymous and nonsynonymous mutations?
    Synonymous mutations do not change the amino acid, while nonsynonymous ones do.
  53. When is it useful to remove the third codon sites from the data before any further analysis?
    It may be useful when “the dataset involves long evolutionary timescales….”
  54. What is the key assumption that is made when constructing a phylogenetic tree from a set of sequences?
    The sequences “are all derived from a single ancestral sequence.”
  55. Explain the process of gene loss. Does gene loss occur solely because of gene duplication?
    • Sometimes, after gene duplication, one of the genes can lose its function due to a mutation becoming a pseudogene. Additional mutations can make it unrecognizable resulting in gene loss.
    • According to the text, “gene loss can occur without gene duplication.”
  56. What is meant by homoplasy?
    “Sequence similarity not due to homology”
  57. What is meant by “horizontal gene transfer” (also known as lateral gene transfer”)? Why is it called “horizontal”?
    • It is the transfer of genes between organisms not by reproduction (Wikipedia)
    • It is called “horizontal” because it does not occur from parent to offspring (vertical transfer).
  58. What are syntenic regions? Are they easily detected? Explain.
    • “equivalent regions in different species”, i.e. “regions containing related genes in the same order”
    • Apparently they are not as easily detected as researches had hoped because large-scale changes of the chromosome(s) and genome occur shuffling the location of the related genes.
  59. When comparing sequences from two closely related species, which regions will convey useful information for the construction of phylogenetic trees?
    • “The ideal is a genomic region that occurs in every species but only occurs once in the genome.”
    • “There should be little if any HGT within this region.”
    • “The rate of change in this sequence segment must be fast enough to distinguish between closely related species, but not so fast that the regions from very distantly related species cannot be confidently aligned. “
    • The three requirements above can often be satisfied “… by a single sequence that has some highly conserved regions and other regions that are more variable between species.”
  60. The analysis of which genomic sequence led to the discovery that prokaryotes comprised two quite distinct domains?
    “The DNA sequence specifying the small ribosomal subunit rRNA (called 16S RNA in prokaryotes)….”
  61. What is phylogeny?
    The history of descent of a group of organisms from a common ancestor. The inference of evolutionary relationships.
  62. What is taxonomy?
    The science of classification of organisms.
  63. What is the aim of phylogenetic analysis?
    To discover all of the branching relationships in the tree and the branch lengths.
  64. Why are phylogenetic trees constructed?
    • To understand lineage.
    • To understand how functions evolved.
    • To perform multiple alignment.
  65. What do internal nodes represent in a phylogenetic tree?
    Hypothetical ancestral units.
  66. In a ___ tree, the path from root to a node represents ___.
    rooted tree, evolutionary path
  67. A(n) ___ specifies ___, but not evolutionary paths.
    unrooted tree, relationships among objects
  68. All objects in a ___ have a single common ancestor.
  69. What might tree construction be based on?
    morphological features or sequence data.
  70. Describe a cladogram.
    Branch length carries no meaning.
  71. Describe an additive tree.
    Branch length measures evolutionary divergence.
  72. Describe an ultrametric tree.
    • An additive tree where there is a constant rate of mutation.
    • Horizontal lines are not important.
  73. What is significant about an additive tree with an outgroup?
    An outgroup can be used to convert an unrooted tree to a rooted tree.
  74. How is a tree rooted?
    Place the candidate root half way between the outgroup and the closest node.
  75. What is UPGMA?
    Distance-based method - Unweighted Pair Group Method using Arithmetic averages
  76. UPGMA method results in what kind of tree?
  77. What is ENIGMA?
    It is the Evidenced-based Network for the Interpretation of Germline Mutant Alleles.
  78. What is the goal of ENIGMA?
    The group was started “to evaluate and implement strategies to characterize the clinical significance of BRCA1 and BRCA2 variants.”
  79. What is BIC?
    the Breast Cancer Information Core database.
  80. What is the Splicing Working Group?
    It’s a group within ENIGMA which “has initiated several projects, including studies aimed at identifying optimal standardized protocols and prediction tools for characterizing splicing aberrations, and assessing the consistency of interpretation of clinical signi?cance of splicing assay results.”
  81. What is HGVS?
    Human Genome Variation Society
  82. What are cryptic sites?
    A cryptic site is “a site defined by the wildtype sequence, but only used when a variant disrupts the native donor or acceptor site”
  83. What website did we use for finding donor/acceptor sites?
    NNSplice at Berkeley Drosophila Genome Project
  84. What is a phylogenetic analysis of a family of related nucleic acid or protein sequences?
    It is a dtermination of how the family *might* have been derived during evolution
  85. What are some advantages of using molecular traits in building phylogenetic trees?
    • The directly reflect the underlying process of evolution
    • There are a vast number of potential traits
    • They can detect differences between closely related organisms
    • They are not affected by the environment
  86. Internal nodes are ___.
    hypothetical ancestral units
  87. In an unrooted tree, we ___ tell if two leaves have a common ancestor.
  88. What is parallel evolution?
    Independent evolution of common traits in organisms sharing distant relatives
  89. Describe distance-based tree construction.
    Calculate distance between all pairs, and then construct tree
  90. Describe character-based tree construction.
    Use individual substitutions among sequences to determine most-likely ancestral relationships
  91. What are the assumptions of UPGMA?
    • Mutation rate is constant
    • Distance is linear with time
Card Set:
Even more bioinformatics
2013-12-12 11:57:10

Show Answers: