Bioinformatics final

Card Set Information

Author:
quietstorm
ID:
187321
Filename:
Bioinformatics final
Updated:
2012-12-05 17:27:07
Tags:
Bioinformatics
Folders:

Description:
Final
Show Answers:

Home > Flashcards > Print Preview

The flashcards below were created by user quietstorm on FreezingBlue Flashcards. What would you like to do?


  1. What are the two types of databases?
    Prinary/ curated
  2. A scientist has solved a structure and would like to create functional rules that can be propagated to rest of the members of the family. What kind of rule do you think he can make?
    Site- rule
  3. What is the basic unit of a PIRSF
    Homeomorphic family
  4. What tool would you use to find aTemplate for modeling
    BLAST
  5. Which of these are elements of protein secondary structure?
    • A) Helic
    • B) Sheet
    • C) Loop
    • D) All of the above

    D: all of the above
  6. What is the only amino acid found at position 3 in a Type II turn ?
    Glycine
  7. Which of the following databases is based on HMM?
    • A) Uniprot
    • B) Pfam
    • C) CDD
    • D) Genbanck

    B: Pfam
  8. A subgroup of two or more taxa or DNA/protein sequences that include both their common ancestor and all of their descendents”. What is this a definition of?
    Clade
  9. What is central to molecular recognition?
    Protein-ligand interactions
  10. Which of the following is aSupersecondary structure?
    • A) alpha helix
    • B) gamma turns
    • C) beta sheet
    • D) Greek key

    D: Greek key
  11. What is the guiding principle that is used in assigning protein function?
    % identity
  12. Which one of the following methods is based onclustering algorithms?
    • A) Least Squares
    • B) Neighbor Joining
    • C) minimum Evolution
    • D) Max Parsimony

    B: Neighbor joining
  13. Name one database that stores homology models?
    MODBASE
  14. Two structures with sequence identities below 20% can still be closely related with a RMSD < 1ang and belong to the same family. What can you attribute this to?
    structures evolve slowly
  15. What is the other term used for beta turns?
    Reverse Turns
  16. Which of the following is a domain database?
    TigrFams
  17. You are given the two accessions gi|4503055 & gi|117351. Which database would you go to retrieve the Fasta formatted file of these sequences?
    NCBI
  18. Where would you go to find the neighbors for proteins with ids 1P91, 2ZDZ, 1BNL
    DALI
  19. Which of the elements of protein structure is most flexible?
    Loops
  20. Which of the following is a Uniprot database?
    UNIPRAC
  21. Homologs produced by gene duplication are called what?
    Paralogs
  22. Which of the following databasesis based on HMMs?
    SMART
  23. Sequences are considered to be in the Twilight Zone if their sequence identities is :
    < 30%
  24. Which pair is considered as aConservative substitution?
    Y/F
  25. Which of the following is a Supersecondary structure?
    helix-loop-helix
  26. Which of the following is best suited to get divergent sequences?
    • A) profile HHMs
    • B) BLASTN
    • C) TBLASTN
    • D) BLASTP

    A: Profile HMMs
  27. What is the most important parameter used ininterpreting the results of sequencecomparisons?
    E-value
  28. At what sequence cut-off it isSafe to model?
    > 60%
  29. Who came up with the first evolutionary tree for globins?
    M.O. Dayhoff
  30. What is the typical word size usedwhen using blastp?
    3
  31. What is Bioinformatics
    Integration of Omic terms
  32. Name the three distinct domains of life. How manyBillion years ago roughly did each of them evolve?
    • Bacteria (2.6 billion years)
    • Archea (3.5 billion years)
    • Eukarya (2.2 billion years)`
  33. When was the first draft of human genomesequencing project completed (roughly)?
    2001: Completion of the human draft genome!
  34. What are accession numbers and IDs? What typeof ID is used by PDB? An example please?
    • Identifier: string of letters and digits; can change
    • Accession: leters and number; stable
  35. What are gaps? How many gaps do you see in thefollowing alignment?
    3 gaps
  36. Name one tool that you would use to alignmultiple sequences.
    • 1. CLUSTALW (Progressive Method)
    • 2. MUSCLE (Iterative method)
    • 3. T-Coffee/Expresso (Structure based)
  37. What tool/database would you use to getstructural neighbors?
    • VAST
    • DALI
    • CE
  38. What tool would you use to browse a genome and find the Chromosomal location of a gene?
    UCSC
  39. What tool would you use to browse a genome and find theChromosomal location of a gene?
    • PSI-BLAST
    • Profile HMMs
  40. Name one tool that you would use to align twosequences to do a pair-wise alignment?
    Align using BLAST
  41. What is a domain? Give names of at least three domain families you have encountered so far.
    Domain is an evolutionarily mobileunit of a protein
  42. Define homology? Will two proteins belongingto the same family be considered Homologous? Why?At what sequence identity cut-offs can two proteins safely be considered homologous.
    Homology: Two sequences or structures are said to be homologs or homologous to each other if they are related by divergence from a common ancestorHomology = descent from a common ancestor

    Yes, two proteins belonging to the same family can be considered Homologous since they will share a common ancestor.
  43. Define Orthologs and Paralogs. Give an example for each.
    • Orthologs: homologs prduced by speciation
    • Paralogs: homologs produced by gene duplication
    • Xenologs: homologs resultiong from horizontal transfer of a gene b/w 2 organisms.
  44. What does sequence identity mean?
    Sequence IdentityThe extent to which two sequences are invariant.
  45. What is a PAM matrix? What does PAM1 mean? Whatkind of alignment is it based on?
    • PAM matrices: Point-accepted mutations
    • based on global alignments of closely related proteins
    • Calculated from comparisons of sequences with no more that 1% divergence
  46. What are the two types of alignments? What type does BLAST use?
    Global and local alignments

    Local alignment is almost always used to database search such as BLAST
  47. What are the different components of a protein?
    • Motifs
    • domains
    • Full-length Protein
    • Intergrated Family databases
    • 3D structure
  48. What is a Substitution matrix? Name the two majortypes of matrices.
    • contains values proportionalto the probability that amino acid i mutates intoamino acid j for all pairs of amino acids.
    • constructed by assemblinga large and diverse sample of verified pairwise alignments(or multiple sequence alignments) of amino acids.
    • should reflect the true probabilitiesof mutations occurring through a period of evolution.
    • The two major types of substitution matrices arePAM and BLOSUM.
  49. Name two protein domain databases.
    • Pfam
    • SMART
    • CD
  50. Comparing two sequences is the cornerstone of any bioinformatics analysis. Can you explain why that is so.
    • It helps us to understand if two proteins are functionally related
    • It helps us to understand if two sequences are structurallyrelated
    • It helps us to understand if two sequences are structurally or Functionally related or both
    • It helps us to identify common domains and motifs (ligand binding Sites; metal sites; active sites)
    • More importantly it helps us to understand the differences and its
    • Functional divergence and hence helps us look back at events Billions of years ago (Evolution!)
  51. Which database serves as a universal hub for protein structuresdetermined by X-ray, NMR or EM
    PDB
  52. What are the different Uniprot databases?
    • UniProtKB
    • UniRef
    • UniPrac
  53. Name the database you would use to get informationabout diseases.
    OMIM: Online Mendelian inheritance in man
  54. Name two structure classification databases.
    • SCOP
    • CATH
  55. Name two full-length classification databases.
    • COG/KOG
    • PRISF
    • PANTHER
  56. You are given the two accessions O95050 & Q12400 . Which database would you go to retrieve the Fasta formatted file of these sequences. Describe the steps
    • 1.Go to http://www.uniprot.org/
    • 2.Click on retrieve
    • 3.Paste the IDs
    • 4.and save the file as fasta from the options.
  57. I would like to know if 3D structure of my query proteinhas been determined? How would I go about it? Which databaseWould I go to and what tool would I use?
    • 1.Go to NCBI and do a BLAST
    • 2.Go to PDB and do a BLAST against PDB
  58. Where would you go to find the neighbours (other related proteins) for proteins with ids 1P91, 2ZDZ, 1BNL
    • VAST
    • DALI
    • CE
  59. A scientist would like to design a hybridization probe for a Type IIB bleeding disorder gene that he has the sequence for. Which tool would you recommend he uses for his probe design?
    Primer3Plus
  60. A scientist has just cloned a gene and would like to see if it hasany known domains. How should he go about it?Which database do you recommend he uses?
    • NCBI to sequence
    • go to Pfam, SMART, CD
  61. Name the database you would use to get informationabout diseases.
    OMIM: Online Mendelian inheritance in man
  62. Which of the following is a domain database?
    CDD
  63. Where would you go to find the neighbors for proteins with ids 1P91, 2ZDZ, 1BNL
    DALI
  64. Which of the following is a Uniprot database?
    • A) RESEQ
    • B) UNIPRAC
    • C) OMIM
    • D) ENTREZ

    B: UNIPARC
  65. Homologs produced by gene duplication are called what?
    Paralogs
  66. What is the typical word size used when using blastp?
    3
  67. Primary Databases
    • txt: PubNed
    • DNA seq: GenBank, DDBJ, EMBL
    • Protein seq: Entrez Proteins, TREMBLE, Refseq
    • Protein structres: PDB
  68. Curated databased
    • DNA seq: RefSeq, OMIM
    • Protein seq: Swiss-Prot, PIR, Refseq
    • Genomes: Entrez Genomen, COGs
  69. Protein sequence databases
    • uniprot
    • pir
    • ncbi
  70. uniprot databases
    • UniprotKB: SWISS-PROT/TrEMBL
    • UniRef
    • UniParc
  71. Protein motif databased
    • PROSITE
    • PRINTS
    • BLOCKS
  72. Protein domain databases
    • Pfam
    • CDD
    • SMART
  73. Structral databases
    PDB
  74. Secondary processed structrual databases
    • VAST
    • DALI
    • CE
  75. Stuctuarl calssification
    • SCOP
    • CATH
  76. Protein full length calssification
    • PIRSF
    • COGS
    • PANTHER
  77. Homology modeling
    • MODBASE
    • SWISS-MODEL 
  78. SNP databases
    dbSNP
  79. Projects
    • HapMap
    • HGP
    • ENCODE
  80. Tools/Programs
    • Chromosome Location: UCSC Genome Browser,NCBI MapViewer
    • Primer/Probe Design: Primer3Plus
    • Pair-wise alignments: BLAST align-b12sep
    • ID conversion: PIR-ID mapping
    • Multiple swq alignments: CLUSTALW(progressive method),;MUSCLE (iterative method); T-coffee, ecpresso (structure-based method); HMMER (statistical method)
    • homology modeling: Modeller, swiss-model
    • nodel validation: ramachandran plot
  81. KB: structure
    • PDB
    • SCOP
    • CATH
  82. KB: family full length
    • PIRSF
    • PANTHER
    • COGS
  83. KB protein seq
    • UNIPROTKB (SP &TrEMBL)
    • NCBI (REFSEQ)
  84. KB: MOTIFS
    • PROSITE
    • BLOCKS
    • PRINTS
  85. KB: DOMAIN
    • PFAM
    • SMART
    • CDD
  86. KB: 3D-MODELS
    • MODBASE
    • SWISSNODEL
  87. KB: DISEASE/SNP
    • OMIM
    • HAPMAP
  88. KB: NUCLEOTIDE
    • GENBANK
    • EMBL
    • DDBJ
  89. STRUCTURE SPECIFIC: UNIVERSAL HUB
    PDB (PRIMARY)
  90. STRUCTURE SPECIFIC: NEIGHBORS
    • VAST
    • CE
    • DALI
  91. STRUCTURE SPECIFIC: STRU. CLASS
    • SCOP
    • CATH
  92. STUR SPEC: 3D MODELS
    • MODBASE
    • SWISS MODEL
  93. PRIMARY DB: STRUC
    PDB
  94. PRIMARY DB: PROT SEQ
    • UNIPROTKB (TrEMBL)
    • NCBI (REFSEQ)
  95. PRIMARY DB: NUCL
    • GENBANK
    • EMBL
    • DDBJ
  96. PRIMARY DB: TXT
    PUBMED
  97. CURATED DB: STRUC
    • SCOP
    • CATH
    • MMDB
  98. CURATED DB: PROT SEQ
    • SWISS-PROT
    • PIR
    • REFSEQ
  99. CURATED DB: NUCLEOTIDE
    • REFSEQ
    • OMIM
  100. CURATED DB: GENOMES
    • ENTREZ
    • GENOMES
    • COGS
  101. SEARCH FOR COMPLETE GENOMES
    GOLD
  102. WHAT IS A DOMAIN
    PROTEIN THAT CARRIES STURCTURE AND FUNCTION
  103. WHAT IS A CONSERVED DOMAIN
    PIECE OF PROTEIN THAT IS CONSERVED ACROSS A FAMILY
  104. HOW MANY GENES HAVE BE IDENTIFIES  THAT INVOLVE DISEASE
    < 2500
  105. SEQ SIMILARTY
    • EXTENT TO WHICH NUCLEOTIDE OR PROTEIN SEQUENCES ARE RELATED
    • BASED ON IDENTITY PLUS CONSERVATION
  106. SEQ CONSERVATION
    CHANGES AT A SPECIFIC POSITION OF A AMINO ACID OR SEQUENCE THAT PRESERVE THE PHYSICO-CHEMICAL PROPERTIES OF THE ORIGINAL RESIDUE
  107. PHI VS. PSI BLAST
    • PHI: PATTERN-HIT INITIATED SEARCH
    • PSI: POSITION-SPECIFIC ITERATED SEARCH
  108. PAM
    • POINT ACCEPTED MUTATION
    • M.O. DAYHOFF
    • LOW PAM: SHORT STRONG LLOCAL SIMILARITIES
    • HIGH PAM: WEAK SIMILARITIES
  109. phylogeny
    • evolutionary history of an organism
    • cornerstone of systematic taxonomy
  110. systematics
    study of the exolution of biological diversity
  111. root
    common ancestor of all taxa
  112. branch
    reflects the relationship b/w taxa according to descent and ancestry
  113. node
    a toxonomic unit identifying either an existing or extinct species
  114. distance scale
    scale that represents the number of differences b/w organisms or seq
  115. topology
    defins the branching patterns of the tree
  116. What are the two types of computational methods used in phylogenetic analysis?
    Clustering algorithms & Optimality approaches
  117. Name four methods that uses optimality criterion.
    Parsimony; Maximum Likelihood; Minimum evolution & Least squares
  118. Name a few methods that uses clustering algorithms.
    UPGMA & Neighbor joining
  119. How is the decision made on when to use what method?
    Based on the levels of similarity
  120. Define Taxonomy and Cladistics.
  121. What are the three types of trees?
    • cladogram
    • phylogram
    • ultrameric tree
  122. How are phylogenetic analysis depicted?
  123. What is phylogeny?
  124. What is evolution at the molecular level?
  125. What is functional annotation?
  126. Why is manual annotation important and absolutely essential?
  127. What are the advantages of using PIRSFs?
  128. What are the two types of rules? What is the difference between them?
  129. What are site rules and what do you absolutely need to create a site rule for propagation?
  130. What is the clustering tool used to create PIRSFs?

What would you like to do?

Home > Flashcards > Print Preview