Ch 10 Big Data - Exam IV

Card Set Information

Ch 10 Big Data - Exam IV
2013-11-09 07:15:13
Hadoop Foundation Ecosystem

Hadoop Ecosystem
Show Answers:

  1. Hadoop
    provides the most comprehensive collection of tools and technologies available today to target big data challenges
  2. Yet Another Resource Negotiator (YARN)
    a core Hadoop service that provides global resource mgmt (ResourceMgr) & per-application mgmt (ApplicationMgr)
  3. ResourceManager
    a master service and control NodeManager in each of the nodes of a Hadoop cluster - ie: includes a scheduler that dynamically allocates resources according to pre-set needs of the application
  4. ApplicationMaster
    includes a notifier that activates when additional resoures are required by the application
  5. HBase
    a columnar (non-relational) database that can hold billions of rows layered across Hadoop clusters - provides real-time access to data - highly configurable - tracks changes by versioning data - organized like a taxonomy (high level categories that get broken down_
  6. Hive
    a relational data warehouse layer that allows SQL-savy users to interact directly with structured data (HiveQL) while retaining the ability to implement analysis with MapReduce - not fast but extensive & scalable and allows partitioning - allows data to be partitioned
  7. Sqoop (SQL to Hadoop)
    the ETL process in the Hadoop system - able to work on non-Hadoop data sets to enable them to be manipulated in the Hadoop environment
  8. Zookeeper
    allows the distributed environment to work smoothly with few faults - synchronized processes so they occur in the proper order by starting & stopping nodes as needed - assigns a node to be a leader - supports effective messaging among nodes - ensures proper configuration of resources - facilitates communication
  9. tables
    three mechanisms for data organization in Hive
  10. Pig
    was designed to make Hadoop more usable by nondevelopers - is capable of producing map & reduce processes so that the user is not required to know how to do so - easy for less technical end users
  11. pig latin
    a language used to express data flows that supports the loading & processing of input data with a series of operators that transform the input data and produce the desired output
  12. script - file containing Pig Latin commands
    grunt - a command interpreter
    embedded - programs executed as part of a Java program
    three ways pig can be run
  13. bulk import
    direct input
    data interaction
    data export
    four key features of Sqoop
  14. process synchronization
    configuration mgmt
    self-election - can assgn a leader role
    reliable messaging
    capabilities of Zookeeper