Ch 10 Big Data - Exam IV

Card Set Information

Author:
mjweston
ID:
245628
Filename:
Ch 10 Big Data - Exam IV
Updated:
2013-11-09 07:15:13
Tags:
Hadoop Foundation Ecosystem
Folders:

Description:
Hadoop Ecosystem
Show Answers:

Home > Flashcards > Print Preview

The flashcards below were created by user mjweston on FreezingBlue Flashcards. What would you like to do?


  1. Hadoop
    provides the most comprehensive collection of tools and technologies available today to target big data challenges
  2. Yet Another Resource Negotiator (YARN)
    a core Hadoop service that provides global resource mgmt (ResourceMgr) & per-application mgmt (ApplicationMgr)
  3. ResourceManager
    a master service and control NodeManager in each of the nodes of a Hadoop cluster - ie: includes a scheduler that dynamically allocates resources according to pre-set needs of the application
  4. ApplicationMaster
    includes a notifier that activates when additional resoures are required by the application
  5. HBase
    a columnar (non-relational) database that can hold billions of rows layered across Hadoop clusters - provides real-time access to data - highly configurable - tracks changes by versioning data - organized like a taxonomy (high level categories that get broken down_
  6. Hive
    a relational data warehouse layer that allows SQL-savy users to interact directly with structured data (HiveQL) while retaining the ability to implement analysis with MapReduce - not fast but extensive & scalable and allows partitioning - allows data to be partitioned
  7. Sqoop (SQL to Hadoop)
    the ETL process in the Hadoop system - able to work on non-Hadoop data sets to enable them to be manipulated in the Hadoop environment
  8. Zookeeper
    allows the distributed environment to work smoothly with few faults - synchronized processes so they occur in the proper order by starting & stopping nodes as needed - assigns a node to be a leader - supports effective messaging among nodes - ensures proper configuration of resources - facilitates communication
  9. tables
    partitions
    buckets
    three mechanisms for data organization in Hive
  10. Pig
    was designed to make Hadoop more usable by nondevelopers - is capable of producing map & reduce processes so that the user is not required to know how to do so - easy for less technical end users
  11. pig latin
    a language used to express data flows that supports the loading & processing of input data with a series of operators that transform the input data and produce the desired output
  12. script - file containing Pig Latin commands
    grunt - a command interpreter
    embedded - programs executed as part of a Java program
    three ways pig can be run
  13. bulk import
    direct input
    data interaction
    data export
    four key features of Sqoop
  14. process synchronization
    configuration mgmt
    self-election - can assgn a leader role
    reliable messaging
    capabilities of Zookeeper

What would you like to do?

Home > Flashcards > Print Preview