Big Data Vocabulary

The flashcards below were created by user mjweston on FreezingBlue Flashcards.

  1. Algorithim
    • a process or set of rules to be followed during problem solving
    • often self-improving
    • ex: GPS route calculation - shortest or fastest routes taking into consideration traffic and speed limits
  2. Asynchronous
    • occurring at different times
    • ex: email communication - act on it at your leasure, & discussion fourms
  3. Business Analytics
    the application of models to business data to support decision making (it's not statistics)
  4. Business Intelligence
    • a framework for decision support
    • includes architecture for the data management cycle
    • includes tools for mining, analysis, and sometimes interpretation
  5. Clickstream Analysis
    • analysis of data that occur in the web environment
    • everything that you do on the web is captured
    • ex: How you came in (ex:mobile)
    •       Where you go
    •       How long you stay
    •       What you look at
    •       Whether you buy (as applicable)
  6. Cloud Computing
    • an infrastructure that is available at a service
    • generally is virtualized (anytime, anywhere)
    • ex: Gmail
  7. Dashboard
    • a visual representation of data
    • used to quickly demonstrate metrics
    • originally developed for busy executives
    • Facebook provides a visualization of metrics with its Facebook Insights (where "likes" are coming from - demographics, etc.)
  8. Data
    • facts without meaning
    • ex: a list of names
  9. Data Integrity
    • a data quality measure
    • refers to accuracy being maintained during manipulation
    • ex: your AU permanant record (name, permanant address, gender, previous educational accomplishments, etc.) should not change because you have moved from an undergraduate to a graduate student
  10. Data Mart
    • a small data warehouse
    • usualy regional, departmental, etc.
    • may store a subset of organizational data - or specialized data
    • ex: financial aid maintains different data than your AU permanant record
  11. Data Mining
    the act of moving through data with the intent of finding patterns & trends to interpret, and on which to act
  12. Supervised (predictive)
    Unsupervised (descriptive)
    two types of data mining
  13. Supervised (predictive)
    • type of data mining where the algorithim is given specific guidelines for the purpose of testing a hypothesis
    • it must be trained, then tested, then applied
  14. Unsupervised (descriptive)
    type of data mining where the algoritym is not given guidelines and there is no preconceived notion of what will be found
  15. 1. Clustering - looks at the data you have & determines if there are multiple attributes that "hang" together ex: buy diapers & wipes together
    2. Summarizing
    two common techniques of Unsupervised (descriptive) data mining
  16. Data Quality
    • critical attributes of data including:
    •    Accuracy - is it correct?
    •    Precision - is it correctly measured? ex: GPA 3.5 vs. 3.45
    •    Completeness - are all of the expected (not necessarily all) attributes provided?
    •    Relevance - does it matter? ex: # of pets on a scholarship form
    •    Temporality - is it current?
  17. Accuracy - is it correct?   Precision - is it correctly measured? ex: GPA 3.5 vs. 3.45   Completeness - are all of the expected (not necessarily all) attributes provided?   Relevance - does it matter? ex: # of pets on a scholarship form   Temporality - is it current?
    critical attributes of Data Quality
  18. Data Visualization
    • representation of data in a non-text way
    • often graphically
    • generally considered to be the ideal method of information transfer
    • caution should be used when interpreting - can easily be skewed with scales of graphs or types of charts
  19. Data Warehouse
    • a repository of organizational data in a cleansed and organized way
    • generally separate from transactional data
    • represents historical data
    • designed for mining
  20. Database
    • a self-describing collection of related records
    • store information about themselves and the data
    • tables are integrated through keys
  21. Database Management System
    • a computer program that facilitates creation, processing, and administration of a database
    • designed as an interface between the data and the user
    • SQL is the American National Standards Institute (ANSI) designated standard language
  22. Drill Down
    • the process of investigating information through levels of granularity
    • high granularity (small grains) means the data is detailed
    • low granularity (large grains) means the data is summarized 
    • usually goes from low to high
  23. Low Granularity (small grains)
    refers to the data being detailed
  24. High Granularity (large grains)
    refers to the data being summarized
  25. Effectiveness
    the degree to which a goal is attained
  26. Efficiency
    the ratio of input to output in relation to goal attainment (not quickness)
  27. Explicit Knowledge
    • objective, often technical material that is easily transferred (learned)
    • ex: Java Language
  28. Extraction, Transformation, Load (ETL)
    a process that prepares data for use in a new storage facility (ex: data warehouse)
  29. Extraction
    the process of removing data from a source
  30. Transformation
    the process of reformatting data
  31. Load
    the process of putting data into a new storage facility
  32. Information
    • organized data that has meaning
    • ex: a list of names with a heading of "ISMN Students"
  33. Knowledge
    actionable information in context
  34. Metadata
    • generally regards form and structure of data
    • used by data repositories (storage) to support data quality
    • used by data repositories to read and recall data
    • data about data
  35. Parallel Processing
    the ability of a computer to perform multiple instructions at one time
  36. Predictive Analytics
    use of tools to predict future outcomes
  37. Structured Query Language (SQL)
    standard language (ANSI) used by databases
  38. Data Definition Language (DDL) & Data Manipulation Language (DML)
    two types of SQL
  39. Data Definition Language (DDL)
    type of SQL language used for creating database structure
  40. Data Manipulation Language (DML)
    type of SQL language used for querying, inserting, modifying, or removing (deleting) data
  41. Tacit Knowledge
    • subjective, experimental knowledge that is difficult to transfer
    • ex: driving a car (we usually don't think about it, just do it)
  42. Text Mining
    • a data mining process used on data that is largely unstructured
    • usually uses the text itself to determine indices for subsequent analysis
    • often unsupervised
  43. Web 2.0
    • the "Social" web - ex: Twitter, Facebook
    • produces the majority of the unstructured data obtained by organizations
  44. Web Mining
    consists of: content mining, structure mining, and usage mining
  45. Content Mining
    text mining of web content
  46. Structure Mining
    • mining of web links and their relationships
    • ex: 3 click limit - if I haven't found what I need in 3 clicks, I'm out
  47. Usage Mining
    • mining of web navigation
    • where are you going on the web
    • ex: Do you always go to mail?
Card Set:
Big Data Vocabulary
2013-09-05 17:39:56
Big Data

Big Data Vocabulary
Show Answers: