geog 461 midterm 1 review

  1. How different is urban GIS?
    • •A bit of an artificial separation
    • •Physical sciences / processes – raster?
    • •SocSci / human processes – vector?
    • •Not exactly, and besides urban processes intersect both.
    • •But still, there are some unique issues / challenges to urban GIS
  2. Commonly used data sources
    • •Census
    • •Land records
    • •Cadastral data
    • •Property records
    • •Utilities / infrastructure•Image data
  3. Parcels are used for:
    • Researching title
    • Assessing historical patterns in land administration, land use, etc.
    • Public service delivery and record keeping (taxation, schooling, zoning, public health, etc).
    • Emergency services
  4. How are parcels established?
    • 1.Approval by various gov‟t agencies
    • 2.Record with Register of Deeds, County Recorder, etc.
    • 3.Add identifying information for tax assessor
    • PIN, address, map sheet #, etc.
    • 4.Update all related hard copy and digital records in local gov‟t system
  5. What attribute information is associated with parcel boundaries?
    Dimensions

    • Administrative classifications (ID, zoning, land use, etc)
    • Taxable value, estimated resale value

    Characteristics of structures on it (size, condition, type, address)

    People who live there or are linked to it (i.e. registered taxpayer)

    Ownership history, transfers, prices
  6. Challenges associated with parcel data for GIS applications
    Format (may not always be digital)

    Availability (varies widely, as do rules/costs for obtaining)

    • Maintenance (hard to keep current in large urban areas!)
    • Quality (see above – highly variable)

    Augmentation (adding our own additional information to parcel boundaries?)

    Confidentiality (lots of potential issues!)
  7. Parcels: Special challenges
    Condos & PINs

    Separated rights (i.e. mining, transportation authorities, air or subsurface water rights)

    Encumbrances (roads, utilities, easement between sidewalk and street)

    Zoning vs. Land use

    Re-platting & other changes
  8. Why are addresses so complicated?
    Associated with many types of features in real world

    Associated with different spatial objects (with different geometries) in GIS

    Relationships between addresses and real world phenomena may be one-to-one or many-to-one
  9. Complex associations with addresses:
    • Addresses (might)
    • Have sub-address
    • Have zone
    • Be assigned to a building/parcel (or more than one)
    • Be assigned to a landmark

    All of these associations describe relationships between different entities and attributes

    Solution: Relational Databases!
  10. Elements of addressing systems
    House or building number

    Street name

    Street type

    Directional component

    • Zone information
    • City, state, ZIP, postal code, province, etc.
  11. Why do we need zones, anyway?
    Multiple cities may have same address

    Sort/Index large data set

    Search/Processing efficiency(sometimes) Used to geocode or join

    Tip: Zones are a great place to set a domain range in an ArcGIS geodatabase!
  12. Structuring address ranges on streets:
    To/from node

    Or, in U.S. Census & TIGER files:

    From Address Right

    To Address Right

    From Address Left

    To Address Right
  13. Geocoding
    Descriptive location (i.e. address) -> exact geographic reference

    • The process involves:
    • Input (the location you are geocoding)
    • Reference data set („base map‟)
    • Processing algorithm (math / geometry)
    • Output (usually an x/y pair, not always)
  14. Geocoding process:
    Normalizes & standardizes input

    • Searches reference data set
    • User-determined tolerances

    Returns geocoded output OR user notification of failure to match

    [If no match: adjust tolerance, try again]
  15. Sources of Error- Geocoding
    • Tolerance is set too low/relaxed
    • Input is matched to the wrong location

    • Reference dataset is inaccurate
    • Output is not in the correct „real world‟ location, even though it is correctly located on the map.

    • Parcels not same size / address range is wrong
    • Output is in the wrong place on the map & the assigned geog. Coordinate is not the „real world‟ location
  16. How much error is too much?- Geocoding
    What is your application? How will you use the output data set?

    Are you combining output data set with other data?

    What is the scale of your analysis and/or map?
  17. Special challenges- Geocoding
    Informal settlements

    PO boxes & other „non standard‟ addresses

    Multi-family dwellings (apts, condos)

    Industrial / commercial areas
  18. Geocoding in U.S. cities: Reference data options
    • TIGER/Line files (street centerlines)
    • Full US coverage, free
    • Lower accuracy, esp for local applications

    • Street centerlines from municipal gov‟t
    • More current than TIGER
    • May not cover whole study area
    • Metadata & accuracy vary widely

    • Parcel address points
    • Highly accurate, but not available everywhere
    • Expensive
  19. New geo-coding capabilities
    At first, addresses only.

    • Now, ZIP codes, parcels, „natural language‟ descriptions
    • But, parcels are less accurate / reliable!

    • Why?
    • More (& better) digital reference data sets
    • More robust interpolation algorithms

    But…declining gov‟t funding for development of reference data sets
  20. Practical tips and common problems- Geocoding
    • Spelling errors or typos
    • May be in input or in reference layer

    • Missing segments
    • Your address not in existing ranges

    • Street is named differently along a portion
    • “Cicero” vs. “Hwy 50”

    Try a „select by attribute‟ to highlight the entire street

    Have a close look at your attribute table
  21. Why study administrative aspects of spatial data?
    Most everyday work of gov’t uses spatial data

    • Practices of creating, sharing, using spatial data vary widely by:
    • Scale of gov’t (local, regional, state, national, etc)
    • National context

    The social and political construction of data affects how we perceive the world to be, and how we try to act to change it.
  22. The ‘social construction’ of data
    Spatial data are not purely ‘technical’

    The data themselves, as well as access, use, implications are determined by more than the just the technologies themselves

    Data <-> Society
  23. For ex: data development, use, impacts affected by:
    • Political contexts
    • e.g. access to gov’t data, U.S. vs E.U.

    • Cultural contexts
    • Rules, beliefs, preferences, accepted behaviors
    • e.g. norms about data privacy,

    • Organizational / institutional contexts
    • Budgets, methods, staffing, technologies, ‘political capital’
    • GIS adoption in Milwaukee WI, vs. Cochise Cty, AZ
  24. Understanding data sharing practices:
    Context matters – type or size of organization, its attitude toward sharing, its national or regional situation

    Social and political connectivity and relationships matter

    Formal codes and rules matter, but so do informal rules, unwritten expectations, and individual quirks

    Policies often have unexpected consequences
  25. Spatial data infrastructures (SDIs)
    The human, technological, and informational resources used to manage and share large collections of spatial data

    Includes data, metadata, networks, regulations, people, and organizations

    SDIs have been the predominant model for governmental spatial data handling in the US

    Most SDIs are national level – local SDIs have proved harder to implement
  26. Technical aspects of data sharing: Interoperability
    Compatibility between different data or computing environments that allow us to integrate or move between them.

    • Data, systems, network
    • In GIS, data interop is biggest challenge

    Also called „semantic interoperability‟

    • Difficult to achieve because of high levels of semantic heterogeneity
    • „green‟, „green‟, „green‟.
  27. Sources of semantic heterogeneity
    Data collection methods

    • Purposes of data collection
    • Institutional differences

    • Classification schemes
    • Even the same classification scheme may be applied differently…
  28. Socio-economic analysis with GIS
    Commonly uses census data

    Thematic mapping, spatial statistics

    • Used for
    • Defining political districts
    • Allocating public funds
    • Policy making / programming decisions
  29. Censuses change over time…
    Questions asked, not asked

    • Changing terminologies / definitions
    • Race; „head of HH‟ vs „householder‟

    • Change in how you can answer the Qs
    • 2000 –multiple race identifiers accepted

    • Enumeration units
    • Watch out if you are doing historical analysis
  30. Contemporary census data issues
    • 2000 was first fully digital/GIS-able censuses (but only in some countries!)
    • But early digital spatial data: DIME, 1970s

    Devolution of gov‟t = devolution of data

    • Growing concern about the data
    • Mistrust, privacy
    • Undercounts
    • Concern about attributes and their definitions (i.e. who is a household?)
  31. What you need to know to find & use US Census data with GIS:
    Data collection & aggregation

    Census geographies

    • Data organization / structure
    • Tables & variables
  32. Data collection – pre 2010
    • Short form (all)
    • Race, Hispanic origin, age, sex, housing tenure, household relationships

    • Long form (1 in 6)
    • Income, language, work status, home values, telephone, etc etc…
  33. Data collection – 2010 and on:
    • Short form (all)
    • Race, Hispanic origin, age, sex, housing tenure, household relationships

    • No more long form!
    • American Community Survey to obtain detailed data
  34. Data aggregation
    Original data aggregated at many different summary levels

    • Governmental units
    • “Seattle”, “Guam”, etc.

    • Statistical units
    • “Tract”, “Block Group”, “MSA”

    • At block level, short form data only
    • Why??
  35. Census geographies: Statistical units
    US->Region->Division

    State->County->County subdivision

    Place->Census tract->Block group->Block
  36. Finding specific census areas & linking census data: FIPS Codes
    • “Federal Information Processing Standard”
    • Example: 060710036021003
  37. Other census geographies you might encounter
    MSAs

    But in New England = NECTA

    TAZs

    Voting districts

    Tribal lands

    [PUMA – public use microdata areas]
  38. Census geographies are provided as TIGER/Line Files
    These are the „spatial data‟ for the census

    Roads, census administrative units, streets addresses, points of interest, other administrative units, hydrography, physical features.

    Organized as line, landmark, polygon features

    Key feature: TOPOLOGY!
  39. Topological rules in TIGER system
    Must not overlap

    Must be covered by (class of)

    Must cover each other

    Can be used to describe ALL relationships between census geographies!

    Features are defined as 0-cell (point), 1-cell (line), 2-cell (polygon)
  40. Elements of TIGER/Line topology
    • General case
    • •From-Node
    • •To-Node
    • •Left-Polygon
    • •Right-Polygon

    ->

    Specific TIGER terms

    • •From Address Right
    • •To Address Right
    • •From Address Left
    • •To Address Right
  41. TIGER/Line “features”
    • •Line
    • •Landmark
    • •Polygon
    • •But: these are not „features‟ like we talk about them in a GIS sense! Loosely, feature is something that has real world identity here.
  42. Locating the right data, Step 1: Look in the correct Summary File
    A summary file is a collection of data from the Census

    SF1 = all short form data

    SF2 = all short form data, by race

    SF3 = all long form data

    SF4 = all long form data, by race

    • PUMS = raw data, long form
    • Requires special permission to access
  43. Locating the right data, Step 2: Find the right column heading
    Each attribute has a numerical code (i.e. “H903633”), not a logical name.

    Pxxxxxx is always population

    Hxxxxxx is always housing

    Pxxxxx1 is almost always total pop

    Hxxxxx1 is almost always total number of households

    Census metadata tells you what the codes stand for!
  44. Census 2010 – Breaking news…
    Short form only - no long form

    • American Community Survey (ACS) to collect info formerly on long form
    • Continuously, rather than ever 10 yrs

    2 new questions to try to get at short-term household members (i.e. „anyone who sometimes lives somewhere else?‟

    This will have implications for how Summary Files are organized/released!
  45. Data quality issues & solutions
    • 10-year gap between censuses
    • American Community Survey
    • Long form, annually, to sample of HHs
    • More follow-up with respondents

    • Undercounts
    • Demographic data (births, deaths, estimates of undoc‟d immigrants, etc.)
    • Post census surveys (Post-Enumeration Survey)
  46. Can we do better than Census data? High resolution socio-economic data
    • Integrate other administrative data
    • IRS, County Registrar, Postal Service, etc.

    Harvest & assemble online personal data

    • Barriers / Challenges
    • Legal / societal
    • Technical
  47. Census is supposed to be a full count, but not everyone responds…
    Response rate typically about 66%, based on master address file (MAF)

    In-person enumeration brings this to 97%

    So how do we come up with the other 3%?

    Were all items completed by the other 97%? Probably not…
  48. Imputation
    Use of statistical methods to adjust for missing and inconsistent responses

    • “Hot deck imputation” - sampling a given set of responses from a particular geographic area over and over and using those responses to do several things:
    • Correct for non-response
    • Perform edits
    • Ensure confidentiality

    Depends on assumptions about homogeneity (near things are more similar than far…)

    • Two major types:
    • Allocation (some missing values entered based on other reported information for the person or household, or similar others)
    • Substitution (all information for a person or household is created from others with similar characteristics
  49. Imputation rates are uneven, socially and spatially:
    Less likely to respond to Census: highly transitory HHs, non-English speakers, undocumented individuals.

    Not all parts of US have similar rates of imputation (pop and race more likely to have been allocated in SW)

    What does this mean for you as a user of US Census data?
  50. Simple network-based applications
    • Shortest path
    • Shortest/fastest way to get home from school?

    • Closest facility
    • Which post office is closest to my house?

    • Service area
    • If people who can get to our store within 30 minutes will shop here, what is our service area?
  51. Network analysis allows us to:
    • Analyze transit "cost"
    • How long does it take to navigate through this network?

    • Select optimal routes
    • What is the least cost path through this network (time, distance, traffic…).

    • Solve resource allocation problems
    • What is the portion of the network that should be considered the „territory‟ from a particular starting point (e.g. each fire station…)
  52. Case #1: Do neighborhoods have different levels of access to parks?
    Context: Gov‟t sets benchmark for green space per capita, urban and regional planners must meet this standard

    But this may not reflect ability of population to travel to available sites

    Buffering the sites or calculating straight-line distance doesn‟t represent how people actually travel to the sites

    A network-based solution is required
  53. Do different n‟hoods have different access? How to solve this problems in GIS:
    Define “access points” for the green spaces

    Establish centroids for the census polygons (or whatever areas you are using as the „source areas‟ that people are coming from)

    For all points in demographics layer, calculate distance to nearest point in parks layer, following the network.

    Assess how the "nearest park" distances may differ for different population groups
  54. Representing and Analyzing Networks in GIS
    Network: a system of linear features connected at intersections and interchanges

    A network is composed of a set of nodes and the links that connect them

    Networks might be used to represent: roads, streams, airline flight paths, railroads and so on.
  55. A square is a rectangle, but not all rectangles…
    Not all sets of lines in a GIS constitute a network

    • Defining a network involves adding additional information that tells us:
    • "Cost" to travel each link
    • Connectivity between links
    • Direction that may be traveled on each
    • Limits on transfer from one link to another
  56. All network applications start with the network itself:
    • If it is a road network you need systems for:
    • Representing overpass/underpass
    • Accounting for time taken in a turn
    • All possible turns at an intersection
    • One-way and other street direction limitations
  57. To create network, you need to create:
    • Overpass & underpass
    • a) Can simply cross the arcs with no node (easy)
    • b) Can insert two nodes with elevation values to show which is the overpass and which is the underpass (harder)
  58. For more realistic modeling, you also need
    • Link impedance – "cost" (time) of traversing a "link" (a segment separated by 2 nodes)
    • To a certain degree, length of link is used
    • May depend on speed limits, traffic conditions, etc

    Turn impedance – „cost‟ of transitioning from one arc to another in the network (a turn). Requires a „turn table‟, because different possible turns may have different impedence (right vs. unprotected left, for instance).
  59. More complicated network applications
    • Modeling events that happen along the network
    • Clusters of traffic accidents, flight delays

    • Multiple modes of travel on the network
    • Bike to the train station, train to work

    • Attributes of the network
    • Risk of a toxic spill, traffic volumes
  60. More complicated impedance models:
    • Distance and speed are not the only factors influencing route selection
    • Road, amenities, terrain/view, weather, elevation, etc.

    Conventional impedance models limited for transportation planning

    • But, additional attributes can be handled as impedance factors, combined
    • Sadeghi-Niaraki et al. 2011.
  61. Dynamic segmentation
    A data model built on lines of a network

    Uses relational database to store network geometry (intersections, streets) and info about traversing the network (turn tables, link impedance tables, etc)

    You can add other attribute data to your model (pavement conditions - dry, wet, icy; risk of an accident; frequency of monsoon flooding, etc.)
  62. Using more complicated network applications for urban research & policy
    Modeling multi-mode travel

    Dispatching „toxic trucks‟ along less risky routes

    Evacuation planning
  63. Existing trip-based models are limited
    •  One kind of transportation
    •  Single trip only
    •  Different start/stop points
    •  Many travel behaviors not covered bythis kind of model…
  64. One solution: Super-networks
    Multiple copies of networks, one for each mode of travel (driving, pub transit, walking etc.)

    The networks intersect @ nodes where you can switch from one mode to another (transition links)

    locations on network have attributes based on mode of travel and type of trip.
  65. Application issues and limits: Super Networks
    Size of network model is untenable

    Simplify with user input selecting parts of the network?

    • Place-based differences in actual travel options and patterns
    • Areas w/o public transit
  66. Another case example: Routing „toxic trucks‟
    How do we model risks along a road network so we can route vehicles on the least risky path?

    Involves multi-criteria modeling

    • Assessing „risk‟ is tricky, esp in GIS
    • How much is too much? Costs?
    • Does risk depend on toxicity, likelihood of event, # of people effected, something else?
  67. Assessing risk along the network
    • Created measures for these variables
    • Population density, traffic flow, traffic speed, emergency response time, sparse population…

    Each variable given an ordinal rank

    Scores are weighted to calculate composite risk; risk score assigned to each segment

    Routing algorithm uses criteria scores to determine optimal route
  68. Assessing impacts/harm along the network
    Buffer the „at risk‟ road segments

    Identify vulnerable facilities w/in the buffers (kids, elderly, high pop density facilities)

    Clip / interpolate socio-economic data using the buffers
  69. Example 3: Evacuation planning
    Networks, as part of mathematical and spatial models.

    • Challenges:
    • Real-time data? [sensors]
    • Uneven info access / unpredictable behavior
    • Damage to network
    • Multi-institutional data collection and sharing
  70. Related applications: Can we deliver information to people based on location?
    Location-based services

    GPS-enabled devices (mostly cell phones, also cars)

    • Where is the device in the network, where is it relative to other locations in the network?
    • Location-based services
  71. All LBS need to be able to model movements / locations in space
    • Most use street networks as a basis:
    • Where is person or object X in relation to street network? How near or far?
    • What is the shorter route between two places on this network? Fastest?
    • How many people/objects can move along this network under certain conditions?
Author
calaedw
ID
131621
Card Set
geog 461 midterm 1 review
Description
midterm
Updated