gis
MODULE 2: ATTRIBUTE DATA
Attribute Data Management
There are two types of GIS data used
to describe features:
1. Spatial Data
a. What things are: Represented by symbols.
b. Where things are: Georeferenced to locate the feature spatially
2. Attribute Data
a. What things are: Describe the
real-world feature
b. What types of things are found in a particular location: Linked to spatial information about a feature
By themselves, attribute data have little meaning
--the analysis or organization of data creates meaning
--databases allow us to order, re-order, summarize, and
combine data
With a database, we can
--sort temperature values, from lowest to highest
--calculate minimum, maximum, or average temperature
--convert from one unit to another
Hotel ID Name Address Number of Rooms Standard
001
002
003
Mountain View
Palace Deluxe
Ski Lodge
23 High Street
Pine Avenue
10 Ski School Road
15
12
40
Budget
Luxury
Standard
Databases Before Computers
A library is essentially a database that organizes
books according to two basic systems:
1. Dewey Decimal System
2. Library of Congress System
A card catalog was used to query the system by:
1. Subject
2. Author
3. Title
Problems with Paper-based databases:
1. Tended to get redundancy of data between agencies
2. They were difficult to manipulate
3. Errors were common and difficult to clean up
4. Cannot be sorted (at least not easily)
5. Cannot select a subset of data
6. Updating requires a re-print
Example Database Application:
Happy Valley Ski Resort
--Booking and Customer Service Database
(Heywood, et. al)
A DBMS is a computer program --controls the storage, retrieval and modification of data stored in a database
A DBMS is used to: 1. Handle and manage files 2. Add, update, and delete records 3. Extract information from data through data sorts, creating
summary tables, and data queries 4. Maintain data security and integrity 5. Build applications
Database Management Systems manage data --organized using a database data model
Analogous to how spatial data are organized in a GIS using spatial data models (raster and vector) Database Models Four of the most widely used database models for handling GIS attribute data: 1) Hierarchical 2) Network 3) Relational 4) Object-oriented
Database Management Systems (DBMS)
The database approach to data handling
Hierarchical Database Model
Object-Oriented Database Approach
Relational Database Model The relational database model is the most widely
used database model
--ArcMap uses a relational database model
How it works:
Data are organized in a series of two-dimensional tables --each table contains records for one entity
Tables are linked by common sets of characteristic data (i.e. data columns) known as keys
In relational tables --data are organized into rows and columns
Columns contain attributes and each has a distinct name --each column represents an attribute field
Rows contain all of the attribute fields for features --each row represents a feature record
The table structure is very flexible --allows a variety of queries to be performed on the same data
Queries can be performed on individual tables --or on multiple tables simultaneously by linking separate tables on a key field
Object State Behavior
Hotel Name Can be plotted on map
Address Can be added to database
Number of bedrooms Can be deleted from database
Standard of Accommodation Standard can be upgraded or downgraded
Number of bedrooms can be increased or decreased
Creating a Database Database design and implementation --guided by relationships between data that will be stored
Design process concerned with --expressing relationships between data
Implementation --means setting up new structures to express these relationships --Usually within an existing database software
Data Investigation produces a lot of unstructured information, which requires data modeling to determine: --information flows --relationships between data --potential entities to be stored in database
The process of modeling relationships between entities is known as: --Entity Relationship Modeling (ERM), or Entity Attribute Modeling (EAM)
Hotel ID Name Address Number of Rooms Standard
001
002
003
Mountain View
Palace Deluxe
Ski Lodge
23 High Street
Pine Avenue
10 Ski School Road
15
12
40
Budget
Luxury
Standard
Step 1: Identifying Entities In the Happy Valley example, there are four entities for which attribute data have been created:
1. Hotels 2. Ski Schools 3. Tour Companies 4. Visitors
Each Entity has distinctive characteristics, and can be described as a noun 1. The entities’ characteristics are the attributes 2. The domain is the set of possible values
Step 2: Identifying Relationships Between
Entities
These relationships can be described with verbs --A hotel is located in a resort --A visitor stays at a hotel --A ski school teaches students
Three types of relationship are possible: 1. 1:1 = each visitor stays at one hotel 2. 1:M (one to many) = one ski school teaches
many visitors 3. M:N = many tour companies use many hotels
Creating an EAM diagram helps the modeler to decide on the appropriate structure of the database to be built
1:1 RELATIONSHIP --means tables for each entity can be either be joined together, or kept separate
1:M RELATIONSHIP --means two tables are needed with a key field to allow a relational join
M:N RELATIONSHIP --means tables should be separated (if tables repeat information the relationships need to be broken down further to avoid redundancy)
Step 3: Identifying Entity Attributes 1) HOTEL: Hotel_id, Hotel Name, etc. 2) TRAVELCO: TravelCo_id, 3) SKISCHOOL: Skischool_id, SkiSchool_name, etc. 4) VISITOR: Visitor_id, Visitor_name, Hotel_id,
TravelCo_id, SkiSchool_id, etc. 5) Link Table: i. TravelCo_id ii. SkiSchool_id
Now you have --an Entity Attribute model diagram --a set of table definitions --the details of the attributes
The database is now ready to implement
Happy Valley EAM (Entity Attribute Modeling ) diagram
Linking Attribute Data to Spatial Data In a simple raster GIS, a database is not necessary 1. One cell in a layer of data contains a single value 2. Attribute data values are often held in the same file as the data layer itself
Few actual GIS are like that 1. Most GIS (esp. vector-based) offer a hybrid approach (Table 4.8) 2. Spatial data are stored as part of the GIS data structure and attribute data are stored in a relational database
A hybrid approach is used by most vector GIS data structures: 1. SPATIAL DATA: Stored as part of the GIS data structure 2. ATTRIBUTE DATA: Stored in a relational DBMS
Existing databases can then be integrated with graphics by allocating a unique identifier to each feature in the GIS
Figure 4.8: Linking spatial and attribute data in GIS
Working With Tables in ArcGIS
A table is a structure for storing multiple attributes (descriptors) about an entity --can be a location or an object in a relational database
Rows in a table represent records, which contain attributes that describe a particular entity (i.e. objects or features)
Columns in a table represent (attribute) fields --attributes are used to describe the feature
Examples of attributes used to describe a person: --eye or hair color --height --weight --race/ethnicity --political or religious affiliation, etc.
Tabular data used by geographic information systems fall into two categories:
1. Attribute tables contain information about features in a geographic data set
a. Always only one row of information per feature b. The row is linked to the spatial feature in a separate file using a unique ID number (Feature ID, or FID) in a shapefile
--geodatabase files store both attributes and x-y coordinates in the same file and uses an Object (OID) instead of an FID
2. A standalone table contains information about one or more objects in tabular format a. May or may not describe map features b. Can come from an Excel spreadsheet, a gps data file, or a database
QUERIES ON TABLES
Querying tables allows one to derive information about subsets of records
--the longest river or largest lake in the United States, for instance
Tabular queries in a GIS use logical expressions to perform queries by specifying certain criteria
The software searches the table to find the records that match the criteria, and returns a selected set
--these selected records can become the input to another action such as printing them, exporting them to a new file, or executing a GIS function on them
Most databases use Structured Query Language (SQL) to perform a query
Relational database management systems (RDBMS) and GIS can combine information from two or more tables through one of two functions: 1. A join
2. A relate
Both functions use a common attribute field called a key field
Example of using SQL to perform a query in ArcGIS
Join When a join is performed, information from both tables is combined to form one table --it is a temporary relationship that can be removed
Relate The two tables are associated by a common field, but the records are not joined together --the two tables remain separate --however, if one or more records are selected in one table, the associated records can be simultaneously selected in the related table
Cardinality Cardinality refers to the relationship between the records in the two tables on which a join or relate is performed The two tables are designated according to the relationship between them: 1. The source table contains the information that will be added (appended) to another table 2. The destination table is the table that receives
the appended table
The relationship between the source table and the destination table is called the cardinality --this is literally the “direction” in which the information will flow
Joins and Cardinality Since a join literally means that two tables are joined to create one table --one record in the source table will have to match exactly one record in the destination table
Four types of cardinality can exist between two tables: 1. One-to-one relationship (cardinality) a. One record in the destination table matches exactly one record in the destination table b. Thus, a join can work with this relationship
2. Many-to-one cardinality a. Many records in the destination table match with exactly one record in the source table b. A join can also work with this relationship
3. One-to-many cardinality a. One record in the destination table match with one record in the source table b. A join cannot work with this relationship c. A relate will be needed to link the information in the two tables
4. Many-to-many cardinality a. Many records in the destination table match with many records in the source table b. Many records in the destination table match many records in the source table c. A relate will also be needed to link the information in the two tables
Addition Functions for Analyzing Table Data
Statistics Function Allows the user to explore the data before performing analyses --can also be used to perform simple queries
Provides basic statistics on an attribute --minimum, maximum, mean, median, and standard deviation --also provides a histogram to evaluate whether it is a potentially a normal distribution
A careful reading of the data can provide important insights to a professional (see the statistics window to the right) --Price provides a detailed discussion of why this appears this way, and how a geologist would interpret this data
Summarizing Tables The Summarize function combines records into groups based on a categorical attribute field --it then calculates statistics separately for each group
Since a summarize performs several sets of statistics simultaneously (one for each group created by the summarize selection) --the Summarize command produces a new table --it contains a Count field that indicates the number of records represented by each group --it also contains each of the statistics that the user requested
Addition Functions for Analyzing Table Data
The user could then join the table created using the Summarize command to another layer --then create a map based on the statistics --see the example of summarizing earthquakes by state found on pp. 180-181 in MGIS --the user can thus use the summarize command to collapse a one-to-many relationship to create a one-to one relationship between two tables
Field Types An important consideration when creating a database --a database field must contain only one type of data --either text or integers, but never both
Important Considerations Field Definition Once a field definition is set, it cannot be changed
Most common field types: --numbers, (text) strings, and dates
Field Length Defines number of text characters stored --if field length exceeded, characters truncated to fit defined field length --thus important to determine longest possible field length in advance
ASCII Versus Binary
Byte A byte is the basic storage space for a computer --composed of a string of eight digits (bits), which may be zeros or ones --represent a number in base 2 (binary numbers) --a single byte can store a binary value from 0 (00000000) to 255 (11111111)
ASCII All text is stored as sequences of characters using ASCII code --each number, letter, and symbol is assigned a single-byte code between 0 and 255 --cat would require 3 bytes, and horse 5 bytes --147 requires 3 bytes, and 147.5 requires 5 bytes
Binary Numbers can be stored either in ASCII or in binary --stored in base 2 directly --16 stored as 00010000 in a single byte
Thus, binary is more efficient for storing numbers --14456 requires 5 bytes in ASCII, but only 2 in binary
Computation also faster using binary --computer functions are based on base 2
Precision
Very large numbers require many numbers to store
Scientists use scientific notation when dealing with very large or very small computers --computers also use scientific notation
Thus, 123456789 would consist of a mantissa (decimal part of the number) and an exponent
--1.23456789 plus exponent 8
--1.23456789e08 --computers truncate this to 12345e08, leaving less significant number out
In ArcGIS, there are several options for defining precision, as noted in the table to the right