Mod2_Lecture_Attribute_Data2.pdf

MODULE 2: ATTRIBUTE DATA

Attribute Data Management

There are two types of GIS data used

to describe features:

1. Spatial Data

a. What things are: Represented by symbols.

b. Where things are: Georeferenced to locate the feature spatially

2. Attribute Data

a. What things are: Describe the

real-world feature

b. What types of things are found in a particular location: Linked to spatial information about a feature

By themselves, attribute data have little meaning

--the analysis or organization of data creates meaning

--databases allow us to order, re-order, summarize, and

combine data

With a database, we can

--sort temperature values, from lowest to highest

--calculate minimum, maximum, or average temperature

--convert from one unit to another

Hotel ID Name Address Number of Rooms Standard

001

002

003

Mountain View

Palace Deluxe

Ski Lodge

23 High Street

Pine Avenue

10 Ski School Road

15

12

40

Budget

Luxury

Standard

Databases Before Computers

A library is essentially a database that organizes

books according to two basic systems:

1. Dewey Decimal System

2. Library of Congress System

A card catalog was used to query the system by:

1. Subject

2. Author

3. Title

Problems with Paper-based databases:

1. Tended to get redundancy of data between agencies

2. They were difficult to manipulate

3. Errors were common and difficult to clean up

4. Cannot be sorted (at least not easily)

5. Cannot select a subset of data

6. Updating requires a re-print

Example Database Application:

Happy Valley Ski Resort

--Booking and Customer Service Database

(Heywood, et. al)

A DBMS is a computer program --controls the storage, retrieval and modification of data stored in a database

A DBMS is used to: 1. Handle and manage files 2. Add, update, and delete records 3. Extract information from data through data sorts, creating

summary tables, and data queries 4. Maintain data security and integrity 5. Build applications

Database Management Systems manage data --organized using a database data model

Analogous to how spatial data are organized in a GIS using spatial data models (raster and vector) Database Models Four of the most widely used database models for handling GIS attribute data: 1) Hierarchical 2) Network 3) Relational 4) Object-oriented

Database Management Systems (DBMS)

The database approach to data handling

Hierarchical Database Model

Object-Oriented Database Approach

Relational Database Model The relational database model is the most widely

used database model

--ArcMap uses a relational database model

How it works:

Data are organized in a series of two-dimensional tables --each table contains records for one entity

Tables are linked by common sets of characteristic data (i.e. data columns) known as keys

In relational tables --data are organized into rows and columns

Columns contain attributes and each has a distinct name --each column represents an attribute field

Rows contain all of the attribute fields for features --each row represents a feature record

The table structure is very flexible --allows a variety of queries to be performed on the same data

Queries can be performed on individual tables --or on multiple tables simultaneously by linking separate tables on a key field

Object State Behavior

Hotel Name Can be plotted on map

Address Can be added to database

Number of bedrooms Can be deleted from database

Standard of Accommodation Standard can be upgraded or downgraded

Number of bedrooms can be increased or decreased

Creating a Database Database design and implementation --guided by relationships between data that will be stored

Design process concerned with --expressing relationships between data

Implementation --means setting up new structures to express these relationships --Usually within an existing database software

Data Investigation produces a lot of unstructured information, which requires data modeling to determine: --information flows --relationships between data --potential entities to be stored in database

The process of modeling relationships between entities is known as: --Entity Relationship Modeling (ERM), or Entity Attribute Modeling (EAM)

Hotel ID Name Address Number of Rooms Standard

001

002

003

Mountain View

Palace Deluxe

Ski Lodge

23 High Street

Pine Avenue

10 Ski School Road

15

12

40

Budget

Luxury

Standard

Step 1: Identifying Entities In the Happy Valley example, there are four entities for which attribute data have been created:

1. Hotels 2. Ski Schools 3. Tour Companies 4. Visitors

Each Entity has distinctive characteristics, and can be described as a noun 1. The entities’ characteristics are the attributes 2. The domain is the set of possible values

Step 2: Identifying Relationships Between

Entities

These relationships can be described with verbs --A hotel is located in a resort --A visitor stays at a hotel --A ski school teaches students

Three types of relationship are possible: 1. 1:1 = each visitor stays at one hotel 2. 1:M (one to many) = one ski school teaches

many visitors 3. M:N = many tour companies use many hotels

Creating an EAM diagram helps the modeler to decide on the appropriate structure of the database to be built

1:1 RELATIONSHIP --means tables for each entity can be either be joined together, or kept separate

1:M RELATIONSHIP --means two tables are needed with a key field to allow a relational join

M:N RELATIONSHIP --means tables should be separated (if tables repeat information the relationships need to be broken down further to avoid redundancy)

Step 3: Identifying Entity Attributes 1) HOTEL: Hotel_id, Hotel Name, etc. 2) TRAVELCO: TravelCo_id, 3) SKISCHOOL: Skischool_id, SkiSchool_name, etc. 4) VISITOR: Visitor_id, Visitor_name, Hotel_id,

TravelCo_id, SkiSchool_id, etc. 5) Link Table: i. TravelCo_id ii. SkiSchool_id

Now you have --an Entity Attribute model diagram --a set of table definitions --the details of the attributes

The database is now ready to implement

Happy Valley EAM (Entity Attribute Modeling ) diagram

Linking Attribute Data to Spatial Data In a simple raster GIS, a database is not necessary 1. One cell in a layer of data contains a single value 2. Attribute data values are often held in the same file as the data layer itself

Few actual GIS are like that 1. Most GIS (esp. vector-based) offer a hybrid approach (Table 4.8) 2. Spatial data are stored as part of the GIS data structure and attribute data are stored in a relational database

A hybrid approach is used by most vector GIS data structures: 1. SPATIAL DATA: Stored as part of the GIS data structure 2. ATTRIBUTE DATA: Stored in a relational DBMS

Existing databases can then be integrated with graphics by allocating a unique identifier to each feature in the GIS

Figure 4.8: Linking spatial and attribute data in GIS

Working With Tables in ArcGIS

A table is a structure for storing multiple attributes (descriptors) about an entity --can be a location or an object in a relational database

Rows in a table represent records, which contain attributes that describe a particular entity (i.e. objects or features)

Columns in a table represent (attribute) fields --attributes are used to describe the feature

Examples of attributes used to describe a person: --eye or hair color --height --weight --race/ethnicity --political or religious affiliation, etc.

Tabular data used by geographic information systems fall into two categories:

1. Attribute tables contain information about features in a geographic data set

a. Always only one row of information per feature b. The row is linked to the spatial feature in a separate file using a unique ID number (Feature ID, or FID) in a shapefile

--geodatabase files store both attributes and x-y coordinates in the same file and uses an Object (OID) instead of an FID

2. A standalone table contains information about one or more objects in tabular format a. May or may not describe map features b. Can come from an Excel spreadsheet, a gps data file, or a database

QUERIES ON TABLES

Querying tables allows one to derive information about subsets of records

--the longest river or largest lake in the United States, for instance

Tabular queries in a GIS use logical expressions to perform queries by specifying certain criteria

The software searches the table to find the records that match the criteria, and returns a selected set

--these selected records can become the input to another action such as printing them, exporting them to a new file, or executing a GIS function on them

Most databases use Structured Query Language (SQL) to perform a query

Relational database management systems (RDBMS) and GIS can combine information from two or more tables through one of two functions: 1. A join

2. A relate

Both functions use a common attribute field called a key field

Example of using SQL to perform a query in ArcGIS

Join When a join is performed, information from both tables is combined to form one table --it is a temporary relationship that can be removed

Relate The two tables are associated by a common field, but the records are not joined together --the two tables remain separate --however, if one or more records are selected in one table, the associated records can be simultaneously selected in the related table

Cardinality Cardinality refers to the relationship between the records in the two tables on which a join or relate is performed The two tables are designated according to the relationship between them: 1. The source table contains the information that will be added (appended) to another table 2. The destination table is the table that receives

the appended table

The relationship between the source table and the destination table is called the cardinality --this is literally the “direction” in which the information will flow

Joins and Cardinality Since a join literally means that two tables are joined to create one table --one record in the source table will have to match exactly one record in the destination table

Four types of cardinality can exist between two tables: 1. One-to-one relationship (cardinality) a. One record in the destination table matches exactly one record in the destination table b. Thus, a join can work with this relationship

2. Many-to-one cardinality a. Many records in the destination table match with exactly one record in the source table b. A join can also work with this relationship

3. One-to-many cardinality a. One record in the destination table match with one record in the source table b. A join cannot work with this relationship c. A relate will be needed to link the information in the two tables

4. Many-to-many cardinality a. Many records in the destination table match with many records in the source table b. Many records in the destination table match many records in the source table c. A relate will also be needed to link the information in the two tables

Addition Functions for Analyzing Table Data

Statistics Function Allows the user to explore the data before performing analyses --can also be used to perform simple queries

Provides basic statistics on an attribute --minimum, maximum, mean, median, and standard deviation --also provides a histogram to evaluate whether it is a potentially a normal distribution

A careful reading of the data can provide important insights to a professional (see the statistics window to the right) --Price provides a detailed discussion of why this appears this way, and how a geologist would interpret this data

Summarizing Tables The Summarize function combines records into groups based on a categorical attribute field --it then calculates statistics separately for each group

Since a summarize performs several sets of statistics simultaneously (one for each group created by the summarize selection) --the Summarize command produces a new table --it contains a Count field that indicates the number of records represented by each group --it also contains each of the statistics that the user requested

Addition Functions for Analyzing Table Data

The user could then join the table created using the Summarize command to another layer --then create a map based on the statistics --see the example of summarizing earthquakes by state found on pp. 180-181 in MGIS --the user can thus use the summarize command to collapse a one-to-many relationship to create a one-to one relationship between two tables

Field Types An important consideration when creating a database --a database field must contain only one type of data --either text or integers, but never both

Important Considerations Field Definition Once a field definition is set, it cannot be changed

Most common field types: --numbers, (text) strings, and dates

Field Length Defines number of text characters stored --if field length exceeded, characters truncated to fit defined field length --thus important to determine longest possible field length in advance

ASCII Versus Binary

Byte A byte is the basic storage space for a computer --composed of a string of eight digits (bits), which may be zeros or ones --represent a number in base 2 (binary numbers) --a single byte can store a binary value from 0 (00000000) to 255 (11111111)

ASCII All text is stored as sequences of characters using ASCII code --each number, letter, and symbol is assigned a single-byte code between 0 and 255 --cat would require 3 bytes, and horse 5 bytes --147 requires 3 bytes, and 147.5 requires 5 bytes

Binary Numbers can be stored either in ASCII or in binary --stored in base 2 directly --16 stored as 00010000 in a single byte

Thus, binary is more efficient for storing numbers --14456 requires 5 bytes in ASCII, but only 2 in binary

Computation also faster using binary --computer functions are based on base 2

Precision

Very large numbers require many numbers to store

Scientists use scientific notation when dealing with very large or very small computers --computers also use scientific notation

Thus, 123456789 would consist of a mantissa (decimal part of the number) and an exponent

--1.23456789 plus exponent 8

--1.23456789e08 --computers truncate this to 12345e08, leaving less significant number out

In ArcGIS, there are several options for defining precision, as noted in the table to the right