Data base concepts and queries
Competency 6 Study Guide
Competency 6: Select and analyze data from a database to support problem solving.
The link below will display the tutorial for this competency. It also includes the instructions for Required Competency 6 Assessment which is presented in 2 parts. For more information about the Required Assessment including the grading rubric and submission instructions, see the Competency 6 Assessments section. The Advanced Competency assignment after the Big Data lecture notes will NOT be used in this course. Please see the Competency 6 Assessments section for instructions for the Advanced Competency Assignment.
Competency 6 Tutorial: Data - Databases -Big Data
Study Guide
For each sub-competency, I have listed some additional Internet websites and videos for you to study to become proficient in the subject matter. Then I have listed some key concepts that you should know about each one. Based on your performance in the readiness self-assessment, you can follow-up by studying the specific topics that you are not already familiar with. Once you are comfortable with the material, then you can proceed on to the Competency 6 Required Assessment. You can retake the self-assessment to see if you think you are ready to proceed.
Required Competencies
a. Recognize and use database terminology
i. Study websites: Introduction to Databases : https://en.wikiversity.org/wiki/Introduction_to_Databases
ii. Videos: What is Database and SQL : https://www.youtube.com/watch?v=FR4QIeZaPeM
iii. Key Concepts:
3. A Database is an organized collection of data that is stored for use by software applications
3. A DBMS or DataBase Management System is a software system designed to manage databases including the storage, organization and retrieval of data.
3. A Relational database stores data in table format with links between the tables to show relationships between the data.
3. A table is a collection of columns (data fields) organized in rows which represent the values for a single record or item in the table.
3. SQL or Structured Query Language is a standard language used to query a relational database.
3. A query retrieves specific information from a database to answer a question.
. Enter queries to extract/ select data meeting specified criteria.
. Formulate and create queries to fulfill data specifications/requirements.
·
. Study websites: SELECT (SQL) : https://en.wikipedia.org/wiki/Select_(SQL)
. Videos: What is Database and SQL : https://www.youtube.com/watch?v=FR4QIeZaPeM
. Key Concepts:
·
· Retrieving, or viewing, as opposed to changing, data in a table is accomplished by using a SELECT statement.
· You tell the DBMS what to retrieve (the columns) by using SELECT and FROM clauses within the SELECT statement.
· The basic syntax of the SELECT statement
SELECT select_list FROM table_source [WHERE search_condition] [ORDER BY order_by_list] [LIMIT [offset,] row_limit]
· The five clauses of the SELECT statement
. SELECT - the columns (fields or data values) to be retrieved from the table
. FROM - designates the table to retrieve data from
. WHERE – optional – used to limit the rows selected based on some selection criteria. If not specified, all rows from the table will be retrieved.
. ORDER BY – optional – specifies the columns to be used to sort the display of the data If not specified, the data will be displayed in ascending order by the primary key.
. LIMIT – optional - used to specify a numeric limit of the number of rows returned.
· Examples:
SELECT last_name, first_name FROM roster; returns only two columns from all rows of the ROSTER table.
To select all columns from a table use the * wildcard: SELECT * FROM roster
· Limiting Result Sets by using a WHERE clause.
Tests the data against one or more conditions.
|
Operator |
Explanation |
|
= |
Equal |
|
< |
Less than |
|
> |
Greater than |
|
<= |
Less than or equal to |
|
>= |
Greater than or equal to |
|
<> |
Not equal to |
|
!= |
Not equal to |
For example if you wanted to see a list of customers that are located in Sheboygan:
SELECT * FROM customers WHERE city = ‘Sheboygan’;
More than one comparison is allowed. An AND operator means that both criteria must be met to return a row.
SELECT * FROM customers WHERE city = ‘Sheboygan’ AND credit_limit < 5000.00; This would return only customers located in Sheboygan that has a credit limit less than $5000.00.
An OR operator means that either criteria will return a row.
SELECT * FROM customers WHERE city = ‘Sheboygan’ OR credit_limit < 5000.00; This would return any customer who lives in Sheboygan as well as any customer that has a credit limit less than $5000.00.
· Sorting a result set is accomplished using an ORDER BY clause.
SELECT * FROM ROSTER ORDER BY last_name; sorts in ascending order, A-Z order by last name. (numeric values sort smallest to largest)
SELECT * FROM ROSTER ORDER BY last_name DESC; sorts in descending order, Z-A by last name. (numeric values sort largest to smalles)
A nested sort simply means that more than two columns are specified in the ORDER BY clause.
SELECT * FROM ROSTER ORDER BY course_name, last_name; Means “give me all rows but sort the rows first by the name of the course and within each course sort the rows by last name”
· Limiting the number of rows returned in a result set.
SELECT * FROM invoices ORDER BY balance_due DESC LIMIT 5 Means “show me the biggest five invoices in terms of balance due”.
Desired Competencies
· Identify the characteristics of Big Data
ix. Study websites: Big Data : https://en.wikipedia.org/wiki/Big_data
ix. Videos: What is Big Data : https://www.youtube.com/watch?v=tkOwlXUaGMM
ix. Key Concepts:
. Big Data includes data sets with sizes beyond the ability of commonly used software tools to capture , curate , manage, and process data
. Characteristics:
Volume
The quantity of generated and stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not.
Variety
The type and nature of the data. This helps people who analyze it to effectively use the resulting insight.
Velocity
In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
Variability
Inconsistency of the data set can hamper processes to handle and manage it.
Veracity
The quality of captured data can vary greatly, affecting accurate analysis.