Discussion

profileGoldie
ITS836_07c_InDB.pdf

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

1Module 5: Advanced Analytics - Technology and Tools

1Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

During this lesson the following topics are covered:

• SQL Essentials • SET Operations

• Online analytical processing (OLAP) features

• GROUPING SETS, ROLLUP,CUBE

• GROUPING, GROUP_ID functions

• Text processing, Pattern matching

In-database Analytics SQL essentials

2Module 5: Advanced Analytics - Technology and Tools

These topics are covered in this lesson.

2Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations

Greenplum supports the following set operations as part of a SELECT statement:

• INTERSECT – Returns rows that appear in all answer sets

• EXCEPT – Returns rows from the first answer set and excludes those from the second

• UNION ALL – Returns a combination of rows from multiple SELECT statements with repeating rows

• UNION – Returns a combination of rows from multiple SELECT statements with no repeating rows

3Module 5: Advanced Analytics - Technology and Tools

Set Operations

Set operators:

• Manipulate the results sets of two or more queries by combining the results of individual queries into a single results set.

• Do not perform row level filtering.

Set operations supported by Greenplum are:

• INTERSECT which returns rows that appear in all answer sets generated by individual SELECT statements.

• EXCEPT returns all rows from the first SELECT except for those which also selected by the second SELECT. This operation is the same as the MINUS operation.

• UNION ALL combines all the results of two or more SELECT statements. There may be repeating rows.

• UNION combines the results of two or more SELECT statements. There will be no repeating rows.

3Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – INTERSECT

INTERSECT:

• Returns only the rows that appear in both SQL queries

• Removes duplicate rows

Table A

Table B

Intersect

SELECT t.transid,

c.custname

FROM facts.transaction t

JOIN dimensions.customer c

ON c.customerid = t.customerid

INTERSECT

SELECT t1.transid,

c1.custname

FROM facts.transaction t1

JOIN dimensions.customer c1

ON c1.customerid = t1.customerid

WHERE t1.transdate BETWEEN

‘2008-01-01’ AND ‘2008-01-21’

4Module 5: Advanced Analytics - Technology and Tools

Set Operations – INTERSECT

A set operation takes the results of two queries and returns only the results that appear in both result sets. Duplicate rows are removed from the final set returned.

4Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – EXCEPT

EXCEPT:

• Returns all rows from the first SELECT statement

• Omits all rows that appear in the second SELECT statement

SELECT t.transid,

c.custname

FROM facts.transaction t

JOIN dimensions.customer c

ON c.customerid = t.customerid

EXCEPT

SELECT t1.transid,

c1.custname

FROM facts.transaction t1

JOIN dimensions.customer c1

ON c1.customerid = t1.customerid

WHERE t1.transdate BETWEEN

‘2008-01-01’ AND ‘2008-01-21’

Table A

Results of first query

Table B

Results of second query

Table A minus Table B

5Module 5: Advanced Analytics - Technology and Tools

Set Operations – EXCEPT

The EXCEPT set operation takes the distinct rows of the first query and returns all of the rows that do not appear in the result set of the second query.

5Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – UNION ALL

UNION ALL:

• Combines rows from the first query with rows from the second query

• Does not remove duplicates rows

SELECT t.transid,

c.custname

FROM facts.transaction t

JOIN dimensions.customer c

ON c.customerid = t.customerid

WHERE t.transdate BETWEEN

‘2008-01-01’ AND ‘2008-05-17’

UNION ALL

SELECT t1.transid,

c1.custname

FROM facts.transaction t1

JOIN dimensions.customer c1

ON c1.customerid = t1.customerid

WHERE t1.transdate BETWEEN

‘2008-01-01’ AND ‘2008-01-21’

Table A

Results of first query

Table B

Results of second query

Table A plus Table B

6Module 5: Advanced Analytics - Technology and Tools

Set Operations – UNION ALL

The UNION ALL set operation is like the UNION operation but it does not remove duplicate or repeating rows.

6Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Set Operations – UNION

UNION:

• Combines rows from the first query with rows from the second query

• Removes duplicates or repeating rows

SELECT t.transid,

c.custname

FROM facts.transaction t

JOIN dimensions.customer c

ON c.customerid = t.customerid

WHERE t.transdate BETWEEN

‘2008-01-01’ AND ‘2008-05-17’

UNION

SELECT t1.transid,

c1.custname

FROM facts.transaction t1

JOIN dimensions.customer c1

ON c1.customerid = t1.customerid

WHERE t1.transdate BETWEEN

‘2008-01-01’ AND ‘2008-01-21’

Table A

Results of first query

Table B

Results of second query

Table A plus Table B minus duplicates

Duplicate

7Module 5: Advanced Analytics - Technology and Tools

Set Operations – UNION

A union operation combines the results of the SELECT statement from the first table with the results from the query on the second table. The result set does not contain any repeating rows.

7Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

• Types of Join • Inner

• Left outer

• Right outer

• Full outer

• Cross

SET Operations

Inner Join Left Outer Join Right Outer Join

Full Outer Join Cross Join

8Module 5: Advanced Analytics - Technology and Tools

The type of SET operations you can perform are:

• Inner join – The inner join is possibly the most common type of join. The resulting data set is obtained by combining two tables on a common column. Each row of the left table is compared against each row of the right table. All matching rows are returned as part of the result set. An equijoin is an inner join that uses only equality comparisons in the join predicate.

• Left outer join – Left outer join returns all of the rows from the left table even if there is no matching row in the right table. It also returns matching rows from the right table. Rows in the right table that do not match are not included as part of the result set.

• Right outer join – Right outer join returns all of the rows from the right table even if there is no matching row in the left table. It also returns the matching rows from the left table.

• Full outer join – Full outer join returns all rows from both tables where there is a match and returns NULL for rows that do not have a match.

• Cross join – Cross join returns the Cartesian product of rows from tables in the join. The resulting data set consists of a combination of each row in the left table with each row in the right table. Two tables, each with five rows, will produce a resulting data set that contains twenty-five rows.

.

8Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Left Outer Join

• Correlated sub-queries do not run efficiently in Greenplum though support has been introduced in Version 4.2

 SELECT * FROM transaction t WHERE NOT EXISTS (

SELECT 1 FROM customer c WHERE c.customerid = t.customerid)

• Use LEFT OUTER JOIN  SELECT t.* FROM transaction t

LEFT OUTER JOIN customer c ON t.customerid=c.customerid

WHERE c.customerid IS NULL

9Module 5: Advanced Analytics - Technology and Tools

Correlated sub queries are supported in a Greenplum environment 4.2 onwards. The first example shown in the slide is a correlated sub query in which we have a nested correlated sub query using the first variable from the first SELECT statement (on table “transaction” ) used for selection with table “Customer” in the query nested with a WHERE clause. In the context of MPP architecture of Greenplum supporting correlated sub queries are not efficient.

The code example shown at the bottom accomplishes the same with a LEFT OUTER JOIN. It is recommended that the multidimensional queries that are traditionally done with sub queries and correlated sub queries be optimally coded with the proper use of the MPP architecture of Greenplum.

9Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Sub-query vs. Inner Join

• IN clause is fully supported …  SELECT *

FROM transaction t WHERE t.customerid IN

(SELECT customerid FROM customer)

• However, generally better idea as long as c.customerid is unique:  SELECT t.*

FROM transaction t INNER JOIN customer c ON c.customerid = t.customerid

10Module 5: Advanced Analytics - Technology and Tools

Sub queries and nested queries are commonly used for multi-dimensional queries in which we use IN clause with another SELECT statement. Sub-queries such as the one shown above are supported in a Greenplum environment.

They are supported and valid but it is generally a good idea to use a INNER JOIN to accomplish the same result. The performance query with INNER JOIN is far superior to the nested query in the first example.

10Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Greenplum SQL OLAP Grouping Extensions

Greenplum supports the following grouping extensions:

• Standard GROUP BY

• ROLLUP

• GROUPING SETS

• CUBE

• grouping(column [, ...]) function

• group_id() function

11Module 5: Advanced Analytics - Technology and Tools

Greenplum SQL OLAP Grouping Extensions

Greenplum introduced support for extensions to the standard GROUP BY clause, which is fully supported. These clauses can simplify the expression of complex groupings:

• ROLLUP – This extension provides hierarchical grouping.

• CUBE – Complete cross-tabular grouping, or all possible grouping combinations, is provided with this extension.

• GROUPING SETS – Generalized grouping is provided with the GROUPING SETS clause.

• grouping function – This clause helps identify super-aggregated rows from regular grouped rows.

• group_id function – This clause is used to identify duplicate rows in grouped output.

11Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Standard GROUP BY Example

GROUP BY:

• Group results based on one or more specified columns

• Is used with aggregate statements

The following example summarizes product sales by vendor:

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY pn, vn

ORDER BY 1,2,3;

pn | vn | sum

-----+----+-------

100 | 20 | 0

100 | 40 | 2640000

200 | 10 | 0

200 | 40 | 0

300 | 30 | 0

400 | 50 | 0

500 | 30 | 120

600 | 30 | 60

700 | 40 | 1

800 | 40 | 1

(10 rows)

12Module 5: Advanced Analytics - Technology and Tools

Standard GROUP BY Example

The standard GROUP BY clause groups results based on one or more columns specified. It is used in conjunction with aggregate statements, such as SUM, MIN, or MAX. This helps to make the resulting data set more readable.

The slide shows an example of a standard GROUP BY clause used to summarize product sales by vendor.

12Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Standard GROUP BY Example with UNION ALL

This example extends the previous example by adding sub-totals and a grand total :

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY pn, vn

UNION ALL

SELECT pn, null, sum(prc*qty)

FROM sale

GROUP BY pn

UNION ALL

SELECT null, null,

sum(prc*qty)

FROM SALE

ORDER BY 1,2,3;

pn | vn | sum

-----+----+---------

100 | 20 | 0

100 | 40 | 2640000

100 | | 2640000

200 | 10 | 0

200 | 40 | 0

200 | | 0

300 | 30 | 0

300 | | 0

400 | 50 | 0

400 | | 0

500 | 30 | 120

500 | | 120

600 | 30 | 60

600 | | 60

700 | 40 | 1

700 | | 1

800 | 40 | 1

800 | | 1

| | 2640182

(19 rows)

13Module 5: Advanced Analytics - Technology and Tools

Standard GROUP BY Example with UNION ALL

In this follow-on example, the requirements for the query have been extended to include sub- totals and a grand total. You would need to use a UNION ALL to continue the grouping and provide for the additional requirements.

13Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

ROLLUP Example

The following example meets the requirement where the sub-total and grand totals are to be included:

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY ROLLUP(pn, vn)

ORDER BY 1,2,3;

pn | vn | sum

-----+----+---------

100 | 20 | 0

100 | 40 | 2640000

100 | | 2640000

200 | 10 | 0

200 | 40 | 0

200 | | 0

300 | 30 | 0

300 | | 0

400 | 50 | 0

400 | | 0

500 | 30 | 120

500 | | 120

600 | 30 | 60

600 | | 60

700 | 40 | 1

700 | | 1

800 | 40 | 1

800 | | 1

| | 2640182

(19 rows)

14Module 5: Advanced Analytics - Technology and Tools

ROLLUP Example

This slide meets the requirements provided in the previous slide, but uses the ROLLUP grouping extension. ROLLUP allows you to perform hierarchical grouping and helps to reduce the code.

14Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUPING SETS Example

The following example shows how to achieve the same results with the GROUPING SETS clause:

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY GROUPING SETS

( (pn, vn), (pn), () )

ORDER BY 1,2,3;

pn | vn | sum

-----+----+---------

100 | 20 | 0

100 | 40 | 2640000

100 | | 2640000

200 | 10 | 0

200 | 40 | 0

200 | | 0

300 | 30 | 0

300 | | 0

400 | 50 | 0

400 | | 0

500 | 30 | 120

500 | | 120

600 | 30 | 60

600 | | 60

700 | 40 | 1

700 | | 1

800 | 40 | 1

800 | | 1

| | 2640182

(19 rows)

Summarize sales

by product & vendor

Subtotals for

each product

Grand total

15Module 5: Advanced Analytics - Technology and Tools

GROUPING SETS Example

The GROUPING SETS extension allows you to specify grouping sets. If you use the GROUPING SETS clause to meet the earlier requirements so that it produced the same output as ROLLUP, it would use the following groups:

• (pn,vn) – This grouping summarizes product sales by vendor.

• (pn) – This grouping provides subtotal sales for each product.

• () – This grouping provides the grand total for all sales for all vendors and products.

15Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

CUBE Example

CUBE creates subtotals for all possible combinations of grouping columns.

The following example

is the same as

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY CUBE(pn, vn)

ORDER BY 1,2,3;

SELECT pn, vn, sum(prc*qty)

FROM sale

GROUP BY GROUPING SETS

( (pn, vn), (pn),

(vn), () )

ORDER BY 1,2,3;

pn | vn | sum

-----+----+---------

100 | 20 | 0

100 | 40 | 2640000

100 | | 2640000

200 | 10 | 0

200 | 40 | 0

200 | | 0

300 | 30 | 0

300 | | 0

400 | 50 | 0

400 | | 0

500 | 30 | 120

500 | | 120

600 | 30 | 60

600 | | 60

700 | 40 | 1

700 | | 1

800 | 40 | 1

800 | | 1

| 10 | 0

| 20 | 0

| 30 | 180

| 40 | 2640002

| 50 | 0

| | 2640182

(24 rows)

16Module 5: Advanced Analytics - Technology and Tools

CUBE Example

A CUBE grouping creates subtotals for all of the possible combinations of the given list of grouping columns, or expressions.

In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.

In the example shown on the slide, the additional grouping set of (vn) - subtotaling the sales by vendor, is included as part of the cube.

Note that n elements of a CUBE translate to 2n grouping sets. Consider using CUBE in any situation requiring cross-tabular reports. CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension. For instance, a commonly requested cross-tabulation might need subtotals for all the combinations of month, state, and product.

16Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUPING Function Example

Grouping distinguishes NULL from summary markers.

SELECT * FROM dsales_null;store | customer | product | price -------+----------+---------+-------

s2 | c1 | p1 | 90

s2 | c1 | p2 | 50

s2 | | p1 | 44

s1 | c2 | p2 | 70

s1 | c3 | p1 | 40

(5 rows)

SELECT

store,customer,product,

sum(price),

grouping(customer)

FROM dsales_null

GROUP BY

ROLLUP(store,customer,

product);

store | customer | product | sum | grouping

-------+----------+---------+----+----------

s1 | c2 | p2 | 70 | 0

s1 | c2 | | 70 | 0

s1 | c3 | p1 | 40 | 0

s1 | c3 | | 40 | 0

s1 | | | 110 | 1

s2 | c1 | p1 | 90 | 0

s2 | c1 | p2 | 50 | 0

s2 | c1 | | 140 | 0

s2 | | p1 | 44 | 0

s2 | | | 44 | 0

s2 | | | 184 | 1

| | | 294 | 1

(12 rows)

17Module 5: Advanced Analytics - Technology and Tools

GROUPING Function Example

When you use grouping extensions to calculate summary rows, such as sub-totals and grand totals, the output can become confusing if the data in a grouping column contains NULL values. It is hard to tell if a row is supposed to be a subtotal row or a regular row containing a NULL.

In the example shown on the slide, one of the rows shown where the customer field is NULL. Without the grouping id, you could misinterpret the sum of 44 as a subtotal row for store 2.

The GROUPING function returns a result for each output row, where:

• 1 represents a summary row

• 0 represents grouped rows

17Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

GROUP_ID Function

GROUP_ID:

• Returns 0 for each output row in a unique grouping set

• Assigns a serial number >0 to each duplicate grouping set found

• Can be used to filter output rows of duplicate grouping sets, such as in the following example:

SELECT a, b, c, sum(p*q), group_id()

FROM sales

GROUP BY ROLLUP(a,b), CUBE(b,c)

HAVING group_id()<1

ORDER BY a,b,c;

18Module 5: Advanced Analytics - Technology and Tools

GROUP_ID Function Is useful when combining grouping extension clauses.

In this example query, the combination of ROLLUP and CUBE produces:

• 12 grouping sets

• 8 DISTINCT grouping sets

The group_id function can be used to filter out or identify duplicate grouping sets in the output.

GROUP BY ROLLUP (a,b), CUBE (b,c) is the same as

• GROUP BY GROUPING SETS ( (a,b), (a), () ), GROUPING SETS ( (b,c), (b), (c), () )

• GROUP BY GROUPING SETS ((a,b,b,c), (a,b,b), (a,b,c), (a,b), (a,b,c), (a,b), (a,c), (a), (b,c), (b), (c), () )

Where there are 12 total grouping sets but only 8 distinct grouping sets, where the groups are:

• (a,b,b,c) = (a,b,c) = (a,b,c)

• (a,b,b) = (a,b) = (a,b)

• (a,c)

• (b,c)

• (a)

• (b)

• (c)

• ()

18Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

In-database Text Analysis

• SQL features for  Text handling functions

 Pattern matching with regular expressions

• Example Use-cases  Filter emails with spam tag in subject

 Extract domains from a URL

 Extract all URLs from a HTML file

 Check for Syntactically correct email addresses

 Convert 10 digits into format “(123) 456-7890”

19Module 5: Advanced Analytics - Technology and Tools

In Lesson 8 of Module 4 we described the techniques for text analysis. It is common practice to store the parsed data from an unstructured source in a database for down stream analysis. With the advent of Hadoop and its ecosystem products, unstructured data is also stored in external tables and accessed by traditional relational databases. We present a brief introduction to text processing in-database.

The topics covered in the next few slides are the in-database methods for handling text and the use of expressions in SQL.

We also briefly discussed Regular expressions in Module 4 lesson 8. A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression.

Regular expressions provide the means for matching strings of text and building the functionalities for string manipulation in SQL. The theoretical basis for regular expressions can be explained with Finite State Machines (out of scope for this course). It should be noted that regular expression cannot match subscripts and superscripts or well formed nested parentheses.

We will illustrate some of the SQL syntax with regular expressions.

19Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Pattern Matching - Regular Expressions (Regex)

Regular Expression match Operators

SQL Functions substring(string, from,pattern [for escape]) regexp_matches(string, pattern, [flags]) regexp_replace(string, pattern, repl, [flags]) regexp_split_to_{array|table}

Operator Description Example

~ Case sensitive substring ‘Greenplum’ ~ ‘^Green’

~* Case-insensitive substring ‘Greenplum’ ~*’ee+’

20Module 5: Advanced Analytics - Technology and Tools

The “substring” function is primarily used for string pattern matching.

The operator ~ is specified for case sensitive match of the substring and ~* for case insensitive match.

In the first example we are trying to find records with substring “Green” (case sensitive) starting at the beginning specified with character ^

In the second example we are finding match for a pattern “ee” (case in-sensitive) as a preceding term one or more times (Specified with a +)

Refer to http://www.postgresql.org/docs/8.3/static/functions-matching.html for details of the syntax.

20Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Regular Expression Quantifiers

• Quantifier

Expression Matches

. Arbitrary character

^ And $ Virtual characters for beginning and end

* Preceding item zero or more times

+ Preceding item one or more times

? Preceding item is optional

{n} Preceding item n times

a|b Item a or b

… …

21Module 5: Advanced Analytics - Technology and Tools

Quantifiers specify how often the preceding Regular Expression should match.

• Try to match the preceding regular expression zero or more times.

• Example: "(ab)c*" matches "ab" followed by zero or more "c"s, i.e., "ab", "abc", "abcc", "abccc" ...

+ Try to match the preceding regular expression one or more times.

• Example: "(ab)c+" matches "ab" followed by one or more "c"s, i.e., "abc", "abcc", "abccc" ...

Examples:

1) All mail with at least two + in x_spam_level:

SELECT * FROM mail WHERE x_spam_level ~ '\\+\\+*‘

2) All top-level domains of sender’s addresses:

SELECT substring("from" FROM '\\.[[:alnum:]]+$') FROM mail

3) Remove [Spam] tag at beginning of subjects:

SELECT regex_replace(subject, '^((?:Re:[[:space:]]*)*)\\[Spam\\]' || '(.*)', '\\1\\2') FROM mail

21Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Check Your Knowledge

1. How would you use GROUPING SETS to produce the same results as the following GROUP BY CUBE?

SELECT state, productID, SUM(volume) FROM sales GROUP BY CUBE (state, productID) ORDER BY state, productID

2. How would you show the sub-totals for each week, for each state, and for each product? (No other totals or grand totals are required.) Suppose the table structure is

TABLE sales (productID VARCHAR, state CHAR(2), week DATE, volume INT)

3. Discuss the utility of grouping and group_id functions

Your Thoughts?

22Module 5: Advanced Analytics - Technology and Tools

Note your answer/references below:

1. SELECT state, productID, SUM(volume) FROM sales GROUP BY GROUPING SETS ((state, productID), (state), (productID), ()) ORDER BY state, productID

2. SELECT state, productID, week, SUM(volume) FROM sales GROUP BY GROUPING SETS ((state), (productID), (week)) ORDER BY state, productID, week

22Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Check Your Knowledge (Continued)

4. Give regular expressions for the following:  A regex that, given a URL, captures the domain name

 A regex that captures PostgreSQL Dollar-quoted String literals Examples:

 $test$This is a string$test$

Your Thoughts?

23Module 5: Advanced Analytics - Technology and Tools

Note your answer/references below:

SELECT substring(c9 from '(?:(?:(?:[^/]|/[^/])*//)|(?=^([^/]|/[^/]|/$)*$))([^/]*)') AS domain

SELECT regexp_matches( '$test$This is a string$test$', E'(\\$[a-z]*\\$).*?\\1');

23Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

During this lesson the following SQL Essentials topics were covered:

• Online analytical processing (OLAP) features

• GROUPING SETS, ROLLUP,CUBE

• GROUPING, GROUP_ID functions

• Text processing, Pattern matching

Summary

24Module 5: Advanced Analytics - Technology and Tools

SQL essentials were covered in this lesson.

24Module 5: Advanced Analytics - Technology and Tools