Assignment - Zero plagiarism

profilesmartman1212
IT446-Assignment1.docx

Pg. 06

خطأ! استخدم علامة التبويب "الصفحة الرئيسية" لتطبيق Heading 1 على النص الذي ترغب في أن يظهر هنا.

Assignment 1

Deadline: Thursday 24/09/2020 @ 23:59

[Total Mark for this Assignment is 6]

Data Mining and Data Warehousing

IT446

https://www.seu.edu.sa/sites/ar/SitePages/images/logo.png

College of Computing and Informatics

Question One

1.5 Marks

Learning Outcome(s):

Apply and evaluate data mining algorithms with respect to problems they are specifically designed for

Using equi-depth partition, create 3 bins to smooth the given data input by:

· Boundaries

· Means

Data: 23 10 5 14 20 16 11 6 20 1 14 27 2 25 1

Answer:

Step one is sorting the data

1 1 2 5 6 10 11 14 14 16 20 20 23 25 27

Step two is creation of bins

B1: 1 1 2 5 6

B2: 10 11 14 14 16

B3: 20 20 23 25 27

Smoothing by bin boundaries:

B1: 1 1 1 6 6

B2: 10 10 16 16 16

B3: 20 20 20 27 27

Smoothing by bin means:

B1: 3 3 3 3 3

B2: 13 13 13 13 13

B3: 23 23 23 23 23

References: chapter 3 of the book, page 30

Lecture 3, slide 57

Question Two

1.5 Marks

Learning Outcome(s):

Apply and evaluate data mining algorithms with respect to problems they are specifically designed for

1. Given the following dataset, fill in the missing values with the attribute mean for all samples belonging to the same class

Answer:

Class A: 10+12=22, 22\2=11 (mean)

Class B: 20+22+24=66, 66\3=22 (mean)

Attribute 1

Attribute 2

Attribute 3

class

Object 1

10

8

- 4

A

Object 2

12

- 8

5

A

Object 3

11

9

5

A

Object 4

20

10

10

B

Object 5

22

12

11

B

Object 6

24

15

18

B

Object 7

22

14

19

B

2. Using Attribute 1, Attribute 2 and Attribute3, calculate the Manhattan distance between Object 1 and Object 2.

Answer:

3. Using Attribute 1, Attribute 2 and Attribute3, calculate the Euclidian distance between Object 4 and Object 5.

Answer:

References

Chapter 2 of the book, page 73.

Lecture 2, slide, 37

Question Three

1 Marks

Learning Outcome(s):

Explain the basic principles of programming, concept of language. Universal constructs of programming languages.(LO1)

What are the significance of OLAP (online analytical processing) in Data Mining?

Answer:

1. Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of different users. These systems are known as online analytical processing (OLAP) systems.

2. An OLAP query often needs read-only access of data records for summarization and aggregation.

3. No need for concurrency control and recovery mechanisms.

4. “concept hierarchies is useful in OLAP?” In the multidimensional model, data are organized into multiple dimensions, and each dimension contains multiple levels of abstraction defined by concept hierarchies. This organization provides users with the flexibility to view data from different perspectives. A number of OLAP data cube operations exist to materialize these different views, allowing interactive querying and analysis of the data at hand. Hence, OLAP provides a user-friendly environment for interactive data analysis.

5. Ability to perform different operations, such as pivot, roll-up, slice and dice, drill-down. This enables to extract more useful knowledge due to exploring the data from different points of view.

References

Chapter 4 of the book, page 128, 129, 130, 146, 148

Question Four

2 Marks

Learning Outcome(s):

Explain the basic principles of programming, concept of language. Universal constructs of programming languages.(LO1)

Write about the following Terms:

Similarity, Dissimilarity, Data matrix and Dissimilarity matrix (give example for Data matrix and Dissimilarity matrix)

Answer:

A similarity is a measure. It measures two objects, i and j, and return the value 0 if the objects are unalike. The higher the similarity value, the greater the similarity between objects. (Typically, a value of 1 indicates complete similarity, that is, the objects are identical.) In other words:

· Numerical measure of how alike two data objects are

· Value is higher when objects are more alike

· Often falls in the range [0,1]

A Dissimilarity is a measure (Distance). It measures two objects, i and j, and return the value 1 if the objects are the same. The higher the similarity value, the lower the Dissimilarity between objects. In other words:

· Numerical measure of how different two data objects are

· Lower when objects are more alike

· Minimum dissimilarity is often 0

· Upper limit varies

Data matrix: it is data structures used to store the data objects. It is also known as object-by-attribute structure. This structure stores the n data objects in the form of a relational table. Example:

Dissimilarity matrix: it is data structures used to store dissimilarity values for pairs of objects. It is also known as object-by-object structure. This structure stores a collection of values that are available for all pairs of n objects. It registers only the distance. Example with Euclidian distance:

References:

Chapter 2 of book, page 67+68

Lecture 2, slide 30

|

|

...

|

|

|

|

)

,

(

2

2

1

1

p

p

j

x

i

x

j

x

i

x

j

x

i

x

j

i

d

-

+

+

-

+

-

=

)

|

|

...

|

|

|

(|

)

,

(

2

2

2

2

2

1

1

p

p

j

x

i

x

j

x

i

x

j

x

i

x

j

i

d

-

+

+

-

+

-

=

pointattribute1attribute2

x112

x235

x320

x445

Sheet1

point x y
0 2
p2 2 0
p3 3 1
p4 5 1
point attribute1 attribute2
x1 1 2
x2 3 5
x3 2 0
x4 4 5
p1

Sheet2

Sheet3

x1x2x3x4

x10

x23.610

x35.15.10

x44.2415.390

Sheet1

point x y
0 2
p2 2 0
p3 3 1
p4 5 1
point x y
p1 0 2
p2 2 0
p3 3 1
p4 5 1
x1 x2 x3 x4
x1 0
x2 3.61 0
x3 5.1 5.1 0
x4 4.24 1 5.39 0
p1

Sheet2

Sheet3