RBR Week 2

profileBeboopRocSteady
BIAM510_W1_Lab_Example.docx

Rattle Screenshot

Data Mining Process Comparison

Process 1: Course Diagram (source: Week 1 Lab description)

Process 2: Rattle Tabs (source: Rattle GUI)

Process Comparison

The Rattle data mining process is more specifically designed to enable execution of the mechanical steps that an analyst would perform when exploring a data set, whereas the course diagram is a broader overview of the steps involved in conducting a data mining expedition. A key indication of this is the course process’ initial step of understanding the business need or problem as contrasted with the Rattle function beginning with loading a data set.

Rattle is a tool used in executing a business process rather than merely being the process. Rattle allows the analyst to load and explore a data set, test the distributions of the data that were loaded, transform the data as may be necessary, and construct and evaluate models based on the data. This is different from the course process and as such, Rattle can be incorporated into the course process between Steps 2 (“gathering and preparing data”) and 4 (“building and testing a model of the business process/problem”).

The course process and Rattle both involve exploring and testing a data set, and time is allocated to interpret and develop an understanding of the data based on the analyst’s exploratory steps. This is more explicit in the course process, although the modeling and evaluation tabs in Rattle allows one to develop a deeper understanding of the implications of the data set being explored.

Ultimately, rather than being mutually exclusive, Rattle is a complementary tool useful in the successful execution of the course process.

1. Data

2. Explore

3. Test

4. Transform

5. Cluster

6. Associate

7. Model

8. Evaluate