RBR Week 2
Rattle Screenshot
Data Mining Process Comparison
Process 1: Course Diagram (source: Week 1 Lab description)
Process 2: Rattle Tabs (source: Rattle GUI)
Process Comparison
The Rattle data mining process is more specifically designed to enable execution of the mechanical steps that an analyst would perform when exploring a data set, whereas the course diagram is a broader overview of the steps involved in conducting a data mining expedition. A key indication of this is the course process’ initial step of understanding the business need or problem as contrasted with the Rattle function beginning with loading a data set.
Rattle is a tool used in executing a business process rather than merely being the process. Rattle allows the analyst to load and explore a data set, test the distributions of the data that were loaded, transform the data as may be necessary, and construct and evaluate models based on the data. This is different from the course process and as such, Rattle can be incorporated into the course process between Steps 2 (“gathering and preparing data”) and 4 (“building and testing a model of the business process/problem”).
Ultimately, rather than being mutually exclusive, Rattle is a complementary tool useful in the successful execution of the course process.
1. Data
2. Explore
3. Test
4. Transform
5. Cluster
6. Associate
7. Model
8. Evaluate