Zeek only

LogisticRegressionExercise1.docx

Home >Business & Finance homework help >Accounting homework help >Zeek only

(Watch the Video Lecture and repeat my steps)

1- Create a folder on your computer and name it as Your_Last_name_Logistic Regression.

2- Download from Blackboard two data sets labeled Training and Scoring And put them inside the same folder Your_Last_name_Logistic Regression.

3- Open RapidMiner and create a new repository by clicking on the arrow next to “import Data.”

4- You should see a window similar to the one below, make sure to uncheck the selection next to “Use Standard Location”, and then, click the folder (in yellow color), keep browsing until you find the folder you created in Step-1 (above). Selected and finish the process.

5- Although the process window is empty now, go ahead and save the process as Your_Last_name_Logistic Regression, by right clicking on folder (as shown below), and selecting “Store Process Here “.

6- Save it with your name, as shown below. YOU WILL HAVE to SAVE this Process again when you are done with this Exercise.

7- Use Read CSV operator, as shown blow to import these two datasets: Training.csv dataset, and the Scoring.csv dataset into RapidMiner Process. Your folder should look like this picture below. Then, select the first operator, Read CSV, and then click on “Import Configuration Wizard” on the top- right-hand side.

8- Then complete the steps below:

1. Begin the process of importing the training data set first. This can be done by importing the data set into a RapidMiner repository or via Read CSV operator. For the most part, the process will be the same as what you have done in past exercise, but for logistic regression, there are a few subtle differences. Be sure to set the first row as the attribute names as we have always done. On the final step of the Import Wizard, when setting data types and attribute roles, you will need to make at least one change. Be sure to set the 2nd_Heart_Attack data type to "binominal" rather than "polynominal." Even though it is a yes/no field, the Logistic Regression operator we'll be using in our modeling phase expects the label to be binominal. RapidMiner does not offer polynominal-to-binominal or integer-to-binominal operators, so we need to be sure to set this target attribute to the needed data type of binominal as we import it. Use the little black down arrow next to the gear icon to change the data type of this attribute. This is shown in Figure 1:

image

Figure 1: Setting the 2nd_Heart_Attack attribute's data type to "binominal" during import.

2. At this time, you can also change the 2nd_Heart_Attack attribute's role to "label" if you wish. We have not done this in Figure 1, and subsequently we will be adding a Set Role operator to our stream as we continue our data preparation.

3. Complete the data import process for the training data, ensuring it is included in a new blank Process. Rename the data set's Retrieve operator as Training.

4. Import the scoring data set now. Be sure the data type for all attributes is "integer." This should be the default, but double-check to make sure. Since the 2nd_Heart_Attack attribute is not included in the scoring data set, you don't need to worry about changing it as you did in step 1. Complete the import process, and include the scoring data set in your Process. Rename this data set's operator as "Scoring." Your model should now appear similar to Figure 2. Note that we have used Read CSV operators.

image

Figure 2: The training and scoring data sets in a new Process window in RapidMiner.

5. Run the model and compare the ranges for all attributes between the scoring and training result set tabs (Figure 3 and Figure 4, respectively). You should find that the ranges are the same. As was the case with linear regression, the scoring values must all fall within the lower and upper bounds set by the corresponding values in the training data set. We can see in Figure 3 and Figure 4 that this is the case, so our data are very clean. They were prepared during extraction from Sonia's source database, so we will not need to do further data preparation in order to filter out observations with inconsistent values or modify missing values

image

Figure 3: Statistics metadata for the scoring data set (note absence of 2nd_Heart_Attack attribute).

image

Figure 4: Metadata for the training data set (2nd_Heart_Attack attribute is present with the binominal data type). Note that scoring range values (Min/Max) fall within training range values for all attributes.

6. Switch back to Design perspective and add a Set Role operator to your training stream. Remember that if you designated 2nd_Heart_Attack to have a "label" role during the data import process, you won't need to add a Set Role operator at this time. We did not do this in the book example, so we need the operator to designate 2nd_Heart_Attack as our label, our target attribute:

image

Figure 5: Configuring the 2nd_Heart_Attack attribute's role in preparation for logistic regression mining.