Text Mining and Word Cloud Assignment

profileoguzhansaglik
BGDA501_ProjectAssignment1.pdf

1 | P a g e

BGDA 501 PROJECT ASSIGNMENT (25%)

November 2021 Project-1:Text Mining and Word Cloud Assignment (10%)

Step 1: Create a text file

• Find a transcript of a presidential speech (or any speech at least 1 page long/800 words) and save it as a text file.

For example: Donald Trump Rally Speeches from Kaggle: https://www.kaggle.com/christianlillelund/donald- trumps-rallies

Step 2 : Install and load the required packages in R

• tm # for text mining • SnowballC # for text stemming • Wordcloud # word-cloud generator • RColorBrewer # color palettes

Step 3 : Text mining

• load the text in R • Load the data as a corpus • Cleaning the text

Step 4 : Build a term-document matrix

Step 5 : Generate the Word cloud

Step 6 : Plot word frequencies (use bar plot)

Step 7 : Save and submit your work as follows:

A. Save each of the scripts you have used in R as a text file (Project_Assignment_Scripts.txt)

2 | P a g e

B. Take screenshots of all steps in R, and make sure you put them in your MS- Word report. (Word_Crowd_Report.docx)

C. Summarize what you have understood by looking at the resulting word crowd and the bar chart.

D. Format your word document professionally (use a cover page also).

E. Submit your work on itslearning using the Project_Assignment_Part1 link. (2 files: 1 text file containing your scripts and 1 MS-word file showing your word crowd and bar chart with your analysis of the speech). You can also submit the text file of the speech you saved in step 1.

Project-2: Popular (Most Frequently Used) R Packages (15%)

A. Create a PowerPoint presentation discussing some of the most popular R Packages (at least 5 different R packages) and explain what each one is used for.

B. Show how to install each of those packages and provide related examples of how you used them in R (take screenshots and make it a part of your slide presentation).

C. Make sure your PowerPoint presentation has a title page D. Submit your PowerPoint presentation file on itslearning using

Project_Assignment_Part2 link. E. You will present your work to the class via screen share on Microsoft Teams

(you will be assigned 10 minutes maximum).

*Use itslearning only to submit your work, DO NOT use e-mail.

  • BGDA 501
  • PROJECT ASSIGNMENT (25%)
  • November 2021
  • Project-1:Text Mining and Word Cloud Assignment (10%)
    • Step 1: Create a text file
    • Step 2 : Install and load the required packages in R
    • Step 3 : Text mining
      •  load the text in R
      •  Load the data as a corpus
      •  Cleaning the text
    • Step 4 : Build a term-document matrix
    • Step 5 : Generate the Word cloud
    • Step 6 : Plot word frequencies (use bar plot)
    • Step 7 : Save and submit your work as follows:
  • Project-2: Popular (Most Frequently Used) R Packages (15%)
    • A. Create a PowerPoint presentation discussing some of the most popular R Packages (at least 5 different R packages) and explain what each one is used for.
    • B. Show how to install each of those packages and provide related examples of how you used them in R (take screenshots and make it a part of your slide presentation).
    • C. Make sure your PowerPoint presentation has a title page
    • D. Submit your PowerPoint presentation file on itslearning using Project_Assignment_Part2 link.
    • E. You will present your work to the class via screen share on Microsoft Teams (you will be assigned 10 minutes maximum).