Assignment

profileGoldie
Assignment.docx

Problem 1. Essay writing (20 points)

In class, we had discussion on why data science is an interdisciplinary field. Briefly list and describe the disciplines that are relevant to data science. In addition, please read the article “50 Years of Data Science” by David Donoho. After your reading, write a short essay (half to one page in font size 11) to support or refute the statement that “Data science is just glorified statistics”. 

Problem 2. Get ready with Kaggle (15 points)

In class, we discussed what Kaggle.com is. We also talked about as a data scientist-to-be, you should learn how to use the large variety of learning resources on Kaggle.com. For this problem, please complete the following tasks:

· Visit Kaggle.com and register for an account. State your Kaggle ID.

·       Browse through the competitions available on Kaggle, pick one that interests you and briefly (half a page) introduce it. Make sure in your introduction, you include the following:

· Competition title

· Nature/source/characteristics of the data source

· What interesting insights may be derived from the competition results

· Who (individuals and/or businesses/organizations) could benefit from the competition?

 

Problem 3. Learning useful Python libraries (15 points)

There are many very useful Python libraries for data scientist. To answer this problem, please conduct your own search and fill in the information required. 

Library function

Library name

What is it? 

(what this library is best at)

Library documentation (location)

Data Mining

Scrapy

BeautifulSoup

Data Processing and Modeling

Pandas

NumPy

SciPy

Machine Learning

TensorFlow

Scikit Learn

Data Visualization

Matplotlib

Seaborn

 

 

 

Problem 4. COVID-19 and Data Science (20 points)

In class, we talked about a recent article from Harvard Data Science Review—“COVID-19: A Massive Stress Test with Many Unexpected Opportunities (for Data Science)”. Please read the article and answer the following questions:

· List three of those “Unexpected Opportunities” presented by the COVID-19 pandemic and briefly describe what values could these opportunities bring to us?

· In your own words, explain what the author calls the pandemic “A massive stress test”?

· Briefly discuss three of the challenges in executing data science projects utilizing the opportunities presented by the pandemic.

Problem 5. The CRISP-DM Cycle (30 points)

In class, we discussed the CRISP-DM cycle and explained each of the phases. We also discussed how those phases are related to each other. Please read  this case  about utilizing data science for online travel agencies. This short article listed four the so-called “The WNS Solution”. Please pick one out of the four as a potential data science project. Then discuss how the CRISP-DM cycle would apply. Conduct your own research as necessary. Your focus should be on:

· Major activities of each phase

· How all the phases could take place, especially the interconnections between all the phases

· Difficulties/challenges you may run into in each phase