Project Development
Project: Develop an application to collect twitter data for COVID-19 since January 1, 2020.
Main Steps
1. Besides Twitter identify other online sources of information we could use to collect COVID-19 data.
2. Develop a web scraper and database schema for storing content from Twitter
3. Cluster content to identify most discussed topics
4. Within the cluster find opposing viewpoints
You will only implement steps 1 and 2 above. Here are the details for steps 1 and 2
1. To start, find online resources we could use that contain opposing views on the COVID-19 virus.
Maybe Fox vs. CNN? Maybe Twitter?
After you find and list some online sources we can use, next work with Twitter data.
2. Develop a twitter scraper. You can find plenty of Python examples online using twitter and mysql to
collect data from Twitter. Here's an online example you can follow:
https://towardsdatascience.com/storing-tweets-in-a-relational-database-d2e4e76465b2
3. Collect data from twitter on COVID-19 since Jan 1, 2020 to May 12, 2020 in Python.
Identify different sub-topics (i.e. "threat of an outbreak in US", "testing", "how
virus is transmitted", "drugs to cure", "availability of vaccine", "trump performance" etc....other sub-topics to be
identified.
You can create a database called COVID-19. Then create a database table with the subtopics and all twitter user features.
We want to collect all the information relating to a tweet. For example,
1. who tweeted
2. date tweeted
3. followers
4. was tweet retweeted
5. etc.....
6 years ago 35