Big Data Architecture
BIG DATA ARCHITECTURE
(FALL 2019) Term Project Version 28 September 2019
1 Objectives Develop a Big Data application using microservice architecture and deploy them on a Platform as a Service (PaaS). Your job is not only to build microservice and serverless applications but also to integrate them into a single distributed application that can be useful for your users.
2 Prerequisites Your application must use the following big data technology:
• Hadoop Distributed File System (HDFS) • MongoDB
Your application must use the following datasets:
Go here http://stat-computing.org/dataexpo/2009/the-data.html and download the airline on-time performance data from 1996 through 2006 (11 files/years in total). Hint: Make sure your application works with data from 2000 through 2002 first (3 years). Then, you should try with 5 years, 7 years, 9 years, and finally 11 years. The reason is that your machine may not be able to compute all 11 years in one scoop.
Store these files into your Hadoop system.
Go here http://stat-computing.org/dataexpo/2009/supplemental-data.html to download the following datasets:
• Airports
CISC-525 Term Project Fall 2019 • Carriers
• Planes Process and store the content of these file into a MongoDB.
You are to develop the following software components, then integrate them using RESTful APIs:
1. Airport Locating Service
2. Carrier Identification Service
3. Plane Information Service
4. Airline On-time Performance Service
5. Weather Serverless Service
You are to provide a website or a simple console application that allow the user to use your service. For example, if I want to find to know the city that is identified by iata code ‘00M’, I should be able to enter the ‘00M’ in a text field, the I should see ‘Thigpen, Bay Springs, USA’. In other words, develop a client application that integrate all the services.
Your application should be deployed on Red Hat OpenShift Community Development (OKD), an open source version. Hint: Make sure your application that should include multiple microservices work outside of OKD first. For example, you would have a set of microservices started manually first. You test to make sure they are working together properly via integration testing. Deploying your application onto the OKD should be your last activity when all your microservices are tested properly.
Data exchange and/or service requests between services are using RESTful web services. You must document your application programming interface (API) using swagger
(https://swagger.io/). Your application must be developed using Java 1.8 You must provide a comprehensive architecture drawing of your microservice design
3 Requirements 1. Given an airport code, return a JSON formatted document with the following
information:
· iata: the international airport abbreviation code
· name of the airport
· city and country in which airport is located.
· lat and long: the latitude and longitude of the airport
· Weather report (see notes)
pg. 2
CISC-525 Term Project Fall 2019
Notes: You are to develop a serverless that returns a weather report of a given city. To write a serverless to call a RESTful API, see this link:
https://aws.amazon.com/blogs/opensource/java-apis-aws-lambda/
It is OK for you just to return the exact content you receive from the weather website. You can try this HTTP GET:
GET /data/2.5/weather?q=london,uk& APPID=*** Your own APPID *** HTTP/1.1 Host: api.openweathermap.org User-Agent: PostmanRuntime/7.15.0 Accept: */*
Cache-Control: no-cache Postman-Token: 2d5e155a-6aa9-4127-8544-5b1c2f42a317,4be61a28-3a2f-47d8-a5c8- 4006542e843b Host: api.openweathermap.org accept-encoding: gzip, deflate Connection: keep-alive cache-control: no-cache
{ "coord": {
"lon": -118.24,
"lat": 34.05
},
"weather": [
{
"id": 804,
"main": "Clouds",
"description": "overcast clouds",
"icon": "04d"
} ],
"base": "stations",
"main": {
"temp": 293.27,
"pressure": 1013,
"humidity": 68,
"temp_min": 290.93,
"temp_max": 295.37
},
"visibility": 16093,
"wind": {
"speed": 3.6,
"deg": 230 },
"clouds": {
"all": 90
},
"dt": 1561157740,
"sys": {
"type": 1,
"id": 4361,
pg. 3
CISC-525 Term Project Fall 2019
"message": 0.0106,
"country": "US",
"sunrise": 1561120918,
"sunset": 1561172846
},
"timezone": -25200,
"id": 5368361,
"name": "Los Angeles",
"cod": 200
}
You would need to sign up for a free account with Open Weather Map:
https://home.openweathermap.org/
The temperature is Kalvin unit (convert it to Fahrenheit)
2. Given a unique carrier code, return a JSON formatted document with the carrier code and a description.
3. Given a unique tail number, return a JSON formatted document with the following information about the plane:
· Talinum
· Type
· Manufacturer
· Issue date
· Model
· Status
· Aircraft
· Type
· Engine type
· Year
4. Find the airport(s) that experiences the most departure delay.
5. Find the airport(s) that experiences the least departure delay.
6. Find the airport(s) that experiences the most arrival delay.
7. Find the airport(s) that experiences the least arrival delay.
8. Find the flight(s) that experiences the most arrival delay.
9. Find the flight(s) that experiences the least arrival delay.
10. Find the flight(s) that experiences the most departure delay.
11. Find the flight(s) that experiences the most departure delay.
12. Find the average arrival delay per carrier
13. Find the average departure delay per carrier
14. Find the average arrival delay before 2001 (1996, 1997, 1998, 1999, 2000).
15. Find the average arrival delay after 2001 (2002, 2003, 2004, 2005, 2006).
pg. 4
CISC-525 Term Project Fall 2019
16. Find the average departure delay before 2001 (1996, 1997, 1998, 1999, 2000). 17. Find the average departure delay after 2001 (2002, 2003, 2004, 2005, 2006).
Note: The requirements are fluid and it is your responsibility to ask me (product owner) for more information. You must do that through a dedicated discussion forum. I will not answer via email about the team project.
*** Do not wait *** until the last few days before the due date to start your project. I can almost guarantee you a lower grade. If you communicate with me weekly, I may be able to guide you to a successful completion. Hint: Do thing in small chunk and test it regularly.
4 Testing Your microservices must include a set of unit testing so that I can run ‘mvn clean test’.
• You must provide JUnit tests for your code.
5 Submission As always, I do not allow late submission. Partial credit is always better than *** no credit ***. Therefore, if you have something decent, submit before the due date expires. Do not wait until 11:59 PM on Sunday to submit. That would be too risky and I will not accept submission via ‘email’.
· Project Report (see the report format toward the end of this document)
· Everyone must submit the project report individually even if you work in a team.
If you work individually, you do not submit the teammate evaluation part. If you work in a team, your project report ‘must’ be identical to your teammates. In other words, you must not only work together in engineering the application but also in generating the project report. One team has one project report. If reports from each team member is different from one another, the entire team will receive ‘zero’ point. In short, if you decide to work as a team, you are required to behave like a team. If you work in a team, you are ‘required’ to include the teammate evaluation. This part ‘must’ be worked by individual teammate without any collaboration from other teammates.
6 Although it is `optional`, I recommend that you send me your weekly report with the following information:
· What have you done last week?
· What will you be doing this week?
· What problems do you have?
Weekly Status (Optional)
pg. 5
CISC-525 Term Project Fall 2019
7 References: 1. AWS Lambda using Spring Boot and/or Jersey:
https://aws.amazon.com/blogs/opensource/java-apis-aws-lambda/
2. Open Weather API:
https://home.openweathermap.org/
CISC-525 BIG DATA ARCHITECTURE
(FALL 2019) Term Project *** Everyone must submit this report ‘individually’ ***
8. 8 Introduction
9. 9 Test Cases
10. 10 Architecture & Design
11. 11 Build Instruction or Script
12. 12 Deployment Instruction or Script
13. 13 User Guide
14. 14 Repository
URL of your code repository (use private repository & share with me)
15 Lessons Learned 16 Conclusion