General data processing and using big data

Cruzlappo
InstructiontoUseSampleCodesforAssignment11.docx

Instruction to use sample codes for Assignment 1

(All codes are developed by Mr Borui Cai)

Note: Don't use the same data sources/websites, when you use the following codes to complete your Assignment 1; otherwise, you will get zero mark.

(1) weather_interval_a.py:

Get 10 temperature data at an interval of 30 minutes from website http://m.weatherzone.com.au/vic/melbourne/melbourne, including “time”, “temperature”, “humidity” and “pressure”.

The main function first call getWeather() to get 10 data, use sleep(1800) as time interval to the next data, then these are stored in weather (a list) and write to file “weather_interval_a.csv”.

(2) weather_interval_b.py:

Get 10 temperature data at an interval of 30 minutes from website http://www.eldersweather.com.au/vic/melbourne/melbourne. , including “time”, “temperature”, “humidity” and “pressure”.

The main function first call getWeather() to get 10 data, use sleep(1800) as time interval to the next data, then these are stored in weather (a list) and write to file “weather_interval_b.csv”.

(3) weather_change.py:

Get 10 temperature data whenever the temperature changes more than 1 degree from website http://m.weatherzone.com.au/vic/melbourne/melbourne. , including “time”, “temperature”, “humidity” and “pressure”.

The main function first call getWeather() to get 10 data, use if abs(currtemperature - oldtemperature) > 1 as the condition to collect the next data, then these are stored in weather (a list) and write to file “weather_change.csv”.

(4) weather_fusion.py:

Fuse the two data collected by weather_interval_a.py and weather_interval_b.py.

In the main function first the two datasets are read into data1 and data2, then call weatherfuse() to fuse data1 and data2, considering that website1 is more reliable than website2, so we give confidence1 bigger than confidence2 (0.8 to 0.6).

(5) weather_query.py:

In main function, client can query the weather by inputting a temperature, the returned temperature will be:

if the query time is not in the dataset:

1, return the earliest temperature is query time is earlier

2, return the latest temperature is query time is later

if the query time is in the range of dataset,

return the mean of its lower bound and upper bound temperature