Pig programming
PROBLEM 1
Select frequent words (whose count is equal or greater than 50,000).
Display the frequent words in descending order.
PROBLEM 2
Get groups of words by their length (Hint: use the built-in function SIZE) and count each group.
For example,
(2,1096049) means that there are 1096049 occurrence of words that have two characters.
Problem 3 is based on dataset nyc_taxi_data_2014.csv.gz
PROBLEM 3
Find the effect of passenger_count on trip_distance, fare_amount, and tip_rate.
a) Create a new data set records2 that has passenger_count, trip_distance, fare_amount,
tip_rate (tip_amount/total_amount)
b) Filter records2 by passenger_count (0 < passenger_count < 10) and name the data set as
records3
c) Group records3 by passenger_count.
d) Display the average trip_distance, average fare_amount, and average tip_rate per each
group of passenger_count.
5 years ago
5
Answer(0)
other Questions(10)
- anybody that do good work
- Leadership and Supervising Influences in Human Service Agencies
- IT10
- WEEK 4: ECET 450 ORACLE DATABASE iLAB
- Juvenile Delinquency
- can **statman** answer any question that i have regarding statistics?
- Astronomy Lab
- Assignment 2: Discussion—Decision Case—Sherman Lawn Service and Greg's Groovy Tunes
- Acounting exam
- Internal Control