project 1

profilesofibarboza
AEB3550_Project1_Spring20252.docx

AEB 3550 Project 1

Global Agricultural Data Analysis (25 points)

A neat, professional looking version of your report and excel file is due on Monday, February 3rd by midnight on eLearning. Late submissions will be penalized 5 points per day late. This is an individual assignment, and each person is expected to do their own work. For this assignment, each student is assigned a specific country and no two people should download the data for the same country (see your assigned country on this google sheet). Please check the rubric on e-Learning before submitting your assignment as you might have missed a required component. You will be graded based on completeness, accuracy, and quality of presentation. If your Turnitin score for your report (not your excel file) exceeds 35%, you will be given a grade of zero. Your lowest-scoring application project will be dropped at the end of the semester.

Project Summary: You will download data from the Food and Agriculture Organization of the United National (FAO) website and create descriptive statistics for this data using excel.

Technology Requirements: You will need a full-powered version of excel within Office 365 and a full-powered computer with an ability to use applications (i.e. not a cromebook). You will also need to install the data analysis toolpak for this project.

Learning Objectives:

1. Access and download data from web-based services to create a useable dataset.

2. Clean data to attain usable datasets (cross-sectional, time series, and panel)

3. Identify various components of a dataset and scales of measure of variables.

4. Create and interpret descriptive statistics (numerical and graphical).

5. Differentiate between the mean and median.

6. Create and interpret pivot tables (cross tabulation) for comparisons.

7. Assess the trend of data and effectively generalize time variant data.

8. Describe agriculture in your country using summary statistics.

Step 1: Download the data

Navigate to the FAOSTAT data portal: https://www.fao.org/faostat/en/#data/QCL

Using the tool, filter the data for your country, include all elements, items, and years to create your time series dataset. An example for the US is given to the right. Next click the “Download Data” button.

Immediately save your file as an .xlsx file rather than a .csv file. Rename the tab containing your data “RawData”, note the number of elements (rows, excluding the header) and the number of variables (columns excluding non-variate columns such as Domain code, Area Code, etc).

Screens screenshot of a computer  Description automatically generated

Step 2: Convert your data into a table and create one cross-sectional dataset, one time series dataset, and one panel dataset.

To do this, highlight all of your data (ctrl + a) Home Format as Table ensure the “My table has headers” box is checked as shown below (your data will likely have a different number of rows):

Part 1: Create a cross-sectional dataset

You can create a cross-sectional dataset by filtering down to one year and one element (like yield, production, etc). To do this, go into “Year”, deselect everything except the final year in your dataset then go into “Element” and deselect everything but one element item. Copy and paste this filtered dataset into a new tab that you can rename xsec. Now, we should delete all non-variant columns so that your data is more succinct and meaningful. You might also consider removing all missing values. Rename your “Value” column to whatever your element is. In my case, it is “Agricultural Production (ha)”.

Part 2: Create a time-series dataset

To do this, adjust the year column to include all years. Next, filter down your item set to be only one product (like apples, cattle, sugar beet, etc) and again, filter down to only one element (like yield, production, etc). Copy this data into a new worksheet and rename it tseries. Now, we should delete all non-variant columns so that your data is more succinct and meaningful. You might also consider removing all missing values. Rename your “Value” column to whatever your element is. In my case, it is “Area Harvested for Avocados in the US (ha)”.

Part 3: Create a panel dataset

For panel data, you need variables for multiple elements over time. In this case, we can filter our data to include more than just one product (let’s say 3) and at least two variables that are common among your elements. In my case, I chose Avocados, Figs, and Kiwi Fruit for my “Item” and “Area harvested”, “yield”, and “Production” as my elements. Then copy and paste this into a new worksheet which you can rename panel. In reality, your original data was panel data (it should be easy to see by now that it was. We will be using pivot charts (cross-tabulation) to simplify this dataset so go ahead and format this dataset as a table like you did for your original data.

For each of these datasets you created, what how many elements, variables, and data points are there? What is the scale of measure for each of the variables you looked at? What about in the original dataset (i.e. determine the scale of measure for domain, area, element, item, etc)? What sorts of things might you be able to learn from these datasets? How many potential datasets do you think there are in the data you downloaded? Comment on what you think of the values provided by FAO. Do you trust this information? Why or why not?

Step 2: Summarize your data with descriptive statistics

1) Using your cross-sectional dataset (xsec), generate descriptive statistics using the data analysis toolpak (It will show up in the Data tab of Excel. Data Analysis Toolpak Descriptive Statistics and then input all data, check the labels box, put the table in your existing sheet, and check the box “Summary statistics”.

2) Based on the descriptive statistics generated, what can you say about your data? Is the mean or median the more appropriate measure of central tendency? Justify your response with an analysis of outliers using the empirical rule and noting the skewness of your dataset.

3) Using the time series dataset, generate a line graph of your variable over time (Insert Charts 2d line). Make sure that time is on your horizontal axis and you only have one variable being charted over time. To edit the data, go to Chart design Select data and make sure your horizontal axis labels are years and your legend entries (series) only includes your one variable. Make sure to edit your chart title and include a vertical axis label that describes your units. Describe what is happening to your variable over time.

4) Next, generate a histogram for this data using the data analysis toolpak (Data Analysis Histogram). Be sure to include the lables and check the label box, put your output in the existing sheet, and be sure to include the chart output so you can see the chart and graphical histograms. How is your data distributed? What does the histogram tell us compared to the line chart? What are each of the statistics useful for in describing a set of data?

5) Using the panel data, we want to simplify this data using cross tabulation. To do this, highlight all your data for your panel dataset (ctrl+A is a nice shortcut to do this). When you end up in the new sheet, make an easy pivot table with item as your column variable and value as your Sum value (see image to the right). It is now summing all of our variables for avocados, figs, and kiwi for all years in our dataset.

6) You can also use this tool to perform row and column percentage calculations. Copy your pivot table twice into the same document. Then, go to your Sum of Value calculation, and click the carrot symbol value field settings Show value as % of column total (and % of row total for the third table). This is useful for comparing the relative “weight” of each item category and value category (area, production, and yield in my case). Remain this sheet “CrossTabs”.

7) Next, let’s create a dataset that has more meaning than summations, let’s again generate a pivot table from our Panel data, but this time, have rows be the year, and Element + Item be the column, while value is still the summation value. You can simplify this by going into the column labels and eliminating all but one of your variables (in my case, I will look at yield only). Finally, you can go into the PivotTable Analyze tab and choose PivotChart, then line graph. This will allow you to compare the three products you chose over time. You can format your table to remove all buttons and add a vertical axis label and a meaningful chart title. How do the three products compare over time for your variable of choice?

Step 3: Write a professional report

Your professional report should include the following:

1) A cover page with a meaningful title, your name, and course information

2) An executive summary that describes your data, what you did in your analysis, and a few key findings

3) Sections with the following:

a. A description of your cleaning process for each of your three datasets and information about the resulting three datasets (elements, variables, potential research questions, etc—i.e. answer questions posed in step 2)

b. A summary of your xsec data including your summary statistics and outlier analysis (i.e. step 2, parts 1 & 2)

c. A summary of your tseries data including line chart, histogram, and discussion of the data distribution (i.e. step 2, parts 3 & 4).

d. A summary of your panel data including the 3 pivot tables, pivot chart, and comparison of the three products and variables investigated (i.e. step 2, parts 5, 6, & 7).

This should be written in your own words based on what you have learned so far in this class. This report will be checked for plagiarism as will all analysis. It will be graded based on the following rubric:

Part

Points

Excel spreadsheet (original data, xsec, tseries, panel, cross tabs, comparison)

6

Cover page and professional formatting of the report

1

Executive summary (data description, what you did, key findings)

3

Data cleaning process summary

3

Cross-sectional analysis (summary statistics and outlier analysis)

4

Time-series analysis (line chart, histogram, and distribution analysis)

4

Panel data analysis (3 pivot tables, pivot chart, and comparisons analysis)

4

Total

25

2

image1.png

image2.png

image3.png

image4.png

image5.png

image6.png

image7.png

image8.png

image9.png