data analysis about case
2022/4/28 00:31 Individual Assignment 1: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/39595/pages/individual-assignment-1 1/3
Individual Assignment 1
General Notices
1. All plots, analyses and technical work must be completed using Python
2. The late penalty is 5% of the assignment mark per day starting at the due date.
3. The assignment is marked anonymously.
4. Collusion and plagiarism are obvious to markers and will not be tolerated.
Academic Integrity
Please be aware of the University’s academic integrity policies. Issues of academic integrity are taken seriously by the University and the BUSS6002 team. If you are suspected of dishonest behaviour you will be referred to the Academic Integrity Office who will process your case. This may result in delayed results, mark reduction, failure of the unit or expulsion.
Dishonest behaviour includes but is not limited to:
using contract cheating services
plagiarism such as copying phrases, paragraphs etc
not appropriately referencing
You are encouraged to refer to the full policy and guidelines on the University of Sydney website (https://canvas.sydney.edu.au/courses/39595/assignments/372346) https://www.sydney.edu.au/students/academic-integrity.html (https://www.sydney.edu.au/students/academic-integrity.html)
You can access the Academic Honesty module on canvas at any time https://canvas.sydney.edu.au/courses/29833 (https://canvas.sydney.edu.au/courses/29833)
Allowed Packages
You must use only the Python packages covered in tutorials e.g. Pandas, NumPy, SciPy, Matplotlib. Use of automatic EDA packages will result in a Fail grade for this component.
Assignment Overview Due Date: 23:59 Sunday May 1st, 2021
Weight: 20%
Max Length: 1000 words (excluding code and tables), we will stop reading at 1100 words.
Type: Individual
Submission Type: Jupyter Notebook (.ipynb) via Canvas
Problem Background
Bellingen Riverwatch was created to provide consistent water quality data in the Bellinger and Kalang catchments following a disease outbreak that caused a mass death event of the Critically Endangered Bellinger River Snapping Turtle (BRST) in early 2015. A lack of water quality data was identified by scientists and community alike as a priority focus area.
Bellingen Riverwatch engages 43 local community volunteers and 5 schools to collect monthly water quality data at 30 sites every month across the Bellinger, Never Never, and Kalang Rivers.
River health and water quality can change due to a wide range of factors, such as geology, rainfall, vegetation cover, gradient/steepness and size of the catchment, human impacts through land use, natural disasters, climate, and much more. To help build a picture of a catchments’ health, ongoing and regular monitoring of water quality is required to build baseline data - a picture of the conditions for that particular waterway.
Submit Here (https://canvas.sydney.edu.au/courses/39595/assignments)
2022/4/28 00:31 Individual Assignment 1: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/39595/pages/individual-assignment-1 2/3
Question
The Bellingen Riverwatch citizen scientists have asked you to help answer the following question:
What were the water quality grades and scores for Bellingen Riverwatch sites annually and each summer?
Requirements
To help answer the question you must perform an exploratory analysis of the data.
Your exploratory analysis must contain the following:
1. A description of the dataset including a complete data dictionary
2. Identification and categorisation of data quality issues
3. Outline of any modifications or data processing accompanied by context driven justification
4. Exploration of relationships and trends including both identification, analysis and any conclusions that can be drawn
Throughout your analysis you must relate your findings to the context of the problem. For example when investigating data quality issues you will need to consider how the data has been collected and by who. Such information can be found in the Volunteer Manual.
Your EDA will consist of a Jupyter Notebook, in which you will use Markdown cells to provide commentary on your EDA process.
Suggested Structure
1. Data description (100 words)
2. Data dictionary (100 words)
3. Data quality analysis (200 words)
4. Exploratory analysis (500 words)
5. Conclusion (100 words)
Tables are not included in the word count.
Submission Items
You must submit a Jupyter Notebook (.ipynb) file with the following filename format, replacing STUDENTID with your own student ID
BUSS6002_STUDENTID.ipynb
Data Download files 1-4 here (ZIP 5.4MB) (https://canvas.sydney.edu.au/courses/39595/files/23011548?wrap=1)
(h�ps://canvas.sydney.edu.au/courses/39595/files/23011548/download?download_frd=1)
1. Community water quality data 2017 to current
2. Riverwatch Sites
3. Volunteer Manual
4. Data Quality Statement
5. About Our Program - Bellingen Riverwatch (PDF 48.9 MB) (https://canvas.sydney.edu.au/courses/39595/files/23395173?wrap=1) (h�ps://canvas.sydney.edu.au/courses/39595/files/23395173/download?download_frd=1)
Data Sources Bellingen Riverwatch - community water quality data 2017 to current
2022/4/28 00:31 Individual Assignment 1: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/39595/pages/individual-assignment-1 3/3
Copyright © The University of Sydney. Unless otherwise indicated, 3rd party material has been reproduced and communicated to you by or on behalf of the University of Sydney in accordance with section 113P of the Copyright Act 1968 (Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice. Live streamed classes in this unit may be recorded to enable students to review the content. If you have concerns about this, please visit our student guide (https://canvas.sydney.edu.au/courses/4901/pages/zoom) and contact the unit coordinator.
https://datasets.seed.nsw.gov.au/dataset/bellingen-riverwatch (https://datasets.seed.nsw.gov.au/dataset/bellingen-riverwatch)
Bellinger River Health Program 2017-2020: DPIE Science Water quality and Macroinvertebrate data
https://datasets.seed.nsw.gov.au/dataset/bellinger-river-health-program-2017-2020-dpie-science-water-quality-and- macroinvertebrate-data (https://datasets.seed.nsw.gov.au/dataset/bellinger-river-health-program-2017-2020-dpie-science-water-quality-and- macroinvertebrate-data)
BR Website
https://sites.google.com/ozgreen.org.au/bellingenriverwatch (https://sites.google.com/ozgreen.org.au/bellingenriverwatch)
Marking Criteria Analysis
5 marks
LO1, LO2, LO3, LO7
the data is extremely limited or missing. Data dictionary is not provided or is mostly incomplete.
Data quality issues are weakly or not considered. A plan is not outlined to manage data quality issues.
complete and correct description of the data is provided. The description is accompanied by a mostly complete data dictionary.
A brief discussion of data quality issues is provided and accompanied by an outline of data quality issues. Plan is executed mostly as outlined.
comprehensive, contextually and technically correct description of the data is provided. The description is accompanied by a complete data dictionary.
A discussion of data quality issues is provided and accompanied by an outline of data quality issues. Plan is executed mostly as outlined.
comprehensive, contextually and technically correct description of the data is provided. The description is accompanied by a complete data dictionary.
An evaluation of data quality issues is provided and accompanied by a plan to manage any data quality issues. Plan is executed as outlined.
comprehensive, contextually and technically correct description of the data is provided. The description is accompanied by a complete data dictionary.
An evaluation and analysis of data quality issues is provided and accompanied by a contextually justified plan to manage any data quality issues. Plan is executed as outlined.
FA PS CR DI HD
Exploratory Description or A description A clear and A clear and A detailed and