ASSESSMENT 3
Reporting and Factsheets with Pandas
Overview
Before we begin, here is a high level comparison of the libraries presented in this post:
|
Library |
Technology |
Summary |
|
Pandas + HTML |
HTML |
You can generate beautiful reports in the form of static web pages if you know your way around HTML + CSS. The HTML report can also be turned into a PDF for printing. |
|
Pandas + Excel |
Excel |
This is a great option if the report has to be in Excel. It can be run on a server where Excel is not installed, i.e. it’s an ideal candidate for a “download to Excel” button in a web app. The Excel file can be exported to PDF. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pandas
It’s incredibly easy to create Pandas DataFrames with data from databases, Excel and csv files or json responses from a web API. Once you have the raw data in a DataFrame, it only requires a few lines of code to clean the data and slice & dice it into a digestible form for reporting. Accordingly, Pandas will be used in all sections of this blog post, but we’ll start by leveraging the built-in capabilities that Pandas offers for reports in Excel and HTML format.
Pandas + Excel
Required libraries: pandas, xlsxwriter
If you want to do something slightly more sophisticated than just dumping a DataFrame into an Excel spreadsheet, I found that Pandas and XlsxWriter is the easiest combination, but others may prefer OpenPyXL. In that case you should be able to easily adopt this snippet by replacing engine='xlsxwriter' with engine='openpyxl' and changing the book/sheet syntax so it works with OpenPyXL:
import pandas as pd
import numpy as np
# Sample DataFrame
df = pd.DataFrame(np.random.randn(5, 4), columns=['one', 'two', 'three', 'four'],
index=['a', 'b', 'c', 'd', 'e'])
# Dump Pandas DataFrame to Excel sheet
writer = pd.ExcelWriter('myreport.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow=2)
# Get book and sheet objects for futher manipulation below
book = writer.book
sheet = writer.sheets['Sheet1']
# Title
bold = book.add_format({'bold': True, 'size': 24})
sheet.write('A1', 'My Report', bold)
# Color negative values in the DataFrame in red
format1 = book.add_format({'font_color': '#E93423'})
sheet.conditional_format('B4:E8', {'type': 'cell', 'criteria': '<=', 'value': 0, 'format': format1})
# Chart
chart = book.add_chart({'type': 'column'})
chart.add_series({'values': '=Sheet1!B4:B8', 'name': '=Sheet1!B3', 'categories': '=Sheet1!$A$4:$A$8'})
chart.add_series({'values': '=Sheet1!C4:C8', 'name': '=Sheet1!C3'})
chart.add_series({'values': '=Sheet1!D4:D8', 'name': '=Sheet1!D3'})
chart.add_series({'values': '=Sheet1!E4:E8', 'name': '=Sheet1!E3'})
sheet.insert_chart('A10', chart)
writer.save()
Create the report
Pandas + HTML
Required libraries: pandas, jinja2
Creating an HTML report with pandas works similar to what’ve just done with Excel: If you want a tiny bit more than just dumping a DataFrame as a raw HTML table, then you’re best off by combining Pandas with a templating engine like Jinja :
First, let’s create a file called template.html:
<html>
<head>
<style>
* {
font-family: sans-serif;
}
body {
padding: 20px;
}
table {
border-collapse: collapse;
text-align: right;
}
table tr {
border-bottom: 1px solid
}
table th, table td {
padding: 10px 20px;
}
</style>
</head>
<body>
<h1>My Report</h1>
{{ my_table }}
<img src='plot.svg' width="600">
</body>
</html>
Then, in the same directory, let’s run the following Python script that will create our HTML report:
import pandas as pd
import numpy as np
import jinja2
# Sample DataFrame
df = pd.DataFrame(np.random.randn(5, 4), columns=['one', 'two', 'three', 'four'],
index=['a', 'b', 'c', 'd', 'e'])
# See: https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Building-styles
def color_negative_red(val):
color = 'red' if val < 0 else 'black'
return f'color: {color}'
styler = df.style.applymap(color_negative_red)
# Template handling
env = jinja2.Environment(loader=jinja2.FileSystemLoader(searchpath=''))
template = env.get_template('template.html')
html = template.render(my_table=styler.render())
# Plot
ax = df.plot.bar()
fig = ax.get_figure()
fig.savefig('plot.svg')
# Write the HTML file
with open('report.html', 'w') as f:
f.write(html)