ASSESSMENT 3

profileNIYATI
CreatingdatareportwithPandas.docx

Reporting and Factsheets with Pandas

Overview

Before we begin, here is a high level comparison of the libraries presented in this post:

Library

Technology

Summary

Pandas  + HTML

HTML

You can generate beautiful reports in the form of static web pages if you know your way around HTML + CSS. The HTML report can also be turned into a PDF for printing.

Pandas  + Excel

Excel

This is a great option if the report has to be in Excel. It can be run on a server where Excel is not installed, i.e. it’s an ideal candidate for a “download to Excel” button in a web app. The Excel file can be exported to PDF.

Pandas

It’s incredibly easy to create Pandas DataFrames with data from databases, Excel and csv files or json responses from a web API. Once you have the raw data in a DataFrame, it only requires a few lines of code to clean the data and slice & dice it into a digestible form for reporting. Accordingly, Pandas will be used in all sections of this blog post, but we’ll start by leveraging the built-in capabilities that Pandas offers for reports in Excel and HTML format.

Pandas + Excel

Required libraries: pandas, xlsxwriter

If you want to do something slightly more sophisticated than just dumping a DataFrame into an Excel spreadsheet, I found that Pandas and XlsxWriter is the easiest combination, but others may prefer OpenPyXL. In that case you should be able to easily adopt this snippet by replacing engine='xlsxwriter' with engine='openpyxl' and changing the book/sheet syntax so it works with OpenPyXL:

import pandas as pd

import numpy as np

# Sample DataFrame

df = pd.DataFrame(np.random.randn(5, 4), columns=['one', 'two', 'three', 'four'],

index=['a', 'b', 'c', 'd', 'e'])

# Dump Pandas DataFrame to Excel sheet

writer = pd.ExcelWriter('myreport.xlsx', engine='xlsxwriter')

df.to_excel(writer, sheet_name='Sheet1', startrow=2)

# Get book and sheet objects for futher manipulation below

book = writer.book

sheet = writer.sheets['Sheet1']

# Title

bold = book.add_format({'bold': True, 'size': 24})

sheet.write('A1', 'My Report', bold)

# Color negative values in the DataFrame in red

format1 = book.add_format({'font_color': '#E93423'})

sheet.conditional_format('B4:E8', {'type': 'cell', 'criteria': '<=', 'value': 0, 'format': format1})

# Chart

chart = book.add_chart({'type': 'column'})

chart.add_series({'values': '=Sheet1!B4:B8', 'name': '=Sheet1!B3', 'categories': '=Sheet1!$A$4:$A$8'})

chart.add_series({'values': '=Sheet1!C4:C8', 'name': '=Sheet1!C3'})

chart.add_series({'values': '=Sheet1!D4:D8', 'name': '=Sheet1!D3'})

chart.add_series({'values': '=Sheet1!E4:E8', 'name': '=Sheet1!E3'})

sheet.insert_chart('A10', chart)

writer.save()

Create the report

Pandas + HTML

Required libraries: pandas, jinja2

Creating an HTML report with pandas works similar to what’ve just done with Excel: If you want a tiny bit more than just dumping a DataFrame as a raw HTML table, then you’re best off by combining Pandas with a templating engine like  Jinja :

First, let’s create a file called template.html:

<html>

<head>

<style>

* {

font-family: sans-serif;

}

body {

padding: 20px;

}

table {

border-collapse: collapse;

text-align: right;

}

table tr {

border-bottom: 1px solid

}

table th, table td {

padding: 10px 20px;

}

</style>

</head>

<body>

<h1>My Report</h1>

{{ my_table }}

<img src='plot.svg' width="600">

</body>

</html>

Then, in the same directory, let’s run the following Python script that will create our HTML report:

import pandas as pd

import numpy as np

import jinja2

# Sample DataFrame

df = pd.DataFrame(np.random.randn(5, 4), columns=['one', 'two', 'three', 'four'],

index=['a', 'b', 'c', 'd', 'e'])

# See: https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Building-styles

def color_negative_red(val):

color = 'red' if val < 0 else 'black'

return f'color: {color}'

styler = df.style.applymap(color_negative_red)

# Template handling

env = jinja2.Environment(loader=jinja2.FileSystemLoader(searchpath=''))

template = env.get_template('template.html')

html = template.render(my_table=styler.render())

# Plot

ax = df.plot.bar()

fig = ax.get_figure()

fig.savefig('plot.svg')

# Write the HTML file

with open('report.html', 'w') as f:

f.write(html)