Calculating public burden using OIRA data
An experiment in using open data to make government better
Recently, the new Administration issued an Executive Order aimed at Reducing Regulation and Controlling Regulatory Costs. As part of this effort, the Administration is supposed to offset regulated costs.
So, that got me thinking. The Office of Information and Regulatory Affairs (OIRA) is charged with reviewing not only regulations, but also is charged with reviewing agency's information-collection requests under the Paperwork Reduction Act. And as part of that review, OIRA and the agencies are supposed to track the public burden associated with the information collection.
As a thought experiment, I decided to see whether we could find some low-hanging fruit, namely paper-based information requests. And the results were interesting...
The analysis¶
First, we need to find the data. Fortunately, that data is already available in bulk from OIRA. Well done, OIRA.
From here, it's simple. First we use lxml to parse the XML file.
from lxml import etree
tree = etree.parse('CurrentInventoryReport.xml')
root = tree.getroot()
So, now that we have the data and it's parsed, where to begin? Let's see what this data looks like by checking out the first Information Collection Request in the data.
print(str(etree.tostring(root[0], pretty_print = True).decode('UTF-8')))
Well, would you look at that?! There's an AvailableElectronically
element.
So, how about we try and find all the agencies that have some information collection requests that are not available electronically.
To do this, we use xpath
to find all the Information Collection Requests that has a "AvailableElectronically" element with "No". Then, we simply pick the fields we want to dump into a Python dict.
results = []
def getInfoRequests(element):
res = []
collections = element.xpath('./InformationCollections/InformationCollection')
for collection in collections:
<span class="n">res</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s2">"title"</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'./Title/text()'</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span><span class="o">.</span><span class="n">strip</span><span class="p">(),</span>
<span class="s1">'obligation'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'./ObligationCode/text()'</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span><span class="o">.</span><span class="n">strip</span><span class="p">(),</span>
<span class="s1">'affected'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'./AffectedPublicCode/PublicCode/text()'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()),</span>
<span class="s1">'number_responses'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'./NumberResponses/AnnualQuantity/text()'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()),</span>
<span class="s1">'burden_hour'</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'./BurdenHour/TotalQuantity/text()'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()),</span>
<span class="s1">'frequency'</span><span class="p">:</span> <span class="n">collection</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">'.//ReportingFrequency/text()'</span><span class="p">),</span>
<span class="p">})</span>
<span class="k">return</span> <span class="n">res</span>
requests = root.xpath('//InformationCollectionRequest[.//AvailableElectronically/text()[. = "No"]]')
for request in requests:
results.append({
"agency_code": request.xpath('./AgencyCode//text()')[0],
"omb_control_number": request.xpath('./OMBControlNumber//text()')[0],
"icr_reference_number": request.xpath('./ICRReferenceNumber//text()')[0],
"title": str(request.xpath('./Title//text()')[0]).strip(),
"abstract": str(request.xpath('./Abstract//text()')[0]).strip(),
"expiration_date": str(request.xpath('./Expiration/ExpirationDate//text()')[0]).strip(),
"burden": int(request.xpath('./Burden/BurdenHour/TotalQuantity/text()')[0]),
"requests": getInfoRequests(request),
"cost": int(request.xpath('./Burden/BurdenCost/TotalAmount/text()')[0]),
})
Now that we have a Python dict, time for the payoff. We sum the burden for each Information Collection Request and print the results.
burden = sum([result["burden"] for result in results])
agencies = set([result["agency_code"] for result in results])
print("There are %s information requests that cannot be filed electronically from %s different agencies with a total public burden of %s hours." % (len(results), len(agencies), "{:,}".format(burden)))
That's not a typo. That's a total of 3.3 billion hours of public burden associated with paper-based information requests. Seems like a target-rich environment.
Unfortunately, I've run out of time to really get in there and visualize where to begin. So, for now, I'll simply save the results in a json
file for later.
import json
with open('results.json', 'w') as fp:
json.dump(results, fp, indent=2)
Hope you enjoyed this little exploration in how open government data can be used to make government work better. It's important to note that none of this would be possible if OIRA did not publish the data in bulk... Again, nice work OIRA.