• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Want to get organized in 2022? Let Dokkio put your cloud files (Drive, Dropbox, and Slack and Gmail attachments) and documents (Google Docs, Sheets, and Notion) in order. Try Dokkio (from the makers of PBworks) for free. Available on the web, Mac, and Windows.


Recovery dot gov Congressional Testimony

Page history last edited by greg@fotonotes.net 12 years, 10 months ago

I have been asked to testify at a hearing of the House Committee on Oversight and Government Reform on Thursday, March 19, 2009. It is entitled "Preventing Stimulus Waste and Fraud: Who Are the Watchdogs?" and it will focus on accountability for stimulus spending. I will talk about how third parties can build interesting tools to help citizens find and sort spending, jobs, and performance information if only government provides the data in a complete, timely, and standardized manner.


Below is a draft of my written testimony. As a way to illustrate the concept of crowdsourcing to the Committee (and to make myself seem smarter than I am) I thought I would ask you all to help me edit the testimony. Please feel free to add anything I may have missed and to make any changes you see fit. Thanks for your help!


To contribute, you will need to click the "Edit" button and then ask for permission to edit the wiki (it doesn't let me give automatic access). I will grant you permission immediately. My testimony is due by C.O.B. tomorrow, and I will incorporate all changes that I would feel comfortable testifying to. -JB



Testimony of

Jerry Brito, J.D.

Senior Research Fellow

Mercatus Center at George Mason University


Before the

House Committee on Oversight and Government Reform


March 19, 2009


Mr. Chairman and Members of the Committee:


Thank you for inviting me to testify on “Preventing Stimulus Waste and Fraud.” Over the last few years my research has focused on how Internet technologies can be leveraged by government and citizens to increase transparency and thereby ensure accountability. I’m happy to share with you some of the things I have learned.


I don’t have to tell this Committee why it is so important to keep close tabs on the nearly $800 billion of spending that will take place as a result of the American Recovery and Reinvestment Act. You are one of the most important institutional organs of oversight. But you cannot do it alone, and the public is eager to help.  Perhaps most importantly, oversight is not accomplished at a single point in time.  It is best accomplished through continuous, multifaceted analysis.


We have a constitutional structure where the Congress - the House and the Senate - are invited to oversee the Executive Branch, and the Judiciary oversees both.  For our separation of powers to work well, the public needs to oversee all three.  The formal mechanisms of oversight require informal oversight. And we are moving into a networked media environment where direct access to data will allow a wide variety of actors and entities in the public to do essentially direct oversight of you in the government and of programs like the recent economic stimulus bill.  We in the transparency community want access to data so that we can do this public oversight.


The question is: how do we do it?



Crowdsourcing Accountability


The dozens of inspectors general and official auditors around the country who will follow this money do commendable work, but they can’t possibly look at every payment and every transaction. While we might want to, we can’t hire an army of auditors charged with tracking every single dollar. However, we can supplement the very small number of professional auditors with a very large number of small contributions from citizens.  This is an approach sometimes called crowdsourcing, in which complex tasks are distributed among a wide community of interest.  Almost any short phrase entered into Google, for instance, will retrieve a top-ranked result from Wikipedia, an entirely crowdsourced encyclopedia.[1]


If the government requires clear, timely, and profound reporting of how every dollar is spent, everyone—not just government auditors—could keep track of the money. This would mean that millions of citizens around the country would be able to look at the transactions related to recovery-funded projects in their neighborhoods. Thousands of journalists could also keep an eye on the spending and the work being done in the communities they serve. Contractors would be able to keep an eye on their competitors, and academics and watchdog groups could sift through the spending data to find interesting patterns.


The point of this would not be to foster “gotcha” games. Sure, we want to suss out fraud, waste, and abuse where it can be found, but more importantly, we want to make sure that money is being spent wisely and that projects are being run efficiently.  Crowdsourcing is one way to overcome the temporal problems associated with traditional oversight.  Local passions, ignited by the spark of local projects, are likely to increase with the passage of time and keep all participants in the economic recovery honest and on track even after traditional watchdogs have turned their attention to the next problem.


How does government go about enlisting the help of citizens around the country to help keep recovery spending accountable? It doesn’t have to. If the government makes available the raw spending data, third parties will build tools that allow citizens to sift, sort and report it. In fact, there is a strong community of transparency activists and enthusiasts eager to do so.


Earlier this year I launched the website StimulusWatch.org with the help of two very talented volunteer software developers, Peter Snyder and Kevin Dwyer. The site presents the nearly 20,000 “shovel-ready” projects that the U.S. Conference of Mayors has reported as candidates for stimulus funding. Citizens can easily find a list of projects in their hometown and then rate, discuss, and add factual context to each project. The site has received 2 million unique visits in its first month.


Within hours of launching the site, projects such as golf courses and dog parks had been found by users and voted to the top of the least critical projects list. On the other hand, the web pages of projects that at first blush seemed unworthy of funding are heavily annotated by visitors to include factual information and explanation of a project’s merits.


Now that the Recovery Act has passed, we want to expand the capabilities of our site to allow citizens to track projects in their communities that are indeed funded. Among other things, we would like to build a tool that allows citizens to discuss a project, track payments related to a project and annotate them, and rate a project’s performance.


I know that other web developers would like to make similar tools, including applications to track job creation and to plot stimulus dollars on map coded with unemployment and other statistics. There is no limit to the number of innovative presentations that public-minded netizens can create.


But before we can build any of these tools of accountability, we need the raw spending data. As I said before, citizen participation in the accountability process requires clear, timely, and profound reporting of how every dollar is spent. Without a question, the most effective way that exists today to make available such a large dataset is by making it available online in useful formats. The key there is that last phrase: useful formats.


What does this mean? It means that the data is presented in a standard, Web-friendly, machine-readable format that can be aggregated, parsed, and sorted. Although my techie friends will give me grief for simplifying it this way, think of it as rows in a spreadsheet with standardized column headings.


For example, one could conceive of a full and thorough disclosure of spending made in prose, or even in haiku. While such a report could fully account for every dollar, a computer could not analyze it. In contrast, the same disclosure in spreadsheet form allows one to sort by the different columns. From low to high dollar amount, by state or city, by contractor, or by any other column that is made available.


Made available in a nonproprietary structured format such as EXtensible Markup Language (or XML for short), and using a common standard for the expression of required information, a citizen could sort in much more complicated ways. For example, you could easily look up the top ten payments to contractors with names that begin with the letter “R” in a particular congressional district. More importantly, information made available in useful formats allows third parties to build interesting tools such as StimulusWatch.org. On their website, the Sunlight Foundation offers a great list of such third-party tools.[2]


Clarifications Needed


The American Recovery and Reinvestment Act calls for the disclosure of spending information online. However, its provisions are vague and do not require structured machine-readable formats. The Office of Management and Budget has issued guidance to federal agencies on how they should comply with Recovery Act reporting requirements, but that document also leaves many questions unanswered.


There are four key issues that the Administration and the Recovery Accountability and Transparency (RAT) Board should address soon. These are the depth of disclosure, standardization, aggregation, and centralized access.


First is the question of how deeply disclosure will go. While the Recovery Act requires that recipients of federal stimulus funds report, to awarding agencies, how the funds are spent, there is no clear instruction that every level of subcontract or subgrant must be disclosed.[3] The OMB Guidance interpreting the Act for agencies states that,


Reporting requirements only apply to the prime non-Federal recipients of Federal funding, and the subawards (i.e., subgrants, subcontracts, etc.) made by these prime recipients.  They do not require each subsequent subrecipient to also report.  For instance, a grant could be given from the Federal government to State A, which then gives a subgrant to City B (within State A), which hires a contractor to construct a bridge, which then hires a subcontractor to supply the concrete. In this case, State A is the prime recipient, and would be required to report the subgrant to City B. However, City B does not have any specific reporting obligations, nor does the contractor or subcontractor for the purposes of reporting for the Recovery.gov website.[4]


This is very troubling. If we want to ensure meaningful accountability, then we must have transparency at every level of transaction. It is not enough for citizens to know that EPA made a grant to New Jersey, which in turn made a sub-grant to Newark. We also need to know that Newark made a payment to “Acme Sanitation,” which a citizen with local knowledge could recognize as a firm owned by a councilmember’s son-in-law.


Congress and the Administration should make it clear that in fact every dollar will be accounted for, all the way down the chain. They should also make it clear that the full reports will be published online in useful formats. Right now, despite the Act’s mandate for a transparency website, there is nothing in the Act or the Guidance guaranteeing that the complete dataset of recipient reports will be made available online.[5]


The second key issue that should be clarified is standardization. At this point, the OMB Guidance does not explain what fields we should expect to see published on Recovery.gov, if and when spending reporting becomes available. That is, we don’t know what the columns of our metaphorical spreadsheet will be; we don’t know by what data fields will we be able to sort.


The Act requires that initial recipients report spending using “data elements required to comply with the Federal Funding and Transparency Act[.]”[6] These include such elements as the name of the entity receiving the award, the amount of the award, program source, description, city and state, etc. But what data elements will actually be published has not been addressed. Nor do we know in what format we can expect it.


Those of us who plan to make use of Recovery.gov data for the public’s benefit would like to know as soon as possible what exactly Recovery.gov will offer so that we can begin working on our applications. Additionally, knowing ahead of time what standards are in the works will allow us to give feedback to the team building the government’s transparency site about our needs.


Closely related to standardization is the third issue of aggregation. When information sharing is standardized along critical dimensions of who, what, where, and when, it becomes much easier to automatically aggregate, or roll-up, information automatically with computers. The Recovery.gov website is already nicely aggregating public relations announcements from respective agencies. We next need more information of how the financial and performance data will be aggregated.


The fourth and final issue that should be clarified is centralized search. The distributed nature of the projects means information will ultimately come from many sources, just as information on the Web comes from many sources. But for the information to be user friendly, this information must be searchable from central locations by both humand and computers just as search engines provide one-stop search for the Web. Citizens and application developers should be able to go to at least one central location—presumably Reovery.gov—to find every single reporting dataset. Providing centralized search does not imply a monopoly. The SEC's Edgar database centralizes SEC filings, but third parties provide alternative, value-added centralized search, too.


The Act requires agencies to publish quarterly spending reports on “a website,”[7] but does not specify which. Reading the Act, one assumes that it is on the agencies’ own websites. The Guidance seems to confirm this, directing agencies to publish reports on a /recovery subdirectory of their main sites. This means that reports will be scattered in dozens of websites around the web.


This approach is not necessarily a bad thing, and in fact might be a good way to ensure scalability. It is also more important to have data in a common structured format, ideally expressed in XML. This does not need to be the final, perfect, national standard, but a common open standard needs to be applied to all Recovery.gov datasets. That said, if all datasets are not housed in a central library at Recovery.gov, then there must at least be one central and easy to use card catalog with references to all datasets. Again, it would be useful if we knew ahead of time what we might expect.




In his first day in office, the President signed a “Memorandum on Transparency and Open Government.” The three central themes of the memorandum—to which the President committed the Administration—are transparency, participation, and collaboration.


About public participation, the memorandum states that “Knowledge is widely dispersed in society, and public officials benefit from having access to that dispersed knowledge.” About collaboration, the memorandum states, “Executive departments and agencies should use innovative tools, methods, and systems to cooperate among themselves, across all levels of Government, and with nonprofit organizations, businesses, and individuals in the private sector.”


There is indeed a community of interested and knowledgeable parties who want to participate and collaborate to make the online disclosure of recovery spending data succeed. For example, a wide range of groups and individuals from all parts of the political spectrum have formed a Coalition for an Accountable Recovery, and I commend to you and the Administration its vision statement and proposed online transparency architecture, which are attached.[8]


At this time, I’m happy to report that the administration has been quite good at listening and taking suggestions from those of us who are interested in recovery data. Unfortunately, they have not been as good at sharing information in return, something necessary to true collaboration.


There are many of us who would like to begin preparing, as soon as possible, to build the accountability tools that the American people will use. But we need to know what we can expect from Recovery.gov and that has not been forthcoming. We need to know sooner rather than later how deeply will the disclosures go. Anything short of every level is not acceptable. We also need to know what data fields and in what formats can we expect the data to be published, and where and how will we be able to access it. Ideally, there would be a dialog with the folks building Recovery.gov so that we can learn what they are planning and we can tell them what we’d like to see included.


Thank you.

[1] Jerry Brito, Hack, Mash & Peer: Crowdsourcing Government Transparency, 9 Columbia Science & Technology Law Review 119 (2008), available at http://www.stlr.org/html/volume9/brito.pdf


[2] http://sunlightfoundation.com/resources/


[3] ARRA § 1512


[4] OMB Guidance page 14-15


[5] explain what’s troublig with §1512 (d)


[6] § 1512 (c)(4)


[7] § 1512 (d)


[8] Also available at http://www.ombwatch.org/car

Comments (7)

astone said

at 12:36 pm on Mar 16, 2009


Would it be worthwhile to remind the committee that the SEC's use of XBRL is/has been reasonably successful?

astone said

at 12:40 pm on Mar 16, 2009

A couple more thoughts Jerry.

-You may want to give the briefest definition of "crowdsourcing."
-You may want to mention that your testimony has been crowdsourced - with great success.

Ted Smith said

at 12:48 pm on Mar 16, 2009

consider whether it would make sense to organize the testimony slightly differently so that these leaders can unpack all that you have to say into discrete chunks. 1) background/context (we have made promises about transparency), 2) Initial steps have been taken with success at place like StimulusWatch, 3) interested citizens deserve to be handed a schedule of which data will be available when (having no answers here ultimately disables or greatly slows the benefits of these initiatives, and 4) actually delivering the data in standard formats. In your current flow, you do an excellent job with the set up but the clarification section would benefit from breaking up the assignment into getting a draft plan ASAP and then data on whatever schedule is practical.

Eric Kansa said

at 8:44 am on Mar 17, 2009

Added some stuff about the sharing of structured data. The main thing is that we want to emphasize the need for data in OPEN (nonproprietary) formats that allow people to use these data with a variety of tools. I also said that it should be done in a "Web-friendly" manner, to show how the information architecture side of this is important. For example, you can have disclosures done in a way that only enterprise computing geeks could understand, and that wouldn't be very transparent.

Last point, is that aggregation is essential and is a key value of machine-readable data. I think this may be missing from the testimony draft. It is not feasible to simply examine at each individual disclosure report (even if presented in a machine-readable format). A key for this to work is that the data need to be easy to aggregate together (XML is great for this, and you do get it for free when using a common XML-expressed document model across all of these different agencies). When you can aggregate data, you can look at general patterns and how specific instances fit or diverge from the norm. In my view, that is an essential requirement to oversight.

Good luck!

Eric Kansa said

at 8:56 am on Mar 17, 2009

One more thing, we have a demonstration of what could do with Recovery data (and a report discussing some of the technical / architecture issues) <a href='http://isd.ischool.berkeley.edu/stimulus/2009-029/'>here</a>. Here is an<a href='http://isd.ischool.berkeley.edu/stimulus/2009-029/timemap/'>example visualization</a> of simulated recovery data using GoogleMaps and a timeline (it's a demo only, not the prettiest web page you'll ever see).

astone said

at 10:27 am on Mar 17, 2009

"...in any structured format, whether XML or Excel." Oh, I get it. A pun! Sorry to be so thick.

Eric Kansa said

at 10:35 am on Mar 17, 2009

Regarding astone's last comment. It is an excellent pun, but I entirely disagree with the statement "This approach is not necessarily a bad thing, and in fact might be a good way to ensure scalability. It is also more important to have data in any structured format, whether XML or Excel, than to wait for or require the use of a specific, national standard."

Excel is only acceptable if accompanied by XML. It is not at all acceptable by itself, since it is too difficult to aggregate multiple Excel files (which are in a proprietary format). Easy aggregation is a really, really important requirement for transparency, and you won't get that with Excel. Data should be required first to be delivered in XML, according to a common schema, and Excel should be an optional second.

You don't have permission to comment on this page.