A Book Review (and how it helped me rethink "data quality")
There are stories in the data. You just have to know how to read them.
So opens Dan Bouk’s “Democracy’s Data: The Hidden Stories about the U.S. Census and How to Read Them.” The book has received incredibly positive reviews and for good reason: it’s beautifully written and insightful scholarship.
Bouk claims to specialize in studying “modern things shrouded in cloaks of boringness”. And he’s clearly very good at it.
And yet, convincing you to read Bouk’s book is not the point of this post. Instead, I wanted to share that Bouk’s book caused me to reconsider a concept that is pretty familiar in the world of data: “data quality”.
Data quality is the sort of thing that people who deal with data think about a lot and talk a lot about. We’ll observe that, when making decisions based on data, we need to make sure that we have “good data”. Bad data, on the other hand, causes people to knowingly observe “garbage in, garbage out”.
In the preface of his book, Bouk makes a statement that (at first blush) runs in the same direction about data quality: “what is our democracy, if this is its data?” As I started on the book, I imagined: “yes, surely the book will reveal how an overconfident census leads to all kinds of political problems. Garbage in, garbage out, right?” And, sure, I guess there’s some of that in there.
But, reading Democracy’s Data offers an alternative framing: behind any data are stories. When you read data, if you read it deeply and with dignity, you can learn greater truths about the motivations and goals across the data lifecycle.
Bouk’s effort largely returns to the “doorstep interactions” of the decennial census and he described all of the challenges of capturing, classifying, and reporting the number of people in our country and facts about them. Issues of race, gender, politics, and more suffuse the “simple” counting of people on a given day. But, by reading in the data (and what was left out of the data), we better understand the complexities and richness of our nation. We see how people imagined and contributed to a better world, lived through fear and struggle, created beauty and art. We see how the “mistakes” of the counts – and the omissions from the counts – revealed and reflected people more accurately.
This book taught me, therefore, to remove the phrase “garbage in, garbage out” from my vocabulary when talking about data. Instead, when confronted with the inevitable “data quality” issues that affect any organization, I will try and approach the discussion with greater curiosity about the question: “what is this organization, if this is its data?”
Data, as it happens, is never really garbage. Rather, looking for the stories behind the data allows questions of data quality to be an entry point to study the motivations and actions of the collectors, the processors, the managers, the publishers. It may be messy (just like democracy), but only by better understanding it can we truly make progress.