Book Review: The Rise of Big Data Policing

Book Review by Frank Cerwin:

The Rise of Big Data Policing

By Andrew Guthrie Ferguson from N.Y. University Press, 2017

How do you feel about being watched, surveilled, and tracked? How about knowing that you’re targeted as a potential victim? Or as a potential perpetrator of a criminal act? Obviously, a very interesting subject in light of the violence we’re currently experiencing in our society. The title peaked my interest for this reason in addition to one other. I began my IT career in county government in the 70s and one of my first projects was to design and develop the initial automation and analysis of the RAP (Report of Arrest and Prosecution) sheets for the sheriff and municipal police departments.

This book explores the subjects of whom we police, where we police, and when we police. Traditional policing reacts to calls for service, observations of police on patrol, and responses to community complaints. Big Data provides a new basis for policing. Big Data that is searchable by police includes 13 million records maintained by the National Crime Information Center (NCIC) that contain arrests, warrants, gang affiliations, terrorism ties, fugitive status, gun ownership, and even car and boat licenses. It also includes where the person works, goes to school, and their biometric data (e.g., DNA, fingerprints, and photos) and appearance characteristics such as tattoos, facial hair, and scars. Law enforcement also acquires data from data brokers that may include their social network, views, “likes”, “links”, and “loves”. The Domain Awareness System (DAS) in NYC uses 9000 closed circuit cameras to collect and monitor suspicious behavior. Facial recognition can be used to perform mug shot and drivers’ license photo comparisons to identify a person up to a distance of 600 feet. Some police departments employ a system called ShotSpotter that uses audio sensors to pick up gun shots. Automated License Plate Readers (ALPRs) can scan license plates and perform a comparison to a database of active warrants, stolen vehicles, and unpaid tickets. Stingray devices can be used to mimic a cell tower and capture phone numbers, location, date, time, and call curation. Of course, criminals don’t still and they seek methods to counter these law enforcement tools such as disposable phones, audio jamming devices, and even anti-drone hoodies.

This book addresses the question of how Big Data technologies may distort traditional Fourth Amendment rules that protect citizens against unreasonable searches and seizures by law enforcement. What constitutes reasonable suspicion when police don’t have the visibility and understanding of the non-transparent proprietary vendor analytical models used with Big Data? Analytical model algorithms are designed by humans leaving open the possibility of unrecognized human biases that may taint the process. Machine Learning (ML) algorithms are ever-evolving, never static. Can the results of a model be validated when it continually changes in real-time? Additionally, algorithms are no better than the data they process which may by unreliable, outdated, or reflect bias. Moving to a Big Data analytics approach further creates data quality risks since crime statistics, tips, cooperating witnesses, nicknames, and detective notes are not uniform and not always accurate. The geographic area from which the data is collected is a factor and impacts police focus. Once a crime is solved and perpetrators are caught, is data adjusted to re-align the targeting of an area for future surveillance? The author highlights 4 root data problems. First, a definitional bias related to the labeling of crimes. Does breaking and entering consistently get coded as a “burglary” even when nothing is reported as stolen? What constitutes a “violent” crime across multiple jurisdictions? Secondly, an algorithm training bias based on historical data where the “why” is not known because Big Data only provides a perspective of correlation, not cause. Next, the problem with feature selection bias related to the size of the data collection area, the defined collection time frame, and specific features chosen to be collected. Lastly, proxy bias whereby data encodings can be redundant when the membership of a criminal act is encoded in other data.

The author concludes with the following questions that can relate to Big Data analysis beyond law enforcement:

Can you identify the risk that Big Data is trying to address?
Can you defend the inputs into the system? (Inclusive of data quality and methodology)
Can you defend outputs and the impact of the outputs?
Can you test the technology to provide a measure of transparency?
Is the use of technology respectful of the autonomy of the people it will impact?

All in all, a very fascinating book with a lot of food for thought about the balance of the use of data, privacy, and public safety.