Research Brief

Fairness in Law Enforcement Facial Recognition

This research examines fairness, subgroup performance, and deployment risk in facial recognition systems used in law enforcement and high-impact biometric decision environments.

Why This Matters

Facial recognition systems can achieve strong aggregate performance while still producing uneven error rates across demographic or operational groups. In law enforcement contexts, these failures can carry serious consequences.

Evaluation must therefore move beyond overall accuracy and examine false positive rates, false negative rates, subgroup reliability, and deployment suitability.

Key Findings

Aggregate accuracy can conceal subgroup-level performance failures.
False positive and false negative disparities carry different deployment risks.
Threshold choices can affect fairness and operational reliability.
Post-deployment auditing is essential for high-impact facial recognition systems.

Subgroup Error Disparities

Aggregate accuracy alone does not reveal how facial recognition systems perform across demographic groups. Subgroup-level false positive and false negative rates can vary substantially despite a single overall accuracy score.

Demographic Attribute	Metric	Minimum	Maximum
Race	FPR	0.2015	0.3518
Race	FNR	0.1357	0.3117
Age	FPR	0.2453	0.4601
Age	FNR	0.1117	0.4958

Subgroup-level evaluation showing substantial variation in error rates across demographic groups despite a single aggregate accuracy value.

A single aggregate accuracy value can coexist with substantially different subgroup error rates, reinforcing why deployment evaluation must include subgroup-level auditing rather than relying solely on overall performance.

Deployment Relevance

In law enforcement applications, model errors are not only technical misclassifications. They can influence operational decisions, public trust, accountability, and institutional legitimacy.

This work supports the broader Ducaltus focus on evaluating whether AI systems are fair, reliable, governable, and suitable for high-stakes deployment.

View paper on arXiv