Research Brief

Fairness in Law Enforcement Facial Recognition

This research examines fairness, subgroup performance, and deployment risk in facial recognition systems used in law enforcement and high-impact biometric decision environments.

Why This Matters

Facial recognition systems can achieve strong aggregate performance while still producing uneven error rates across demographic or operational groups. In law enforcement contexts, these failures can carry serious consequences.

Evaluation must therefore move beyond overall accuracy and examine false positive rates, false negative rates, subgroup reliability, and deployment suitability.

Key Findings

Subgroup Error Disparities

Aggregate accuracy alone does not reveal how facial recognition systems perform across demographic groups. Subgroup-level false positive and false negative rates can vary substantially despite a single overall accuracy score.

Demographic Attribute Metric Minimum Maximum
Race FPR 0.2015 0.3518
Race FNR 0.1357 0.3117
Age FPR 0.2453 0.4601
Age FNR 0.1117 0.4958

Subgroup-level evaluation showing substantial variation in error rates across demographic groups despite a single aggregate accuracy value.

A single aggregate accuracy value can coexist with substantially different subgroup error rates, reinforcing why deployment evaluation must include subgroup-level auditing rather than relying solely on overall performance.

Deployment Relevance

In law enforcement applications, model errors are not only technical misclassifications. They can influence operational decisions, public trust, accountability, and institutional legitimacy.

This work supports the broader Ducaltus focus on evaluating whether AI systems are fair, reliable, governable, and suitable for high-stakes deployment.

View paper on arXiv