Research Brief
Fairness in Law Enforcement Facial Recognition
This research examines fairness, subgroup performance, and deployment risk in facial recognition systems used in law enforcement and high-impact biometric decision environments.
Why This Matters
Facial recognition systems can achieve strong aggregate performance while still producing uneven error rates across demographic or operational groups. In law enforcement contexts, these failures can carry serious consequences.
Evaluation must therefore move beyond overall accuracy and examine false positive rates, false negative rates, subgroup reliability, and deployment suitability.
Key Findings
- Aggregate accuracy can conceal subgroup-level performance failures.
- False positive and false negative disparities carry different deployment risks.
- Threshold choices can affect fairness and operational reliability.
- Post-deployment auditing is essential for high-impact facial recognition systems.
Subgroup Error Disparities
Aggregate accuracy alone does not reveal how facial recognition systems perform across demographic groups. Subgroup-level false positive and false negative rates can vary substantially despite a single overall accuracy score.
| Demographic Attribute | Metric | Minimum | Maximum |
|---|---|---|---|
| Race | FPR | 0.2015 | 0.3518 |
| Race | FNR | 0.1357 | 0.3117 |
| Age | FPR | 0.2453 | 0.4601 |
| Age | FNR | 0.1117 | 0.4958 |
Subgroup-level evaluation showing substantial variation in error rates across demographic groups despite a single aggregate accuracy value.
Deployment Relevance
In law enforcement applications, model errors are not only technical misclassifications. They can influence operational decisions, public trust, accountability, and institutional legitimacy.
This work supports the broader Ducaltus focus on evaluating whether AI systems are fair, reliable, governable, and suitable for high-stakes deployment.