By Matt Fleischer-Black
Cybersecurity Law Report
New York City’s new law mandating bias audits for employment uses of AI went in force July 5, 2023. Since then, however, this pioneering AI regulation has inspired very few companies to visibly publish audit results, as the law requires. Tips from lawyers and auditors, and the Cybersecurity Law Report’s own searches, uncovered five companies’ public results.
The NYC automated employment decision tools (AEDT) law (Local Law 144 of 2021) covers the array of resume scanners and candidate scorers that companies use to help winnow all the applications they receive for NYC office jobs. The law regulates use of such tools to “substantially assist or replace discretionary decision making” about hiring or promotion in the city. The employer must disclose AEDT use to the candidate and annually publish results from a third-party audit of it for bias.
Studies have shown that approximately 75 percent of large companies use AEDTs. They range from those helping to make an initial first cut among resumes based on qualifications, to others scoring candidates for “cultural fit,” personal traits or other subjective factors. Given the prevalence of AEDTs, “I’m surprised we haven’t seen more of these audits published,” said Davis Wright Tremaine partner K.C. Halm.
This article, the first in a two-part series about this landmark AI disclosure law, breaks down the divergent approaches of the five audits and the justifications that companies invoke to avoid publishing an audit. Part two will provide top issues and enforcement risks with the law going forward and best practices to comply with it.
See “First Independent Certification of Responsible AI Launches” (Apr. 12, 2023).
A Narrow Law With New Complexities
In a required AEDT audit, a company must publish ratios comparing how the AI’s decisions on candidates correlated with race, sex and ethnicity categories. Collecting data about hiring choices is a longstanding obligation under fair employment laws, noted Resolution Economics partner Paul White, who is an auditor of workforce practices. “Companies need to check for bias no matter whether AI is involved or not,” he said.
Large employers “are the organizations most likely to use an AEDT in the manner contemplated by the law,” but they also have the most experience reporting detailed workforce statistics, Husch Blackwell attorney Keith Ybanez pointed out.
BakerHostetler partner James Sherer reported that he has not heard “a lot of organizations be truly worried about these requirements. Most are looking at it and saying, ‘we know about these issues.’” Still, beneath the law’s straightforward tasks, “it is deceptively complex,” he qualified. The law includes a novel and untested measure of bias in candidate scoring. Calculating results from algorithms involves methodological choices. The law’s ambiguities make it difficult to conclude whether it applies.
Halm likewise noted that “the disclosure requirements have been more challenging than many believe.”
Companies also are aware that once results are published, individuals might misunderstand the statistics’ significance or litigators might publicly exploit the results. Litigators, competitors or any other party could scrape up the details of public audit results, said Sherer. “This is one more entry into practices that organizations traditionally have not had to disclose,” he noted.
See our three-part series on new AI rules: “NYC First to Mandate Audit” (Jun. 15, 2022), “States Require Notice and Records, Feds Urge Monitoring and Vetting” (Jun. 22, 2022), and “Five Compliance Takeaways” (Jul. 13, 2022).
Surprisingly Few Published Audits for a City of 8 Million
The Cybersecurity Law Report informally searched for audits from the city’s 20 largest employers, including in their career sections, as well as for companies listed on AI auditors’ sites as clients. “There are some companies where I’d be fascinated to hear the reasoning why the audit is not out there,” Sherer said.
Consumer advocates and the business community both vigorously commented on two sets of proposed rules (first and second), at one point crashing the DCWP’s Zoom hearing. After a year of hubbub, the lack of disclosures so far in 2023 feels to some as incongruous as tumbleweed drifting down Fifth Avenue. “I’d love to talk to someone in the DWCP to know if they looked at how few public reports might be available,” White mused.
A first violation of the law is $500, thereafter $1,500 per day per tool. The law is enforced by the city’s Department of Consumer and Worker Protection (DCWP).
Companies’ Three Justifications for Not Publishing an Audit
Companies have been discussing a trio of reasons that may let them avoid posting an audit on their websites.
Their Tool Use Falls Outside the Law’s Scope
Sufficient Human Input in the Sorting Decisions
Several of White’s clients put aside audits once the April 2023 final rules changed a definition, he reported. The DCWP clarified that a trigger for the audit is when an AEDT’s “simplified output” outweighs other factors used to make the employment decision.
“The companies’ explanation was they had enough substantial human involvement. Their AEDT use didn’t rise to the level of replacing human decision making,” White said. Sherer similarly heard from “organizations who took the position that the tool was not sufficiently determinative” to require an audit.
“Many companies are reasonably taking the position that they use AI tools to help scale their processes a lot, but that help does not” outweigh other factors used to make the employment decision, Luminos Law attorney Jey Kumarasamy reported.
See our AI Compliance Playbook series: “Traditional Risk Controls for Cutting-Edge Algorithms” (Apr. 14, 2021), “Seven Questions to Ask Before Regulators or Reporters Do” (Apr. 21, 2021), “Understanding Algorithm Audits” (Apr. 28, 2021), and “Adapting the Three Lines Framework for AI Innovations” (Jun. 2, 2021).
There remains “room for debate” about what counts as “substantial” assistance or discretion, Ybanez noted.
A top ambiguity is how much winnowing an AEDT may do before its choices outweigh other factors, Halm suggested. When companies receive hundreds of resumes for positions, what is the magic number of applications the AEDT may discard? “If the tool excludes sets of people and leaves only 30 resumes for review by humans, arguably that is a determination relying solely on the simplified output of these systems – which brings you back in scope,” he said.
“The reality is, in many cases, human recruiters are not going to manually review all of the lowest ranked applications,” explained Kumarasamy. Yet, employers often assert that they do not blindly trust scores. Companies diverge from scores for a few reasons. The scoring algorithms sometimes return only a few highly ranked people, or the keywords that the recruiter inserted into the AEDT may not deliver candidates with the desired skill set, he elaborated.
A second ambiguity is the number of knockout questions an AEDT may combine before its output is the lead factor, Sherer said. Knockout questions eliminate candidates that do not satisfy key job requirements, like a certification, location or lifting capabilities. If a company stacks enough knockout questions together and the algorithm is sufficiently sophisticated, a published audit may be required, but it is unclear where the tipping point lies, he added.
A third ambiguity in the rule is whether the law covers only employment decisions that are final selections that eliminate candidates, Halm highlighted.
They Exclude NYC Candidates From Automated Evaluation
Cutting out NYC jobs from AI tool evaluation is another way to avoid an audit. As the law covers a clearly defined pool, “we saw internal discussions whether the employer should stop using AI tools in New York City,” Kumarasamy observed. He clarified that he did not know whether the companies with whom he spoke followed through with that approach.
Geofencing with candidate searches could burden the company’s recruitment and culture. For many, “NYC is too big of an economic center to eliminate,” Ybanez said.
A broader risk, Sherer noted, is that “New York City tends to be remarkably diverse. Removing this pool of candidates from the organization’s overall pool” could hurt the company’s results on AEDT bias. With AI regulation proposals bubbling in several other jurisdictions, “there could be the cobra effect – you don’t know if it’s going to come back and bite you,” he cautioned.
See “Takeaways From the New Push for a Federal AI Law” (Oct. 26, 2022).
They “Publish” Results Only for Active Candidates
Some companies may have applied a narrow interpretation of “publish,” reasoning that they must provide the audit summary to candidates, but not to the public, Sherer said.
The logic is that the law focuses on employment matters – and innovates in that realm. While traditional employment law has said that the company has a limited relationship with the applicant, “New York has opened that up to say, the applicant is in some ways a consumer of this recruiting process. Let’s give you more information about what is happening.” Thus, candidates must be given notice of AEDT use and availability of audit results.
See our two-part series on the practicalities of AI governance: “AI Governance Gets Real: Tips From a Chat Platform on Building a Program,” (Feb. 1, 2023), and “AI Governance Gets Real: Core Compliance Strategies” (Feb. 8, 2023).
Comparing Five Companies’ Audits
The five companies’ public audits we explored vary greatly in their approaches to the law’s obligations and the potential risks of publication. Four different auditors prepared them.
Results With Large Pools Include Fewer Red Flag Categories
Perhaps the most striking difference between the five published audits is the number of screened applicants that the companies reported. Those assessing larger populations had better results.
The law requires a company to report the adverse impact ratios (AIR) for categories that the U.S. Equal Employment Opportunity Commission uses. The AIR is calculated by finding the category with the best results, then comparing each category’s results to that. The ratio is written with the top category as 1.0, and other categories as a decimal percentage.
Under decades of discrimination law, a “four-fifths” rule of thumb says that if a less chosen category’s ratio is below .8 (8 out of 10), it may indicate an adverse impact, and unlawful discrimination may have occurred.
The law addresses two distinct AEDT practices: selection and scoring. An algorithm that sorts candidates into different bands, e.g., pass/fail, uses the “selection rate” formula to calculate AIRs. An AEDT that scores candidates on a continuum uses the “scoring rate” formula.
Small numbers in a category can produce unrepresentative results. The law allows auditors to exclude any category that comprises less than 2 percent of the pool. Applicants lacking an ethnicity or gender designation are also excluded.
The numerical results for the three companies assessing large pools of candidates were as follows:
Bloomberg’s audit evaluated 7,930 applicants with its AEDT, PLUM Assessment. These measures showed no category having an AIR under .83. It excluded five ethnicity categories with less than 2 percent of total applicants, although a footnote listed the number of applications received.
Morgan Stanley used Eightfold, an AEDT, to assess 263,064 applications, and listed no categories worse than .903 (black males). None were excluded.
NBC Universal presented an AEDT audit of SmartAssistant, evaluating cumulative results from multiple employers’ use of it, which the rules allow. The audit considered the tool’s scores for 4.1 million applications. All categories recorded an AIR of at least .82, with none excluded.
Lower applicant pools produced more variable results, a known issue:
The Hartford audited applications for two jobs, with 140 and 136 applications each.
Some categories had at least 15 candidates, and the audit reported a mix of AIRs for those, with three borderline ones of .65, .77 and .79.
Several categories had fewer than 10 candidates, showing ratios between .33 and .79. This audit excluded five ethnicity categories with even fewer candidates.
Frito-Lay, a PepsiCo food division, reported no application numbers. It excluded 13 ethnic categories, each under 10 applications, while reporting results only for six. The lower two AIRs were .62 and .65.
Historical Data Versus Last Year’s Results
Bloomberg, The Hartford and Frito-Lay companies used 2022 data for the audit, seemingly to be ready for the original enforcement date of January 1, 2023.
Morgan Stanley and NBC Universal looked at data from mid‑2022 to mid‑2023.
See “Key Legal and Business Issues in AI-Related Contracts” (Aug. 9, 2023).
AEDT Transparency Need Not Include the Tool’s Name
In an era of concern about the explainability and transparency of AI, the law nonetheless does not require naming the AI tool used. Neither audit that consultant DCI prepared (The Hartford and Frito-Lay) identified the AEDT involved. DCI declined to comment on its recommendations on this transparency question, a spokesperson said.
Vendors like SmartRecruiters, of course, want companies like NBC Universal to use their audits to aid their products’ visibility. Several other tool vendors have published audits, so other companies might follow suit in 2023 – while they still can. The vendor’s historical data “lets the company use the tool in year one,” but maybe not past that, Sherer said.
See “Innovation and Accountability: Asking Better Questions in Implementing Generative AI” (Aug. 2, 2023).
Scoring Formula Versus Selection Formula
Bloomberg, The Hartford and Frito-Lay used the AEDT to do a binary sort of candidates (either pass/fail or recommended/not designations). They applied the selection rate formula, which parallels the longstanding approach to assess algorithmic discrimination in employment, lending and insurance offers.
Morgan Stanley and NBC Universal both used AEDTs that scored candidates. The DCWP’s scoring rate formula gives the auditor four steps. The scoring rate formula is novel for legal use and its accuracy on different scoring patterns is sorely untested, Kumarasamy pointed out.
Context and Caveats
The audits diverged on how much context they supplied. Bloomberg and Frito-Lay simply provided the results and a summary of methodology.
The auditors for Morgan Stanley and NBC Universal revealed that they took extra steps with the scoring rate formula. Each stated that, in their professional view, what should be assessed differed from what the law prescribed, but they still followed the law.
The Hartford provided the most vivid concern about the law’s requirements. Its audit warned that the law mandated use of overly small sample sizes, which meant that the resulting impact ratios “may be volatile or meaningless.”
None of these AEDT audits included the top caveat about bias statistics for any non-professional reader: a company’s ultimate hiring of humans will differ – perhaps a lot – from its use of a tool during the process.
See our two-part series on managing legal issues arising from use of ChatGPT and Generative AI: “E.U. and U.S. Privacy Law Considerations” (Mar. 15, 2023), and “Industry Considerations and Practical Compliance Measures” (Mar. 22, 2023).