Think Twice: Leveraging AI to Improve Government Efficiency in HK
The public can find Chief Executive John Lee’s 4th policy address most reformist in years. AI has become not only a major component of the new industrial policies, but also a revolutionary force to the government as Lee aims to improve civil service efficiency by establishing an AI Efficacy Enhancement Team.
The idea of applying AI in the government to reduce staffing demand gained traction during the fiscal deficit debate early this year. Regina IP, the convenor of the Executive Council in Hong Kong, voiced her unequivocal belief in AI’s potential to replace civil servants. HKGAI V1 and other AI products developed by local research center HKGAI, provide trustworthy tools for the trial application at over 70 government agencies. John Lee shared initial success and efficiency improvement in Census and Statistics Department, as well as the 1823 hotline.
There’s little doubt that AI will be eventually embedded into every facet of society, including civil service. However, AI may not be the short-term band-aid to civil service’s recent reputation crises and is likely to compromise its integrity and quality of service unless the AI Efficacy Enhancement Team can look past the KPI on deployment and exercise caution.
First, AI can amplify administrative evil due to bounded rationality, optimism bias, and social pressure. A 2011 research discovered that judges were more lenient at the start of shifts and after meal breaks, but harsher in between—evidence that even expert decisions degrade under cognitive load. Now imagine these are civil servants deciding applicants’ eligibility for social welfare or benefits. They are not making tens, but hundreds or even thousands of decisions per day thanks to AI. It won’t be alarmist or fear-mongering to predict an increasing amount of careless mistakes that enlarges administrative evil and erodes public trust as precedents in Canada, Finland, and Poland have been highlighted by OECD.
As early as 2019, Poland piloted with automated decision systems to profile the unemployed into three categories which determine what types of programs a subject is eligible for. The system was meant to serve an advisory role. However, the study discovered that less than 1 in 100 decisions by the algorithm was questioned by responsible clerks, when the algorithm's high precision was considered to be unlikely. Among the interview feedback, one reason for not challenging decisions by AI is said to be the “lack of time to ponder its details”. Other claims include the “belief in the objectivity of the process” and “fear of repercussions from the supervisors”, revealing salient behavioral frailties like optimism and social pressure under which civil servants can turn a blind eye to AI’s limitations. This application was ruled unconstitutional by Poland’s Constitutional Tribunal.
These problems are not unique to early forms of AI. Generative AI has also seen its misuse in academia, legal practices, and politics, which one can make similar diagnoses.
Second, AI can affect the quality of public service and neglect humanitarian concerns when juxtaposed with inappropriately set performance measurement. When performance is increasingly emphasized by the city’s leader, departments are pressed to demonstrate results such as the number of public houses reclaimed. It remains to see if and how AI can take into account humanitarian factors if applied in reclaiming public housing. In addition, peer or inter-departmental competition may risk overriding concerns for decision fairness and implementation quality. Lastly, capability of AI is only going to amplify this deterioration in both service and integrity.
The third reason is the difficulty in delineating accountability. Accountability should not be a post-crisis blame game, but shape social expectations and oversight on civil servants’ behavior. When accountability is skewed towards users (i.e. civil servants), risk aversion and reputational risks are likely to rein in reckless decisions and careless implementation to some extent. When accountability is divided among bureaucrats, AI developers, and providers, a sense of responsibility and mental burden is lessened, increasing the risks of blame avoidance and trust erosion.
Safeguarding public values and civil service integrity
Existing guidelines on developing and applying Artificial Intelligence remain focused on ethical, privacy, and safety risks of the technology itself, leaving the risks arising from workflow integration or function replacement unchecked. The government should commence a risk assessment study into existing AI applications to identify both exposed and potential risks from integration and replacement, before marching into widespread adoption across departments and application scenarios. It should comprehensively examine the sufficiency of existing AI guidelines and regulation before incorporating revisions or setting up AI user guidelines for civil service to safeguard public value delivery and government integrity.
If Hong Kong is to revise its AI guidelines, the revisions should go beyond principle and into design. Behavioral insights can and should be embedded in technical and UI details to slow down error, surface doubt, and encourage challenge. High-stakes decisions should trigger “slow mode” interfaces that deliberately introduce friction: require officers to provide a short justification to “accept with reasons” or “override with reasons”, and include a brief timed pause and checklist before confirmation or submission. Visually separate AI suggestions from human inputs to reduce automation bias, and hide the AI answer until the officer records an initial view in defined scenarios to promote independent judgment. Build one‑click “raise a flag” reporting that routes concerns to an independent team with protection from retaliation.
Beyond screens, workflows and institutional design must hedge against individual carelessness. Replace single‑point approvals with multi‑eyes review for rights‑related decisions. For high‑volume, low‑stakes cases, apply randomized audits and rapid post‑decision checks. Mandate escalation to specialist panels for complex or novel cases and allow provisional decisions with limited effect pending a secondary review. Institutionalize red‑teaming to test adversarial cases, probe failure modes and feed fixes into training and model updates.
If the AI Efficacy Enhancement Team is to look past mere application counts, it must measure and disclose success by safety and quality, beyond deployments: audited error rates; appeal overturn rates; detection and correction time for model drift; and completion rates of slow‑mode justifications. Track near‑miss reports as a positive indicator of vigilance, and tie funding tranches to reductions in substantiated complaints and improvements from blind reviews.
Finally, codify responsibility so no one can hide behind the algorithm. Users retain decision accountability for legality, fairness, and adequacy of reasons, attesting to compliance on each case. Department heads and the AI team hold oversight accountability for fitness‑for‑purpose, monitoring, audits, and training. Developers and providers hold design accountability for documentation, data provenance, robustness, and timely remediation. Embed these duties in procurement, approval memos, and civil service codes, with clear incident escalation and proportionate sanctions mapped to each domain of responsibility.