The CISO playbook for generative AI has evolved over the past year and a half, with a focus on controlling the browser. Security teams have been implementing tighter cloud access security broker (CASB) policies, monitoring traffic to AI endpoints, and routing usage through approved gateways. The goal was to observe, log, and prevent sensitive data from leaving the network through external API calls. However, a new trend known as Shadow AI 2.0 or the “bring your own model” (BYOM) era is emerging, challenging this traditional model.
A shift towards running large language models (LLMs) locally on endpoints is becoming more common. This means employees are now capable of running models offline on their laptops without making any API calls or leaving a network signature. While the governance conversation has primarily focused on data exfiltration to the cloud, the immediate risk for enterprises is now centered around unvetted inference happening locally on devices.
The practicality of local inference has increased significantly in the past few years due to advancements in consumer-grade accelerators, mainstream quantization techniques, and frictionless distribution of models. Engineers can now work with multi-GB model artifacts offline, performing tasks like source code review, document summarization, and exploratory analysis without any outbound packets or cloud audit trails.
With local inference becoming more prevalent, the focus of security teams needs to shift from preventing data from leaving the company to ensuring the integrity, provenance, and compliance of the models being used locally. There are three key blind spots that enterprises need to address when it comes to local inference:
– Code and decision contamination: Unvetted models used locally can introduce security vulnerabilities without leaving a trace.
– Licensing and IP exposure: Running models locally can bypass normal procurement processes, leading to potential licensing and intellectual property issues.
– Model supply chain exposure: Endpoints storing large model artifacts can be vulnerable to malicious payloads, highlighting the need for a robust supply chain management approach.
To mitigate the risks associated with BYOM, organizations should treat model weights as software artifacts and implement endpoint-aware controls. This includes moving governance down to the endpoint, providing a curated internal model hub, and updating policy language to explicitly cover local inference usage. By focusing on controlling artifacts, provenance, and policy at the endpoint, CISOs can effectively manage the shift towards local inference without compromising productivity.
In conclusion, the perimeter of AI governance is shifting back to the device as local inference becomes more prevalent. CISOs need to adapt their strategies to address the challenges posed by Shadow AI 2.0 and ensure that security measures extend to the endpoints where AI activities are taking place. By implementing endpoint-aware controls and governance practices, organizations can effectively manage the risks associated with local inference and maintain a secure AI environment.
