The original is one click away. Open original ↗
Why banning ChatGPT from your company is a mistake
Executive overview
Many security teams are blocking employees from using AI tools like ChatGPT over fears of sensitive data leakage. The actual probability of data being memorised and extracted from a large language model is roughly one in a million under realistic conditions.
Banning AI tools costs far more in lost productivity than the marginal data-leakage risk justifies.
Why LLMs are not databases
- LLMs generate text from patterns — they do not store or retrieve inputs like a database
- Memorisation does occur but requires extreme repetition of the same data (e.g. 5 million instances of one SSN in a 500-billion-word dataset)
- Estimated probability of leakage under those conditions: ~0.0000001 (one in a million)
- Three factors govern memorisation likelihood: input frequency, dataset size, and model architecture
Three additional barriers to successful data extraction
- Memory threshold — data must be repeated enough times to be memorised at all
- Formatting — the model must also reproduce the exact format of sensitive data (e.g. SSN or phone number structure)
- Accuracy validation — an attacker cannot easily verify whether extracted data is real or hallucinated
Built-in mitigations already available
- GPT-4's technical paper explicitly documents filtering of private information from training data
- Users can opt out of contributing data to model training
- Internal policies can instruct employees not to paste sensitive data into public tools
The real risk: RAG architectures
- Retrieval-augmented generation (RAG) connects an LLM to a live vector database and agents
- An attacker who can manipulate prompts may be able to retrieve actual stored documents
- This risk is meaningfully higher than base LLM memorisation and warrants dedicated attention
- RAG deployments need prompt injection defences and access controls on the vector database
The cost-benefit case for allowing AI tools
- One study found a 14–34% productivity increase per hour for employees using AI versus those who do not
- Productivity gains compound into automation, revenue, market share, and customer satisfaction
- Blocking a tool with this upside requires a proportionate, evidence-based risk justification — not precaution alone
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.