What data should an AI privacy assessment cover?

Cover prompts, uploaded files, retrieval content, logs, feedback, memory, analytics, support records, and data shared with model or platform providers.

Can sensitive data be used with generative AI?

It depends on the workflow, lawful basis, controls, provider terms, and review requirements. Sensitive data needs stronger controls and clearer purpose.

What is the fastest privacy improvement?

Reduce unnecessary data collection and logging, then add clear retention rules and access controls for remaining data.

AI Privacy Impact Assessment Guide for Generative AI

// 01

Follow the data, not the buzzwords

A privacy impact assessment for generative AI should trace data across the full workflow. Data may enter through prompts, uploaded files, retrieval stores, application logs, feedback forms, memory, analytics, and support records.

The main privacy question is not whether a model is modern. It is whether the organisation can explain what personal or sensitive data enters the system, why it is needed, who can access it, how long it is retained, and how it is protected.

// 02

Map collection and purpose

Start by documenting each data category and the purpose it serves. If a data field does not improve the task, remove it. If sensitive data is required, explain why and add controls that match the sensitivity.

Purpose should be narrow. A prompt collected to answer a support question should not silently become training material, analytics material, or future memory unless the user has been told and the organisation has a lawful basis.

// 03

Review prompts, files, and retrieval stores

Generative AI systems often blur boundaries between user input and knowledge base content. A privacy assessment should separate prompt data, uploaded files, retrieved sources, tool outputs, and final responses.

Retrieval stores need special attention. Teams should check permissions, source age, deletion process, indexing rules, and whether restricted information could be surfaced to the wrong user.

// 04

Control logs and memory

Logs are useful for debugging and safety review, but they can also preserve sensitive data longer than necessary. Define what is logged, who can view logs, how long logs are retained, and how high-risk content is redacted.

Memory should be opt-in or tightly scoped for sensitive workflows. Users should know when memory is active and have a way to correct or remove stored facts.

Minimise prompt and response logging where possible.
Mask or redact sensitive fields before logs are stored.
Use role-based access for review logs.
Set retention periods for prompts, files, outputs, and feedback.

// 05

Assess model and vendor handling

Where third-party model or platform providers are used, assess data processing terms, retention settings, sub-processors, region controls, security certifications, incident notification, and whether customer data may be used to improve services.

Configuration matters. Two teams can use the same provider with different privacy outcomes depending on logging, retention, training, and integration settings.

// 06

Document user-facing controls

Privacy controls should show up in the product. Users should know what not to enter, when AI is being used, how sensitive data is handled, and how to request correction or deletion where applicable.

A good AI privacy assessment ends with product changes: copy, warnings, input constraints, upload restrictions, access controls, logging limits, and deletion workflows.