Back to challenges

Challenge 03

GDPR Data
Discovery

Build a prototype that helps organizations identify, classify, and responsibly delete personal and GDPR-relevant data at scale.

The problem

GDPR requires companies to manage personal data transparently and delete it once retention periods are exceeded. In large organizations, that obligation collides with massive, distributed data landscapes.

A company may need to reason across hundreds of thousands of OneDrives, globally distributed shared drives, SharePoint sites, and other sources. Manual auditing is hardly feasible at that scale.

The goal is a proof of concept that can reliably identify sensitive data, categorize it, attribute it to a responsible person, and support deletion when required.

Focus 01

Find sensitive data

Detect personal and GDPR-relevant information across distributed corporate sources such as OneDrive, SharePoint, shared drives, and comparable repositories.

Focus 02

Classify with context

Use AI-supported categorization together with classic search signals so reviewers understand why a file was flagged.

Focus 03

Keep humans accountable

Route findings to a direct owner or Master of Data for final review and deletion decisions before action is taken.

What you are building

An AI-assisted data protection scanner with human review.

Design and build an innovative software concept for automated identification and categorization of personal and GDPR-relevant data across corporate data sources. The result should be a prototype that can serve as a basis for future scaling.

The scan should support a full initial search and later delta scans that only review files modified since the last scan. AI can suggest categories, but final review and deletion should remain with a human reviewer to support compliance.

Every finding should be attributable to a responsible person: directly through an owner in sources such as OneDrive, or indirectly through a Master of Data for shared repositories.

Provided resource

Starter material

The sample repository includes PDF documents such as expense reports, IT access requests, incident reports, supplier onboarding, and training evaluation examples for testing.