From Data Governance to AI Governance: A Necessary Shift
- Jun 8
- 5 min read
At Gabriel Greenfield, we work with organizations that have invested years in their data governance. And today, many of them are asking the same question: “Does our data governance prepare us to govern AI?”
Yes, but not automatically. The foundations are there. The transition, however, must be built consciously. Here are the 10 pillars of this transition, along with what we’re seeing in practice.
① Data quality → Reliability of production models
In traditional data governance, you assess the completeness, consistency, and timeliness of your datasets. These same practices directly determine the reliability of your models in production.
A concrete example: a bank that had strict rules regarding the quality of its customer data (deduplication, address standardization, consistency of identifiers) was able to train its first credit scoring models with a near-zero error rate. Conversely, an organization that had “let its standards slip” saw its models produce inconsistent recommendations within the first few months.
The rule is simple: clean data produces stable models. Data quality debt becomes AI reliability debt.
② Data traceability → Detection of biases & audit trails
Documenting where data originated, how it was processed, and who modified it: this is exactly what you need to trace the source of algorithmic bias.
A concrete example: an HR scoring model was systematically rejecting candidates from certain universities. The investigation revealed that the historical hiring data used for training reflected 15 years of human recruitment bias. Without data lineage, it would have been impossible to identify this source of contamination. With full traceability, the team was able to pinpoint the source, correct the dataset, and document the audit trail for compliance. Data lineage isn’t a bureaucratic formality. It’s your AI forensics tool.
③ Access controls → Boundaries of ethical use
In data governance, you define who can read, modify, or export sensitive data. In AI governance, the question becomes: on what data is a model permitted to train?
Real-world example: At an insurance group, medical data and customer behavior data were stored in separate silos with strict access controls for humans. When deploying a pricing model, no one anticipated that the training pipeline would “see” both silos simultaneously. The result: a legally non-compliant model, trained on prohibited correlations. An RBAC (role-based access control) system extended to ML pipelines would have prevented the incident.
What your access controls protect for humans must also protect what your models learn.
④ Data cataloging → Model registry & discoverability
You’ve centralized your dataset metadata so that teams can find the right data at the right time. The same logic applies to your AI assets.
Real-world example: During an AI audit at a major industrial group, 23 different models were running in production, including 7 for which no one knew exactly who the owner was, what the training data was, or when they were last validated. Some dated back to 2021 and had never been retrained. Without a centralized registry (model registry), the organization was operating blindly. The implementation of a model catalog, modeled after the architecture of the existing data catalog, solved the problem.
Just as your data catalog is for your data, a model registry must be for your models.
⑤ Compliance frameworks → Preparing for the EU AI Act
GDPR, CCPA, HIPAA: You’ve learned to map personal data flows, conduct impact assessments, and respond to data subjects’ rights. These skills are directly applicable to the EU AI Act.
Real-world example: A fintech company already well-versed in GDPR realized that 80% of its AI compliance documentation already existed in another form. Processing records became records of high-risk AI systems. Data Protection Impact Assessments (DPIAs) turned into algorithmic compliance assessments.
The EU AI Act has extended existing bureaucracy to new areas. Organizations that understand this are 18 months ahead of those that treat the AI Act as a separate issue.
⑥ Data Stewardship → Responsibility for models
Data stewardship assigns clear owners to each data domain: someone responsible for quality, usage, and compliance. With no equivalent for AI models, organizations fall into the classic “not my problem” trap.
A concrete example: a model for detecting bank fraud was producing an abnormally high false positive rate on cross-border transactions. For six weeks, the problem bounced back and forth between the data team, the ML team, compliance, and IT. There was no designated “model owner.” Since the introduction of a Model Steward role, this type of uncertainty has disappeared. Every model in production has a designated owner with clear reporting obligations.
⑦ Schema Standards → Training Data Specifications
You define rules regarding data types, acceptable formats, and validation rules. These standards have a direct, and often underestimated, impact on how models behave.
A concrete example: two teams within the same organization were training sales forecasting models on product sales data, one using amounts in euros excluding tax, the other in euros including tax, without documenting the difference. The models consistently produced divergent forecasts. A simple schema standard (requiring all monetary data to be expressed in base currency, excluding tax) would have saved six months of debugging. Standardizations that seem trivial in data engineering are critical in ML.
What you standardize in your schemas, you control in your models.
⑧ Versioning & Change Management → Monitoring Model Drift
You track the versions of your datasets, structural changes, and changes over time. This same practice forms the basis for detecting model drift.
Real-world example: A product recommendation model trained in 2023 on pre-inflation purchasing behavior began producing erroneous recommendations in 2024. Consumer behavior had changed radically, but the model had not been retrained. An organization with rigorous data versioning would have detected the divergence between the training data distributions and the production data long before the problem became visible in the outputs. Data versioning is the early warning system for drift.
A model doesn’t break down. It ages, and your versioning system can detect this aging.
⑨ Security & Encryption → Adversarial Defense
Encrypting data at rest and in transit, securing data transfer pipelines: you’re already doing this to protect confidentiality. In an AI context, these same measures protect against a new breed of attacks.
Real-world example: During a penetration test on a financial institution’s ML infrastructure, auditors successfully injected corrupted data into an unsecured training pipeline. This is a classic case of “data poisoning.” The resulting model produced outputs that were slightly biased toward certain risk profiles. Without the penetration test, no one would have detected the manipulation. The pipeline protections you’ve implemented for GDPR compliance are the same ones that protect your models against prompt injection, data poisoning, and model extraction.
The AI attack surface is an extension of your data’s attack surface.
⑩ Quality metrics → Explainability requirements
You measure the quality of your data using KPIs: completeness rate, error rate, and data freshness. The EU AI Act and emerging AI governance standards require the same level of rigor for model outputs.
Real-world example: A consumer credit organization that had invested in highly detailed data quality dashboards was able, within a few weeks, to extend this measurement culture to its scoring models: cross-group fairness metrics, prediction confidence intervals, and the explainability rate of rejection decisions. Where other organizations took 18 months to build their explainability framework, they did it in two months—because the culture of measurement already existed. Explainability is not a technical constraint. It is a measurement discipline applied to a new object.
Key takeaways from these observations:
AI governance is a matter of organizational adaptation. Successful organizations do not reinvent their frameworks. Instead, they methodically extend them by clearly understanding what can be carried over and what requires adaptation, and they equip themselves accordingly.
This is exactly what Gabriel Greenfield’s Meridian methodology structures: mapping your current maturity in Data Governance, identifying the 10 critical translations, prioritizing initiatives based on your regulatory exposure and AI portfolio, and supporting the organization over the long term.
AI governance is not a fresh start. It is the logical continuation of work you have already begun. Would you like to map your maturity in AI governance and identify your priorities? Contact us.




Comments