Why Data Cleansing Must Happen Before Governance

The Foundation-First Approach

By Raghu Vishwanath, Managing Partner | December 2025 | 9 min read

“We implemented SAP MDG last year. Why is our data quality still terrible?”

The VP of Operations stared at the dashboard showing duplicate parts, missing specifications, and classification chaos—all the problems their expensive Master Data Governance tool was supposed to solve.

The answer was uncomfortable: they built governance on top of garbage.

The Backwards Approach That Fails

Most organizations approach data governance in the wrong sequence:

Step 1: Buy enterprise MDG platform (SAP, Oracle, IBM)
Step 2: Configure complex approval workflows
Step 3: Deploy to organization
Step 4: Wonder why data quality doesn’t improve

This approach fails because governance tools control how new data enters your system—they don’t fix what’s already there.

It’s like installing an elaborate security system in a house with a crumbling foundation. The security works perfectly, but the house is still falling apart.

What Governance Actually Does

Let’s clarify what governance platforms actually accomplish:

Governance platforms excel at:

Preventing new bad data from entering the system
Enforcing validation rules on incoming requests
Managing approval workflows for future data
Maintaining quality standards going forward

Governance platforms do NOT:

Eliminate existing duplicates
Fix incomplete historical records
Standardize legacy descriptions
Correct classification errors
Repair structural data issues

In other words: governance protects against future pollution. It doesn’t clean up existing contamination.

Why Organizations Get This Wrong

The backwards approach persists for predictable reasons:

1. Software Vendors Push Governance First

MDG vendors sell expensive enterprise platforms. Their business model depends on selling governance tools, not cleansing services.

When you ask about data quality, they answer with governance features. They’ll demonstrate workflow engines, validation rules, and stewardship dashboards—all impressive technology that doesn’t address your actual problem.

What they won’t tell you: All these features assume you’re starting with clean data.

2. “We’ll Clean As We Go”

Organizations convince themselves they can govern and cleanse simultaneously:

“We’ll implement governance now, then gradually improve quality through the governance process.”

This sounds reasonable but fails in practice because:

Validation rules reject most requests when data is dirty
Data stewards spend all their time fixing requests instead of governing
Users bypass the system when approval takes too long
Duplicates and errors persist because governance doesn’t touch existing records
After 18 months, you have an expensive governance tool and unchanged data quality

3. Cleansing Looks Expensive

Baseline cleansing requires upfront investment:

Data profiling and assessment
Deduplication algorithms and analysis
Attribute standardization
Classification correction
Data steward time and effort

Executives see this cost and want to skip it. But implementing governance without cleansing wastes even more money—you just spend it over 3-5 years instead of 3-5 months.

4. Governance Sounds Strategic

“Data Governance Initiative” sounds more strategic than “Data Cleansing Project.”

Governance implies forward-thinking leadership. Cleansing implies you let things get messy.

But clean data is the strategic asset—governance is just the maintenance plan.

The Foundation-First Approach

The correct sequence is simple:

Phase 1: Baseline Cleansing (2-6 months)

Assess current data quality comprehensively
Eliminate duplicates across the entire catalog
Standardize descriptions and attributes
Correct classification errors
Fill critical data gaps
Establish clean baseline

Phase 2: Governance Implementation (3-6 months)

Deploy governance platform on clean foundation
Configure validation rules that actually work
Establish approval workflows
Train data stewards on maintenance, not firefighting
Monitor and optimize

Phase 3: Continuous Improvement (Ongoing)

Governance prevents new issues
Metrics show sustained quality
Stewards focus on policy and improvement
Technology investment delivers ROI

Notice: Phase 2 only works if Phase 1 is complete.

Real-World Example: Two Different Paths

Company A: Governance Without Cleansing

Their approach:

Implemented SAP MDG ($4M investment)
Skipped baseline cleansing (“too expensive”)
Created validation rules on dirty data

18 months later:

67% of part requests rejected by validation rules
Data stewards spending 30 hours/week fixing requests
Technicians bypassing system via email requests
Management considering scrapping MDG entirely
Data quality unchanged from pre-MDG baseline

Total cost: $4M (implementation) + $800K/year (stewarding costs) + ongoing operational losses

Company B: Foundation-First Approach

Their approach:

8 weeks baseline cleansing (eliminated 50K duplicates)
Then implemented governance platform
Configured validation rules on clean data

12 months later:

94% of requests pass validation first time
Data stewards focused on policy and improvement
High user adoption (system seen as helpful, not obstructive)
Data quality sustained and improving
Measurable procurement savings from better data

Total cost: $1.2M (cleansing) + $3M (governance) = $4.2M total, but ROI positive within 18 months

Same total investment. Dramatically different outcomes.

The difference? Company B built governance on a solid foundation.

How to Assess Your Foundation

Before implementing governance, assess your current data quality:

Key Metrics to Measure:

Duplicate rate:

What percentage of your records are duplicates?
Target: <2% before governance implementation

Attribute completeness:

What percentage have manufacturer part numbers?
What percentage have complete technical specifications?
Target: >90% completeness for critical attributes

Classification accuracy:

What percentage properly classified?
Target: >95% using industry-standard taxonomy

Naming consistency:

Do similar items have similar descriptions?
Target: Standardized naming conventions applied consistently

If any metric is below target, you need baseline cleansing first.

Quick Assessment Process:

Sample your catalog (pull 1,000 random records)
Manually review for duplicates, missing data, classification errors
Calculate percentages for each metric
Extrapolate to full catalog to understand total scope

If you find significant issues in the sample, assume the full catalog is worse.

The Economics of Sequence

Let’s examine the cost difference:

Governance-First Approach (The Expensive Path):

Year 1:

Governance platform: $4M
Failed implementation due to dirty data
Steward firefighting: $800K

Year 2:

Emergency cleansing project: $1.5M
Governance reconfiguration: $500K
Continued steward costs: $800K

Year 3:

Still fixing issues: $800K

Total 3-year cost: $8.4M
ROI: Negative

Foundation-First Approach (The Efficient Path):

Year 1:

Baseline cleansing: $1.2M (Months 1-2)
Governance implementation: $3M (Months 3-8)
Normal steward operations: $400K

Year 2:

Sustained operations: $400K
Measurable savings: $2M+

Year 3:

Sustained operations: $400K
Cumulative savings: $4M+

Total 3-year cost: $5.4M
ROI: Strongly positive by Year 3

The foundation-first approach costs $3M less over three years while delivering actual results.

What Happens Without Cleansing

When you implement governance on dirty data, you get:

1. Validation Rule Chaos

Your validation rules must handle all existing data variations:

47 different ways to describe the same bearing
Missing manufacturer part numbers in 60% of records
12 different classification categories for identical items

You have two choices:

Strict rules that reject 70% of requests (users hate the system)
Loose rules that allow bad data (governance doesn’t work)

With clean data, validation rules can be strict AND users stay happy.

2. Steward Burnout

Data stewards become full-time firefighters:

Spending 80% of time fixing malformed requests
No time for actual governance activities
Constant pressure to “just approve it”
High turnover as stewards get frustrated

With clean data, stewards govern instead of firefight.

3. Governance Bypass

When governance creates friction, users find workarounds:

Email requests directly to procurement
Phone calls to maintenance planners
Spreadsheet-based shadow systems
Anything to avoid “that terrible approval system”

With clean data, governance helps users instead of blocking them.

4. Ongoing Contamination

Governance can’t fix existing duplicates, so:

Users keep selecting wrong parts
Procurement keeps ordering duplicates
Inventory keeps growing
Costs keep climbing

Governance alone cannot solve quality problems that already exist.

The Integration Imperative

Baseline cleansing and governance aren’t separate initiatives—they’re two phases of one transformation:

Phase 1: Foundation (Cleansing)

Eliminate accumulated pollution
Establish quality baseline
Create clean slate for governance

Phase 2: Protection (Governance)

Prevent new pollution
Maintain baseline quality
Enable continuous improvement

Skipping Phase 1 makes Phase 2 impossible.
Stopping after Phase 1 makes quality temporary.

You need both. In sequence.

When to Start Phase 2

How do you know you’re ready for governance?

Start Phase 2 when you achieve:

✓ Duplicate rate below 2% across critical catalog segments
✓ Attribute completeness above 90% for mandatory fields
✓ Classification accuracy above 95% using standard taxonomy
✓ Naming standards applied consistently across similar items
✓ Data steward agreement that quality is sustainable

Until you hit these targets, keep cleansing.

Implementing governance too early wastes the governance investment. Better to delay governance and do both phases properly.

The Bottom Line

Data governance is essential—but only on a clean foundation.

Implementing governance without first cleansing your data is like:

Painting a rusty car (looks better briefly, then rust returns)
Organizing a cluttered room without throwing anything away (organized chaos is still chaos)
Installing advanced security in a contaminated building (securing pollution)

The sequence matters:

Clean first. Govern second. Maintain forever.

This is the only path to sustainable data quality and governance ROI.

What To Do Next

If you’re planning a data governance initiative:

Step 1: Assess Your Foundation

Get a comprehensive data quality assessment:

Duplicate rate across catalog
Attribute completeness gaps
Classification accuracy issues
Standardization level

Don’t guess. Measure.

Step 2: Budget for Both Phases

Include baseline cleansing in your business case:

Phase 1: Foundation cleansing (20-30% of budget)
Phase 2: Governance implementation (70-80% of budget)

Don’t try to skip Phase 1. You’ll pay more later.

Step 3: Execute in Sequence

Resist pressure to “go live with governance quickly”:

Clean foundation first (2-6 months)
Deploy governance second (3-6 months)
Results will justify the sequence

Foundation-first always wins.

Client Example: Foundation-First Success

Major Industrial Conglomerate
Multi-sector Manufacturing | 100K+ parts across business units

The Challenge

Years of decentralized procurement and inconsistent data practices created a fragmented MRO catalog across business units. With plans to consolidate onto a unified EAM platform, they needed clean, standardized data as the foundation—not just governance tools to control existing chaos.

The Ark Approach

We started where traditional MDG vendors don’t: comprehensive baseline cleansing to eliminate years of accumulated pollution. Only after establishing a clean foundation did we implement Ark’s prevention-first governance to maintain that quality permanently.

Results Delivered:

✓ 50,000+ duplicate parts eliminated before EAM migration
✓ 1,200+ new categories defined with industry-standard classification
✓ 270,000+ master parts delivered with complete technical specifications
✓ Zero data quality degradation post-implementation through automated governance
✓ Millions in procurement savings through improved findability and duplicate elimination
✓ 60-day implementation from start to full deployment

Critical Success Factor

Early stakeholder alignment on data governance policies and adequate resource allocation for baseline cleansing were essential. Organizations planning similar initiatives should secure executive commitment and dedicated funding upfront—attempting governance without first cleansing the foundation leads to governing pollution.

Want to See What Foundation Cleansing Reveals?

We offer complimentary data quality assessments that show you:

Exact duplicate count and types
Missing critical attributes
Classification gaps and errors
Estimated cost of current quality issues
Foundation cleansing scope and timeline

About the Author

Raghu Vishwanath is Managing Partner at Bluemind Solutions, providing technical and business leadership across Data Engineering and Software Product Engineering.

With over 30 years in software engineering, technical leadership, and strategic account management, Raghu has built expertise solving complex problems across retail, manufacturing, energy, utilities, financial services, hi-tech, and industrial operations. His broad domain coverage and deep expertise in enterprise architecture, platform modernization, and data management provide unique insights into universal organizational challenges.

Raghu’s journey from Software Engineer to Managing Partner reflects evolution from technical leadership to strategic business development and product innovation. He has led complex programs at global technology organizations, managing strategic relationships and building high-performing teams.

At Bluemind, Raghu has transformed the organization from a data services company to a comprehensive Data Engineering and Software Product Engineering firm with two major initiatives: developing Ark—the SaaS platform challenging legacy MRO Master Data Governance products with prevention-first architecture—and building the Software Product Engineering practice that partners with clients on multi-year engagements to develop world-class, market-defining products.

Raghu is recognized for bridging business and IT perspectives, making complex problems solvable. He focuses on genuine partnerships and understanding what clients truly need. His approach combines analytical thinking with pragmatic engineering—addressing root causes rather than symptoms.

Raghu continues advancing technical expertise with recent certifications in AI, machine learning, and graph databases—staying at the forefront of technologies powering modern software solutions and driving innovation in enterprise platforms.