Why Data Cleansing Must Happen Before Governance

The Foundation-First Approach

By Raghu Vishwanath, Managing Partner | December 2025 | 9 min read

“We implemented SAP MDG last year. Why is our data quality still terrible?”

The VP of Operations stared at the dashboard showing duplicate parts, missing specifications, and classification chaos—all the problems their expensive Master Data Governance tool was supposed to solve.

The answer was uncomfortable: they built governance on top of garbage.

The Backwards Approach That Fails

Most organizations approach data governance in the wrong sequence:

Step 1: Buy enterprise MDG platform (SAP, Oracle, IBM)
Step 2: Configure complex approval workflows
Step 3: Deploy to organization
Step 4: Wonder why data quality doesn’t improve

This approach fails because governance tools control how new data enters your system—they don’t fix what’s already there.

It’s like installing an elaborate security system in a house with a crumbling foundation. The security works perfectly, but the house is still falling apart.

What Governance Actually Does

Let’s clarify what governance platforms actually accomplish:

Governance platforms excel at:

  • Preventing new bad data from entering the system
  • Enforcing validation rules on incoming requests
  • Managing approval workflows for future data
  • Maintaining quality standards going forward

Governance platforms do NOT:

  • Eliminate existing duplicates
  • Fix incomplete historical records
  • Standardize legacy descriptions
  • Correct classification errors
  • Repair structural data issues

In other words: governance protects against future pollution. It doesn’t clean up existing contamination.

Why Organizations Get This Wrong

The backwards approach persists for predictable reasons:

1. Software Vendors Push Governance First

MDG vendors sell expensive enterprise platforms. Their business model depends on selling governance tools, not cleansing services.

When you ask about data quality, they answer with governance features. They’ll demonstrate workflow engines, validation rules, and stewardship dashboards—all impressive technology that doesn’t address your actual problem.

What they won’t tell you: All these features assume you’re starting with clean data.

2. “We’ll Clean As We Go”

Organizations convince themselves they can govern and cleanse simultaneously:

“We’ll implement governance now, then gradually improve quality through the governance process.”

This sounds reasonable but fails in practice because:

  • Validation rules reject most requests when data is dirty
  • Data stewards spend all their time fixing requests instead of governing
  • Users bypass the system when approval takes too long
  • Duplicates and errors persist because governance doesn’t touch existing records
  • After 18 months, you have an expensive governance tool and unchanged data quality

3. Cleansing Looks Expensive

Baseline cleansing requires upfront investment:

  • Data profiling and assessment
  • Deduplication algorithms and analysis
  • Attribute standardization
  • Classification correction
  • Data steward time and effort

Executives see this cost and want to skip it. But implementing governance without cleansing wastes even more money—you just spend it over 3-5 years instead of 3-5 months.

4. Governance Sounds Strategic

“Data Governance Initiative” sounds more strategic than “Data Cleansing Project.”

Governance implies forward-thinking leadership. Cleansing implies you let things get messy.

But clean data is the strategic asset—governance is just the maintenance plan.

The Foundation-First Approach

The correct sequence is simple:

Phase 1: Baseline Cleansing (2-6 months)

  • Assess current data quality comprehensively
  • Eliminate duplicates across the entire catalog
  • Standardize descriptions and attributes
  • Correct classification errors
  • Fill critical data gaps
  • Establish clean baseline

Phase 2: Governance Implementation (3-6 months)

  • Deploy governance platform on clean foundation
  • Configure validation rules that actually work
  • Establish approval workflows
  • Train data stewards on maintenance, not firefighting
  • Monitor and optimize

Phase 3: Continuous Improvement (Ongoing)

  • Governance prevents new issues
  • Metrics show sustained quality
  • Stewards focus on policy and improvement
  • Technology investment delivers ROI

Notice: Phase 2 only works if Phase 1 is complete.

Real-World Example: Two Different Paths

Company A: Governance Without Cleansing

Their approach:

  • Implemented SAP MDG ($4M investment)

  • Skipped baseline cleansing (“too expensive”)
  • Created validation rules on dirty data

18 months later:

  • 67% of part requests rejected by validation rules
  • Data stewards spending 30 hours/week fixing requests
  • Technicians bypassing system via email requests
  • Management considering scrapping MDG entirely
  • Data quality unchanged from pre-MDG baseline

Total cost: $4M (implementation) + $800K/year (stewarding costs) + ongoing operational losses

Company B: Foundation-First Approach

Their approach:

  • 8 weeks baseline cleansing (eliminated 50K duplicates)

  • Then implemented governance platform
  • Configured validation rules on clean data

12 months later:

  • 94% of requests pass validation first time
  • Data stewards focused on policy and improvement
  • High user adoption (system seen as helpful, not obstructive)
  • Data quality sustained and improving
  • Measurable procurement savings from better data

Total cost: $1.2M (cleansing) + $3M (governance) = $4.2M total, but ROI positive within 18 months

Same total investment. Dramatically different outcomes.

The difference? Company B built governance on a solid foundation.

How to Assess Your Foundation

Before implementing governance, assess your current data quality:

Key Metrics to Measure:

Duplicate rate:

  • What percentage of your records are duplicates?
  • Target: <2% before governance implementation

Attribute completeness:

  • What percentage have manufacturer part numbers?
  • What percentage have complete technical specifications?
  • Target: >90% completeness for critical attributes

Classification accuracy:

  • What percentage properly classified?
  • Target: >95% using industry-standard taxonomy

Naming consistency:

  • Do similar items have similar descriptions?
  • Target: Standardized naming conventions applied consistently

If any metric is below target, you need baseline cleansing first.

Quick Assessment Process:

  1. Sample your catalog (pull 1,000 random records)
  2. Manually review for duplicates, missing data, classification errors
  3. Calculate percentages for each metric
  4. Extrapolate to full catalog to understand total scope

If you find significant issues in the sample, assume the full catalog is worse.

The Economics of Sequence

Let’s examine the cost difference:

Governance-First Approach (The Expensive Path):

Year 1:

  • Governance platform: $4M
  • Failed implementation due to dirty data
  • Steward firefighting: $800K

Year 2:

  • Emergency cleansing project: $1.5M
  • Governance reconfiguration: $500K
  • Continued steward costs: $800K

Year 3:

  • Still fixing issues: $800K

Total 3-year cost: $8.4M
ROI: Negative

Foundation-First Approach (The Efficient Path):

Year 1:

  • Baseline cleansing: $1.2M (Months 1-2)
  • Governance implementation: $3M (Months 3-8)
  • Normal steward operations: $400K

Year 2:

  • Sustained operations: $400K
  • Measurable savings: $2M+

Year 3:

  • Sustained operations: $400K
  • Cumulative savings: $4M+

Total 3-year cost: $5.4M
ROI: Strongly positive by Year 3

The foundation-first approach costs $3M less over three years while delivering actual results.

What Happens Without Cleansing

When you implement governance on dirty data, you get:

1. Validation Rule Chaos

Your validation rules must handle all existing data variations:

  • 47 different ways to describe the same bearing
  • Missing manufacturer part numbers in 60% of records
  • 12 different classification categories for identical items

You have two choices:

  • Strict rules that reject 70% of requests (users hate the system)
  • Loose rules that allow bad data (governance doesn’t work)

With clean data, validation rules can be strict AND users stay happy.

2. Steward Burnout

Data stewards become full-time firefighters:

  • Spending 80% of time fixing malformed requests
  • No time for actual governance activities
  • Constant pressure to “just approve it”
  • High turnover as stewards get frustrated

With clean data, stewards govern instead of firefight.

3. Governance Bypass

When governance creates friction, users find workarounds:

  • Email requests directly to procurement
  • Phone calls to maintenance planners
  • Spreadsheet-based shadow systems
  • Anything to avoid “that terrible approval system”

With clean data, governance helps users instead of blocking them.

4. Ongoing Contamination

Governance can’t fix existing duplicates, so:

  • Users keep selecting wrong parts
  • Procurement keeps ordering duplicates
  • Inventory keeps growing
  • Costs keep climbing

Governance alone cannot solve quality problems that already exist.

The Integration Imperative

Baseline cleansing and governance aren’t separate initiatives—they’re two phases of one transformation:

Phase 1: Foundation (Cleansing)

  • Eliminate accumulated pollution
  • Establish quality baseline
  • Create clean slate for governance

Phase 2: Protection (Governance)

  • Prevent new pollution
  • Maintain baseline quality
  • Enable continuous improvement

Skipping Phase 1 makes Phase 2 impossible.
Stopping after Phase 1 makes quality temporary.

You need both. In sequence.

When to Start Phase 2

How do you know you’re ready for governance?

Start Phase 2 when you achieve:

Duplicate rate below 2% across critical catalog segments
Attribute completeness above 90% for mandatory fields
Classification accuracy above 95% using standard taxonomy
Naming standards applied consistently across similar items
Data steward agreement that quality is sustainable

Until you hit these targets, keep cleansing.

Implementing governance too early wastes the governance investment. Better to delay governance and do both phases properly.

The Bottom Line

Data governance is essential—but only on a clean foundation.

Implementing governance without first cleansing your data is like:

  • Painting a rusty car (looks better briefly, then rust returns)
  • Organizing a cluttered room without throwing anything away (organized chaos is still chaos)
  • Installing advanced security in a contaminated building (securing pollution)

The sequence matters:

Clean first. Govern second. Maintain forever.

This is the only path to sustainable data quality and governance ROI.

What To Do Next

If you’re planning a data governance initiative:

Step 1: Assess Your Foundation

Get a comprehensive data quality assessment:

  • Duplicate rate across catalog
  • Attribute completeness gaps
  • Classification accuracy issues
  • Standardization level

Don’t guess. Measure.

Step 2: Budget for Both Phases

Include baseline cleansing in your business case:

  • Phase 1: Foundation cleansing (20-30% of budget)
  • Phase 2: Governance implementation (70-80% of budget)

Don’t try to skip Phase 1. You’ll pay more later.

Step 3: Execute in Sequence

Resist pressure to “go live with governance quickly”:

  • Clean foundation first (2-6 months)
  • Deploy governance second (3-6 months)
  • Results will justify the sequence

Foundation-first always wins.

Client Example: Foundation-First Success

Major Industrial Conglomerate
Multi-sector Manufacturing | 100K+ parts across business units

The Challenge

Years of decentralized procurement and inconsistent data practices created a fragmented MRO catalog across business units. With plans to consolidate onto a unified EAM platform, they needed clean, standardized data as the foundation—not just governance tools to control existing chaos.

The Ark Approach

We started where traditional MDG vendors don’t: comprehensive baseline cleansing to eliminate years of accumulated pollution. Only after establishing a clean foundation did we implement Ark’s prevention-first governance to maintain that quality permanently.

Results Delivered:

50,000+ duplicate parts eliminated before EAM migration
1,200+ new categories defined with industry-standard classification
270,000+ master parts delivered with complete technical specifications
Zero data quality degradation post-implementation through automated governance
Millions in procurement savings through improved findability and duplicate elimination
60-day implementation from start to full deployment

Critical Success Factor

Early stakeholder alignment on data governance policies and adequate resource allocation for baseline cleansing were essential. Organizations planning similar initiatives should secure executive commitment and dedicated funding upfront—attempting governance without first cleansing the foundation leads to governing pollution.

Want to See What Foundation Cleansing Reveals?

We offer complimentary data quality assessments that show you:

  • Exact duplicate count and types
  • Missing critical attributes
  • Classification gaps and errors
  • Estimated cost of current quality issues
  • Foundation cleansing scope and timeline
About the Author

Raghu Vishwanath

Raghu Vishwanath is Managing Partner at Bluemind Solutions, providing technical and business leadership across Data Engineering and Software Product Engineering.

With over 30 years in software engineering, technical leadership, and strategic account management, Raghu has built expertise solving complex problems across retail, manufacturing, energy, utilities, financial services, hi-tech, and industrial operations. His broad domain coverage and deep expertise in enterprise architecture, platform modernization, and data management provide unique insights into universal organizational challenges.

Raghu’s journey from Software Engineer to Managing Partner reflects evolution from technical leadership to strategic business development and product innovation. He has led complex programs at global technology organizations, managing strategic relationships and building high-performing teams.

At Bluemind, Raghu has transformed the organization from a data services company to a comprehensive Data Engineering and Software Product Engineering firm with two major initiatives: developing Ark—the SaaS platform challenging legacy MRO Master Data Governance products with prevention-first architecture—and building the Software Product Engineering practice that partners with clients on multi-year engagements to develop world-class, market-defining products.

Raghu is recognized for bridging business and IT perspectives, making complex problems solvable. He focuses on genuine partnerships and understanding what clients truly need. His approach combines analytical thinking with pragmatic engineering—addressing root causes rather than symptoms.

Raghu continues advancing technical expertise with recent certifications in AI, machine learning, and graph databases—staying at the forefront of technologies powering modern software solutions and driving innovation in enterprise platforms.