Why Data Cleansing Must Happen Before Governance
The Foundation-First Approach
By Raghu Vishwanath, Managing Partner | December 2025 | 9 min read
“We implemented SAP MDG last year. Why is our data quality still terrible?”
The VP of Operations stared at the dashboard showing duplicate parts, missing specifications, and classification chaos—all the problems their expensive Master Data Governance tool was supposed to solve.
The answer was uncomfortable: they built governance on top of garbage.
The Backwards Approach That Fails
Most organizations approach data governance in the wrong sequence:
Step 1: Buy enterprise MDG platform (SAP, Oracle, IBM)
Step 2: Configure complex approval workflows
Step 3: Deploy to organization
Step 4: Wonder why data quality doesn’t improve
This approach fails because governance tools control how new data enters your system—they don’t fix what’s already there.
It’s like installing an elaborate security system in a house with a crumbling foundation. The security works perfectly, but the house is still falling apart.
What Governance Actually Does
Let’s clarify what governance platforms actually accomplish:
Governance platforms excel at:
- Preventing new bad data from entering the system
- Enforcing validation rules on incoming requests
- Managing approval workflows for future data
- Maintaining quality standards going forward
Governance platforms do NOT:
- Eliminate existing duplicates
- Fix incomplete historical records
- Standardize legacy descriptions
- Correct classification errors
- Repair structural data issues
In other words: governance protects against future pollution. It doesn’t clean up existing contamination.
Why Organizations Get This Wrong
The backwards approach persists for predictable reasons:
1. Software Vendors Push Governance First
MDG vendors sell expensive enterprise platforms. Their business model depends on selling governance tools, not cleansing services.
When you ask about data quality, they answer with governance features. They’ll demonstrate workflow engines, validation rules, and stewardship dashboards—all impressive technology that doesn’t address your actual problem.
What they won’t tell you: All these features assume you’re starting with clean data.
2. “We’ll Clean As We Go”
Organizations convince themselves they can govern and cleanse simultaneously:
“We’ll implement governance now, then gradually improve quality through the governance process.”
This sounds reasonable but fails in practice because:
- Validation rules reject most requests when data is dirty
- Data stewards spend all their time fixing requests instead of governing
- Users bypass the system when approval takes too long
- Duplicates and errors persist because governance doesn’t touch existing records
- After 18 months, you have an expensive governance tool and unchanged data quality
3. Cleansing Looks Expensive
Baseline cleansing requires upfront investment:
- Data profiling and assessment
- Deduplication algorithms and analysis
- Attribute standardization
- Classification correction
- Data steward time and effort
Executives see this cost and want to skip it. But implementing governance without cleansing wastes even more money—you just spend it over 3-5 years instead of 3-5 months.
4. Governance Sounds Strategic
“Data Governance Initiative” sounds more strategic than “Data Cleansing Project.”
Governance implies forward-thinking leadership. Cleansing implies you let things get messy.
But clean data is the strategic asset—governance is just the maintenance plan.
The Foundation-First Approach
The correct sequence is simple:
Phase 1: Baseline Cleansing (2-6 months)
- Assess current data quality comprehensively
- Eliminate duplicates across the entire catalog
- Standardize descriptions and attributes
- Correct classification errors
- Fill critical data gaps
- Establish clean baseline
Phase 2: Governance Implementation (3-6 months)
- Deploy governance platform on clean foundation
- Configure validation rules that actually work
- Establish approval workflows
- Train data stewards on maintenance, not firefighting
- Monitor and optimize
Phase 3: Continuous Improvement (Ongoing)
- Governance prevents new issues
- Metrics show sustained quality
- Stewards focus on policy and improvement
- Technology investment delivers ROI
Notice: Phase 2 only works if Phase 1 is complete.
Real-World Example: Two Different Paths
Company A: Governance Without Cleansing
Their approach:
Implemented SAP MDG ($4M investment)
- Skipped baseline cleansing (“too expensive”)
- Created validation rules on dirty data
18 months later:
- 67% of part requests rejected by validation rules
- Data stewards spending 30 hours/week fixing requests
- Technicians bypassing system via email requests
- Management considering scrapping MDG entirely
- Data quality unchanged from pre-MDG baseline
Total cost: $4M (implementation) + $800K/year (stewarding costs) + ongoing operational losses
Company B: Foundation-First Approach
Their approach:
8 weeks baseline cleansing (eliminated 50K duplicates)
- Then implemented governance platform
- Configured validation rules on clean data
12 months later:
- 94% of requests pass validation first time
- Data stewards focused on policy and improvement
- High user adoption (system seen as helpful, not obstructive)
- Data quality sustained and improving
- Measurable procurement savings from better data
Total cost: $1.2M (cleansing) + $3M (governance) = $4.2M total, but ROI positive within 18 months
Same total investment. Dramatically different outcomes.
The difference? Company B built governance on a solid foundation.
How to Assess Your Foundation
Before implementing governance, assess your current data quality:
Key Metrics to Measure:
Duplicate rate:
- What percentage of your records are duplicates?
- Target: <2% before governance implementation
Attribute completeness:
- What percentage have manufacturer part numbers?
- What percentage have complete technical specifications?
- Target: >90% completeness for critical attributes
Classification accuracy:
- What percentage properly classified?
- Target: >95% using industry-standard taxonomy
Naming consistency:
- Do similar items have similar descriptions?
- Target: Standardized naming conventions applied consistently
If any metric is below target, you need baseline cleansing first.
Quick Assessment Process:
- Sample your catalog (pull 1,000 random records)
- Manually review for duplicates, missing data, classification errors
- Calculate percentages for each metric
- Extrapolate to full catalog to understand total scope
If you find significant issues in the sample, assume the full catalog is worse.
The Economics of Sequence
Let’s examine the cost difference:
Governance-First Approach (The Expensive Path):
Year 1:
- Governance platform: $4M
- Failed implementation due to dirty data
- Steward firefighting: $800K
Year 2:
- Emergency cleansing project: $1.5M
- Governance reconfiguration: $500K
- Continued steward costs: $800K
Year 3:
- Still fixing issues: $800K
Total 3-year cost: $8.4M
ROI: Negative
Foundation-First Approach (The Efficient Path):
Year 1:
- Baseline cleansing: $1.2M (Months 1-2)
- Governance implementation: $3M (Months 3-8)
- Normal steward operations: $400K
Year 2:
- Sustained operations: $400K
- Measurable savings: $2M+
Year 3:
- Sustained operations: $400K
- Cumulative savings: $4M+
Total 3-year cost: $5.4M
ROI: Strongly positive by Year 3
The foundation-first approach costs $3M less over three years while delivering actual results.
What Happens Without Cleansing
When you implement governance on dirty data, you get:
1. Validation Rule Chaos
Your validation rules must handle all existing data variations:
- 47 different ways to describe the same bearing
- Missing manufacturer part numbers in 60% of records
- 12 different classification categories for identical items
You have two choices:
- Strict rules that reject 70% of requests (users hate the system)
- Loose rules that allow bad data (governance doesn’t work)
With clean data, validation rules can be strict AND users stay happy.
2. Steward Burnout
Data stewards become full-time firefighters:
- Spending 80% of time fixing malformed requests
- No time for actual governance activities
- Constant pressure to “just approve it”
- High turnover as stewards get frustrated
With clean data, stewards govern instead of firefight.
3. Governance Bypass
When governance creates friction, users find workarounds:
- Email requests directly to procurement
- Phone calls to maintenance planners
- Spreadsheet-based shadow systems
- Anything to avoid “that terrible approval system”
With clean data, governance helps users instead of blocking them.
4. Ongoing Contamination
Governance can’t fix existing duplicates, so:
- Users keep selecting wrong parts
- Procurement keeps ordering duplicates
- Inventory keeps growing
- Costs keep climbing
Governance alone cannot solve quality problems that already exist.
The Integration Imperative
Baseline cleansing and governance aren’t separate initiatives—they’re two phases of one transformation:
Phase 1: Foundation (Cleansing)
- Eliminate accumulated pollution
- Establish quality baseline
- Create clean slate for governance
Phase 2: Protection (Governance)
- Prevent new pollution
- Maintain baseline quality
- Enable continuous improvement
Skipping Phase 1 makes Phase 2 impossible.
Stopping after Phase 1 makes quality temporary.
You need both. In sequence.
When to Start Phase 2
How do you know you’re ready for governance?
Start Phase 2 when you achieve:
✓ Duplicate rate below 2% across critical catalog segments
✓ Attribute completeness above 90% for mandatory fields
✓ Classification accuracy above 95% using standard taxonomy
✓ Naming standards applied consistently across similar items
✓ Data steward agreement that quality is sustainable
Until you hit these targets, keep cleansing.
Implementing governance too early wastes the governance investment. Better to delay governance and do both phases properly.
The Bottom Line
Data governance is essential—but only on a clean foundation.
Implementing governance without first cleansing your data is like:
- Painting a rusty car (looks better briefly, then rust returns)
- Organizing a cluttered room without throwing anything away (organized chaos is still chaos)
- Installing advanced security in a contaminated building (securing pollution)
The sequence matters:
Clean first. Govern second. Maintain forever.
This is the only path to sustainable data quality and governance ROI.
What To Do Next
If you’re planning a data governance initiative:
Step 1: Assess Your Foundation
Get a comprehensive data quality assessment:
- Duplicate rate across catalog
- Attribute completeness gaps
- Classification accuracy issues
- Standardization level
Don’t guess. Measure.
Step 2: Budget for Both Phases
Include baseline cleansing in your business case:
- Phase 1: Foundation cleansing (20-30% of budget)
- Phase 2: Governance implementation (70-80% of budget)
Don’t try to skip Phase 1. You’ll pay more later.
Step 3: Execute in Sequence
Resist pressure to “go live with governance quickly”:
- Clean foundation first (2-6 months)
- Deploy governance second (3-6 months)
- Results will justify the sequence
Foundation-first always wins.
Client Example: Foundation-First Success
Major Industrial Conglomerate
Multi-sector Manufacturing | 100K+ parts across business units
The Challenge
Years of decentralized procurement and inconsistent data practices created a fragmented MRO catalog across business units. With plans to consolidate onto a unified EAM platform, they needed clean, standardized data as the foundation—not just governance tools to control existing chaos.
The Ark Approach
We started where traditional MDG vendors don’t: comprehensive baseline cleansing to eliminate years of accumulated pollution. Only after establishing a clean foundation did we implement Ark’s prevention-first governance to maintain that quality permanently.
Results Delivered:
✓ 50,000+ duplicate parts eliminated before EAM migration
✓ 1,200+ new categories defined with industry-standard classification
✓ 270,000+ master parts delivered with complete technical specifications
✓ Zero data quality degradation post-implementation through automated governance
✓ Millions in procurement savings through improved findability and duplicate elimination
✓ 60-day implementation from start to full deployment
Critical Success Factor
Early stakeholder alignment on data governance policies and adequate resource allocation for baseline cleansing were essential. Organizations planning similar initiatives should secure executive commitment and dedicated funding upfront—attempting governance without first cleansing the foundation leads to governing pollution.
Want to See What Foundation Cleansing Reveals?
We offer complimentary data quality assessments that show you:
- Exact duplicate count and types
- Missing critical attributes
- Classification gaps and errors
- Estimated cost of current quality issues
- Foundation cleansing scope and timeline
About the Author
Raghu Vishwanath is Managing Partner at Bluemind Solutions, providing technical and business leadership across Data Engineering and Software Product Engineering.
With over 30 years in software engineering, technical leadership, and strategic account management, Raghu has built expertise solving complex problems across retail, manufacturing, energy, utilities, financial services, hi-tech, and industrial operations. His broad domain coverage and deep expertise in enterprise architecture, platform modernization, and data management provide unique insights into universal organizational challenges.
Raghu’s journey from Software Engineer to Managing Partner reflects evolution from technical leadership to strategic business development and product innovation. He has led complex programs at global technology organizations, managing strategic relationships and building high-performing teams.
At Bluemind, Raghu has transformed the organization from a data services company to a comprehensive Data Engineering and Software Product Engineering firm with two major initiatives: developing Ark—the SaaS platform challenging legacy MRO Master Data Governance products with prevention-first architecture—and building the Software Product Engineering practice that partners with clients on multi-year engagements to develop world-class, market-defining products.
Raghu is recognized for bridging business and IT perspectives, making complex problems solvable. He focuses on genuine partnerships and understanding what clients truly need. His approach combines analytical thinking with pragmatic engineering—addressing root causes rather than symptoms.
Raghu continues advancing technical expertise with recent certifications in AI, machine learning, and graph databases—staying at the forefront of technologies powering modern software solutions and driving innovation in enterprise platforms.

