cleansing

What is cleansing?

Data cleansing (also called data cleaning) is the process of finding and fixing errors, inconsistencies, and duplicate information in a dataset so that the data is accurate, complete, and ready to be used.

Let's break it down

First, you look at the raw data and spot problems such as misspelled words, missing values, or duplicate rows. Next, you decide how to fix each issue-by correcting, filling in, or removing the bad data. Finally, you apply those fixes and verify that the cleaned data looks correct.

Why does it matter?

If you use dirty data, the results of analysis, reports, or automated decisions can be wrong. Clean data leads to better insights, more reliable software, and fewer costly mistakes later on.

Where is it used?

Business intelligence dashboards
Machine‑learning model training
Customer relationship management (CRM) systems
Financial reporting and compliance
Any application that stores or processes large amounts of information

Good things about it

Improves accuracy of decisions and predictions
Reduces wasted time fixing errors later
Increases trust in data among users and stakeholders
Helps meet regulatory and quality standards

Not-so-good things

Can be time‑consuming, especially with very large datasets
Requires clear rules; wrong rules can delete good data
May need specialized tools or expertise to automate effectively
Ongoing maintenance is needed because new data can become dirty again.