All about SQL Data Cleaning

Next part of Data Profiling — After defining the potential errors of data that require correction for precise analysis, data cleaning is the next step

Chi Nguyen
7 min readApr 3, 2023

--

Photo by Katie Smith on Unsplash

Data Cleaning is Indispensable

When you first receive a data set to explore, the first thing that we would always want to check is if this data is ready for analysis. Often, the answer is no. Raw data can be unstructured for a variety of reasons, and it commonly contains mistakes, typos, duplication, missing values, and other matters that could make analysis more challenging.

That’s when data cleaning comes in. The objective is to ensure that the data used to collect the information is consistent, trustworthy, and correct.

In this article, I will walk you through different cases of data cleaning in SQL, along with approaches to dealing with them quickly. Usually, I will present my examples with PostgreSQL.

Now, let’s start to see what we’ve got!

Handling Missing or NULL values

Usually, there are a few ways to deal with missing values, including, Removing the rows with missing values, Filling in some fixed errors, and Inputting a calculated value.

--

--

Chi Nguyen

MSc in Statistics. Sharing my learning tips in the journey of becoming a better data analyst. Linkedin: https://www.linkedin.com/in/chinguyenphamhai/