Enqurious logo
Liked the masterclass! Give it a shot for free
Get access

Robert, the Customer Success Head at GlobalMart had requested two summary reports : 

  • Count of customers by State
  • Count of orders by State

The format suggested is : 

(Numbers shown are for representation purposes only)

 

Rajiv, the newbie Business Analyst prepared the report within minutes and sent across to Robert. 

 

However, the next day Robert came back to Rajiv, sounding a bit upset about the quality of report received. 

 

Comments from Robert :

  • The number of customers in California is incorrect
  • The orders by state report is entirely wrong
  • Why are blanks in Profit and Sales columns?
  • Why is there no customers for the Order ID : CA-2017-114413 ?
  • The number formatting of Sales and Profit columns is not as expected
  • I wanted to create an year wise product addition report. I was expecting something like this :

But, what I am getting is this :

 

 

Robert highlighted the above issues with Rajiv's manager and has issued a stern statement : 

 

All the reports going to customers should be of top notch quality. Take whatever steps necessary to fix the issues and ensure they don't repeat in future.

 

Rajiv, visibly frightened and mostly clueless about what could be all possible quality issues and what are those necessary steps went to Sharon, the Lead Business Analyst for help. 

 

Let's check out the below conversation between the two of them and figure out the problems and how to tackle them.

Rajiv : Hi Sharon, Hope you are doing good. I'm in a very bad situation right now. Need urgent help.

 

Sharon : Hi Rajiv. You look visibly tensed. Please calm down. I can help you.

 

Rajiv : Robert had asked me some simple reports. I am pretty confident about my Excel skills. So, I prepared them quickly and sent across. However, it turns out my reports have so called quality issues and he has given a stern warning to ensure all necessary steps are taken to ensure no further bad reports. 

I don't understand what does he mean by quality issues and what should be the necessary steps? He seems pretty upset about our reports quality and wants a permanent fix.

 

Sharon : Hmm. I get it. You are facing the classic data quality issues faced by many organizations in their reporting processes.

 

Rajiv : What do I do about it? Here is my worked out Excel sheet. Please help.

 

Sharon : Thanks for the notebook. Let's dive right into it, learn what do we mean by quality issues, how to find and fix them. 

 

Rajiv : I'm ready. Let's begin.

 

Sharon : First things first, get a first hand idea about what do we mean by data quality issues. Check out the hint to learn more.

Some records in a dataset have missing values for crucial fields, such as customer address or product price. What data quality issue does this represent?
Duplication
Completeness
Accuracy
Integrity
The orders data of customers has order purchase date stored which looks like : 27-07-2019, Ship date stored like 07-19-2020, and order recieved date stored like 27-Aug-2021. What data quality issue does this represent?
Accuracy
Conformity
Consistency
Integrity
While checking the orders data, you found two different customers mapped to the same order. This would be labelled as _______________ issue
Completeness
Conformity
Integrity
Consistency
In customers dataset, more than one entries of the same customer exists with identical fields. This would be a duplication issue
True
False
In a school database, you found that there are some students are mapped to classrooms which don't exist in that school. The quality issue described here is ______________
Duplication
Integrity
Completeness
Consistency
In orders data, you discovered that the order delivery date column is blank for many orders. However, the delivery status is "In Transit". Mark the statement which is true in this case
This is a valid case of missing value
This isn't a case of missing value

Rajiv : Woah! I've been so ignorant about such important concepts all this while. This changes my entire perspective of looking at data. I've been always like jumping into doing things with data without even realizing it can have so many kinds of issues!

Sharon : Glad that you realized this. Data Quality causes millions of dollars of losses to companies every year. Planning and executing effective data quality treatment strategies is critical to building trust with management and most importantly customers.

 

Rajiv : Without doubt. I think I should revisit my report and try again with this knowledge.

There are at least ________ records which are getting duplicated in Customers table.
0
1
2
3
In orders table, the profit column has 2 instances of missing values.
True
False
______ instances of California has wrong spelling.
0
1
2
3
______instances of Texas has leading or trailing whitespace issues.
1
2
3
4
How many orders in the orders data aren't mapped to any customer?
0
1
2
3
How many orders in the orders data are mapped to customers which don't exist?
0
1
2
3
Once done with all quality checks and fixes, upload the corrected Excel workbook here

Sharon: Great job Rajiv! 

Before we conclude, a short note on one more type of data quality issue which is important but is little subjective to decide : 

  • Analyst friendliness

Rajiv: What does this mean? 

 

Sharon: Let's take an example : Check the products table snapshot here : 

You as Data Analyst have been asked to find out the count of products, by size. How will you approach this question?

 

Rajiv: Well I have no clue how to approach this question, Sharon!

 

Sharon: Checking the dataset carefully, you will find the sizes column having values as comma separated values. You would ideally like to use the sizes column in rows and drop product_id column in Values in a pivot. But, you can't do it directly because the data is not in a format friendly to your needs as a Data Analyst. 

 

What if somehow we got the data unpivoted on the sizes as shown below (only for 2nd record)?

 

 

Keeping data in this way would surely make your life easy. That's what we call Analyst Friendliness of data. 

 

Rajiv: How do you look for this kind of issue?

 

Sharon: The only way of doing this would be to check with business for the kind of insights they are looking for data and then checking with the datasets to see if there are column(s) which are making it difficult to perform analyst friendly analysis.

 

Rajiv: That's a wonderful learning Sharon. Thank you for all your help.