FIVE THINGS I LEARNT FROM DATACAMP’S SQL ASSOCIATE CERTIFICATION EXAM.
You’re most likely reading this article for one of two reasons. You’re either here because you saw the link on my page or because you’re stuck with the practical exam and you searched online for some help. I hope it’s the latter because you definitely need this article more.
For me, obtaining DataCamp’s SQL Associate Certification was not just about acquiring a badge of expertise; it was a journey into putting to practice the knowledge I’ve acquired and testing myself under the pressure of time with an actual prize at stake.
I practically ran away from SQL for years because of how complicated joins looked to me every time I tried to learn them. Well, NOT ANYMORE.
Indeed, I’m glad I took the certification (especially the practical exam) and stayed through till the end, even when I faced obstacles. The four-task experience not only honed my technical skills but also instilled valuable lessons that extend beyond just SQL but data science as a whole. In this article, I will share five key insights gained from the certification exam, shedding light on my experience working with SQL and how I’ll approach data science projects from now on. Here we go!
1. Data Cleaning and Data Validation is Everything.
Data cleaning means replacing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
This reminds me of a quote by Scott Nicholson:
I kind of have to be a master of cleaning, extracting and trusting my data before I do anything with it.”
Task 1 required me to do a lot of data validation and data cleaning. Actually, this makes a lot of sense because when the data is incorrect, every result generated isn’t reliable, regardless of the amount of time spent or the number of lines of code.
Some of the operations I carried out included:
i. Verifying that the data type of the columns corresponded with the information on the picture of the database schema. A sample code of how I did that can be seen below.
SELECT column_name, data_type
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name='table';
ii. Replacing null and incorrect values. Here, I used the COALESCE and CASE functions. The COALESCE function when I only had to replace null values and the CASE function when there are multiple corrections to be done. A sample code can be seen below:
COALESCE(column_name, 'value_to_be_replaced_with') AS column_name
-- THIS REPLACES NULL VALUES WITH THE SECOND PARAMETER ACCEPTED BY THE FUNCTION.
CASE WHEN column_name IS NULL THEN 'value' WHEN another_condition THEN 'value2' ELSE column_name END AS column_name
-- THIS REPLACES NULL VALUES AS WELL AS OTHER INCORRECT VALUES ACCORDING TO THE SPECIFIED CRITERIA.
Perfect data only exists in classes and tutorials. The real world is messy.💯
2. Always Pay Attention to Details.
This cannot be overemphasized. As data scientists, the reliability of our results depends a lot on this. In the case of this practical exam, I was forced to learn this. To pass the exam, your result has to be completely identical to the “marking scheme,” which means even the slightest of slip-ups means you failed. Here are five tips that worked for me and ensured I didn’t miss any important details:
- Thoroughly understand the task objectives and requirements.
- Explicitly list any assumptions and validate them.
- Address missing, inaccurate, or inconsistent data promptly. Take note of the additional criteria that each column must satisfy.
- L. isn’t the same as Laurel, and D. is not the same as Dora.
- Check and re-check your code syntax. Run after each correction before the final submission.
3. Validate ALL Assumptions.
location | Nominal. The location of the particular hotel. One of four possible values, ‘EMEA’, ‘NA’, ‘LATAM’ and ‘APAC’. Missing values should be replaced with “Unknown”.
I can’t believe I assumed that “NA” meant “null” here. Don’t blame me; I’ve seen that a couple of times. It took a couple of failed attempts before it clicked that I should research if “NA” could be anything other than a null value. Turned out it stood for “North America”; shouldn’t that have been pretty obvious?😖
What you might make as an assumption could be different; make a comprehensive list of everything and do research to validate each. You never know what you are misrepresenting.
4. SQL Needs to Do More with Pointing Out Where the Bug is.
I don’t know if other editors are more specific with pointing out where the error is, but the editor on Datacamp’s platform isn’t doing a really good job.😖
Excuse me! What do you mean by “Error NEAR ,”? I have about 20 commas in this code. 😒 Hopefully, there is improvement in this aspect, but for now, I guess we have to still manage it that way.
5. Coding is like learning a new language; you get better as you speak.
Like learning a new language, you start small and slowly, and as you progress, you build on the basics and make complex sentences until you become a very skilled speaker. Coding with SQL (and other programming languages) shares this. Like in the English language where “she” and “her” are for the female gender but context determines which is used, SQL’s “WHERE” and “HAVING” work with context too. This amongst other similarities that can be drawn.
I learned sooooooo much from the certification exam. The exam helped me learn a few new things and reinforced some of the things I already knew. With every project I start, line of code I write, and error I encounter, I feel like it leaves me one step better than before. It also opens my eyes to the sea of knowledge I have yet to explore. We keep going!
Check out my GitHub for my solutions to the tasks here. See you next time if I decide to write again.😅