It's been a while since I am working on writing unit test cases, I will be sharing some of my experiences and challenges about the same, hopefully, it will help someone somewhere on the planet or maybe the future me.. :)
for starters what exactly is unit testing and why exactly is that needed?
A unit test is a way of testing a unit - the smallest piece of code that can be logically isolated in a system. Essentially, a unit test is a method that instantiates a small portion of our application and verifies its behavior independently from other parts.
setting up environments:
first, run the following commands to install pytest pandas which are required:
pip install pytest pip install pandas
following is a simple folder structure:
heading towards the practical things now, we will see how exactly we should write unit tests
we have simple data where the name and age of the person are available but for some reason, we need one new column to be created that will have their birth year so here is a function to achieve that...
extract_birth_year.py import datetime def get_birth_year(df): ''' This function is used to calculate birth year and create a new column called birth_year in dataframe. parameters: df: dataframe having Age column returns: dataframe with birth_year column. ''' year = datetime.date.today().year replace_boolean_values = [True,False] if 'Age' in df.columns: df['Age'] = df['Age'].replace(replace_boolean_values,0) df['birth_year'] = year - df['Age'] else: raise NotImplementedError('unsupported dataframe') return df
You might be thinking why are we replacing the boolean values.... spot on you are right here's a scenario suppose if the boolean value is present in the age column it gives out the wrong birth_year how??? as boolean values are considered as 0 and 1 for False True respectively. so for the current year suppose 2022 it will do like 2022 - 1 = 2021 or 2022 -0 = 2022 so those results will be wrong in that case it will be replaced by 0 so the birth year will be a current year and we can create next steps around that.
Now, we are going to write test cases for the above function. we will mainly learn the framework to write test cases as well as how exceptions are tested using pytest which is one of the widely used python libraries.
Things we need to consider:
- We are writing code to check if the behavior of the function is working as expected or not
- Is our function capable to provide user-friendly error messages so that they can be easy to understand or debug
- last but not least test cases are also one of the key elements to understanding the functionality of the code apart from usual docstrings and type hints.
so by convention, we will create a new python file named test_extract_birth_year.py
inside the python file we just created, we will import our function which is present in extract_birth_year.py please keep both of the two files in the same folder as of now, we will discuss ways to call functions from other folders I'll add a link for that here.
from extract_birth_year import get_birth_year
This will import the intended function, let's start with the cases now,
There is a simple framework that is recommended by pytest documentation which says use following four step framework to write your test cases.
You can explore more about it here
import pandas as pd from extract_birth_year import get_birth_year def test_get_birth_year(): # Arrange expected_data = [['tom', 10, 2012], ['nick', 15, 2007], ['juli', 14, 2008]] df_expected_output = pd.DataFrame(expected_data, columns=['Name', 'Age','birth_year']) # Act data = [['tom', 10], ['nick', 15], ['juli', 14]] df_actual = pd.DataFrame(data, columns=['Name', 'Age']) df_actual_output = get_birth_year(df_actual) # Assert assert df_actual_output.equals(df_expected_output)
for running a test cases you will have to use following command:
In Arrange I have kept expected values from the function output, Act is more related to calling a dedicated function and saving the output in a variable, and in the end, assert will check if actual values and expected values are matching or not.
here we are using .equals which is one of the powerful ways provided by pandas that will check if two dataframes are identical or not
if you run the above test you will somewhat like this screen:
in green means, it says your test is passed!!!
if everything is red that means the test failed
if you just add -v to the command like this:
pytest test_extract_birth_year.py -v
it will show the summary kinda like this:
if you want to run any specific test there is a way for that too.
This will select only specified test functions and execute in this case output will be somewhat like this
Cool, enough with the pytest commands let's get back to our function and test cases..
Now we have the second part to the function as well, which is if provided dataframe is not the correct one then our function should raise an exception.. let's write it for that now...
Things we need to think of before writing a function is like we need to exactly check that it is raising the exception called NotImplementedError, we cannot check if just the general Exception is raised because there are possibilities that our age column may have some string data, in that case, it would throw some errors like
TypeError: unsupported operand type(s) for -: 'int' and 'str'
which is different than our use case if we add an Exception in the test it will pass the test but in fact, it should get noticed.
cooming back to exceptions testing now, there is a pretty useful syntax provided by pytest to actually check whether is expected exception is occurred or not
test_extract_birth_year.py import pandas as pd import pytest from extract_birth_year import get_birth_year def test_get_birth_year_unsupported_excpetion(): # Arrange & Act data = [['krish', 1], ['jack', 50], ['elon', 100]] df_input = pd.DataFrame(data, columns=['Name', 'Amount']) with pytest.raises(NotImplementedError) as exc_info: df_actual_output = get_birth_year(df_input) # Assert assert exc_info.type is NotImplementedError assert exc_info.value.args == "unsupported dataframe"
here you will observe that Act and Arrange is clubbed together, we can club this together considering the readability of the code.
Finally, in one edge case if the Boolean value is present in the Age column then it should return the birth year as the current year.
test_extract_birth_year.py import datetime import pandas as pd import pytest from extract_birth_year import get_birth_year def test_get_birth_year_unclean_data(): # Arrange year = datetime.date.today().year expected_data = [['tom', 0, year], ['nick', 15, 2007], ['juli', 14, 2008]] df_expected_output = pd.DataFrame(expected_data, columns=['Name', 'Age','birth_year']) # Act data = [['tom', True], ['nick', 15], ['juli', 14]] df_actual = pd.DataFrame(data, columns=['Name', 'Age']) df_actual_output = get_birth_year(df_actual) #Assert assert df_actual_output.equals(df_expected_output)
Conclusion: we learned about how to write test cases for
- Checking the behavior of the function when the data provided is correct.
- Behavior of function when data provided is not cleaned.
- Is exception handling working as intended when the data provided is unsupported?
Cheers till the next one!!!
feel free to contact Happy Learning :)