Cleaning Splits and Bonus Data Using Python and Calculating Their Factors!

Welcome Readers 🤩 to the sixth article in the series "NSE EOD Stock Prices." This article will discuss how to use the pandas library to clean the Bonus and Splits Data we downloaded from the Bombay Stock Exchange in the article here. In the last article, we also covered how to calculate the Dividend Factors, which is much more complex than what we will be doing in this one. This article will be very much on the same lines; you can access the previous one here.

For readers who were not able to download the files due to technical issues with the BSE website, you can download the sample Bonus and Splits file by clicking on their respective names. This will help you to follow me on this article.

This article is split into two parts:

Cleaning Bonus Data and Calculating Bonus Factor: We use basic pandas string manipulation to calculate the bonus_factor
Cleaning Splits Data and Calculating Split Factor: Similar to Bonuses, we use basic pandas string manipulation and regex operations to calculate the split_factor

Cleaning Bonus Data and Calculating Bonus Factor

The data which I am using contains all bonus information from Jan 2016 - March 2021. I will be posting screenshots of the Jupyter notebook so that it's easier to follow, but if you need just the code, please check out Github here.

Loading the Data and Checking the first 5 rows. You will notice we have too many unnecessary columns; realistically, we only need Security Name, Ex-Date, and Purpose.

Stripping all the column headers of any whitespaces, i.e., converting " Security Code" to "Security Code."

Dropping all unnecessary columns and keeping the essentials

Now Extracting the numeric bonus ratio from purpose column, since every row in the purpose column is a string, noticing a common theme, every ratio is after a space in the form of a:b, so we split the string on space i.e. (' ') and then take the last value of that string in the form of a ratio. We then assign this value to a new column ratio

Splitting the ratio into two different columns using a simple function where we split the string of a:b on the semicolon and assign the first value a to column num and second value b to column denom, keep in mind, that we might have strings which do not have a semicolon due to incorrect data from the exchange, in that case, we need to put in a try and except code block so that our program does not fail.

def split_ratio_cols(series):

    num_list = []
    denom_list = []

    for ratio in series:
        try:
            num_list.append(ratio.split(':')[0])
        except IndexError:
            num_list.append(None)
        try:
            denom_list.append(ratio.split(':')[1])
        except IndexError:
            denom_list.append(None)

    return num_list, denom_list

In the above, we gave the function a pandas series, i.e., ratio column, and it split that series into two unique series, i.e., num and denom. We just then assigned these new series to two new columns in the dataframe.

Converting all numeric values to float values using the convert_float function we created in the dividend article. As a sanity check, we will also see how many NaN values we have and will drop them as we will have to adjust them manually.

def convert_float(x):
    try:
        return float(x)
    except ValueError:
        return None
    except TypeError:
        return None

Finally calculating the Bonus Factor by applying a lambda function on the numerator and denominator columns. But before that, how does the calculation work?

For example, a 1:1 bonus issue means that for every 1 share you hold, you get 1 more share, so this means the total outstanding shares in the market will now double, effectively making the price half. So the calculation is 1 / (1 + 1) which is 0.5

Similarly, a 5:8 bonus will mean for every 8 shares you hold; you will have 5 more shares, this will have a bonus_factor of 8 / (5+8) which is 0.615, all prices before Ex-Date needs to be multiplied by this factor to arrive at adjusted numbers.

If you are confused, take a minute and do the math manually. The below table from Zerodha might also help.

Cleaning Splits Data and Calculating Split Factor

As above, the data I am using for Splits contains all information from Jan 2016 - March 2021. I am posting screenshots of the Jupyter notebook, but if you directly prefer seeing the code, you can access it here.

Loading the Data and Checking the first 5 rows. You will notice we have too many unnecessary columns; realistically, we only need Security Name, Ex-Date, and Purpose.

Stripping all the column headers of any whitespaces, i.e., converting " Security Code" to "Security Code."

Dropping all unnecessary columns and keeping the essentials

Extracting Split Info from Purpose Column using regex operations and creating two new columns numerator and denominator to store that information. Finally, we will also convert these numbers to float using the convert_float function we created above.

import re

def split_columns(series):

    num = []
    denom = []

    for ratio in series:
        ratio_list = re.findall(r'[0-9]+', ratio)
        # regex r'[0-9]+' searches for all numbers 
        # in a string and adds it into a list.

        if len(ratio_list) == 2:
            #if two numbers are found, append them to num
            # and denom list.
            num.append(ratio_list[0])
            denom.append(ratio_list[1])
        else:
            num.append(None)
            denom.append(None)

    return num, denom

Dropping all NaN values (if any) and stripping all values in the dataframe of any whitespace.

Finally Calculating the Split Factor which is probably the easiest calculation out of all corporate actions. For example, for a stock split from Rs. 10/- to Rs. 5/- would mean a halving of stock price, which can be calculated by 5/10, and we will use a similar lambda function as we did for Bonus.

And that's it, folks, I can guarantee you, you will find no blog out there which can give you such a detailed implementation of each and every step of what needs to be done. I have realized over time that coding is not just writing code; in fact, it's 5 hours thinking about a problem and spending 5-50 mins to code it.

I hope I have done some justice in explaining the concept at hand; if not, please drop a line with feedback; I would highly appreciate it. If you are stuck at any stage, reach out to me on Linkedin or Twitter.

💡

Please note we haven't made any new posts since Nov 2021 on this blog, you are free to subscribe to the mailing list, however, you will be auto-added to the new blog's (thealtinvestor.in) mailing list as well.

I will soon begin building my Youtube channel, which will have all the content present on this blog in the form of videos. To support me, you can buy me a coffee by clicking here or on the button below.