Skip to main content

Command Palette

Search for a command to run...

Cleaning Splits and Bonus Data Using Python and Calculating Their Factors!

The most detailed implementation that's available on the Internet!

Updated
6 min read
Cleaning Splits and Bonus Data Using Python and Calculating Their Factors!
Y

I am a CFA and FRM Charterholder. I used to work as a Portfolio Implementation Manager for Aviva Investors managing £3bn+in Assets Under Management in the UK. I am very passionate about educating people on how they can make more money with their existing investments sustainably.

Welcome Readers 🤩 to the sixth article in the series "NSE EOD Stock Prices." This article will discuss how to use the pandas library to clean the Bonus and Splits Data we downloaded from the Bombay Stock Exchange in the article here. In the last article, we also covered how to calculate the Dividend Factors, which is much more complex than what we will be doing in this one. This article will be very much on the same lines; you can access the previous one here.

For readers who were not able to download the files due to technical issues with the BSE website, you can download the sample Bonus and Splits file by clicking on their respective names. This will help you to follow me on this article.

This article is split into two parts:

  1. Cleaning Bonus Data and Calculating Bonus Factor: We use basic pandas string manipulation to calculate the bonus_factor

  2. Cleaning Splits Data and Calculating Split Factor: Similar to Bonuses, we use basic pandas string manipulation and regex operations to calculate the split_factor

Cleaning Bonus Data and Calculating Bonus Factor

The data which I am using contains all bonus information from Jan 2016 - March 2021. I will be posting screenshots of the Jupyter notebook so that it's easier to follow, but if you need just the code, please check out Github here.

  • Loading the Data and Checking the first 5 rows. You will notice we have too many unnecessary columns; realistically, we only need Security Name, Ex-Date, and Purpose.

image.png

  • Stripping all the column headers of any whitespaces, i.e., converting " Security Code" to "Security Code."

image.png

  • Dropping all unnecessary columns and keeping the essentials

image.png

  • Now Extracting the numeric bonus ratio from purpose column, since every row in the purpose column is a string, noticing a common theme, every ratio is after a space in the form of a:b, so we split the string on space i.e. (' ') and then take the last value of that string in the form of a ratio. We then assign this value to a new column ratio

image.png

  • Splitting the ratio into two different columns using a simple function where we split the string of a:b on the semicolon and assign the first value a to column num and second value b to column denom, keep in mind, that we might have strings which do not have a semicolon due to incorrect data from the exchange, in that case, we need to put in a try and except code block so that our program does not fail.
def split_ratio_cols(series):

    num_list = []
    denom_list = []

    for ratio in series:
        try:
            num_list.append(ratio.split(':')[0])
        except IndexError:
            num_list.append(None)
        try:
            denom_list.append(ratio.split(':')[1])
        except IndexError:
            denom_list.append(None)

    return num_list, denom_list

image.png

In the above, we gave the function a pandas series, i.e., ratio column, and it split that series into two unique series, i.e., num and denom. We just then assigned these new series to two new columns in the dataframe.

  • Converting all numeric values to float values using the convert_float function we created in the dividend article. As a sanity check, we will also see how many NaN values we have and will drop them as we will have to adjust them manually.
def convert_float(x):
    try:
        return float(x)
    except ValueError:
        return None
    except TypeError:
        return None

image.png

image.png

  • Finally calculating the Bonus Factor by applying a lambda function on the numerator and denominator columns. But before that, how does the calculation work?

For example, a 1:1 bonus issue means that for every 1 share you hold, you get 1 more share, so this means the total outstanding shares in the market will now double, effectively making the price half. So the calculation is 1 / (1 + 1) which is 0.5

Similarly, a 5:8 bonus will mean for every 8 shares you hold; you will have 5 more shares, this will have a bonus_factor of 8 / (5+8) which is 0.615, all prices before Ex-Date needs to be multiplied by this factor to arrive at adjusted numbers.

If you are confused, take a minute and do the math manually. The below table from Zerodha might also help.

image.png

image.png

Cleaning Splits Data and Calculating Split Factor

As above, the data I am using for Splits contains all information from Jan 2016 - March 2021. I am posting screenshots of the Jupyter notebook, but if you directly prefer seeing the code, you can access it here.

  • Loading the Data and Checking the first 5 rows. You will notice we have too many unnecessary columns; realistically, we only need Security Name, Ex-Date, and Purpose.

image.png

  • Stripping all the column headers of any whitespaces, i.e., converting " Security Code" to "Security Code."

image.png

  • Dropping all unnecessary columns and keeping the essentials

image.png

  • Extracting Split Info from Purpose Column using regex operations and creating two new columns numerator and denominator to store that information. Finally, we will also convert these numbers to float using the convert_float function we created above.
import re

def split_columns(series):

    num = []
    denom = []

    for ratio in series:
        ratio_list = re.findall(r'[0-9]+', ratio)
        # regex r'[0-9]+' searches for all numbers 
        # in a string and adds it into a list.

        if len(ratio_list) == 2:
            #if two numbers are found, append them to num
            # and denom list.
            num.append(ratio_list[0])
            denom.append(ratio_list[1])
        else:
            num.append(None)
            denom.append(None)

    return num, denom

image.png

  • Dropping all NaN values (if any) and stripping all values in the dataframe of any whitespace.

image.png

  • Finally Calculating the Split Factor which is probably the easiest calculation out of all corporate actions. For example, for a stock split from Rs. 10/- to Rs. 5/- would mean a halving of stock price, which can be calculated by 5/10, and we will use a similar lambda function as we did for Bonus.

image.png

And that's it, folks, I can guarantee you, you will find no blog out there which can give you such a detailed implementation of each and every step of what needs to be done. I have realized over time that coding is not just writing code; in fact, it's 5 hours thinking about a problem and spending 5-50 mins to code it.

I hope I have done some justice in explaining the concept at hand; if not, please drop a line with feedback; I would highly appreciate it. If you are stuck at any stage, reach out to me on Linkedin or Twitter.

💡
Please note we haven't made any new posts since Nov 2021 on this blog, you are free to subscribe to the mailing list, however, you will be auto-added to the new blog's (thealtinvestor.in) mailing list as well.

I will soon begin building my Youtube channel, which will have all the content present on this blog in the form of videos. To support me, you can buy me a coffee by clicking here or on the button below.

A

I have tried all what you implemented, thankyou very much. I have tried to implement further, i.e. applying Dividend/Split/Bonus to stocks, but failed, because of series changes from EQ to BE and BE to EQ frequently. Is there any OHCL Adjusted (Dividend/Split/Bonus etc.) Price Data Provider from year 2000.

S

Thanks Yash...excellent stuff, this very useful. I was not aware of the split/bonus info available as a file in BSE....that's saves a lot of time and effort (have experienced pain) of collecting it from other places in past. One question: If we have say 200 stocks that we have applied splits and bonuses adjustment...we want to check the smoothness price data (or correct application of splits/bonuses). Is there a way to do it with pandas? or would we have to check visually in plots

S

Example: I am using nsepy lib and for some reason it did not fetch any data from Sep 16th 2021 to 31st march 2022.

image.png

S

I enjoyed all the 7 articles, learnt a lot. Finish it off with making the adjustments to the price data.

You also mentioned that there is a better way to store this data than saving it as a csv.

Am eagerly waiting for the next article of this series.

Keep rocking!

1
Y

Hi SURAJSINGH RAJPUT glad to know you finished all the articles, very few people complete them as per the analytics I have, so thank you! Google parquet files - they are better than csv. Any suggestions on what our next series should be based on?

1
S

Hey Yash Roongta , thanks for the response, I am exploring the adjusting of prices part and it is quite a brain tickler since am so new to pandas.

If you ask me I would wanna build further on this same project and delve into Tkinter for creating a GUI where in all these 7 articles can be used in a much simpler manner. And further explore doing some charting with the EOD database that we have with us now. There are different libraries for GUI and plotting charts in Python, you can create some good content which will be a continuation of this series itself.

S

What a series! It covers a lot of topics. I hope you create more amazing content.

More from this blog

P

Python For Traders

32 posts

A place for traders to learn more on how to use Python to do Algorithmic Trading and a place for programmers to learn more about financial markets. Use Contact Us above to reach out to the team :)