Modeling Stock Portfolios with Python


Having a look on the representation, the top-most desk represents some faux purchase/promote knowledge in a portfolio. The phase beneath that represents what a per-day snapshot looks as if at more than a few issues around the decided on time-frame. As you’ll be able to see, trades prior to the beginning date turn into extraordinarily vital in calculating the energetic stability at first of the time frame. Simply because I purchased 100 stocks of Apple prior to my get started date does no longer imply it will have to be excluded.

You are going to additionally understand at the bottom-middle desk value according to proportion turns into $11.53 when promoting AAPL, why is that? For the reason that AAPL stocks had been purchased two days aside at other costs according to proportion, we wish to combination and moderate the fee in combination. This implies the order of what you promote turns into vital! For this text, we’ll think the entirety adheres to FIFO requirements, this means that the oldest stocks purchased are the primary stocks bought. After all, we see that the purchase for MSFT inventory will likely be excluded because it exists out of doors of the time frame, so we’ll wish to account for that as neatly in our program.

After taking into account the demanding situations discussed above, I made up our minds making a contemporary ‘day-to-day’ calculation of holdings and inventory costs could be important to generate a correct research. As an added receive advantages, the entirety discussed in Kevin’s article may be possible since maximum all calculations had been in line with taking a snapshot and evaluating it to a later snapshot. Protecting this all in thoughts, we’re going to take the next strategy to construct this:

  • Learn the portfolio report with dates of all purchase/promote transactions, then pull by-day knowledge for all tickers prior to the desired finish date (keep in mind we wish to fear about buys/sells prior to the beginning date)

Now that we’ve got a free construction of what we need to accomplish, let’s get to typing!

Doggo doing a snake boi program

The very first thing we wish to do is seize day-to-day knowledge for each the shares now we have in our portfolio, in addition to the benchmark we’re comparing our portfolio towards. Inventory knowledge is usually no longer to be had in an open supply or unfastened shape, and even supposing there are many superior services and products like Quandl or IEXFinance for hardcore research, they’re a Ferrari to my Prius wishes. Fortuitously, there’s a super library known as yfinance that scrapes Yahoo Finance inventory knowledge and returns it in a structured shape. So let’s get started via uploading the libraries we’ll want and the primary few purposes for purchasing the knowledge:

The primary serve as we’re writing is named create_market_cal and makes use of the pandas_market_calendars library to search out all related buying and selling days inside of a specified time-frame. This library mechanically filters out non-trading days founded available on the market, so I don’t wish to fear about attempting to enroll in knowledge to invalid dates via the use of one thing like pandas.date_range. Since my shares are all US-based, I’ll make a choice NYSE as my calendar, after which standardize the timestamps to cause them to simple to enroll in on later.

The get_data serve as takes an array of inventory tickers in conjunction with a get started and finish date, after which grabs the knowledge the use of the yfinance library indexed above. You’ll understand the tip date parameter features a timedelta shift, it’s because yfinance is unique of the tip date you supply. Since we don’t need to keep in mind this caveat when environment our parameters, we’ll shift the date+1 right here the use of timedelta.

After all, the get_benchmark serve as simply feeds into get_data after which drops the ticker image. Now that we’ve got our preliminary purposes let’s run the next to assign the entirety to variables:

portfolio_df = pd.read_csv('stock_transactions.csv')
portfolio_df['Open date'] = pd.to_datetime(portfolio_df['Open date'])
symbols = portfolio_df.Image.distinctive()
stocks_start = datetime.datetime(2017, 9, 1)
stocks_end = datetime.datetime(2020, 4, 7)
daily_adj_close = get_data(symbols, stocks_start, stocks_end)
daily_adj_close = daily_adj_close[['Close']].reset_index()
daily_benchmark = get_benchmark(['SPY'], stocks_start, stocks_end)
daily_benchmark = daily_benchmark[['Date', 'Close']]
market_cal = create_market_cal(stocks_start, stocks_end)

As some extent of reference, my CSV report accommodates the next columns and also you’ll need to be sure your CSV accommodates the similar columns should you’re looking to mirror:

Transaction CSV Structure

We’ve 4 an important datasets so as to continue:

  1. portfolio_df with our purchase/promote transaction historical past

The usage of this we will transfer ahead to the next move, onward to glory!

Now that we’ve got those 4 datasets, we wish to work out what number of stocks we actively held all the way through the beginning date specified. To do this, we’re going to create two purposes, portfolio_start_balance and position_adjust.

Assigning the output to a variable will have to provide the energetic positions inside of your portfolio:

active_portfolio = portfolio_start_balance(portfolio_df, stocks_start)

So now that we will see the code let’s stroll throughout the interior workings and display what’s taking place in layman’s phrases.


First, we carry our CSV knowledge and get started date to the portfolio_start_balance serve as and create a dataframe of all trades that came about prior to our get started date. We’ll then take a look at to peer if there are long term gross sales after the start_date since we can reconstruct a snapshot of this dataframe after all:

positions_before_start = portfolio[portfolio['Open date'] <= start_date]
future_sales = portfolio[(portfolio['Open date'] >= start_date) & (portfolio['Type'] == 'Promote.FIFO')]

We’ll then create a dataframe of gross sales that passed off prior to the start_date. We wish to be sure that those are all factored out of our energetic portfolio at the specified start_date:

gross sales = positions_before_start[positions_before_start['Type'] == 'Promote.FIFO'].groupby(['Symbol'])['Qty'].sum()
gross sales = gross sales.reset_index()

Subsequent, we’ll make a last dataframe of positions that didn’t have any gross sales happen over the desired period of time:

positions_no_change = positions_before_start[~positions_before_start['Symbol'].isin(gross sales['Symbol'].distinctive())]

Now we’ll loop thru each sale in our gross sales dataframe, name our position_adjust serve as, after which append the output of that into our empty adj_postitions_df:

adj_positions_df = pd.DataFrame()
on the market in gross sales.iterrows():
adj_positions = position_adjust(positions_before_start, sale)
adj_positions_df = adj_positions_df.append(adj_positions)

Let’s now take a look at how the position_adjust serve as works so we will totally perceive what’s occurring right here.


First, we’ll create an empty dataframe known as stocks_with_sales the place we’ll later upload adjusted positions, and some other dataframe retaining all the transactions categorised as ‘buys’.

Remember the fact that we already filtered out ‘buys at some point’ within the portfolio_start_balance serve as, so no wish to do it once more right here. You’ll additionally understand that we’re sorting via ‘Open Date’, and that will likely be vital given we need to subtract positions the use of the FIFO means. Through sorting the dates, we all know we will transfer iteratively thru an inventory of old-to-new positions:

stocks_with_sales = pd.DataFrame()    
buys_before_start = daily_positions[daily_positions['Type'] == 'Purchase'].sort_values(via='Open date')

Now that we’ve got all buys in one dataframe, we’re going to filter out for all buys the place the inventory image fits the inventory image of the bought place:

for place in buys_before_start[buys_before_start['Symbol'] == sale[1]['Symbol']].iterrows():

You’ll understand that we’re the use of indexing to get admission to the ‘Image’ column in our knowledge. That’s as a result of the use of iterrows() creates a tuple from the index [0] and the sequence of knowledge [1]. This is similar explanation why we’ll use indexing once we loop thru buys_before_start:

for place in buys_before_start[buys_before_start['Symbol'] == sale[1]['Symbol']].iterrows():
if place[1]['Qty'] <= sale[1]['Qty']:
sale[1]['Qty'] -= place[1]['Qty']
place[1]['Qty'] = 0
place[1]['Qty'] -= sale[1]['Qty']
sale[1]['Qty'] -= sale[1]['Qty']
stocks_with_sales = stocks_with_sales.append(place[1])

So what’s taking place within the loop this is that for each purchase in buys_before_start:

  • If the volume of the oldest purchase quantity is ≤ the bought amount (aka you bought greater than your preliminary acquire quantity), subtract the volume of the purchase place from the promote, then set the purchase amount to 0

As soon as that loops thru each gross sales place your code will now execute the overall traces of portfolio_start_balance:

adj_positions_df = adj_positions_df.append(positions_no_change)
adj_positions_df = adj_positions_df.append(future_sales)
adj_positions_df = adj_positions_df[adj_positions_df['Qty'] > 0]

So we’re taking our adjusted positions in adj_positions_df, including again positions that by no means had gross sales, including again gross sales that happen at some point, and in spite of everything filtering out any rows that position_adjust zeroed out. You will have to now have a correct document of your energetic holdings as of the beginning date!

So now that we’ve got a correct remark of positions held at first date, let’s create day-to-day efficiency knowledge! Our technique is very similar to what we did in step 2, if truth be told, we’ll re-use the position_adjust means once more since we’ll wish to account for attainable gross sales inside of our date vary. We’ll pass forward and create two new purposes, time_fill and fifo, and I’ll give an explanation for what every does in additional element:


Very similar to portfolio_start_balance, our function is to supply our dataframe of energetic positions, in finding the gross sales, and zero-out gross sales towards purchase positions. The principle distinction this is that we’re going to loop thru the use of our market_cal record with legitimate buying and selling days:

gross sales = portfolio[portfolio['Type'] == 'Promote.FIFO'].groupby(['Symbol','Open date'])['Qty'].sum()
gross sales = gross sales.reset_index()
per_day_balance = []
for date in market_cal:
if (gross sales['Open date'] == date).any():
portfolio = fifo(portfolio, gross sales, date)

This manner we will pass day-by-day and notice if any gross sales passed off, regulate positions accurately, after which go back a proper snapshot of the day-to-day knowledge. As well as, we’ll additionally filter out to positions that experience passed off prior to or on the present date and ensure there are best buys. We’ll then upload a Date Snapshot column with the present date within the market_cal loop, then append it to our per_day_balance record:

daily_positions = portfolio[portfolio['Open date'] <= date]
daily_positions = daily_positions[daily_positions['Type'] == 'Purchase']
daily_positions['Date Snapshot'] = date


Our fifo serve as takes your energetic portfolio positions, the gross sales dataframe created in time_fill, and the present date within the market_cal record. It then filters gross sales to search out any that experience passed off at the present date, and create a dataframe of positions no longer suffering from gross sales:

gross sales = gross sales[gross sales['Open date'] == date]
daily_positions = daily_positions[daily_positions['Open date'] <= date]
positions_no_change = daily_positions[~daily_positions['Symbol']. isin(gross sales['Symbol'].distinctive())]

We’ll then use our trusty position_adjust serve as to zero-out any positions with energetic gross sales. If there have been no gross sales for the precise date, our serve as will merely append the positions_no_change onto the empty adj_positions dataframe, leaving you with a correct day-to-day snapshot of positions:

adj_positions = pd.DataFrame()
on the market in gross sales.iterrows():
adj_positions = adj_positions.append(position_adjust( daily_positions, sale))
adj_positions = adj_positions.append(positions_no_change)
adj_positions = adj_positions[adj_positions['Qty'] > 0]

Operating this line of code will have to go back again an inventory of all buying and selling days throughout the time vary specified, in conjunction with a correct depend of positions per-day:

positions_per_day = time_fill(active_portfolio, market_cal)

In the event you’re nonetheless following alongside we’re in the house stretch! Now that we’ve got a correct by-day ledger of our energetic holdings, we will pass forward and create the overall calculations had to generate graphs! We’ll be including an extra six purposes to our code to perform this:

Let’s get started with the remaining serve as per_day_portfolio_calcs since it’ll use all of the different purposes.


Now that we’ve got our positions_per_day from step 3, our function is to cross that in conjunction with daily_benchmark, daily_adj_close, and stocks_start to this new serve as:

combined_df = per_day_portfolio_calcs(positions_per_day, daily_benchmark, daily_adj_close, stocks_start)

We’ll then concatenate our record of dataframes right into a unmarried record the use of pd.concat:

df = pd.concat(per_day_holdings, type=True)

Now that we’ve got a unmarried huge dataframe we’ll cross it to the remainder purposes in per_day_portfolio_calcs.


If we need to monitor day-to-day efficiency we’ll wish to know the theoretical worth of our holdings according to day. This calls for taking the volume of securities recently owned after which multiplying it via the day-to-day shut for every safety owned.

mcps = modified_cost_per_share(df, daily_adj_close, stocks_start)

To try this, we offer our new unmarried df in conjunction with the per-day knowledge we pulled the use of yfinance, in addition to our get started date. We’ll then merge our portfolio to the day-to-day shut knowledge via becoming a member of the date of the portfolio snapshot to the date of the day-to-day knowledge, in addition to becoming a member of at the ticker. For folks extra accustomed to SQL that is necessarily a left sign up for:

df = pd.merge(portfolio, adj_close, left_on=['Date Snapshot', 'Symbol'],right_on=['Date', 'Ticker'], how='left')

As soon as now we have our merged df we’ll rename the day-to-day just about ‘Image Adj Shut’, after which multiply the day-to-day shut via the volume of stocks owned. Shedding additional columns will go back the dataframe we wish to continue:

df.rename(columns={'Shut': 'Image Adj Shut'}, inplace=True)
df['Adj cost daily'] = df['Symbol Adj Close'] * df['Qty']
df = df.drop(['Ticker', 'Date'], axis=1)


Now that we’ve got a correct day-to-day value of our securities, we’ll need to upload in our benchmark to the dataset so as to make comparisons towards our portfolio:

bpc = benchmark_portfolio_calcs(mcps, daily_benchmark)

We commence via merging our day-to-day benchmark knowledge to the proper snapshots via the use of a merge very similar to the only in modified_cost_per_share:

portfolio = pd.merge(portfolio, benchmark, left_on=['Date Snapshot'], right_on=['Date'], how='left')
portfolio = portfolio.drop(['Date'], axis=1)
portfolio.rename(columns={'Shut': 'Benchmark Shut'}, inplace=True)

Now that we’ve got day-to-day closes for our benchmark merged to our portfolio dataset, we’ll filter out our daily_benchmark knowledge in line with its max and min dates. It’s vital to make use of max and min vs. your get started and finish date for the reason that max/min will be mindful days the place the marketplace used to be open:

benchmark_max = benchmark[benchmark['Date'] == benchmark['Date'].max()]
portfolio['Benchmark End Date Close'] = portfolio.observe(lambda x: benchmark_max['Close'], axis=1)
benchmark_min = benchmark[benchmark['Date'] == benchmark['Date'].min()]
portfolio['Benchmark Start Date Close'] = portfolio.observe(lambda x: benchmark_min['Close'], axis=1)

Nice! So now now we have absolute get started and finish closes for our benchmark within the portfolio dataset as neatly, which will likely be vital when calculating returns each day.


So now that our benchmark knowledge is added, let’s transfer onto the next move:

pes = portfolio_end_of_year_stats(bpc, daily_adj_close)

Our function this is to take the output of benchmark_portfolio_calcs, in finding the remaining day of shut for all of the shares within the portfolio, after which upload a Ticker Finish Date Shut column to our portfolio dataset. We’ll do that via as soon as once more merging to the day-to-day inventory knowledge, filtering for the max date, after which becoming a member of in line with the ticker image:

adj_close_end = adj_close_end[adj_close_end['Date'] == adj_close_end['Date'].max()]portfolio_end_data = pd.merge(portfolio, adj_close_end, left_on='Image', right_on='Ticker')portfolio_end_data.rename(columns={'Shut': 'Ticker Finish Date Shut'}, inplace=True)portfolio_end_data = portfolio_end_data.drop(['Ticker', 'Date'], axis=1)

Now only one extra step till we generate our calculations!


This ultimate step takes the up to date portfolio dataframe, the day-to-day inventory knowledge from yfinance, and assigns get started of 12 months an identical positions for the benchmark:

pss = portfolio_start_of_year_stats(pes, daily_adj_close)

We’ll first filter out the day-to-day shut knowledge to its starting date, then merge our portfolio knowledge to it the use of the ticker image. We’ll then name this shut Ticker Get started Date Shut for comfort:

adj_close_start = adj_close_start[adj_close_start['Date'] == adj_close_start['Date'].min()]portfolio_start = pd.merge(portfolio, adj_close_start[['Ticker', 'Close', 'Date']], left_on='Image', right_on='Ticker')portfolio_start.rename(columns={'Shut': 'Ticker Get started Date Shut'}, inplace=True)

Then we wish to ‘true up’ our adjusted value according to proportion prices, however why? Believe you purchased Google a very long time in the past at $500/proportion, however now you need to calculate YTD returns to your place in 2020. In the event you use $500 as your value foundation for the start of 2020, you’re no longer going to have a correct comparability for the reason that value foundation is from years in the past. To mend this, we’re going to make use of Numpy’s the place serve as:

portfolio_start['Adj cost per share'] = np.the place(portfolio_start['Open date'] <= portfolio_start['Date'],
portfolio_start['Ticker Start Date Close'],
portfolio_start['Adj cost per share'])

Merely put, that is announcing ‘if the open date is ≤ the date of the beginning date, then Adj value according to proportion is the same as Ticker Get started Date Shut’ (last value of the inventory from the min date at the yfinance knowledge). If no longer, then use the present Adj value according to proportion.

The remainder phase modifies the adjusted value in line with the changed value according to proportion, drops unneeded columns from the merge, after which calculates the an identical quantity of benchmarks stocks you could possibly have owned in line with your newly calculated adjusted value:

portfolio_start['Adj cost'] = portfolio_start['Adj cost per share'] * portfolio_start['Qty']
portfolio_start = portfolio_start.drop(['Ticker', 'Date'], axis=1)
portfolio_start['Equiv Benchmark Shares'] = portfolio_start['Adj cost'] / portfolio_start['Benchmark Start Date Close']portfolio_start['Benchmark Start Date Cost'] = portfolio_start['Equiv Benchmark Shares'] * portfolio_start['Benchmark Start Date Close']

Congratulations, we have now all of the important knowledge to calculate returns correctly! Let’s knock out this remaining phase after which dive into visualizing this!


The general step right here merely takes the aggregated dataframe from all of the different purposes, applies a host of calculations towards the knowledge we’ve been enhancing, and returns a last dataframe:

returns = calc_returns(pss)

The primary set,Benchmark Go back and Ticker Go back, each use a present shut value divided via their starting value foundation to calculate a go back:

portfolio['Benchmark Return'] = portfolio['Benchmark Close'] / portfolio['Benchmark Start Date Close'] - 1portfolio['Ticker Return'] = portfolio['Symbol Adj Close'] / portfolio['Adj cost per share'] - 1

Proportion worth for every is calculated the similar means, the use of the changed per-day amounts and an identical benchmark stocks we calculated previous:

portfolio['Ticker Share Value'] = portfolio['Qty'] * portfolio['Symbol Adj Close']portfolio['Benchmark Share Value'] = portfolio['Equiv Benchmark Shares'] * portfolio['Benchmark Close']

We’ll do the similar factor once more to calculate financial acquire/loss, subtracting the proportion worth columns from the changed adjusted value we calculated within the portfolio_start_of_year_stats serve as:

portfolio['Stock Gain / (Loss)'] = portfolio['Ticker Share Value'] - portfolio['Adj cost']portfolio['Benchmark Gain / (Loss)'] = portfolio['Benchmark Share Value'] - portfolio['Adj cost']

After all, we’ll calculate absolute go back values the use of the benchmark metrics we calculated previous:

portfolio['Abs Value Compare'] = portfolio['Ticker Share Value'] - portfolio['Benchmark Start Date Cost']portfolio['Abs Value Return'] = portfolio['Abs Value Compare']/portfolio['Benchmark Start Date Cost']portfolio['Abs. Return Compare'] = portfolio['Ticker Return'] - portfolio['Benchmark Return']

Increase! Now let’s work out easy methods to graph our new knowledge and end this up.

Step 4 — Visualize the Knowledge

So now that we went thru all of that to get our day-to-day efficiency knowledge, how will have to we best possible show it? The largest good thing about this day-to-day knowledge is to peer how your positions carry out over the years, so let’s check out having a look at our knowledge on an aggregated foundation first.

I’ve been the use of Plotly so much for contemporary facet tasks, so for doing this I’m going to choose for easy and pass with the Plotly Express library. Since we’ll wish to combination on a daily basis’s shares right into a unmarried metric per-day, I’m going to put in writing this as a serve as that takes your finished dataframe and two metrics you need to devise towards every different:

As you’ll be able to see, we’ll provide ticker and benchmark acquire/loss because the metrics, then use a groupby to combination the day-to-day efficiency to the portfolio-level. Plotting it out will have to go back one thing very similar to this!

Aggregated by-day acquire/loss vs. benchmark

You’ll be able to additionally combination the use of other metrics like Abs Worth Evaluate to peer this as a unmarried line:

Showing the use of Absolute Worth Evaluate metric

That is nice, however essentially the most helpful view, in my view, may also be generated via the use of the facet_col choice in plotly categorical to generate a chart according to ticker that compares the benchmark towards every ticker’s efficiency:

We’ll additionally use the facet_col_wrap parameter so as to restrict the volume of graphs according to row. Operating this code will have to generate one thing very similar to the output beneath!

Instance knowledge of benchmark go back comparability according to ticker

We coated numerous floor right here, and with a bit of luck, this has been useful for finding out extra about populating and examining monetary knowledge! There’s much more that may be explored at some point, together with:

  • Factoring in dividends and splits — yfinance has this information to be had, however I sought after to submit this as a primary step prior to including extra options

Hope this used to be useful to any person in the market, and be at liberty to succeed in out or remark beneath with questions/feedback. Thank you for studying!

Source link


Please enter your comment!
Please enter your name here