Introduction to QuantConnect

Ryan

This is written for those who have some knowledge of Python and basic finance terms. If you don't get anything, ask Claude 3.5 Sonnet or ask in #general.

QuantConnect is the platform where we run our algorithms. You can use C#, but we'll be using Python for the sake of simplicity. Opening an account is pretty easy, but you'll have to link a credit card. Don't worry about it too much, you won't be charged unless you explicitly sign up for paid tiers.

Diving straight in, you will be greeted with this code when you start a blank QuantConnect file:

# region imports
from AlgorithmImports import *
# endregion

class HipsterApricotHyena(QCAlgorithm):

    def initialize(self):
    self.set_start_date(2023, 7, 13)
    self.set_cash(100000)
    self.add_equity("SPY", Resolution.MINUTE)
    self.add_equity("BND", Resolution.MINUTE)
    self.add_equity("AAPL", Resolution.MINUTE)

    def on_data(self, data: Slice):
    if not self.portfolio.invested:
	self.set_holdings("SPY", 0.33)
	self.set_holdings("BND", 0.33)
	self.set_holdings("AAPL", 0.33)

There are three functions you need to care about: initialize(), onData(), and train().

Initialize() is where you declare all your details: starting cash, symbols, test period, train period, etc. More settings available here.
OnData() is the the code that gets looped every unit time (e.g day) for the test period. This is where you want to apply your strategy.
train() gets run before onData() to train your model. For larger models (e.g RNNs) you might want to pretrain it somewhere else and call it directly in onData(), but that comes later.

We'll be using SPY in this tutorial. We'll also change how it's represented in the initialize() function just to make it easier to deal with.

# region imports
from AlgorithmImports import *
# endregion

class HipsterApricotHyena(QCAlgorithm):

def initialize(self):
    self.set_start_date(2023, 7, 13)
    self.set_end_date(2024, 12, 31)
    self.set_cash(100000)

    # this lets us use it to compare it against a buy-and-hold strategy later
    self.symbol = self.AddEquity("SPY", Resolution.Daily).Symbol  
    self.set_benchmark(self.symbol)

def on_data(self, data: Slice):

AddEquity() adds the particular symbol (SPY in this case) into your model
The Resolution parameter defines the frequency of data - it gives you the averaged value for that particular frequency
self.set_benchmark is a function we'll need to define later. It won't work right now.

Broadly speaking, the algorithm can be broken down into two components:

Prediction
Strategy

Prediction refers to obtaining as accurate as possible a value for a specific metric - price, volatility, etc. The better your prediction, the more confidence you can place on your model, and the larger the sum you can entrust to your algorithm.

Strategy, on the other hand, refers to what you do with the value. For instance, you might know that SPY (selling for 582.19 at time of writing) might reach 600 tomorrow, then drop to 200 the day after. The most straightforward thing to do would be to buy it immediately, sell tomorrow, and short it for the day after. If you're not certain in your model's prediction, you might hedge it with call/put options to cap your potential losses. If you have supreme confidence in your prediction, you might sell a ton of put options the first day and call options the second. The number of combinations and permutations of things you can do only increases from here, and that's for you to study in your own time. Resources are linked at the bottom.

In this tutorial, we will be implementing logistic regression with momentum on SPY. Add these imports at the top of your file:

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

Modeling SPY

A model, in essence, is a learned function that maps input 'X' to prediction 'Y'. In this tutorial, we'll be predicting the future direction given the relative differences between the past three days. To do that, we first define the training period with self.History() in initialize():

def initialize(self):
    self.set_start_date(2023, 7, 13)
    self.set_end_date(2024, 12, 31)
    self.set_cash(100000)
    self.symbol = self.AddEquity("SPY", Resolution.Daily).Symbol 
    self.set_benchmark(self.symbol)

    history = self.History(self.symbol, 1800, Resolution.Daily)
    self.model = self.train(history)

Looking at the three parameters we have, self.symbol, 1800, and Resolution.Daily

self.symbol and Resolution.Daily are self explanatory - while you can use other symbols and resolutions, we'll keep it the same as the test data
1800 simply defines the number of days in the test period. Again, you can further define it and increase specificity if you so wish.
self.model = self.train(history) is where you keep the trained model to be used in test time. This is what we'll be working on next.

It's important to note that data is king. Andrej Karpathy states this very well:

The first step to training a neural net is to not touch any neural net code at all and instead begin by thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns.

Therefore, we'll always start by looking at the data. To do this, we'll move to the research.ipynb located in the same directory.

Here's the initalization code, we will skip all those details for now and look directly at the data:

# Add this to the top of your research.ipynb
qb = QuantBook()
spy = qb.AddEquity("SPY")
df = qb.History(qb.Securities.Keys, 1800, Resolution.Daily)
df

Next comes playing around with pandas. While we highly encourage you to experiment, that's for you to do in your own time. Here, we are using the relative changes between days, then normalizing them.

# trailing average values for the past three days
df['average'] = (df['high'] + df['low']) / 2
df['lag_1'] = df['average'].shift(1)
df['lag_2'] = df['average'].shift(2)
df['lag_3'] = df['average'].shift(3)

# interday differences
df['diff_1'] = df['average'] - df['lag_1']
df['diff_2'] = df['lag_1'] - df['lag_2']
df['diff_3'] = df['lag_2'] - df['lag_3']

# interday gradients
df['grad_1'] = df['diff_1'] / df['lag_1'] # relative to the previous day
df['grad_2'] = df['diff_2'] / df['lag_2']
df['grad_3'] = df['diff_3'] / df['lag_3']

def normalize(series):
    mean = series.mean()
    std = series.std()
    return (series - mean) / std

df['grad_1_norm'] = normalize(df['grad_1'])
df['grad_2_norm'] = normalize(df['grad_2'])
df['grad_3_norm'] = normalize(df['grad_3'])

This constitutes the 'X' element of the function. For Y, we make another column describing whether the stock moves up or down the next day.

# 1 if it moves up, 0 if it stays the same/moves down
df['target'] = np.where(df['average'].shift(-1) > df['average'], 1, 0)

We can just leave this as a pandas DataFrame for later. Now we can go back to the main algorithm and look at the train() function:

def train(self, history):
        df = pd.DataFrame(history)

        # trailing average values for the past three days
        df['average'] = (df['high'] + df['low']) / 2
        df['lag_1'] = df['average'].shift(1)
        df['lag_2'] = df['average'].shift(2)
        df['lag_3'] = df['average'].shift(3)

        # interday differences
        df['diff_1'] = df['average'] - df['lag_1']
        df['diff_2'] = df['lag_1'] - df['lag_2']
        df['diff_3'] = df['lag_2'] - df['lag_3']

        # interday gradients
        df['grad_1'] = df['diff_1'] / df['lag_1'] # relative to the previous day
        df['grad_2'] = df['diff_2'] / df['lag_2']
        df['grad_3'] = df['diff_3'] / df['lag_3']

        def normalize(series):
            mean = series.mean()
            std = series.std()
            return (series - mean) / std

        df['grad_1_norm'] = normalize(df['grad_1'])
        df['grad_2_norm'] = normalize(df['grad_2'])
        df['grad_3_norm'] = normalize(df['grad_3'])
        
        # target
        df['target'] = np.where(df['average'].shift(-1) > df['average'], 1, 0)

        if df.empty:
            self.Log("Training data is empty. Cannot train model.")
            return None

And for the final step of this section, we create the X and Y features, initialize the model, and train it. Almost everything is abstracted away by the libraries, so it's worthwhile to manually write it out every now and then.

X = df[['grad_1_norm', 'grad_2_norm', 'grad_3_norm']].dropna() # dropna() removes the empty values created when we shifted the data
y = df[['target']]
model = LogisticRegression()
model.fit(X, y)

This is how your function should look altogether now:

def train(self, history):
        df = pd.DataFrame(history)

        # trailing average values for the past three days
        df['average'] = (df['high'] + df['low']) / 2
        df['lag_1'] = df['average'].shift(1)
        df['lag_2'] = df['average'].shift(2)
        df['lag_3'] = df['average'].shift(3)

        # interday differences
        df['diff_1'] = df['average'] - df['lag_1']
        df['diff_2'] = df['lag_1'] - df['lag_2']
        df['diff_3'] = df['lag_2'] - df['lag_3']

        # interday gradients
        df['grad_1'] = df['diff_1'] / df['lag_1'] # relative to the previous day
        df['grad_2'] = df['diff_2'] / df['lag_2']
        df['grad_3'] = df['diff_3'] / df['lag_3']

        def normalize(series):
            mean = series.mean()
            std = series.std()
            return (series - mean) / std

        df['grad_1_norm'] = normalize(df['grad_1'])
        df['grad_2_norm'] = normalize(df['grad_2'])
        df['grad_3_norm'] = normalize(df['grad_3'])
        
        # target
        df['target'] = np.where(df['average'].shift(-1) > df['average'], 1, 0)

        if df.empty:
            self.Log("Training data is empty. Cannot train model.")
            return None

        X = df[['grad_1_norm', 'grad_2_norm', 'grad_3_norm']].dropna() # dropna() removes the empty values created when we shifted the data
        y = df[['target']]
        model = LogisticRegression()
        model.fit(X, y)

        return model

Using the Prediction

Now we move onto the OnData() function, which will be run when we want to backtest or trade.

Let's first add error checking - just paste it at the top of the function:

if not data.ContainsKey(self.symbol):
    return

if self.model is None:
    self.Log("Model is not trained. Skipping prediction.")
    return

Predicting the Movement

Again, we want to predict the movement based on the previous days' movements. So we take the previous days' data, preprocess it, and apply the model. This is pretty standard Python, so I won't bore you with the details:

history = self.History(self.symbol, 10, Resolution.Daily) # past 10 days' worth of data
df = self.prepare_data(history)

if df.empty: 
    self.Log("DataFrame is empty. Skipping this OnData call.")
    return

# obtaining input data, making prediction
latest_data = df.iloc[[-1]][['grad_1_norm', 'grad_2_norm', 'grad_3_norm']]
pred = self.model.predict(latest_data)[0]

Making a Decision

And finally, making a decision. I'll show you the code first, then explain it:

buy_threshold = 0.6
sell_threshold = 0.4
portfolio_weight = 0.7

if pred >= buy_threshold:
    self.SetHoldings(self.symbol, portfolio_weight)
    purchase_price = self.Portfolio[self.symbol].Price

elif pred <= sell_threshold:
    self.liquidate()

else:
    pass

In essence, every strategy can be boiled down to choosing:

When to buy
When to sell

And there are many ways in which you make this decision. You can use price, volatility, momentum, P/B, and essentially any financial metric to decide; as well as whatever information you deem important and high-signal enough to support your decision. Here, we use momentum, basing off a stock's tendency to continue moving in its current direction.

We'll skip the discussion on the code and focus on the functions involved:

self.SetHoldings() calculates the number of asset units to purchase based on the portfolio weight - as in, the buying power you currently have (0.5 of \$100 = \$50).
self.Portfolio[self.symbol].Price gets the purchase price of the symbol.
self.liquidate() liquidates everything. You can pass parameters in to further define how much and what symbols you'd like to liquidate, but here, we are dumping everything if the threshold goes below 0.4.

As your strategies become more sophisticated, the more functions and features you'll be able to involve. QuantConnect lacks good documentation, so reading other people's code can provide you with better insight into the action space.

Putting Everything Together

And now, putting everything (everything) together:

# region imports
from AlgorithmImports import *
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
# endregion

class HipsterApricotHyena(QCAlgorithm):

    def initialize(self):
	self.set_start_date(2023, 7, 13)
	self.set_end_date(2024, 12, 31)
	self.set_cash(100000)
	self.symbol = self.AddEquity("SPY", Resolution.DAILY).Symbol

	# benchmark graph. very useful for visualizations
	self.set_benchmark(self.symbol)
	self.cap = 100000
	self.benchmark_chart = []

	history = self.History(self.symbol, 1800, Resolution.DAILY)
	self.model = self.train(history)

    def on_data(self, data: Slice):
	# some checking to prevent errors
	if not data.ContainsKey(self.symbol):
	    return

	if self.model is None:
	    self.Log("Model is not trained. Skipping prediction.")
	    return

	self.plot_market()
	# ================================================================   
	# predicting the movement
	# ================================================================   
	# obtaining history
	history = self.History(self.symbol, 10, Resolution.Daily) # past 10 days' worth of data
	df = self.prepare_data(history)

	if df.empty: 
	    self.Log("DataFrame is empty. Skipping this OnData call.")
	    return
	
	# obtaining input data, making prediction
	latest_data = df.iloc[[-1]][['grad_1_norm', 'grad_2_norm', 'grad_3_norm']]
	pred = self.model.predict(latest_data)[0]
	
	self.Log(f"Date: {self.Time.date()}, Predicted Movement: {pred}")

	# ================================================================   
	# using the prediction
	# ================================================================   
	buy_threshold = 0.6
	sell_threshold = 0.4
	portfolio_weight = 0.7

	if pred >= buy_threshold:
	    self.SetHoldings(self.symbol, portfolio_weight)
	    purchase_price = self.Portfolio[self.symbol].Price
	    # self.PlaceStopMarketOrder(purchase_price, direction="long") # hedging

	elif pred <= sell_threshold:
	    self.liquidate()

	else:
	    pass

    def train(self, history):
	df = self.prepare_data(history)

	X = df[['grad_1_norm', 'grad_2_norm', 'grad_3_norm']]
	y = df[['target']]

	model = LogisticRegression()
	model.fit(X, y)

	return model

    def prepare_data(self, history):
	df = pd.DataFrame(history)

	# trailing average values for the past three days
	df['average'] = (df['high'] + df['low']) / 2
	df['lag_1'] = df['average'].shift(1)
	df['lag_2'] = df['average'].shift(2)
	df['lag_3'] = df['average'].shift(3)

	# interday differences
	df['diff_1'] = df['average'] - df['lag_1']
	df['diff_2'] = df['lag_1'] - df['lag_2']
	df['diff_3'] = df['lag_2'] - df['lag_3']

	# interday gradients
	df['grad_1'] = df['diff_1'] / df['lag_1'] # relative to the previous day
	df['grad_2'] = df['diff_2'] / df['lag_2']
	df['grad_3'] = df['diff_3'] / df['lag_3']

	def normalize(series):
	    mean = series.mean()
	    std = series.std()
	    return (series - mean) / std

	df['grad_1_norm'] = normalize(df['grad_1'])
	df['grad_2_norm'] = normalize(df['grad_2'])
	df['grad_3_norm'] = normalize(df['grad_3'])
	
	# target
	df['target'] = np.where(df['average'].shift(-1) > df['average'], 1, 0)

	df = df.dropna(subset=['grad_1', 'grad_2', 'grad_3', 'target'])

	if df.empty:
	    self.Log("Training data is empty. Cannot train model.")
	    return pd.DataFrame()

	return df

    def PlaceStopMarketOrder(self, purchase_price, direction):
	if direction == "long":
	    stop_price = purchase_price * 0.99  # 1% below purchase price
	    self.stopMarketTicket = self.StopMarketOrder(self.symbol, -self.Portfolio[self.symbol].Quantity, stop_price)
	    self.Log(f"Placed stop market order to sell at ${stop_price:.2f} (0.5% below purchase price).")
	elif direction == "short":
	    stop_price = purchase_price * 1.01  # 1% above purchase price
	    self.stopMarketTicket = self.StopMarketOrder(self.symbol, -self.Portfolio[self.symbol].Quantity, stop_price)
	    self.Log(f"Placed stop market order to cover at ${stop_price:.2f} (0.5% above purchase price).")


    def plot_market(self): #plot the market on the Startegy Equity Chart with your portfolio
	hist = self.History([self.symbol], 252, Resolution.Daily)['close'].unstack(level=0).dropna()
	self.benchmark_chart.append(hist[self.symbol].iloc[-1])
	benchmark_perf = self.benchmark_chart[-1] / self.benchmark_chart[0] * self.cap
	self.Plot("Strategy Equity", "Buy & Hold", benchmark_perf)

And there you have it - a working (albeit underperforming benchmark) trading algorithm! We've also included hedging and the plot_market() function for you to see how you perform compared to just buying and holding. For any more questions, ask in Discord.

Acknowledgements

Claude 3.5 Sonnet
@michaelbol