Integrates financial market data provided by Norgate Data with Zipline, a Pythonic algorithmic trading library for backtesting.
Key features of this extension
- Simple bundle creation
- Survivorship bias-free bundles
- Incorporates time series data such as historical index membership and dividend yield into Zipline's Pipeline mechanism
- No modifications to the Zipline code base (except to fix problems with installation and obsolete calls that crash Zipline)
Table of Contents
Requirements
- Zipline 3.0 and above (based upon the Zipline Reloaded fork led by Stefan Jansen, which originates from the Quantopian-developed Zipline (which became become abandonware). We recommend the latest release of Zipline Reloaded (currently v3.0.4) and associated packages (such as exchange-calendars) - there are too many quirks and workarounds for issues with older versions of Zipline to continue to maintain backwards compatibility.
- Python 3.8+ (Python 3.12 recommended)
- Microsoft Windows
- An active Norgate Data subscription
- Norgate Data Updater software installed and running
- Writable local user folder named .norgatedata (or defined in environment variable NORGATEDATA_ROOT) - defaults to C:\Users\Your username\.norgatedata
- Python packages: Pandas, Numpy, Logbook
Note: The "Norgate Data Updater" application (NDU) is a Windows-only application. NDU must be running for this Python package to work.
How to install Zipline using Anaconda/Miniconda
Most people have problems installing Zipline because they attempt to install it into their base environment. The solution is simple: Create a separate virtual environment that only has the necessary Python pacakges you require. If you want to experiment then just create a new environment.
Firstly, install either Anaconda (graphical environment) or Miniconda (cut-down command-line-based). These instructions relate to Windows only.
How to install Zipline Reloaded and PyFolio, and Zipline-NorgateData
Here's how we installed it here at Norgate:
Note: We use Mamba instead of conda for the majority of the install, as it seems to be much quicker in resolving everything (ie seconds instead of minutes) and parallelizing the downlodas/install.
Install the latest 64 bit MiniConda or Anaconda Distribution.
If you have ANY other running instances of Anaconda prompt/jupyter etc., ensure sure they are all shut down.
Start an Anaconda (base) prompt, create an environment and install the appropriate versions of packages:
conda create -y -n zip312 python=3.12
conda activate zip312
conda install -y -c conda-forge ta-lib
pip install zipline-reloaded pyfolio-reloaded
conda install -y -c conda-forge jupyter
pip install norgatedata zipline-norgatedata
if not exist %HOMEPATH%\.zipline mkdir %HOMEPATH%\.zipline
if not exist %HOMEPATH%\.zipline\extension.py copy /b NUL %HOMEPATH%\.zipline\extension.py
Upgrades of Zipline-NorgateData
To receive upgrades/updates
pip install zipline-norgatedata --upgrade
Exchange Calendar Issues that require patching
Norgate Data has developed the following patches. Please make sure you implement the ones applicable to you.
Patch to allow backtesting before 20 years ago
Unfortunately this is hardcoded into exchange_calendars for some reason. To extend backtesting beyond more than 20 years from today:
Navigate to the exchange_calendars folder within site packages. This is typically located at C:\Users<your username>\miniconda3\envs\zip312\Lib\site-packages\exchange_calendars
Edit exchange_calendar.py
Go to line 58 and change:
GLOBAL_DEFAULT_START = pd.Timestamp.now().floor("D") - pd.DateOffset(years=20)
to the following:
GLOBAL_DEFAULT_START = pd.Timestamp('1970-01-01')
Additional patch to allow backtesting before 1990
If you want to do backtesting prior to 1990 you will need to patch Zipline to handle that too since it is hard-coded to 1990.
Navigate to the zipline folder within site packages. This is typically located at C:\Users<your username>\miniconda3\envs\zip312\Lib\site-packages\zipline
Navigate to the subfolder utils.
Edit calendar_utils.py
Go to line 31 and change:
return ec_get_calendar(*args, side="right", start=pd.Timestamp("1990-01-01"))
to the following:
return ec_get_calendar(*args, side="right", start=pd.Timestamp("1970-01-01"))
Patch to allow calendars other than US calendars for backtesting
If you see the message "AssertionError: All readers must share target trading_calendar." then you probably need this patch. Our testing shows that AU and CA stocks users need this.
Navigate to the zipline folder within site packages. This is typically located at C:\Users<your username>\miniconda3\envs\zip312\Lib\site-packages\zipline
Navigate to the data subfolder and edit the file dispatch_bar_reader.py
Locate the code (around line 50)
assert trading_calendar == r.trading_calendar, (
"All readers must share target trading_calendar. "
"Reader={0} for type={1} uses calendar={2} which does not "
"match the desired shared calendar={3} ".format(
r, t, r.trading_calendar, trading_calendar
)
)
Change it to:
assert isinstance(trading_calendar, type(r.trading_calendar)), (
"All readers must share target trading_calendar. "
"Reader={0} for type={1} uses calendar={2} which does not "
"match the desired shared calendar={3} ".format(
r, t, r.trading_calendar, trading_calendar)
)
(For further details, see https://github.com/quantopian/zipline/issues/2684 - this has been an issue for some time and the original solution doesn't address the issue since there are actually two instances of the calendar - one from run_algorithm and one from the register_bundle within extension.py)
Patches for Australian equities
Of you are a Australian or Canadian Stocks user, in certain testing you will receive an error message during a backtest such as
ValueError: `minute` '1996-01-02 05:00:00+00:00' is not a trading minute. Consider passing `direction` as 'next' or 'previous'.
This is solvable by adding a patch to the Zipline's events.py file.
Navigate to the zipline folder within site packages. This is typically located at C:\Users<your username>\miniconda3\envs\zip312\Lib\site-packages\zipline
Navigate to the utils subfolder and edit the file events.py
Locate the code (around line 519)
def should_trigger(self, dt):
value = self.cal.minute_to_session(dt, direction="previous").value
return value in self.execution_period_values
Change it to:
def should_trigger(self, dt):
# is this market minute's period in the list of execution periods?
# For some reaosn, ASX requires direction to be "previous" or else this fails
if self.cal.name == "XASX":
value = self.cal.minute_to_session(dt, direction="previous").value
else:
value = self.cal.minute_to_session(dt, direction="none").value
return value in self.execution_period_values
Patches for Canadian equities
Of you are a Canadian Stocks user, you probably want to add this as a holiday:
On 17 Dec 2008, TSX had a major outage and was halted not long after the open, and never reopened. In general, the financial industry has written off this day as a bust for the purposes of data analysis.
The New Years observance shift to Monday only started in 2000.
Navigate to the exchange_calendars folder within site packages. This is typically located at C:\Users<your username>\miniconda3\envs\zip312\Lib\site-packages\exchange_calendars
Edit exchange_calendar_xtse.py
Add the following at line 95:
# Significant failures where TSX was, for practical purposes, closed for the entire day
TSXFailure20081217 = pd.Timestamp("2008-12-17")
Edit exchange_calendar_xtse.py
change the following at line 164:
return list(chain(September11ClosingsCanada))
to:
return list(chain(September11ClosingsCanada.tolist(),[TSXFailure20081217,]))
Backtest Assumptions
- Stocks are automatically set an auto_close_date of the last quoted date
- Futures are automatically set an auto_close_date to the earlier of following: 2 days prior to last trading date (for cash settled futures, and physically delivered futures that only allow delivery after the last trading date), or 2 trading days prior to first notice date for futures that have a first notice date prior to the last trading date.
Bundle Creation
Navigate to your Zipline local settings folder. This is typically located at c:\users\\.zipline
Add the following lines at the top of your Zipline local settings file - i.e. extension.py (:
Note: This is NOT the extension.py file inside the Anaconda3\envs\\lib\site-packages\zipline
from norgatedata import StockPriceAdjustmentType
from zipline_norgatedata import (
register_norgatedata_equities_bundle,
register_norgatedata_futures_bundle )
Then create as many bundles definitions as you desire. These bundles will use either a given symbol list, one or more watchlists from your Norgate Data Watchlist Library and (for futures markets) all contracts belonging to a given set of futures market session symbols.
Here are some examples with varying parameters. You should adapt these to your requirements.
register_norgatedata_equities_bundle has the following default parameters:
stock_price_adjustment_setting = StockPriceAdjustmentType.TOTALRETURN,
end_session = 'now',
calendar_name = 'NYSE',
excluded_symbol_list = None,
register_norgatedata_futures_bundle has the following default parameters:
end_session = 'now',
calendar_name = 'us_futures',
excluded_symbol_list = None,
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-aapl',
symbol_list = ['AAPL','$SPXTR',],
start_session = '1990-01-01',
end_session = '2020-12-01'
)
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-fang',
symbol_list = ['META','AMZN','NFLX','GOOGL','$SPXTR',],
start_session = '2012-05-18',
)
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-selected-etfs',
symbol_list = ['SPY','GLD','USO','$SPXTR',],
start_session = '2006-04-10',
)
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-sp500',
symbol_list = ['$SPXTR'],
watchlists = ['S&P 500 Current & Past'],
start_session = '1970-01-01',
)
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-russell3000',
watchlists = ['Russell 3000 Current & Past'],
symbol_list = ['$RUATR'],
start_session = '1990-01-01' ,
)
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-russell3000-exfroth',
watchlists = ['Russell 3000 Current & Past'],
symbol_list = ['$RUATR'],
start_session = '1990-01-01' ,
excluded_symbol_list = ['TSLA','AMZN','META','NFLX','GOOGL',]
)
register_norgatedata_futures_bundle(
bundlename = 'norgatedata-selected-index-futures',
session_symbols = ['ES','NQ','RTY'],
symbol_list = ['$SPXTR'],
start_session = '2000-01-01',
)
For more bundle examples, scroll down to "Books/publications that use Zipline, adapted for Norgate Data use" below and download the Trading Evolved examples.
To ingest a bundle:
zipline ingest -b <bundlename>
Benchmark against a symbol
To benchmark against an index, you should use add set_benchmark within the intialize function.
def initialize(context):
set_benchmark(symbol('$SPXTR'))
Pipelines - accessing timeseries data
Timeseries data has been exposed into Zipline's Pipeline interface. During a backtest, the Pipelines will be calculated against all securities in the bundle.
The following Filter (i.e. boolean) pipelines are available:
The following Factor (i.e. float) pipelines are available:
To incorporate these into your trading model, you need to import the relevant packages/methods:
from zipline.pipeline import Pipeline
from zipline_norgatedata.pipelines import (
NorgateDataIndexConstituent, NorgateDataDividendYield )
from zipline.api import order_target_percent, set_benchmark, attach_pipeline, pipeline_output
Then create a pipeline in your initialize method:
It is recommended you put your pipeline construction in its own function:
def make_pipeline():
indexconstituent = NorgateDataIndexConstituent('S&P 1500')
divyield = NorgateDataDividendYield()
return Pipeline(
columns={
'NorgateDataIndexConstituent':indexconstituent,
'NorgateDividendYield':divyield },
screen = indexconstituent)
Incorporate this into your trading system by attaching it to your initialize method. Note, for better efficiency, use chunks=9999 or however many bars you are likely to need.
This will save unnecessary access to the Norgate Data database.
def initialize(context):
set_benchmark(symbol('$SPXTR'))
attach_pipeline(make_pipeline(), 'norgatedata_pipeline', chunks=9999,eager=True)
Now you can access the contents of the pipeline in before_trading_start and/or handle_data by using Zipline's pipline_output method. You can exit positions not already in the
def before_trading_start(context, data):
context.pipeline_data = pipeline_output('norgatedata_pipeline')
def handle_data(context, data):
context.pipeline_data = pipeline_output('norgatedata_pipeline')
current_constituents = context.pipeline_data.index
for asset in context.portfolio.positions:
if asset not in current_constituents:
order_target_percent(asset,0.0)
Note: Access to historical index constituents requires a Norgate Data Stocks subscription at the Platinum or Diamond level.
Worked example backtesting S&P 500 Constituents back to 1990
This example comprises a backtest on the S&P 500, with a basic trend filter that is applied on the S&P 500 index ($SPX). The total return version of the index is also ingested ($SPXTR) for comparison purposes.
Note: This requires a Norgate Data US Stocks subscription at the Platinum or Diamond level.
Create a bundle definition in extensions.py as follows:
from zipline_norgatedata import register_norgatedata_equities_bundle
register_norgatedata_equities_bundle(
bundlename = 'norgatedata-sp500-backtest',
symbol_list = ['$SPX','$SPXTR',],
watchlists = ['S&P 500 Current & Past',],
start_session = '1990-01-01',
)
Now, ingest that bundle into zipline:
zipline ingest -b norgatedata-sp500-backtest
Inside your trading system file, you'd incorporate the following code snippets:
from zipline.pipeline import Pipeline
from zipline_norgatedata.pipelines import (
NorgateDataIndexConstituent,
NorgateDataDividendYield)
...
def make_pipeline():
indexconstituent = NorgateDataIndexConstituent('S&P 500')
return Pipeline(
columns={
'NorgateDataIndexConstituent':indexconstituent,
},
screen = indexconstituent)
def initialize(context):
set_benchmark(symbol('$SPXTR'))
attach_pipeline(make_pipeline(), 'norgatedata_pipeline', chunks=9999,eager=True)
def before_trading_start(context, data):
context.pipeline_data = pipeline_output('norgatedata_pipeline')
def handle_data(context, data):
context.pipeline_data = pipeline_output('norgatedata_pipeline')
current_constituents = context.pipeline_data.index
for asset in context.portfolio.positions:
if asset not in context.assets:
order_target_percent(asset,0.0)
Worked example backtesting E-Mini S&P 500 futures
This example created a continuous contract of the E-Mini S&P 500 futures that trade on CME on volume.
Create a bundle definition in extensions.py as follows:
from zipline_norgatedata import register_norgatedata_futures_bundle
bundlename = 'norgatedata-es-futures'
session_symbols = ['ES',]
symbol_list = ['$SPXTR',],
start_session = '2000-01-01'
register_norgatedata_futures_bundle(bundlename,start_session,session_symbols = session_symbols )
Now, ingest that bundle into zipline:
zipline ingest -b norgatedata-es-futures
Inside your trading system file, you'd incorporate the following code snippets:
def initialize(context):
set_benchmark(symbol('$SPXTR'))
af = context.asset_finder
markets = set([])
allcontracts = af.retrieve_futures_contracts(af.futures_sids)
for contract in allcontracts:
markets.add(allcontracts[contract].root_symbol)
markets = list(markets)
markets.sort()
context.universe = [
continuous_future(market, offset=0, roll='volume', adjustment='mul')
for market in markets
]
def handle_data(context, data):
hist = data.history(
context.universe,
fields=['close','volume'],
frequency='1d',
bar_count=250,
)
open_pos = {
pos.root_symbol: pos
for pos in context.portfolio.positions
}
contracts_to_trade = 5
for continuation in context.universe:
contract = data.current(continuation, 'contract')
order_target(contract, contracts_to_trade)
order_target(contract, -1 * contracts_to_trade)
if len(open_pos) > 0:
roll_futures(context, data)
Metadata
The following fields are available in the metadata dataframe: start_date, end_date, ac_date, symbol, asset_name, exchange, exchange_full, asset_type, norgate_data_symbol, norgate_data_assetid.
Norgate Data Futures Market Session symbols
To obtain just the futures market sessions symbols, you can use the norgatedata package and adapt the following code:
import norgatedata
for session_symbol in norgatedata.futures_market_session_symbols():
print (session_symbol + " " + norgatedata.futures_market_session_name(session_symbol))
Zipline Limitations/Quirks
- Zipline 2.4 and v3+ is hardcoded to ignore dates prior to 1990-01-01. It can be patched to 1970-01-01, but no cannot go any further since it uses the Unix Epoch (1970-01-01) as the underlying time storage mechanism.
- Zipline doesn't define all futures markets and doesn't provide any runtime extensibility in this area - you will need to add them to <your_environment>\lib\site-packages\zipline\finance\constants.py if they are not defined. Be sure to backup this file as it will be overwritten any time you update zipline.
- Zipline assumes that there are bars for every day of trading. If a security doesn't trade for a given day (e.g. it was halted/suspended, or simply nobody wanted to trade it), it will be padded with the previous close repeated in the OHLC fields, with volume set to zero. Consider how this might affect your trading calculations.
- Index volumes cannot be accurately ingested due to Zipline trying to convert large volumes to UINTs which are out-of-bounds for UINT32. Index volumes will be divided by 1000.
- Any stock whose adjusted volume exceeds the upper bound of UINT32 will be set to the maximum UINT32 value (4294967295). This only occurs for stocks with a lot of splits and/or very large special dsitributions.
- Some stocks have adjusted volume values that fall below the boundaries used by winsorize_uint32 (e.g. volume of 8.225255e-05). You'll see a warning when those stocks are ingested "UserWarning: Ignoring 12911 values because they are out of bounds for uint32". These are There's not much we can do here. For now, just ignore those warnings.
- Ingestion times could be improved significantly with multiprocessing (this would require Zipline enhancements)
- Zipline cannot handle negative prices (eg. Crude Oil in 2020) - any such prices will be set to zero. Most systems would have rolled prior to this strange event anyway.
Testing on Australian ASX data
By default, run_algorithm uses the 'NYSE' trading calendar. To backtest other markets, you need to specify the calendar. For the ASX, the calendar name is XASX.
At the top of your algorithm:
from exchange_calendars import get_calendar
In the run_algorithm call, add a trading_calendar= line, for example:
results = run_algorithm(
start=start, end=end,
initialize=initialize, analyze=analyze,
handle_data=handle_data,
capital_base=10000,
trading_calendar=get_calendar('XASX'),
data_frequency = 'daily',
bundle='norgatedata-spasx200',
)
Be sure to implement the patches shown above too
Testing on Canadian TSX data
By default, run_algorithm uses the 'NYSE' trading calendar. To backtest other markets, you need to specify the calendar. For the TSX, the calendar name is XTSE.
At the top of your algorithm:
from trading_calendars import get_calendar
In the run_algorithm call, add a trading_calendar= line, for example:
results = run_algorithm(
start=start, end=end,
initialize=initialize, analyze=analyze,
handle_data=handle_data,
capital_base=10000,
trading_calendar=get_calendar('XTSE'),
data_frequency = 'daily',
bundle='norgatedata-sptsx60',
)
Be sure to implement the patches shown above too
Books/publications that use Zipline, adapted for Norgate Data use
We have adapted the Python code in the following books to use Norgate Data.
Trading Evoled: Anyone can Build Killer Trading Strategies in Python.
Source code compatible with Zipline (Reloaded) in Jupyter notebook format, updated and refreshed to handle newer Zipline/Pandas/Numpy versions can be downloaded here:
https://norgatedata.com/book-examples/trading-evolved/NorgateDataTradingEvolvedExamples.zipline.310.zip
If there are other book/publications that use Zipline and worth adding here, let us know.
FAQs
During a backtest I receive an error ValueError: 'Time Period' is not in list. How do I fix this?
This can occur when the items in the bundle do not match the latest data in the Norgate Data database. For stocks, if there are symbol changes within the database then the bundle will have the old symbol but the Norgate database will have the new symbol. For Futures, there may have been additional futures contracts listed since your previous ingestion and the roll-over algorithm is trying to roll into them.
The solution is simple: Ingest the bundle with fresh data.
Also consider putting Norgate Data Updater into manual mode for updating and using the NDU Trigger app to explicitly start NDU and obtain updates. More information on this can be found here:
https://norgatedata.com/ndu-usage.php
During a backtest, an error message is shown for index_constituent_timeseries
For example, an error such as this is shown:
[2023-06-05 07:39:25.720989] INFO: Norgate Data: Populating NorgateDataIndexConstituent pipeline populating with $DJI on 3638 securities from 2000-01-03 to 2023-06-02....
[2023-06-05 07:39:34.734116] ERROR: Norgate Data: index_constituent_timeseries: DBD not found
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
(lots of irrelevant trace messages thereafter)
The Norgate Data database has been updated since you last performed an ingest, and there have been some symbol changes. In the above example, since the ingest occurred, DBD has been demoted to OTC and has a new symbol of DBDQQ.
The solution is simple: Ingest the bundle with fresh data.
Also consider putting Norgate Data Updater into manual mode for updating and using the NDU Trigger app to explicitly start NDU and obtain updates. More information on this can be found here:
https://norgatedata.com/ndu-usage.php
Change log
Released versions and release dates can be seen here:
https://pypi.org/project/zipline-norgatedata/#history
The CHANGES.TXT within the package details the changes A summary is also shown below.
Installing older versions
Older versions of Zipline-NorgateData can be installed easily using pip. For example, to install v2.0.17.
pip install zipline-norgatedata==2.0.17
Note that prior versions may be only suited to older versions of Zipline. However, due to the constantly evolving nature of Zipline and supporting modules (Pandas, Numpy etc.) we can only really support the current version.
Support
For support on Norgate Data or usage of the zipline-norgatedata extension:
Norgate Data support
Please put separate issues in separate emails, as this ensures each issue is separately ticketed and tracked.
For bug reports on Zipline Reloaded, report them on Stefan Jansen's Zipline Reloaded Github
There is also a Google Group, which isn't used much these days: Zipline Google Group.
Thanks
Thanks to:
- Andreas Clenow for his pioneering work in documenting Zipline bundles in his latest book Trading Evolved: Anyone can Build Killer Trading Strategies in Python. We used many of the techniques described in the book to build our bundle code. There are many excellent examples of how to implement various trading systems including trend following, counter trend following, momentum, curve trading and combining multiple trading systems together.
- Norgate Data alpha and beta testers. Without your persistence we wouldn't have implemented half of the features.
- The team that were formerly employed by Quantopian for developing and open sourcing Zipline
- Continued development efforts on Zipline and associated packages since Quantopian ceased, by Stefan Jansen, Mehdi Bounouar, Allan Coppola and Shlomi Kushchi and many more.
Recent Version History
v2.3.0 20220728 Bump to v2.3.0 due to version checking issue
v2.3.1 20221027 Added Mac M1/M2 workaround to docs
v2.3.2 20221114 Notes on Juneteenth holiday patch
v2.3.3 20230122 Notes on TSXFailures patch
v2.3.4 20230222 Update documentation to specify sqlalchemy<2
v2.4.0 20230506 Change to zipline-reloaded and pyfolio-reloaded via conda-forge
v2.4.1 20230507 Minor documentation fixes on installation method
v2.4.1 20230507 Documentation fixes
v2.4.2 20230507 Documentation fixes, revised Clenow scripts
v2.4.3 20230606 Revised information on extending backtests to 1970 by patching Zipline, better start/end session handling so that you don't have to be on an actual trading date
v2.4.4 20230709 Prevent TypeError: Cannot compare tz-naive and tz-aware timestamps on bundle ingest when normalizing start and end session dates
v3.0.0 20231005 Fix a few typos in the instructions, convert the deprecated Pandas fillna to ffill and bfill. Fully tested against Zipline v3, Added BlankCheckCompany pipeline
v3.0.1 20231005 Updated Trading Evolved scripts
v3.0.2 20240111 Added h5py to install instructions due to incorrect package dependency in conda-forge
v3.0.3 20240111 Docs
v3.1.0 20240927 Tested against Zipline v3.1.0
Testing against Python v3.12
Changed delisted Lumber contract to be LM so it doesn't overlap the new Lumber
Docs on pipline for import statements
Patch for event.py for ASX/CA users
Change installation to use pip for zipline-reloaded as Conda is not being updated as frequently.
Added Cboe CA as an exchange for Canadian users