Skip to content

Potential Bug in Signal Generation Logic in 02_vectorized_backtest.ipynb (Chapter 8) #333

@lansetaowa

Description

@lansetaowa

Hi,

I’d like to report a potential logic issue in the "Generate Signals" section of 02_vectorized_backtest.ipynb, which is part of Chapter 8: The ML4T Workflow - From Model to Strategy Backtesting.

Location
Notebook: 08_ml4t_workflow/02_vectorized_backtest.ipynb
Section: Generate Signals

Current Code
long_signals = ((predictions .where(predictions > 0) .rank(axis=1, ascending=False) > N_LONG) .astype(int))
short_signals = ((predictions .where(predictions < 0) .rank(axis=1) > N_SHORT) .astype(int))

Issue
The current logic generates a signal of 1 for assets ranked below the top N, rather than for the top N predictions. However, based on the context and subsequent backtest logic, a 1 in the signal matrix should indicate a trade signal (i.e., to long/short that asset).

In descending ranking (ascending=False), smaller rank values correspond to higher predictions. Therefore, the condition > N_LONG incorrectly selects the lower-ranked (i.e., less bullish) assets.

Suggested Fix
To select the top N long signals (i.e., assets with highest predicted returns), the logic should be:

long_signals = ((predictions .where(predictions > 0) .rank(axis=1, ascending=False) <= N_LONG) .astype(int))

Similarly, for short signals (lowest predictions):
short_signals = ((predictions .where(predictions < 0) .rank(axis=1, ascending=True) <= N_SHORT) .astype(int))

Why It Matters
Downstream code such as:
long_returns = long_signals.mul(fwd_returns).mean(axis=1)
short_returns = short_signals.mul(-fwd_returns).mean(axis=1)
assumes that a 1 in the signal matrix corresponds to an active long/short position. The current logic would instead assign 1 to lower-ranked predictions, which may significantly degrade the backtest result or produce misleading performance statistics.

The same logic appears in the paperback book as well.

Thanks again for this excellent open-source resource! This clarification could help future readers better align signal generation with the intended trading logic.

Please let me know your thoughts on this.

Best,
Saining Zhang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions