quant-stream

quant-stream is a factor research and backtesting framework built on Pathway, a stream processing engine for Python. Pathway handles the temporal semantics of financial data cleanly (rolling windows, cross-sectional ops, forward-bias prevention) without writing the plumbing by hand.

It was the alpha research layer of a larger agentic trading platform we built for Inter IIT Tech Meet 14: a full-stack system with real-time NSE filing signals, HMM-based regime detection, automated trade execution via Angel One, and a five-service Kafka-backed monorepo deployed to production at agentinvest.space. quant-stream sat inside that platform as the AlphaCopilot backend, but it works as a standalone library too.

quant-stream has five layers: a function library (50+ indicators), an expression language, a backtesting engine, ML model integration, and an LLM agent (AlphaCopilot) that generates and tests factor hypotheses in a loop.

the function library

All functions accept a pw.Table and return a pw.Table with a new column added. They're composable by chaining. The library is organized by operation type:

Cross-sectional (group by timestamp, operate across instruments): RANK, ZSCORE, SCALE, MEAN, STD, SKEW, MAX, MIN, MEDIAN

Rolling window (per instrument, over time): TS_MAX, TS_MIN, TS_MEAN, TS_STD, TS_CORR, TS_RANK, TS_MAD, PERCENTILE, DECAYLINEAR, and more

Technical indicators: SMA, EMA, MACD, RSI, BB_MIDDLE, BB_UPPER, BB_LOWER

Math / elementwise: DELTA (differences), DELAY (lag/shift), ABS, LOG, EXP, SQRT, POW

Rolling window operations use Pathway's UDF system. The pattern is: group by instrument, sort by timestamp, apply a sliding window via pw.apply_with_type. Cross-sectional ops use table.groupby(by_time).reduce(...). This maps cleanly to Pathway's model without needing pandas.

the expression language

Rather than writing Python to compose functions, you can write string expressions:

python

from quant_stream.factors import AlphaEvaluator
 
evaluator = AlphaEvaluator(table)
result = evaluator.evaluate("RANK(DELTA($close, 1))")

The expression parser uses pyparsing and handles operator precedence, nested expressions, and variable references ($close, $volume, etc.). A few translation rules:

$close + $open → ADD($close, $open) (variable + variable becomes a function call)
$close + 1 → kept inline (variable + constant is arithmetic)
&& / || → AND() / OR()

This is primarily useful for the AlphaCopilot integration: the LLM writes factor expressions as strings, not Python code.

YAML config

The whole pipeline can be declarative:

yaml

data:
  source: indian_stock_market_nifty500.csv
  date_range: [2020-01-01, 2024-01-01]
 
factors:
  - name: momentum_5d
    expression: "RANK(DELTA($close, 5))"
  - name: volume_zscore
    expression: "ZSCORE(TS_MEAN($volume, 20))"
 
strategy:
  type: topk_dropout
  topk: 20
  n_drop: 5
  method: equal
 
backtest:
  initial_capital: 1000000
  transaction_cost: 0.001
  short_funding_rate: 0.0002

quant-stream run --config strategy.yaml runs the full pipeline.

the backtesting engine

The core design decision: build the backtester on Pathway's streaming engine rather than pandas. This solves the "dual codebase" problem, where teams that backtest in batch and trade live in streaming end up with two implementations of the same logic that inevitably diverge. With Pathway, the same DELTA function runs identically on pw.demo.replay_csv (historical) and pw.io.kafka.read (live). Swapping from backtest to live trading is one line: the input connector.

A few other design decisions worth noting:

Forward bias prevention: Signals at time t use only data available at the close of t. Trades execute at t+1 close price. The engine maintains an execution_to_signal mapping to handle weekends and holidays; a signal on Friday maps to Monday's execution price.

Missing price handling: last_known_prices tracks the most recent price for each instrument. When price data is missing for a timestep (halted stocks, illiquid instruments), it uses the last known price rather than valuing the position at zero.

Short selling: The intraday_short_only mode squares off all short positions at day end, required for Indian market compliance. Short positions accrue a short_funding_rate (default 0.02%/day).

Capital reserves: A cost_reserve (default 2%) keeps a fraction of capital undeployed to ensure there's always enough to cover transaction costs.

Performance metrics include the usual suspects: total return, annualized return, Sharpe, Sortino, max drawdown, Calmar ratio, win rate, profit factor. For factor evaluation: IC (Pearson correlation between factor and forward returns), Rank IC (Spearman), ICIR, Rank ICIR.

ML integration

The run_ml_workflow function handles the full pipeline: factor computation, train/test split, model fitting, signal generation, backtesting. Supported models: LightGBM, XGBoost, RandomForest, Linear. The model_type=None path skips ML entirely and uses factor values directly as signals.

The data used is NIFTY 500 Indian stocks. The Alpha158 factor set (158 factors derived from OHLCV data) is used as a reference benchmark in the existing CSV outputs.

AlphaCopilot

This is the most interesting part. AlphaCopilot is a LangGraph agent that generates alpha factor hypotheses, constructs factor expressions, runs backtests, evaluates results, and iterates.

The graph:

plaintext

factor_propose
    ↓
factor_construct
    ↓
factor_validate  ←─────────── (syntax error? loop back)
    ↓
factor_workflow (runs backtest)
    ↓
feedback
    ↓
should_continue? ── yes → factor_propose
                └── no  → END

Each node is an LLM call:

factor_propose: Given a market hypothesis (e.g., "stocks with recent volume spikes tend to mean-revert"), generate 2-3 factor expressions.
factor_construct: Turn the hypothesis into concrete RANK(DELTA($close, 5)) style expressions.
factor_validate: Call the MCP server's expression parser. If the syntax is invalid, loop back to factor_construct with the error message.
factor_workflow: Submit a backtest job via MCP, poll until complete, get metrics.
feedback: LLM evaluates the Sharpe ratio, IC, drawdown against prior experiments. Decides whether to continue and what to try next.

The agent maintains a Trace history of (hypothesis, experiment, feedback) tuples across iterations. The LLM can see what it already tried and why it failed.

The agent generates only factor expressions. Data source, strategy (topk=20, n_drop=5), model type, and backtest parameters are pre-configured. This scopes the search space to what the LLM is actually good at: generating market hypotheses and translating them into expressions, without asking it to also tune hyperparameters.

In backtests on NIFTY 500 data, AlphaCopilot with Gemini 2.5 Pro + LightGBM achieved 65.06% total return, 87.44% annualized return, Sharpe 4.57 over the test period, vs. NIFTY 500 buy-and-hold at 15.0% TR and Sharpe 3.08. The best manually-constructed baseline (Alpha158 factors + LGBM) came in at 21.74% TR, Sharpe 1.46.

the MCP server

AlphaCopilot communicates with the backtesting engine via an MCP (Model Context Protocol) server. This lets any MCP-compatible client (Claude, other agents, custom tooling) run backtests as tools.

The server uses FastMCP + Celery + Redis. Tools: validate_factors (syntax check only), run_ml_workflow (submit a backtest job, returns a job_id), check_job_status, cancel_background_job. Resources expose documentation: available functions, alpha construction patterns, strategy configs.

The async job pattern matters here because backtests take seconds to minutes. The agent submits a job, gets a job_id, polls check_job_status, and processes results when ready. If Celery isn't available, the server falls back to synchronous execution with job_id="mcp_sync_fallback".

quant-stream serve starts both the MCP server and the Celery worker together.

experiment tracking

All runs log to MLflow. The Recorder class wraps the MLflow API: step-by-step metrics during the backtest loop, daily holdings as a CSV artifact, summary metrics at the end. Backend is SQLite (mlruns.db), no external tracking server needed.

current state

The core pipeline works. The get_metrics() method in Backtester returns placeholder values; only run() produces actual metrics. That's a known gap.

The broader platform's NSE filing pipeline (which ingests filings via Pathway with 50ms latency and scores them with an LLM) achieved 59.09% total return and Sharpe 4.94 over a 3-month paper trading window during Inter IIT Tech Meet 14, vs. NIFTY50 at 4.93% TR and BANKNIFTY at 8.94%. One finding from that pipeline: feeding the filing PDF directly to Gemini (multimodal) outperformed extracting text first, win rate went from 42% to 55%, Sharpe from 4.96 to 7.07.

The data is NIFTY 500 Indian stocks. Adapting to other markets requires swapping the data loader and adjusting for local conventions (settlement periods, short-selling rules, etc.).