10 to the 23 AI logo
Stephen Lieberman
Through 1023AI
← Back to About on 1023.ai

Applied Operations

Live Market Operations

Eight years. Live capital. Real consequence. A validated proof of concept for the core ideas underlying AI safety at scale.

The central argument of this work, that complex adaptive systems produce emergent behavior that static evaluation frameworks cannot anticipate, is easy to state and difficult to prove. Most AI safety discourse operates at the level of theory, simulation, or controlled experiment. What is rare is a live, multi-year, financially consequential test of the same thesis.

From 2015 to 2023, that test existed.

The CASS agent-based simulation framework, developed over more than a decade for defense and national security applications, was extended to live US equity index derivatives markets. The specific target: the emergent cascading failure patterns produced when human traders and automated execution systems interact under volatility stress. The nonlinear amplification dynamics that make these markets dangerous in dislocating conditions are structurally identical to the dynamics that make capable AI systems dangerous at scale.

The system identified those dynamics in advance. It traded them. And over eight years, including some of the most volatile market conditions in recent history, it worked.

Methodology

Simulation Before Capital

The operation began not in the markets but in a completely simulated environment. Three years of simulation-only work, from 2012 to 2015, preceded any live capital deployment. During that period, the agent-based framework was used to model the interaction dynamics of human traders and automated execution systems under a range of market conditions, including high-volatility regimes, liquidity dislocations, and cascading price amplification events.

The simulation framework tested not just whether the system was profitable in historical conditions, but whether the underlying behavioral model, how market participants, human and automated, interact during stress, was structurally sound. Only when the simulation results were consistent, robust across regimes, and theoretically grounded in the complex systems literature did live capital deployment begin.

This sequence matters. It is precisely the pre-deployment validation discipline that AI safety governance demands and that most AI deployments skip.

Deployment Sequence

Step 1

Simulation (2012 to 2015)

Agent-based modeling of human-automated market dynamics. No live capital. Validation of behavioral model across regimes.

Step 2

Live Deployment (2015)

Capital deployed only after simulation results met consistency and robustness thresholds.

Step 3

Continuous Validation (2015 to 2023)

Live results continuously tested against simulation predictions. System refined iteratively.

Operating Environment

A Formally Structured Operation

The operation was run with the institutional discipline its complexity required. Exchange access was obtained at the institutional level, with multi-platform infrastructure across futures and options on ES (S&P 500) and NQ (Nasdaq 100) contracts, among the most liquid and competitive derivatives markets in the world. Legal structure was established at the outset with specialized counsel. Tax architecture was managed annually given the technical complexity of Section 1256 treatment, mixed straddle positions, and multi-platform reporting across instrument types.

Custom execution and forecasting systems were coded in Java and Python, coordinating with professional-grade platforms. Data inputs included price and volume, order flow, market internals, volatility surfaces, derived technical analyses, and sentiment analysis using text and language processing, both live and historical.

Operating Parameters

Markets traded

ES (S&P 500 futures) and NQ (Nasdaq 100 futures) and futures options

Active period

2015 to 2023

Simulation period preceding live deployment

3 years (2012 to 2015)

Regulatory engagement

CFTC, SEC, CME, and CBOT sessions on cryptocurrency futures and derivatives framework development

Performance

Results Across Market Regimes

The most meaningful test of a systematic trading system is not its performance in favorable conditions. It is whether it holds in conditions specifically designed to destroy it. 2020 was that test. The COVID crash of March 2020 was one of the fastest and most severe market dislocations in modern history. Many systematic strategies that had performed well in stable conditions failed or were severely damaged during this period. The results below are for 2020. They are verified.

Verified 2020 Performance

Verified across three independent institutional platforms · Tax-record verifiable

41.52%

Annualized return, futures book

ES and NQ futures and futures options only

49.65%

Annualized return, integrated portfolio

Futures book plus long equity options positions

Buy-and-hold return for the same period: 3.01% · Futures book outperformed passive exposure by 38.51 percentage points while deployed less than 2% of total trading hours.

Return Retracement Ratio

10.32

Combined

Sharpe Ratio

0.88 / 1.05

Futures / Combined

Sortino Ratio

1.17 / 1.35

Futures / Combined

Time in market

1.72% of total trading hours

Directional bias

Profitable long and short across every measured period

1.72% time in market is not an incidental characteristic. It is the risk control mechanism.

Capital was deployed in live futures and options positions for 1.72% of total trading hours in 2020. The remaining 98.28% of the time, the system held no positions. A passive long investor in the S&P 500 was exposed to every minute of the March 2020 crash, a -34% decline in 33 days. The system was not in the market for the overwhelming majority of that period, because the behavioral patterns the simulation framework identified were not present. When they were, the system entered. When they resolved, it exited. That selectivity, not position sizing or stop-losses, is what produced a Return Retracement Ratio of 10.32 against a backdrop of one of the worst dislocations in modern market history.

Metric Definitions

Reward to Risk Ratio (RRR)

Average winning trade size divided by average losing trade size. Above 1.0 means winners exceed losers in magnitude. Above 1.5 is considered strong for systematic strategies.

Gain to Pain Ratio (GPR)

Sum of all returns divided by the absolute sum of all losses. Above 1.0 is good; above 2.0 is exceptional. A GPR of 48 reflects a strategy that was almost never in the market and, when it was, won by a very large margin.

Return Retracement Ratio

Total return divided by the maximum retracement (peak-to-trough drawdown) during the period. A ratio of 10.32 means the system generated 10.32 times its worst drawdown as return. Values above 3.0 are considered strong; 10.32 is exceptional.

Sharpe / Sortino Ratio

Risk-adjusted return metrics. Sharpe divides excess return by total volatility; Sortino divides by downside volatility only. Values above 1.0 are generally considered good. The Sortino being higher than the Sharpe indicates the system's volatility was predominantly upside.

Futures RRR and GPR by Instrument (2019 and 2020)

YearInstrumentReward to Risk RatioGain to Pain Ratio
2019NQ (Nasdaq 100 futures)4.6948.11
2019ES (S&P 500 futures)1.3316.10
2020NQ (Nasdaq 100 futures)1.320.49
2020ES (S&P 500 futures)1.191.35

All figures represent live futures and futures options trading results. 2020 includes the March COVID crash. Positive RRR was maintained across both ES and NQ throughout. The GPR values for 2019 (48.11 for NQ, 16.10 for ES) reflect the combination of high selectivity and exceptional trade quality in a high-momentum environment.

The Through-Line

The Same Framework, Applied to Two Domains

The connection between eight years of live derivatives trading and AI safety governance is not an analogy. It is a methodological continuity.

The CASS framework was designed to model complex adaptive systems: environments where large numbers of heterogeneous agents interact, produce emergent collective behavior, and generate risks that no individual component was designed to create. It was built first for conflict dynamics and national security applications. Then it was extended to financial markets. The underlying structure of the problem is identical in both cases: human agents and automated systems interacting under stress, producing nonlinear amplification dynamics that static evaluation frameworks cannot anticipate.

AI systems at scale are the third domain. The dynamics are the same. A capable AI system interacting with human organizations, under deployment pressure, in messy institutional contexts, produces emergent behavior that was not present in controlled testing. The robustness gap between nominal safety and real-world resilience is the financial stability problem restated at the level of machine intelligence. Emergent misalignment is cascading failure with longer time horizons and more diffuse consequence.

What the derivatives work produced is not just evidence of profitable trading. It is an eight-year empirical record of governing a complex adaptive system under conditions where the feedback was immediate, the consequence was financial, and the theory had to be right before capital was risked. That discipline, simulation first, validation before deployment, continuous testing of the behavioral model against live results, is the discipline AI safety governance requires and that most AI deployments currently lack.

The system worked. The framework is the same. The application is larger.

Partnership

Ongoing Research and Partnership

This research program did not end in 2023. The theoretical and methodological work that produced the trading system continues to develop in the context of AI safety and sociotechnical systems governance. The agent-based modeling of human-automated interaction dynamics, the simulation-first validation methodology, and the empirical understanding of emergent failure in complex systems under stress are directly applicable to the governance challenges of capable AI deployed at scale.

The most consequential next step is extending this work into AI-relevant domains with the resources and institutional context to do it rigorously. That requires the right organizational partner.

What Partnership Could Look Like

Research collaboration applying the CASS methodology to AI deployment dynamics and emergent misalignment

Corporate-funded research program with joint publication rights

Embedded research leadership within an AI safety or governance function

Advisory engagement with access to deployment data and institutional context

The work speaks for itself. If it is relevant to what your organization is navigating, let's talk.

Start a Confidential Conversation →