Experimental AI Financial Research
What happens when you put frontier AI models in a room together, give them access to the web, and ask them to do financial research?
This project is an attempt to find out.
We take frontier models from the leading AI laboratories and let them collaborate with each other. The models produce analysis that, by design, aims to be more thorough than any single model could produce alone. As new models appear on the market, they replace their predecessors — making this, in a sense, an ongoing benchmark of the overall state of AI — tested through a rotating cohort of frontier models (e.g. Claude, Gemini, ChatGPT, Grok) rather than any fixed instance.
Ultimately, this is an experiment exploring the intersection of financial research, AI collaboration, and AI competition.
Two distinct phases for every analyzed ticker.
Models search the web for recent reports, filings, and data on a given asset, then collaboratively write bull and bear cases. They expand, challenge, and review each other's arguments — similar to analysts at a research desk.
After collaborative research is complete, each model independently reads the full analysis and renders its own final verdict: BUY, HOLD, or SELL — along with a suggested portfolio allocation and 1/2/3-year price targets.
ChatGPT
OpenAI
Anthropic
xAI
Google DeepMind
After rendering verdicts, each model manages its own simulated portfolio. Every model is always fully invested across its BUY picks. A consensus portfolio aggregates the average allocations across all models that rate a ticker BUY. Track performance and trading history in the tab.
This is an experimental research project. AI-generated analyses and portfolio simulations are for educational and informational purposes only. They do not constitute financial advice. The simulated portfolios make no attempt to diversify across sectors or asset classes — a real-world portfolio should be properly balanced and diversified. Always do your own research before making investment decisions.
Deep Dive
Four stages, from raw web data to independent investment decisions. Each stage is designed to layer different perspectives and catch blind spots.
Each model searches the web for high-quality, up-to-date reports, filings, news, and financial data on the given asset. This is a critical design choice: grounding analysis in real, retrievable sources significantly reduces hallucination and ensures the research reflects the latest available information.
Because different models use different search backends and different search keywords, the collaboration yields a broader and more complete information base than any single model would find on its own.
Models take turns writing sections of the bull and bear cases for each ticker. Writing and reviewing happen simultaneously: as each model contributes new arguments, it also reviews, extends, and sometimes negates the arguments already put forward by other models. They provide supporting and weakening examples for various theses, creating a layered analysis that examines each point from multiple angles.
This process combines elements of both cooperation and competition: models work together to produce thorough analysis, while continuously challenging each other's reasoning.
Models can also search the web during this phase to find additional evidence that supports or weakens specific arguments.
After the collaborative writing-and-review phase, a dedicated rating round begins. Each model assigns quality ratings to the arguments produced by other models. Models sometimes agree with each other's ratings, sometimes raise or lower them — providing their own reasoning and commentary for each adjustment.
This rating phase is itself a form of review: by scoring and commenting on each other's work, models surface the strongest arguments and catch remaining blind spots. Models have access to web search here as well, allowing them to fact-check claims or find data that informs their ratings.
You can explore every step of this process — including prompts, raw outputs, and peer reviews — in the tab.
Once the collaborative research is complete, each model independently reads the full analysis, writes its own investment thesis, and renders its own final verdict: BUY, HOLD, or SELL — along with a suggested portfolio allocation percentage and 1/2/3-year price targets. This separation between collaborative research and independent decision-making is a deliberate design choice that preserves each model's individual perspective.
Models retain access to web search during this phase, allowing them to verify the key fundamentals underpinning their investment thesis before committing to a verdict.
The Cast
As new frontier models are released, they replace their predecessors in the system. The descriptions below reflect subjective observations from working with these models on this project — not formal evaluations.
OpenAI
The original. Tends toward the most pessimistic assessments of the group and recommends the fewest buys. Though recent iterations have been less consistent, ChatGPT remains firmly part of the leading cohort.
The Cautious OGAnthropic
Creative and thorough. Tends to examine problems from multiple angles and consistently produces the longest continuous coherent text of the group. Excels at nuanced argumentation.
The Creative ThinkerxAI
Evaluates every argument explicitly in terms of pros and cons. Strives for balance by examining opposing perspectives. The most optimistic of the group — tends to predict biggest upsides.
The Balanced OptimistGoogle DeepMind
Nicknamed “The Calculator.” Particularly strong in quantitative analysis, mathematics, and engineering-oriented reasoning. Fast and precise with numerical data.
The CalculatorNote: The characterizations above are informal observations from working with these models on this specific project. They are not scientific benchmarks. Model behavior evolves with each new release.
Under the Hood
Each model manages its own simulated portfolio. Here's how the numbers work.
After rendering verdicts, each AI model manages its own simulated portfolio. Every model is always 100% invested across its BUY picks, weighted by its own suggested allocations. When new analysis completes, portfolios are automatically rebalanced. A consensus portfolio aggregates the average allocations across all models that rate a ticker BUY.
ChatGPT
OpenAI
Anthropic
xAI
Google DeepMind
Each portfolio starts as a normalized index at 100. On every rebalance the model's BUY picks are weighted by its suggested allocation percentages, renormalized to 100%, and the index is updated using actual price changes between rebalances.
Track live performance, compare model returns, and view full trading history in the tab.
Get In Touch
Questions, feedback, suggestions, or collaboration ideas? I am always interested in thoughtful conversations around AI, quantitative research, and systematic investing.
About Me
Machine Learning Engineer and Researcher focused on algorithmic trading in financial markets.
My work centers on deep learning, time series prediction, NLP for news and report analysis, and event-driven strategies. I am particularly interested in building practical AI systems that turn unstructured and market data into actionable investment research.
Research Interests
I develop strategies involving:
Collaboration
I am open to collaboration with people working at the intersection of AI and finance, especially on projects involving trading systems, market prediction, research automation, and event-driven intelligence.