Skip to main content
AnnouncementsBlog

Measuring Intelligence Summit at PyTorch Conference

By October 1, 2025No Comments

Measuring Intelligence Summit

The Measuring Intelligence Summit on October 21 in San Francisco, co-located with PyTorch Conference 2025, brings together experts in AI evaluation to discuss the critical question: how do we effectively measure intelligence in both foundation models and agentic systems? 

As AI systems become more capable and more widely deployed, evaluation methods must evolve just as rapidly. This half-day summit will cover key topics such as evaluating reasoning models, superintelligence, and the evolution of AI benchmarks. Attendees will gain insight into state-of-the-art evaluation methods, explore the challenges in assessing AI capabilities, and engage in discussions that will shape the future of AI evaluation from the experts leading the work in this field.

Top 3 Reasons to Attend

  1. Engage with leading voices in AI evaluation – Hear directly from researchers at OpenAI, Stanford, Meta, and more as they share insights into the latest methods for evaluating reasoning, intelligence, and agentic behavior in advanced AI systems.
  2. Be part of shaping the future of benchmarks – From debates on whether benchmarks truly capture intelligence to discussions on practical, real-world evaluation, you’ll gain a front-row seat to conversations that will guide how our community measures progress in AI.
  3. Connect with the leaders driving innovation – The summit offers a unique opportunity to meet others working at the intersection of research and application, building networks that extend beyond the conference and into the broader AI ecosystem.

Program Highlights

Keynotes

  • Framing the Frontier of Machine Intelligence – Joe Spisak, Meta
  • Chat about the SOTA in reasoning, planning, and inference time scaling and the new methods for how we measure intelligence in this new regime – Noam Brown, OpenAI in conversation with Joe Spisak, Meta

Sessions

  • Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers – Jon Saad-Falcon, Stanford
  • Holistic Evaluation of Language Models (HELM) – Yifan Mai, Stanford University
  • Scaling Agentic Intelligence from Pre-Training to RL – Aakanksha Chowdery, Reflection AI & Stanford University
  • LMArena: The Reliability Standard for AI – Anastasios Angelopolous, LMArena

Panels

Are We Measuring Intelligence or Just Benchmarks?

  • Sara Hooker
  • Vivienne Zhang, NVIDIA
  • Baber Abbasi, Eleuther AI
  • Nathan Habib, HuggingFace
  • Carlos Jimenez, Princeton University / SWE Bench

Beyond the Leaderboard: Practical Intelligence in the Wild

  • Shishir Patil, Meta
  • Haifeng Xu, ProphetArena / U.Chicago
  • Tatiana Shavrina, Meta
  • Lisa Dunlap, UCB / LMSys
  • Rebecca Qian, Patronus AI

Register by adding the Measuring Intelligence Summit to your PyTorch Conference registration.