Round 18

District AI Tutoring Rollout

A mid-sized school district (22 schools, 14,000 students, grades 3-12) has a one-time $2.4M grant to improve math outcomes over three years, after which ongoing funding must come from the regular budget (~$400K/yr sustainable). Test scores have stagnated and teacher shortages have left many classrooms with underqualified substitutes in math. The district must decide how to deploy the grant. Options differ in how directly they intervene in instruction, how much they depend on teacher buy-in, how equitably benefits reach struggling vs. advanced students, and how sustainable they are once grant funding ends. Board members, principals, teachers' union reps, and parent advocates all have different priorities: some want measurable short-term score gains, some want to protect teacher autonomy and jobs, some want equity for under-resourced schools, and some worry about over-reliance on unproven technology or vendor lock-in.

Dashboard

Status

DECIDED

Winner

F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort.

Majority

F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort.

Duration

10051 ms

Topic

Options

F Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort.
2.00 of 3 PPV winner
A Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide.
1.00 of 3
B Deploy an adaptive AI tutoring platform to all students district-wide, integrated into daily math class time, with heavy vendor support and data dashboards for teachers.
0.00 of 3
C Invest primarily in extending the school day with mandatory after-school math labs staffed by paraprofessionals, using only a small AI component for practice problems.
0.00 of 3
D Split the grant evenly between teacher professional development in math pedagogy and a lighter-touch, opt-in AI practice tool students can use for homework support.
0.00 of 3
E Use most of the grant to hire and retain certified math teachers and instructional coaches instead of technology, betting on stronger classroom instruction over software.
0.00 of 3

Checks

Baselines

PPV no gold answer
Majority no gold answer
Comparison PPV matched majority
k samples8
Agents3
Parsed samples 24 / 24
Quality split agent disagreement

PPV

Top Influences

GPT 5.4 mini · delegate 1.300
Gemini 3 Flash Preview · delegate 1.211
Claude Sonnet 5 · delegate 1.000

Samples

Answer Distribution

Claude Sonnet 5 A: 8
GPT 5.4 mini F: 7 A: 1
Gemini 3 Flash Preview F: 6 B: 2

PPV

Influences

Label Role Score
GPT 5.4 mini delegate 1.300
Gemini 3 Flash Preview delegate 1.211
Claude Sonnet 5 delegate 1.000

Voting matrix

Constructed Delegation Matrix

To / From Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview
Claude Sonnet 5 0.0000.0000.000
GPT 5.4 mini 0.0000.0000.300
Gemini 3 Flash Preview 0.0000.2110.000
A 1.0000.0000.000
B 0.0000.0000.000
C 0.0000.0000.000
D 0.0000.0000.000
E 0.0000.0000.000
F 0.0000.7890.700

Embeddings

Sample Embeddings

24 samples
Claude Sonnet 5 sample #0 · A · 1536 dims A Claude Sonnet 5 sample #1 · A · 1536 dims A Claude Sonnet 5 sample #2 · A · 1536 dims A Claude Sonnet 5 sample #3 · A · 1536 dims A Claude Sonnet 5 sample #4 · A · 1536 dims A Claude Sonnet 5 sample #5 · A · 1536 dims A Claude Sonnet 5 sample #6 · A · 1536 dims A Claude Sonnet 5 sample #7 · A · 1536 dims A GPT 5.4 mini sample #0 · F · 1536 dims F GPT 5.4 mini sample #1 · F · 1536 dims F GPT 5.4 mini sample #2 · F · 1536 dims F GPT 5.4 mini sample #3 · F · 1536 dims F GPT 5.4 mini sample #4 · F · 1536 dims F GPT 5.4 mini sample #5 · F · 1536 dims F GPT 5.4 mini sample #6 · F · 1536 dims F GPT 5.4 mini sample #7 · A · 1536 dims A Gemini 3 Flash Preview sample #0 · F · 1536 dims F Gemini 3 Flash Preview sample #1 · F · 1536 dims F Gemini 3 Flash Preview sample #2 · F · 1536 dims F Gemini 3 Flash Preview sample #3 · B · 1536 dims B Gemini 3 Flash Preview sample #4 · F · 1536 dims F Gemini 3 Flash Preview sample #5 · B · 1536 dims B Gemini 3 Flash Preview sample #6 · F · 1536 dims F Gemini 3 Flash Preview sample #7 · F · 1536 dims F
Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview

Samples

Agent Responses

Claude Sonnet 5 openai_compatible · anthropic/claude-sonnet-5
#0 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot allows the district to rigorously test AI tutoring's effectiveness, teacher buy-in, and equity impacts on a small scale before committing the full $2.4M district-wide, avoiding the risk of vendor lock-in or wasted funds on an unproven approach. ...

#1 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (A) directly addresses the stakeholders' core concern—unproven technology and vendor lock-in—by generating rigorous evidence before committing the full $2.4M district-wide, while still leaving room to redirect funds toward equity, PD, or staffin...

#2 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (A) lets the district gather rigorous evidence on effectiveness, teacher buy-in, and equity impact before committing the full $2.4M, avoiding vendor lock-in risk and unsustainable post-grant costs. It balances competing stakeholder concerns—auto...

#3 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (A) directly addresses the core tension: it generates rigorous evidence on effectiveness and equity impact before committing the full $2.4M to an unproven district-wide rollout, avoiding vendor lock-in and sustainability risk once grant funds ex...

#4 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (A) directly addresses the district's core uncertainties—unproven technology, vendor lock-in, and sustainability after grant funding ends—by generating rigorous evidence before committing to district-wide scale. It also balances competing stakeh...

#5 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (Option A) best balances competing priorities by generating rigorous local evidence on efficacy, equity impact, and teacher buy-in before committing the full $2.4M to an unproven district-wide rollout that risks vendor lock-in and unsustainable ...

#6 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased two-year pilot (A) allows the district to gather rigorous evidence on the AI tutoring platform's effectiveness, equity impact, and teacher buy-in before committing the full one-time grant to a district-wide rollout that must later be sustained on a mu...

#7 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

A phased pilot (A) directly addresses the tension between wanting measurable gains and avoiding over-reliance on unproven, expensive technology by generating rigorous evidence before committing the full $2.4M district-wide, which matters since only ~$400K/yr i...

GPT 5.4 mini openai_compatible · openai/gpt-5.4-mini
#0 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The best choice is the option that most directly improves math instruction for the students who need it most, while also being realistic about limited sustainable funding after the grant ends. Option F targets the lowest-performing students in the highest-pove...

#1 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The best choice is the option that maximizes likely math gains while fitting the district’s long-term budget and equity goals. A districtwide, heavily vendor-dependent rollout is risky, and a purely staffing-based plan may be hard to sustain after the grant en...

#2 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The most balanced choice is the option that directly addresses the shortage of qualified math instruction, targets the students with the greatest need, and is more likely to be sustainable after the grant ends. Option (F) focuses resources on the lowest-perfor...

#3 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The best choice is the option that most directly improves math instruction for the students who need it most, while also being realistic about limited ongoing funding and teacher shortages. Targeting the lowest-performing students in the highest-poverty school...

#4 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The district needs a plan that addresses the immediate instructional gaps, uses the grant where it can most directly raise math achievement, and remains financially sustainable after three years. Targeting the lowest-performing students in the highest-poverty ...

#5 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The district needs a plan that can produce measurable math gains quickly, address shortages, and still be sustainable after the grant ends. Option F targets the students most in need in high-poverty schools, pairing AI with human tutoring for stronger support ...

#6 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The district needs a plan that can produce meaningful math gains, especially for students most harmed by shortages and stagnation, while also being feasible to sustain after the grant ends. Option F is the strongest balance: it targets the students and schools...

#7 A · Run a phased two-year pilot in a handful of volunteer schools with rigorous evaluation before deciding whether to scale AI tutoring district-wide. 1536 dims

The best option is the one that balances evidence, equity, scalability, and sustainability under a temporary grant. A phased pilot in volunteer schools allows the district to test whether AI tutoring actually raises math scores before committing the full budge...

Gemini 3 Flash Preview openai_compatible · google/gemini-3-flash-preview
#0 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

Option (F) balances immediate intervention for teacher shortages with long-term fiscal sustainability by focusing resources on the students with the greatest need. This targeted approach addresses equity concerns and allows the district to maintain a high-impa...

#1 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

Option (F) balances immediate intervention for teacher shortages with long-term financial sustainability by targeting the students with the greatest need rather than thinning resources district-wide. This targeted approach addresses equity concerns, remains wi...

#2 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

The district faces a combination of acute teacher shortages, a need for equitable outcomes, and a long-term sustainability gap between grant funding and the regular budget. Option (F) addresses these by concentrating high-impact resources on the students with ...

#3 B · Deploy an adaptive AI tutoring platform to all students district-wide, integrated into daily math class time, with heavy vendor support and data dashboards for teachers. 1536 dims

To solve this district-scale challenge, the district must balance fiscal sustainability, the urgency of teacher shortages, and the need for equitable math improvement. Option (B) leverages the grant to address immediate instructional gaps caused by underqualif...

#4 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

This decision requires balancing immediate student needs, financial sustainability, and the reality of teacher shortages. Option (F) addresses the district's equity goals by targeting resources toward low-performing students in high-poverty schools while combi...

#5 B · Deploy an adaptive AI tutoring platform to all students district-wide, integrated into daily math class time, with heavy vendor support and data dashboards for teachers. 1536 dims

The district faces a combination of a one-time funding surge ($2.4M), a lower long-term budget ($400k), and chronic teacher shortages. Option (B) provides the most immediate, scalable solution to address the lack of qualified teachers across 22 schools while b...

#6 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

Option (F) balances immediate intervention for teacher-depleted schools with the fiscal reality of the $400K/yr long-term budget by narrowing the scope. It addresses equity by prioritizing high-poverty students and mitigates "unproven technology" concerns by p...

#7 F · Target the AI tutoring platform only at the lowest-performing quartile of students in the highest-poverty schools, paired with small-group human tutoring for the rest of that cohort. 1536 dims

Option (F) balances immediate impact on math outcomes with long-term fiscal sustainability by concentrating resources on students with the highest need rather than over-extending the $400K annual budget. This targeted approach addresses equity concerns, mitiga...

Timings

Stages

Sampling9384 ms
Embedding655 ms
Signals11 ms
PPV1 ms

Errors

Agent and Round Errors

No errors recorded.