Round 67

District AI Tutoring Rollout Strategy

A mid-sized school district has been piloting an AI-powered tutoring tool that supplements classroom instruction in math and reading. Pilot results show modest average score gains but uneven effects across student subgroups, some teacher pushback over reduced instructional autonomy, and rising costs for licenses and tech support. The district must decide how to proceed with a limited budget, a mandate to close achievement gaps, and pressure from parents both excited about personalized learning and worried about screen time and data privacy. The school board needs a clear policy before the next budget cycle, balancing equity, cost, teacher buy-in, and evidence quality.

Dashboard

Status

DECIDED

Winner

E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week.

Majority

E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week.

Duration

6264 ms

Topic

Options

E Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week.
1.99 of 3 PPV winner
C Pause expansion for a year to run a rigorous randomized evaluation before committing further funding.
1.01 of 3
A Expand the pilot only to schools with the lowest performance metrics, targeting resources at students furthest behind.
0.00 of 3
B Scale the AI tutoring tool district-wide immediately, reallocating funds from other enrichment programs to cover licensing costs.
0.00 of 3
D Shift funds instead toward hiring additional human tutors and teaching assistants, treating the AI tool as unproven and lower priority.
0.00 of 3
F Let individual schools opt in or out based on their own staff and parent votes, decentralizing the decision entirely.
0.00 of 3

Checks

Baselines

PPV no gold answer
Majority no gold answer
Comparison PPV matched majority
k samples8
Agents3
Parsed samples 24 / 24
Quality split agent disagreement

PPV

Top Influences

Gemini 3 Flash Preview · delegate 1.415
Claude Sonnet 5 · delegate 1.247
GPT 5.4 mini · delegate 1.106

Samples

Answer Distribution

Claude Sonnet 5 C: 7 E: 1
GPT 5.4 mini E: 5 A: 2 C: 1
Gemini 3 Flash Preview E: 8

PPV

Influences

Label Role Score
Gemini 3 Flash Preview delegate 1.415
Claude Sonnet 5 delegate 1.247
GPT 5.4 mini delegate 1.106

Voting matrix

Constructed Delegation Matrix

To / From Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview
Claude Sonnet 5 0.0000.2470.000
GPT 5.4 mini 0.1060.0000.000
Gemini 3 Flash Preview 0.1060.2470.000
A 0.0000.0000.000
B 0.0000.0000.000
C 0.7890.0000.000
D 0.0000.0000.000
E 0.0000.5071.000
F 0.0000.0000.000

Embeddings

Sample Embeddings

24 samples
Claude Sonnet 5 sample #0 · C · 1536 dims C Claude Sonnet 5 sample #1 · C · 1536 dims C Claude Sonnet 5 sample #2 · C · 1536 dims C Claude Sonnet 5 sample #3 · C · 1536 dims C Claude Sonnet 5 sample #4 · C · 1536 dims C Claude Sonnet 5 sample #5 · C · 1536 dims C Claude Sonnet 5 sample #6 · C · 1536 dims C Claude Sonnet 5 sample #7 · E · 1536 dims E GPT 5.4 mini sample #0 · E · 1536 dims E GPT 5.4 mini sample #1 · E · 1536 dims E GPT 5.4 mini sample #2 · A · 1536 dims A GPT 5.4 mini sample #3 · C · 1536 dims C GPT 5.4 mini sample #4 · A · 1536 dims A GPT 5.4 mini sample #5 · E · 1536 dims E GPT 5.4 mini sample #6 · E · 1536 dims E GPT 5.4 mini sample #7 · E · 1536 dims E Gemini 3 Flash Preview sample #0 · E · 1536 dims E Gemini 3 Flash Preview sample #1 · E · 1536 dims E Gemini 3 Flash Preview sample #2 · E · 1536 dims E Gemini 3 Flash Preview sample #3 · E · 1536 dims E Gemini 3 Flash Preview sample #4 · E · 1536 dims E Gemini 3 Flash Preview sample #5 · E · 1536 dims E Gemini 3 Flash Preview sample #6 · E · 1536 dims E Gemini 3 Flash Preview sample #7 · E · 1536 dims E
Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview

Samples

Agent Responses

Claude Sonnet 5 openai_compatible · anthropic/claude-sonnet-5
#0 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved evidence quality, a full-scale rollout (B) or defunding (D) is premature, while decentralizing (F) risks equity gaps and inconsistent data privacy standards across schools. A one-year rigorous ran...

#1 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved evidence quality, immediate district-wide scaling (B) or fully abandoning the tool (D) are premature; decentralized opt-in (F) risks inequitable access and inconsistent evidence gathering. A one-y...

#2 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and cost/equity concerns, the district needs stronger evidence before scaling and should preserve teacher-led instruction rather than replacing it. A rigorous randomized evaluation (C) directly addresses "eviden...

#3 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved cost/privacy concerns, immediate district-wide scaling (B) is premature and risky, while completely abandoning the tool (D) ignores its modest positive gains. A one-year pause for rigorous randomi...

#4 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven pilot effects, teacher pushback, and unresolved evidence quality, a district-wide rollout or full defunding would be premature; the responsible next step is to gather rigorous causal evidence before committing further budget. A one-year randomized...

#5 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved cost/privacy concerns, immediately scaling district-wide (B) or fully deprioritizing the tool (D) both overreact to incomplete evidence. A one-size-fits-all mandate ignores the mandate to close ac...

#6 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved data-privacy/cost concerns, immediate district-wide scaling (B) or fully abandoning the tool (D) both overreact to incomplete evidence. A rigorous year-long randomized evaluation (C) directly addr...

#7 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

Given uneven subgroup effects, teacher pushback, and unresolved cost/privacy concerns, immediate district-wide scaling (B) or fully deprioritizing the tool (D) are premature given the modest but real gains observed. A rigorous evaluation (C) is prudent but del...

GPT 5.4 mini openai_compatible · openai/gpt-5.4-mini
#0 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The pilot shows only modest gains and uneven subgroup effects, so a full district-wide rollout would be premature, especially with budget pressure and teacher concerns. The best immediate policy is to protect teacher-led instruction while using the AI as a lim...

#1 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The pilot shows some promise, but the effects are uneven and the costs, teacher concerns, and privacy worries are still unresolved. A full district-wide rollout is too risky, while a total pause or full rejection would ignore the evidence of modest gains; the ...

#2 A · Expand the pilot only to schools with the lowest performance metrics, targeting resources at students furthest behind. 1536 dims

The pilot shows only modest gains and uneven subgroup effects, so expanding immediately district-wide would be risky given the limited budget and teacher concerns. A targeted expansion to the lowest-performing schools best aligns with the mandate to close achi...

#3 C · Pause expansion for a year to run a rigorous randomized evaluation before committing further funding. 1536 dims

The pilot shows only modest gains and uneven subgroup effects, so a full district-wide rollout is too risky given the budget and parent concerns. A pause for a rigorous randomized evaluation would best address the evidence gap before committing more funds, whi...

#4 A · Expand the pilot only to schools with the lowest performance metrics, targeting resources at students furthest behind. 1536 dims

The pilot shows only modest benefits and uneven subgroup effects, so a full district-wide rollout is too risky given the budget and parent concerns. A targeted approach to the lowest-performing schools best aligns with the mandate to close achievement gaps whi...

#5 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The pilot shows only modest gains and uneven subgroup effects, so scaling district-wide would be premature, but waiting a full year for a new evaluation may not meet the budget and equity pressures. A targeted hybrid approach can preserve teacher-led instructi...

#6 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The pilot shows some promise, but the effects are uneven, the costs are rising, and teachers and parents have legitimate concerns about autonomy, screen time, and privacy. Since the district also has a mandate to close gaps and needs stronger evidence before l...

#7 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The district needs a decision that respects the evidence so far: modest gains, uneven subgroup impacts, and real concerns about cost, autonomy, and privacy. A hybrid rollout preserves teacher-led instruction while using AI as a supplement, which addresses buy-...

Gemini 3 Flash Preview openai_compatible · google/gemini-3-flash-preview
#0 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

Option (E) addresses the core tensions by ensuring AI acts as a supplement rather than a replacement, thereby mitigating teacher pushback regarding instructional autonomy. By capping usage hours, it responds to parental concerns about screen time while maintai...

#1 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

Option (E) addresses the core tensions by ensuring AI remains a supplemental tool, thus preserving teacher autonomy and addressing concerns about excessive screen time. This hybrid approach balances the personalized learning benefits for students with the need...

#2 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The district must address multiple conflicting factors, including uneven student outcomes, teacher concerns regarding autonomy, and parental anxiety over screen time. Option (E) provides a balanced approach by integrating the tool as a supplement rather than a...

#3 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

A hybrid model (E) addresses the mandate to close gaps while mitigating teacher pushback by preserving instructional autonomy and preventing AI from replacing human-led groups. It directly tackles parental concerns regarding screen time through a usage cap and...

#4 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

To select the optimal strategy, the district must balance the mandate to close achievement gaps and support personalized learning with concerns regarding costs, teacher autonomy, and excessive screen time. Option (E) addresses teacher pushback by framing the t...

#5 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

The district faces a need to balance student achievement gaps and teacher concerns with budget constraints and parent anxiety regarding screen time. Option (E) addresses the "uneven effects" and "teacher pushback" by ensuring the tool supports rather than repl...

#6 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

Option (E) addresses the core conflicts by ensuring AI remains a supplemental tool, protecting teacher autonomy and small-group instruction while limiting screen-time concerns through usage caps. This hybrid approach balances the personalized learning benefits...

#7 E · Adopt a hybrid model where AI tutoring supplements but does not replace any teacher-led small-group instruction, capping usage hours per week. 1536 dims

This strategy addresses the primary concerns regarding teacher autonomy and screen time by ensuring that the AI tool remains a supplement to, rather than a replacement for, human instruction. By capping usage hours, it limits screen-time exposure and mitigates...

Timings

Stages

Sampling5210 ms
Embedding1046 ms
Signals7 ms
PPV1 ms

Errors

Agent and Round Errors

No errors recorded.