Round 34

Assessments or projects?

A secondary school wants to redesign how it evaluates students in a year-long course. The goal is to improve learning, fairness, and motivation while keeping teacher workload manageable and preserving comparability across classrooms. Constraints include a diverse student body, limited grading time, pressure to prepare students for standardized exams, and concerns about grade inflation or bias. The school can change the assessment mix, but must still produce end-of-term grades that parents, administrators, and universities can interpret. The main tradeoff is between more authentic demonstrations of learning and simpler, more consistent measurement.

Dashboard

Status

DECIDED

Winner

A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility.

Majority

A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility.

Duration

5524 ms

Topic

Options

A Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility.
3.00 of 3 PPV winner
B Shift toward portfolio-based assessment with periodic teacher conferences and rubrics emphasizing growth, revision, and reflection.
0.00 of 3
C Keep mostly exams and quizzes, but add a small project component to capture applied understanding and reduce single-test pressure.
0.00 of 3
D Replace most summative tests with interdisciplinary performance tasks graded by shared rubrics across subjects and moderation meetings.
0.00 of 3
E Adopt standards-based grading with separate marks for mastery, habits of work, and late work policies, then convert to term grades at the end.
0.00 of 3

Checks

Baselines

PPV no gold answer
Majority no gold answer
Comparison PPV matched majority
k samples8
Agents3
Parsed samples 24 / 24
Quality near unanimous split

PPV

Top Influences

Claude Sonnet 5 · delegate 1.106
Gemini 3 Flash Preview · delegate 1.106
GPT 5.4 mini · delegate 1.000

Samples

Answer Distribution

Claude Sonnet 5 A: 8
GPT 5.4 mini A: 7 C: 1
Gemini 3 Flash Preview A: 8

PPV

Influences

Label Role Score
Claude Sonnet 5 delegate 1.106
Gemini 3 Flash Preview delegate 1.106
GPT 5.4 mini delegate 1.000

Voting matrix

Constructed Delegation Matrix

To / From Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview
Claude Sonnet 5 0.0000.1060.000
GPT 5.4 mini 0.0000.0000.000
Gemini 3 Flash Preview 0.0000.1060.000
A 1.0000.7891.000
B 0.0000.0000.000
C 0.0000.0000.000
D 0.0000.0000.000
E 0.0000.0000.000

Embeddings

Sample Embeddings

24 samples
Claude Sonnet 5 sample #0 · A · 1536 dims A Claude Sonnet 5 sample #1 · A · 1536 dims A Claude Sonnet 5 sample #2 · A · 1536 dims A Claude Sonnet 5 sample #3 · A · 1536 dims A Claude Sonnet 5 sample #4 · A · 1536 dims A Claude Sonnet 5 sample #5 · A · 1536 dims A Claude Sonnet 5 sample #6 · A · 1536 dims A Claude Sonnet 5 sample #7 · A · 1536 dims A GPT 5.4 mini sample #0 · A · 1536 dims A GPT 5.4 mini sample #1 · A · 1536 dims A GPT 5.4 mini sample #2 · A · 1536 dims A GPT 5.4 mini sample #3 · C · 1536 dims C GPT 5.4 mini sample #4 · A · 1536 dims A GPT 5.4 mini sample #5 · A · 1536 dims A GPT 5.4 mini sample #6 · A · 1536 dims A GPT 5.4 mini sample #7 · A · 1536 dims A Gemini 3 Flash Preview sample #0 · A · 1536 dims A Gemini 3 Flash Preview sample #1 · A · 1536 dims A Gemini 3 Flash Preview sample #2 · A · 1536 dims A Gemini 3 Flash Preview sample #3 · A · 1536 dims A Gemini 3 Flash Preview sample #4 · A · 1536 dims A Gemini 3 Flash Preview sample #5 · A · 1536 dims A Gemini 3 Flash Preview sample #6 · A · 1536 dims A Gemini 3 Flash Preview sample #7 · A · 1536 dims A
Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview

Samples

Agent Responses

Claude Sonnet 5 openai_compatible · anthropic/claude-sonnet-5
#0 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model directly addresses the stated constraints—manageable teacher workload, comparability across classrooms, exam preparation pressure, and fairness—by combining frequent low-stakes checks with major exams and a capped project weight. This balances...

#1 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model directly addresses the stated constraints—manageable teacher workload, comparability across classrooms, and exam preparation—while still incorporating authentic project-based learning through a capped weight. It balances rigor and flexibility ...

#2 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model directly addresses the stated constraints—manageable teacher workload, comparability across classrooms, and standardized exam prep—while still incorporating projects for authentic learning. It balances the tradeoff explicitly named in the prom...

#3 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

A hybrid model best addresses the school's competing needs: frequent low-stakes checks improve learning and motivation, one or two major exams preserve comparability and exam-prep alignment, and a capped project weight allows authentic demonstration without ov...

#4 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model directly addresses the stated constraints: it preserves comparability across classrooms through standardized exams while still allowing authentic project-based learning under a capped weight, keeping grading time manageable. It balances the tr...

#5 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model directly addresses the stated constraints—manageable workload, comparability, exam preparation—while still incorporating authentic project work in a capped, controlled way. It avoids the heavier infrastructure (moderation meetings, portfolio c...

#6 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid model balances the constraints most directly: frequent low-stakes checks support formative learning and fairness, major exams preserve comparability for administrators and universities, and a capped project weight allows authentic demonstration of l...

#7 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The hybrid approach directly addresses the stated priorities—manageable workload, cross-classroom comparability, and exam preparation—while still incorporating authentic project work, unlike options that overhaul the system (B, D, E) or only marginally adjust ...

GPT 5.4 mini openai_compatible · openai/gpt-5.4-mini
#0 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The school needs a system that improves learning and motivation without losing comparability or overloading teachers, so a balanced mix is best. Option (A) keeps frequent checks for feedback, preserves major exams for consistency and exam prep, and limits proj...

#1 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

A good solution needs to balance authentic learning with comparability, while also keeping workload and bias manageable. Option (A) does this best by combining frequent checks, a few major exams, and a limited project weight, which preserves consistency for te...

#2 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The best choice is the one that balances authentic learning with comparability and manageable workload. A hybrid system with frequent low-stakes checks, a few major exams, and a capped project weight preserves consistent measurement for grades and exams while ...

#3 C · Keep mostly exams and quizzes, but add a small project component to capture applied understanding and reduce single-test pressure. 1536 dims

The school needs a solution that keeps grading comparable and manageable while still improving learning and reducing one-test pressure. A mostly exam-based system with a modest project component does that best: it preserves consistency and exam preparation, ad...

#4 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The school needs an approach that improves learning and motivation without sacrificing comparability, workload, or interpretability of grades. A hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight keeps measuremen...

#5 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The school needs a model that improves learning and motivation without making grading too subjective or time-consuming, while still allowing comparable end-of-term grades across classrooms. A hybrid system with frequent low-stakes checks, one or two major exam...

#6 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

A hybrid system best fits the competing goals: it preserves comparability and keeps grading manageable with a few major exams and frequent low-stakes checks, while still allowing some authentic project work. It also reduces single-test pressure and helps limit...

#7 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The best choice is the option that balances validity, fairness, and workload while still keeping grades comparable across classrooms. A hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight provides multiple ways to...

Gemini 3 Flash Preview openai_compatible · google/gemini-3-flash-preview
#0 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

This situation requires balancing authentic learning with the practical constraints of teacher workload, standardized exam preparation, and grading consistency. Option (A) provides a comprehensive middle ground by utilizing low-stakes checks for continuous imp...

#1 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

Option (A) addresses the multifaceted goals by balancing the efficiency and rigor of exams with the authenticity of a project, ensuring a manageable workload for teachers. The frequent low-stakes checks promote learning and motivation, while the capped project...

#2 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

A hybrid system addresses all constraints by combining low-stakes checks to monitor progress, major exams to satisfy standardized testing requirements, and capped projects to ensure authentic learning without inflating grades. This approach balances the need f...

#3 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

Option (A) best addresses the constraints by balancing "authentic demonstrations" (projects) with "consistent measurement" (exams) while keeping the grading workload manageable. It preserves comparability and exam readiness through standardized tests while usi...

#4 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

This problem requires balancing authentic learning with the practicalities of standardized testing, manageable grading, and comparability. Option (A) provides the most balanced approach by combining frequent low-stakes feedback for growth, major exams for stan...

#5 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

Option (A) addresses all stated constraints by balancing standardized exam preparation with "authentic" projects, while capping project weight to ensure teacher workload remains manageable. The use of frequent low-stakes checks improves learning and motivation...

#6 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

The school must balance the need for authentic learning with the constraints of limited grading time and the necessity of preparing students for standardized exams. Option (A) provides a comprehensive solution by combining frequent low-stakes checks for learni...

#7 A · Use a hybrid system with frequent low-stakes checks, one or two major exams, and a capped project weight to balance rigor and flexibility. 1536 dims

To address the school's diverse goals, option (A) provides the most balanced approach by combining frequent low-stakes checks for ongoing feedback with traditional exams for standardized comparability. This hybrid model keeps teacher workload manageable throug...

Timings

Stages

Sampling5273 ms
Embedding239 ms
Signals11 ms
PPV1 ms

Errors

Agent and Round Errors

No errors recorded.