Round 49

Choose an Outlier Detection Approach

A data analytics team needs to detect abnormal customer transactions in a large, fast-moving dataset. The goal is to reduce fraud losses without overwhelming investigators with false positives. The data include transaction amount, merchant category, device signals, geography, time patterns, and a small amount of confirmed fraud labels. Constraints: labels are sparse and delayed, patterns change over time, explanations are required for each alert, and the system must run daily with moderate compute. Tradeoffs include precision versus recall, model interpretability versus adaptability, and whether to optimize for immediate operational load or broader fraud coverage.

Dashboard

Status

DECIDED

Winner

C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability.

Majority

C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability.

Duration

6355 ms

Topic

Options

C Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability.
3.00 of 3 PPV winner
A Deploy an unsupervised anomaly detection pipeline that flags deviations from peer behavior and recent baselines, minimizing dependence on scarce labels.
0.00 of 3
B Optimize the detection threshold for very high precision, accepting that many fraud cases will be missed to keep investigator workload tightly controlled.
0.00 of 3
D Train a supervised gradient-boosted model on historical labels, focusing on predictive accuracy and ranking suspicious transactions for investigators.
0.00 of 3
E Use a transparent rule-based scorecard built from domain heuristics and a few calibrated thresholds, prioritizing explainability and easy operations.
0.00 of 3

Checks

Baselines

PPV no gold answer
Majority no gold answer
Comparison PPV matched majority
k samples8
Agents3
Parsed samples 24 / 24
Quality unanimous

PPV

Top Influences

Claude Sonnet 5 · delegate 1.000
GPT 5.4 mini · delegate 1.000
Gemini 3 Flash Preview · delegate 1.000

Samples

Answer Distribution

Claude Sonnet 5 C: 8
GPT 5.4 mini C: 8
Gemini 3 Flash Preview C: 8

PPV

Influences

Label Role Score
Claude Sonnet 5 delegate 1.000
GPT 5.4 mini delegate 1.000
Gemini 3 Flash Preview delegate 1.000

Voting matrix

Constructed Delegation Matrix

To / From Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview
Claude Sonnet 5 0.0000.0000.000
GPT 5.4 mini 0.0000.0000.000
Gemini 3 Flash Preview 0.0000.0000.000
A 0.0000.0000.000
B 0.0000.0000.000
C 1.0001.0001.000
D 0.0000.0000.000
E 0.0000.0000.000

Embeddings

Sample Embeddings

24 samples
Claude Sonnet 5 sample #0 · C · 1536 dims C Claude Sonnet 5 sample #1 · C · 1536 dims C Claude Sonnet 5 sample #2 · C · 1536 dims C Claude Sonnet 5 sample #3 · C · 1536 dims C Claude Sonnet 5 sample #4 · C · 1536 dims C Claude Sonnet 5 sample #5 · C · 1536 dims C Claude Sonnet 5 sample #6 · C · 1536 dims C Claude Sonnet 5 sample #7 · C · 1536 dims C GPT 5.4 mini sample #0 · C · 1536 dims C GPT 5.4 mini sample #1 · C · 1536 dims C GPT 5.4 mini sample #2 · C · 1536 dims C GPT 5.4 mini sample #3 · C · 1536 dims C GPT 5.4 mini sample #4 · C · 1536 dims C GPT 5.4 mini sample #5 · C · 1536 dims C GPT 5.4 mini sample #6 · C · 1536 dims C GPT 5.4 mini sample #7 · C · 1536 dims C Gemini 3 Flash Preview sample #0 · C · 1536 dims C Gemini 3 Flash Preview sample #1 · C · 1536 dims C Gemini 3 Flash Preview sample #2 · C · 1536 dims C Gemini 3 Flash Preview sample #3 · C · 1536 dims C Gemini 3 Flash Preview sample #4 · C · 1536 dims C Gemini 3 Flash Preview sample #5 · C · 1536 dims C Gemini 3 Flash Preview sample #6 · C · 1536 dims C Gemini 3 Flash Preview sample #7 · C · 1536 dims C
Claude Sonnet 5 GPT 5.4 mini Gemini 3 Flash Preview

Samples

Agent Responses

Claude Sonnet 5 openai_compatible · anthropic/claude-sonnet-5
#0 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

A hybrid approach addresses the sparse/delayed labels by not relying solely on supervised learning, while rules provide explainability for known fraud patterns and the ML score adapts to emerging patterns for ambiguous cases. This balances precision/recall and...

#1 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The hybrid approach directly addresses the stated constraints: sparse/delayed labels favor unsupervised or rule-based detection for coverage, while explainability requirements favor rules, and adaptability to changing patterns favors ML scoring. Combining rule...

#2 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The system needs to handle sparse labels, evolving patterns, mandatory explanations, and moderate compute—no single approach (pure rules, pure ML, pure unsupervised) satisfies all constraints simultaneously. A hybrid design lets deterministic rules catch known...

#3 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The hybrid approach directly addresses the stated constraints: sparse/delayed labels favor combining rules with ML rather than pure supervised learning, evolving patterns require adaptability that pure rules lack, and explainability requirements are satisfied ...

#4 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The system requires handling sparse/delayed labels, adapting to changing patterns, providing explanations, and balancing precision/recall—no single approach satisfies all constraints alone. A hybrid system combining deterministic rules (for explainability and ...

#5 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The hybrid approach directly addresses the constraints: sparse/delayed labels (rules cover known patterns without needing extensive labels), evolving patterns (ML component adapts), required explanations (rules are inherently interpretable, and risk scores can...

#6 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The system requires handling sparse/delayed labels, evolving patterns, explainability, and balancing precision/recall—no single approach satisfies all constraints. A hybrid design lets clear-cut fraud be caught by transparent rules (satisfying explainability a...

#7 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario emphasizes sparse/delayed labels, evolving patterns, and mandatory explanations—conditions that favor combining deterministic rules (for known, explainable fraud patterns) with an adaptive ML risk score (for novel or ambiguous cases). This hybrid ...

GPT 5.4 mini openai_compatible · openai/gpt-5.4-mini
#0 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

A hybrid system is the best fit because it can handle sparse, delayed labels by using rules for known fraud patterns while still adapting with a machine-learning risk score for ambiguous cases. It also supports explanations for each alert, offers a practical b...

#1 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

A hybrid system is best here because the team has sparse delayed labels, needs explanations, and must adapt to changing fraud patterns while still keeping workload manageable. Rules handle known fraud patterns clearly, and a machine-learning risk score can cat...

#2 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The team has sparse and delayed labels, changing fraud patterns, and a need for explanations with moderate daily compute. A hybrid system best fits these constraints because it can capture known fraud via rules, adapt with a machine-learning score for ambiguou...

#3 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario has sparse and delayed labels, evolving fraud patterns, and a need for both explanations and manageable daily operations, so a purely supervised or purely unsupervised approach is less balanced. A hybrid system can capture known fraud with rules, ...

#4 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The team needs a method that works with sparse, delayed labels, adapts to changing fraud patterns, and still provides explanations for investigators. A hybrid system best fits these constraints because rules can cover known fraud patterns with clear explanatio...

#5 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario needs sparse-label learning, adaptation to changing patterns, and explanations for alerts, while still keeping false positives manageable. A hybrid system is best because rules handle known fraud patterns clearly, and a machine-learning risk score...

#6 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The data have sparse, delayed labels and changing patterns, so a purely supervised model is risky and a purely unsupervised one may generate too many false positives. A hybrid approach can use rules for known fraud patterns, add a machine-learning score for am...

#7 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The best choice is the hybrid approach, because it can use the sparse delayed labels where available while still handling new or changing fraud patterns through rules and a learned risk score. It also supports explanations for alerts and can be tuned to balanc...

Gemini 3 Flash Preview openai_compatible · google/gemini-3-flash-preview
#0 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario requires balancing sparse labels and evolving patterns with the need for explainability and operational efficiency. Option (C) addresses the lack of labels through a hybrid approach that uses rules for known threats and machine learning for emergi...

#1 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario describes a need for a system that handles sparse labels, evolving patterns, and the requirement for explainability while managing investigator workload. Option (C) is the most robust choice because it integrates heuristic rules for known threats ...

#2 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

To address sparse labels and changing fraud patterns while providing required explanations, a hybrid approach is the most effective. Combining deterministic rules for known threats with a machine-learning risk score allows the system to remain adaptable and in...

#3 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The problem requires balancing sparse/delayed labels, evolving fraud patterns, and the need for explainable alerts. Option (C) provides a balanced framework by using rules for immediate, known threats and machine learning to flag complex anomalies, ensuring bo...

#4 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The scenario requires balancing sparse labels and evolving patterns with the need for explainability and operational efficiency. **Option (C)** addresses these constraints by combining rules for known threats with a machine-learning score for emerging anomalie...

#5 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

A hybrid approach is the most effective choice because it combines the immediate explainability of rules for known patterns with the adaptability of machine-learning scores for evolving, "sparse-label" fraud. This strategy balances the need for high coverage (...

#6 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

The problem requires balancing sparse/delayed labels, changing fraud patterns, and the need for explainable alerts. A hybrid approach (Option C) allows the team to capture known threats with rules while using machine learning to adapt to emerging patterns and ...

#7 C · Use a hybrid system that combines rules for known fraud patterns with a machine-learning risk score for ambiguous cases, balancing coverage and explainability. 1536 dims

This problem requires a solution that addresses sparse labels, its evolving nature, and the need for explainability while managing investigator workload. A hybrid approach (Option C) leverages rules to capture known patterns and explainability requirements, wh...

Timings

Stages

Sampling5996 ms
Embedding345 ms
Signals13 ms
PPV1 ms

Errors

Agent and Round Errors

No errors recorded.