Skip to content

Sample-size rules

mureo’s anomaly detector does not alert on every metric shift. It refuses to fire below a set of sample-size thresholds, because below those thresholds a single atypical day of traffic can move a rate metric enough to look like a genuine change.

This page documents the thresholds, the reasoning, and the operator override surface.

MetricMinimum sampleAlert direction
CPA (cost per acquisition)30 conversions per daySpike (up)
CTR (click-through rate)1000 impressions per dayDrop (down)
Zero spendNoneFlag on a previously-spending campaign

Below the minimum, the detector does not return an anomaly. The metric is surfaced under a monitor flag in the /daily-check report, without a recommended action.

Above the minimum, the detector applies a severity tier (see below).

CPA is a rate of dollars to conversions. At low conversion counts, a single atypical transaction moves the ratio enough to produce a false signal:

  • At 5 conversions, one outlier shifts CPA by roughly 20%.
  • At 10 conversions, one outlier shifts CPA by roughly 10%.
  • At 30 conversions, one outlier shifts CPA by roughly 3%.

Thirty is the point at which a single-day CPA reading no longer needs a story. Below that, “yesterday was weird” has to be the operator’s default assumption, not “something is wrong.”

Thirty is the floor used by the mureo-learning skill because it is conservative for most consumer and B2B accounts. A high-frequency ad account (large DTC) may set its local gate higher. A low-frequency account (enterprise B2B) may set it lower, accepting more noise in exchange for faster detection of real shifts.

CTR is a rate of clicks to impressions. Impressions are plentiful compared to conversions — an active campaign usually has thousands of impressions per day. The 1000 floor addresses a different failure mode: low-delivery days where the impression mix is dominated by a single audience segment, ad slot, or device tier, making the CTR number reflect the mix rather than creative fit.

Below 1000 daily impressions, CTR reads are suppressed because they are more likely a delivery artifact than a creative-quality signal. At 1000+, the mix evens out enough for CTR to mean what it usually means.

Zero spend is an absolute, not a rate. If yesterday the campaign spent non-zero, and today it spent zero, that is a signal regardless of sample size — something structural (budget cap, paused ad group, billing issue) has changed. The detector emits this as CRITICAL without any sample gate.

Severity tiers (for metrics that clear the gate)

Section titled “Severity tiers (for metrics that clear the gate)”

Once the sample gate is cleared, the detector assigns one of two tiers:

MetricHIGHCRITICAL
CPA spike≥ 1.5× baseline≥ 2.0× baseline
CTR drop≤ 0.5× baseline≤ 0.3× baseline

Two tiers, not five. A finer gradation would imply a precision the baseline math does not have.

  • HIGH — investigate before the next daily check; likely structural (bid change, new competitor, landing page break).
  • CRITICAL — pause-worthy without explanation; budget is actively burning against something that stopped working.

The comparison baseline is the median of the same metric over a recent window of snapshots for the same campaign. Median, not mean, because one bad day should not move the reference.

Baseline window and inclusion rules:

  • Window: last 14 daily snapshots for the same campaign.
  • Excluded: snapshots tagged with known-promotion, manual-intervention, or post-rollback (operator-flagged as not representative).
  • Minimum window size: 7 snapshots. Below this, no baseline is built; the metric is surfaced as building-baseline rather than evaluated.

Two supported overrides:

  1. Per-invocation override — the analysis.anomalies.check MCP tool accepts optional min_conversions and min_impressions parameters. Passing smaller values lowers the gate for that run only. Useful for niche accounts where 30 conversions is an unreasonable floor.
  2. Strategy-level override — setting anomaly.sample_size_override in STRATEGY.md changes the defaults for every workflow run against that account. The detector surfaces the override in every alert so downstream reviewers can see it was in effect.

Overrides adjust the sample-size floor only. They do not change the severity tier thresholds (1.5× / 2.0× / 0.5× / 0.3×). Those are fixed because adjusting them would change the semantics of HIGH and CRITICAL across operators.

The rules are tuned for the median account. They are wrong, and should be overridden, in at least these cases:

  • Known promotional pulse. A 48-hour flash sale that doubles CPA on hour two is the promotion working (high CPC auction, high volume), not a spike. Tag the snapshot known-promotion or teach the framework with /learn.
  • Attribution lag. View-through, app-install, and offline conversion imports arrive 1–7 days late. Same-day CPA reads inflate because the numerator is real but the denominator is partial. Apply the lookback-window suppression at the tool-invocation layer.
  • Sample-gate boundary. A genuinely-low-frequency account (enterprise B2B, high LTV, 5–10 conversions per day) needs a lower floor. Supply min_conversions=10 or similar and accept more noise in exchange for any signal at all.
  • mureo/analysis/anomaly_detector.py — implementation
  • mureo-learning skill — the statistical-thinking rule set the thresholds derive from
  • Blog: How AI agents misdiagnose CPA spikes — narrative introduction for the same material

Thresholds are pinned to the OSS release. mureo 0.9.21 (current at time of writing) uses the values on this page. The diagnostic knowledge base may retune the defaults as evidence accumulates; when that happens, the release CHANGELOG will reference this page and the change will be announced on the blog.