← All posts
·6 min read

What we score wrong: scoring mistakes Panya has made and the fixes

The 11-signal rubric is public. The audits behind every score are run by humans + agents. Both make mistakes. This is the running list of scoring calls we got wrong, what we corrected, and what the editorial process changes in response.

The 11-signal rubric is public. The audits behind every score are run by humans plus agents. Both make mistakes. This post is the running errata · the scoring calls Panya has gotten wrong since launch, what we corrected, and what the editorial process changes in response.

We publish this for the same reason we publish the rubric · the alternative is to pretend our judgment is infallible, and that is the position every "best-of" affiliate site takes that we do not want to be.

What "wrong" means here

A score is wrong when one of three things is true:

1. The signal score does not reflect the documented evidence. We rated a vendor's COA at 70 when the public COA is from 2024 and the lots being sold are 2026. The data was there; we missed it. 2. The verdict does not match the score. A vendor scored 78 (clearly routable on the math) was tagged conditional because we had a stale read on a regulatory event that had since resolved. 3. The audit timing was off. A vendor's score was current in February but a March enforcement action shifted the picture; we did not refresh the scorecard for 4 weeks. The score was right when written and stale when read.

We do not consider it "wrong" when:

  • A vendor disputes a scoring methodology · the methodology page is the open conversation for that.
  • A vendor's score is low because their channel structurally cannot achieve a high score (research-chem vs Rx pharmacy on rx-legality). The score reflects the channel; the channel choice is the vendor's.
  • Reasonable people disagree on weighting between two signals.

The corrections

Listed in chronological order. We update this post as new corrections happen rather than burying them in commit messages.

2026-04-26 · Bangkok Peptides COA score (corrected up)

The mistake: scored COA at 50 in initial audit, citing "self-reported only." Subsequent verification confirmed Bangkok Peptides was using Janoshik third-party testing on most of their peptide product line; only the SARM line was first-party-tested.

The correction: COA bumped to 65 with the per-line distinction noted in the per-signal note. Total score moved from 56 to 58.

Process change: when a vendor sells across multiple compound classes (peptides + SARMs + other), the audit now explicitly distinguishes per-line testing posture rather than averaging across the catalog.

2026-04-29 · Vendor catalog rubric vs methodology page rubric (drift)

The mistake: the /methodology page documented one 11-signal taxonomy (Identity, Cold-chain, COA, Packaging, Shipping, Response time, Review sentiment, Returns, Price, Promo, Years in operation) while every actual scorecard at /vendor used a different one (coa, cold-chain, rx-legality, compound-id, dose-accuracy, endotoxin, refund, channel, support, pricing, retention). A user reading the methodology page and clicking through to a vendor saw two different rubrics.

The correction: aligned /methodology to the canonical vendor-scores taxonomy in PR #33. Then aligned the /blog/11-signals-vendor-rubric explainer post to the same taxonomy in PR #37. Three sources of rubric truth now share one taxonomy.

Process change: any "this surface explains the rubric" page must use the slugs from `vendor-scores.ts` directly, not paraphrased equivalents. The "deepen X" sprint pattern that surfaced the drift is now codified as a heuristic · check internal consistency before adding new content.

2026-04-30 · Hold-verdict vendor CTA pointing at dead site

The mistake: when a vendor's verdict was "hold" because the vendor was offline (Amino Asylum after the FDA enforcement action), the scorecard still rendered a "Visit Amino Asylum →" primary CTA. Clicks went to a 404. The page contradicted itself · the verdict pill said hold, the CTA said visit.

The correction: hold-verdict scorecards now render a "Not routable · on hold" pill in place of the visit CTA. Routable + conditional unchanged. Shipped in PR #30.

Process change: any UI element that drives traffic to a vendor must be conditional on verdict. The "is this CTA appropriate for this verdict" check is part of every vendor-page UX review.

2026-05-02 · Region detection silently broken since launch

The mistake: `detectRegion()` expected Fastly to forward `fastly-client-country-code` (or one of 5 fallback header keys) to the Railway runtime. The diagnostic shipped in PR #36 confirmed Fastly forwards none of them. Every visitor since launch was falling through to Accept-Language and getting tagged "other" instead of their actual region. The matchmaker treated a Thai-resident as a generic visitor and routed accordingly.

The correction: server-side IP-geo lookup as fallback in PR #38, with 24h in-memory cache and 1.5s timeout. Verified live: Thai IPs now resolve to the Thailand cohort.

Process change: production verification after every infrastructure-touching deploy. Typecheck + tests prove the code compiles; only a real production probe proves the infrastructure plumbing works.

2026-04-30 · Empower Pharmacy "April 2025 FDA warning letter" weighting

The mistake: initial Empower Pharmacy scorecard treated the April 2025 FDA warning letter as a single-signal overhang affecting `cold-chain` only. The warning letter actually cited deficiencies across sterile-drug production (cold-chain), microbial-excursion investigation (endotoxin), AND labeling omissions (compound-identity). The scorecard underweighted the impact across the related signals.

The correction: rebalanced the per-signal weighting in the Empower scorecard so the warning letter's documented scope is reflected in coa (75 → from 80), endotoxin (70 → from 80), and compound-id (75 → from 80) plus the existing cold-chain hit. Final score moved from 65 to 62.

Process change: regulatory enforcement events (warning letters, inspection findings, lawsuits) now get a structured per-signal-impact map at audit time rather than a free-form note attached to one signal.

What this means for users reading scorecards

If you spot a score that does not match what you have learned about a vendor from other sources:

1. Check the per-signal notes. The score itself is a summary; the notes carry the reasoning. If the reasoning is missing the data point you have, that is the conversation to have. 2. Check the audit date. Scorecards do drift between audits. The next-most-recent audit date is on every scorecard. 3. Email partner@panya.health with the evidence. We re-score on documented evidence. We do not re-score on volume or pressure.

We update this post each time a correction lands. The git history of `apps/web/lib/vendor-scores.ts` is the underlying record; this page is the editorial summary.

Why publish this

Because the alternative is "trust us" · and we have established that we do not want to be trusted on our word. We want to be trusted on the rubric, on the audits, on the corrections when we get something wrong, and on the editorial process that catches the mistakes.

Other "best-of" affiliate sites do not publish their scoring mistakes. That is not because they do not make mistakes. It is because the affiliate-revenue model rewards looking infallible and punishes admitting fallibility. We have a different revenue model (flat-fee, no pay-to-rank) and a different bar.

If you are a vendor reading this and you think one of your signal scores is wrong, the methodology page has the dispute path. We have changed scores based on documented evidence before. We will again.

Share this post
Tags:methodologyeditorial-stancerubricoperationserrata

We earn a small commission when you buy through recommended vendors. That is how this stays free. Vendors rank by quality signals, not paid placement.