6 May 2026·8 min read

We scored 77 GLP-1 vendors on 11 signals: here's what 18 months of data showed

We've been running the same 11-signal rubric against every GLP-1 telehealth and pharmacy operator we could find. 77 vendors across 19 regions, scored on the same axes. The pattern that emerged was not what we expected.

By Mira Tanaka

We've been running the same 11-signal rubric against every GLP-1 telehealth and compounded-pharmacy operator we could find since late 2024. Same 11 axes, same scoring methodology, same researcher hand-checking each entry. As of May 2026, the dataset is 77 vendors across 19 regions.

The pattern that emerged from sorting the data wasn't what we expected.

The rubric, in one paragraph

Each vendor gets scored 0-100 on 11 signals, then averaged into a final score. The signals split into two groups: clinical-quality signals (COA on every lot, cold-chain verified, Rx legality, compound identity, dose accuracy, endotoxin testing) and customer-relationship signals (refund posture, channel clarity, support quality, price transparency, longitudinal retention).

The full rubric is at /methodology with the per-signal definitions and scoring conventions. This post is about what 77 vendors looked like when run through it.

The headline numbers

Metric	Value

| Total vendors scored | 77 | | Regions covered | 19 | | Score range | 25 to 77 | | Median score | 69 | | Mean score | 66.8 |

Zero vendors scored above 80. Six scored below 50. The distribution is tight in the upper-middle band, not a long tail of gold-standard operators with everyone else trailing.

The verdict distribution:

Verdict	Count	Threshold

77% of vendors we scored ended up routable. Most operators in this market are not actively dangerous; they are reasonable-quality operators with specific weaknesses.

The pattern that surprised us

When we ranked the 11 signals by average score across all vendors, we expected the medical-quality signals to be the weak spot. The compounded peptide world has a reputation for variable quality, sketchy COAs, and cold-chain failures. We expected those to score worst.

The actual ranking, lowest mean to highest:

Rank	Signal	Mean score	Category

| 1 (worst) | Refund posture | 59.8 | Customer relationship | | 2 | Longitudinal retention | 62.0 | Customer relationship | | 3 | Price transparency | 68.2 | Customer relationship | | 4 | Endotoxin testing | 68.9 | Clinical quality | | 5 | Support quality | 75.1 | Customer relationship | | 6 | COA on every lot | 76.9 | Clinical quality | | 7 | Compound identity | 77.1 | Clinical quality | | 8 | Cold-chain verified | 78.2 | Clinical quality | | 9 | Dose accuracy | 78.3 | Clinical quality | | 10 | Channel clarity | 82.0 | Clinical quality | | 11 (best) | Rx legality | 82.5 | Clinical quality |

The five lowest-scoring signals are all customer-relationship signals (with one exception, endotoxin testing). The clinical-quality signals are clustered in the upper half.

The vendors are getting the medical part substantially right. Where they fall down is the trust infrastructure around the medical part: refunds, retention practices, price transparency, support response.

Why this finding holds up

The first reaction to "vendors are bad at refunds" is usually "well, of course, that's a margin question, not a quality question." The interpretation we landed on after sorting the data is more textured.

A vendor that has a clean refund posture is signaling several things at once. It signals that they expect their product to occasionally fail and they have a workflow for handling it. It signals that they trust their unit economics enough to absorb the cost. It signals that they have a customer service operation that can process refunds rather than ignore them.

A vendor that doesn't have refund clarity is often a vendor that hasn't built any of those things. The refund posture is a leading indicator of operational maturity broadly, not just an isolated policy choice.

The same logic applies to longitudinal retention. A vendor that tracks how long patients stay (and how many come back after a gap, and what their dose journey looks like) is a vendor that's running a business, not just executing a transaction. The vendors that score low on retention are the vendors that fundamentally don't care what happens after the first prescription fills.

Price transparency is the tell that's most legible to patients before they buy. A vendor with clear, single-month, all-doses-listed pricing is a different kind of operation than a vendor with hero pricing for the 2.5mg starter dose and ambiguous escalation. We discussed this pattern in vendor pricing page red flags.

Where the regional differences land

We ran the same scoring against vendors in 19 regions:

Region	Vendors	Median score

| United States | 13 | 67 | | United Kingdom | 7 | 70 | | Thailand | 6 | 71 | | Singapore | 6 | 72 | | United Arab Emirates | 6 | 68 | | Indonesia (Bali) | 6 | 64 | | Hong Kong | 5 | 71 | | Vietnam | 4 | 60 | | New Zealand | 3 | 73 | | Australia | 3 | 73 | | Mexico | 3 | 65 | | India | 3 | 58 | | Other (10 regions) | 12 | 64 |

The regional spreads are smaller than we expected. Singapore (72), Thailand (71), Hong Kong (71), New Zealand (73), Australia (73), and the UK (70) all cluster in the same band. The US (67) lands lower than the median of these regions, mostly because US compounded-leaning operators score lower on transparency than US brand operators.

What this challenges is the implicit assumption that "Western regulated" automatically means "higher quality." For specific subdomains (refund posture, price transparency), the Western regulatory environment doesn't appear to produce meaningfully better outcomes. The Australian and Singaporean private-clinic markets actually score better on customer-relationship signals than the US market.

The score ceiling

Zero vendors scored 80+. The highest-scoring vendor in the dataset hit 77. We expect this ceiling to break in the next 12-18 months as the market matures, but it hasn't broken yet.

What's holding the ceiling: it takes excellence on multiple axes to score above 80, and most vendors are excellent on 6-7 axes and merely adequate on the other 4-5. The vendors hitting 75+ are running coherent operations across most signals; the rare 80+ vendor would need to close the gaps on every dimension simultaneously.

The gaps that show up most often even in 70+ scoring vendors:

Refund posture stays surprisingly weak even at the top of the distribution. A 75-scoring vendor often has 50-60 on refunds.
Longitudinal retention is similarly weak. Most vendors don't have data infrastructure to even track patient retention systematically, let alone optimize for it.
Price transparency improves at higher score bands but plateaus. The vendors at 75+ are clear about pricing but rarely as clear as they could be.

These are the dimensions where the market has room to mature. They're also the dimensions patients can't easily evaluate before committing.

What we got wrong in the early scoring

We're not perfect at this. Some adjustments we made over the 18 months:

Initial COA scoring was too lenient. Early scoring rewarded vendors that produced any COA. We tightened to require per-batch COA from a third-party lab; some operators previously scored 80+ on this signal moved to 60.

Endotoxin testing scoring was too strict. Early scoring penalized any vendor without explicit endotoxin testing documentation. The reality is that brand pharmaceutical operators don't typically expose this documentation publicly; we recalibrated to credit the regulatory framework rather than requiring per-vendor disclosure.

Cold-chain assessment was uneven. Different reviewers were using different definitions of "cold-chain verified." We standardized on a four-criterion definition (temperature monitoring during transit, documented receipt protocol, replacement policy on cold-chain failure, and clinical staff training) and re-scored.

Channel clarity scoring rewards transparency, not just legitimacy. A clinic operating in a regulatory gray zone but openly disclosing the operating arrangement scored higher than we initially gave them credit for. The signal is "clarity," not "compliance."

The current 77-vendor dataset reflects these calibrations. The methodology is at /methodology; the per-vendor scoring is on the vendor catalog.

What this implies for patients

If you're choosing among GLP-1 vendors:

Don't optimize on clinical quality alone. Most vendors are clinically reasonable. The differentiation is mostly on customer-relationship signals, which is where you'll feel the difference month-to-month even if the medical outcome is similar.

Refund clarity is a strong leading indicator. A vendor that can describe their refund policy in one sentence is meaningfully different from a vendor that hedges. Test this in the intake call.

Price transparency is verifiable in 2 minutes. Look at the pricing page. If single-month, per-dose, no-bundling, no-hidden-consultation-fees pricing isn't visible without scrolling, the vendor is doing something that benefits them more than you.

Regional choice matters less than within-region choice. The variance within a region is larger than the variance across regions. A 75-scoring Bangkok vendor is meaningfully better than a 60-scoring Bangkok vendor, more so than a 70-scoring Bangkok vendor versus a 70-scoring London vendor.

Verdict matters more than score. A "routable" 65 vendor is meaningfully different from a "conditional" 65 vendor. The verdict captures something the raw score doesn't (specifically, fatal flags that aren't fully reflected in the average).

What we're doing with the dataset

The 77-vendor dataset updates monthly as we re-score existing vendors and add new ones. The data is public at the vendor catalog with per-vendor scorecards showing the full signal breakdown.

We're also using the longitudinal data to track where the market is moving. The mean score has increased about 4 points over the 18 months we've been running this. The improvement has been almost entirely in price transparency (which jumped 8 points) and channel clarity (up 5 points). Refund posture and longitudinal retention have not improved.

The market is becoming more transparent without becoming more accountable to its customers. That's an interesting and probably temporary state. We expect it to shift as competition intensifies; the vendors that figure out refund posture and retention will eat the share of the vendors that don't.

Methodology + corrections

The full rubric is at /methodology. The 11 signals, scoring conventions, and verdict thresholds are all documented there.

If you spot a vendor that's miscategorized or a scoring decision that's wrong, editor@panya.health is the right channel. We've corrected a few dozen scores based on reader emails over the 18 months; corrections that come with a specific source link get acted on the same week.

The dataset is also exposed via the /api/acp/products.json feed for agent-era discovery, with cryptographic panya_proof signatures on each vendor entry to prevent score-tampering downstream.

Share this post

Tags:methodology vendor-trust data rubric glp-1

Friday digest

One email a week. Catalog updates, new posts, BKK supply state. No spam, no MLM. What lands in the inbox →

We earn a small commission when you buy through recommended vendors. That is how this stays free. Vendors rank by quality signals, not paid placement.

About the editor

Mira Tanaka is the editor at panya, based in Bangkok. Editor at Panya. Covers peptide therapeutics with a focus on the routing decisions mainstream adults actually face. Corrections, tips, or push-back: editor@panya.health.

8 May 2026