Why the seven numbers diverge
Read side by side, the vendor docs turn the mystery into a definition problem. The numbers vary on four axes.
Four axes the citation numbers diverge on: what counts as a citation, which LLMs are sampled, how often and dedup rules, and which languages are queried
1. What counts as a citation
This is the big one. Every tool counts a different thing and calls it the same word.
Profound counts a citation only when the answer includes a clickable source link pointing at your domain. Strict, and useful for attribution. It misses any mention where the model talks about your brand without linking. Peec AI counts any mention of your brand name in the answer text, link or no link. So if Perplexity says "Ken Imoto wrote a useful guide on voice AI," that is a citation to Peec, even with no link. That is why their number is the biggest. Otterly AI counts a cited URL, like Profound, but de-duplicates per query per day, which crushes the number down. Bluefish AI is really running a share-of-voice calculation against competitors, so its "citations" number reads closer to a rank than a count. Scrunch counts both brand mentions and source links with no dedup, which lands it in the middle-high range. Semrush only counts when your domain shows up in the URL field of the structured answer, the strictest reading. My Python script counts whatever I tell it to, which today is "the brand string appears in the answer text, deduped per query, three samples averaged."
This split is not specific to me. The 2026 tooling guides now draw the same line: brand mentions are how often a model says your name, citations are when it links or attributes a source. Some platforms (Profound, Peec AI, AthenaHQ) break out explicit versus implicit citations at the URL level; others report brand-level visibility only. Pick any two definitions and they will not agree. That is the field not having a shared standard yet.
2. Which LLMs they sample
No tool covered all five engines I cared about. Peec AI samples all five, which gives it more surface area and is part of why its number is highest. Scrunch samples only ChatGPT and Perplexity, which makes its high number more interesting: more citations from fewer surfaces. If you only care about ChatGPT, your choice of tracker matters less. If you care about Gemini or Claude, you can cross half the list off immediately.
3. How often they sample
Most tools run each query daily. Some run weekly. Otterly runs daily but deduplicates inside a 24-hour window, so a brand mentioned five times in one day counts once. Peec AI runs daily and counts each mention on its own. Over fifteen days and twelve queries, that compounds fast.
4. Whether they sample in your languages at all
I publish in four languages. Most trackers default to English-only sampling unless you configure language sets by hand. Peec AI gave me the most useful multilingual number because it queries in 115+ languages by default. The rest mostly ignored my PT and ES content, which is why their numbers undercount what is actually happening in Brazilian and LatAm search.
Pick the definition, then pick the tool
After two weeks staring at this, I think "which tracker is most accurate" is the wrong question. There is no ground truth for AI citations. Every LLM is a black box that returns slightly different answers to the same prompt depending on time, region, and which datacenter you hit. There is no Search Console for this.
The right question: which definition of "citation" maps to the business outcome you actually care about?
- Want attribution traffic (someone clicks a link)? Use Profound or Otterly. They count linked citations only. The numbers stay small, but they map to GA4 referrer events you can verify.
- Want brand presence (the model says your name, link or not)? Use Peec AI. The number looks generous, but it is the closest proxy to "ChatGPT says my name out loud."
- Want competitive positioning? Use Bluefish or Scrunch. Both run competitor sets natively.
- Want the truth on a budget? Write your own script. Mine is 200 lines of Python around the OpenAI, Anthropic, and Perplexity APIs, runs about 8ドル a month, and hands me raw answer text to grep through, which the commercial tools mostly hide behind charts.
Until the field agrees on a shared definition, every vendor keeps counting differently under the same word. A shared taxonomy would fix this: a standard for what "citation", "mention", and "source link" mean across tools, so the numbers become comparable. The Citation Signals work at llmoframework.com is one attempt at exactly that vocabulary.
What I actually run
Honest answer: two trackers, not seven.
I kept Otterly because it is cheap and its strict definition lines up with what I can verify in GA4. If Otterly says I got cited and GA4 shows a referrer click, I trust both. I kept my own Python script because it hands me raw text and I can change the definition tomorrow if I want.
I dropped the rest. Not because they are bad. Because paying 499ドル a month for a number I could not reconcile against a 29ドル tool was making me dumber, not smarter.
If you are about to spend money on an AI-citation tracker, do this first: write down what "citation" means to you, in one sentence. Then ask each vendor whether their definition matches yours. Most will not answer cleanly. That is your answer.
I wrote a book about exactly this measurement problem, including the Python script I use and the GA4 setup that pairs with it: LLMO: AI Search Optimization.