That’s a lot of questions! ** 🙂 ** I don’t know whether you saw the MuttMix project’s [FAQs](https://iaabcprojects.org/faq/) page–it actually does have answers to several of them, such as how were the reference panel breeds chosen and where did that data come from, how do the genotyping and low-coverage sequencing work, how accurate is the breed-calling algorithm and how was that determined, what happens if one of the breeds in a dog isn’t included in the reference panel, etc. Not as much detail as you’re looking for in some cases, but there are answers there.
I know your post wasn’t in response to me, but I’m maybe a little puzzled as to what your intended line of questioning is? Those are good questions about the diversity of the reference panel, but at some points it almost comes across as if you’re making a kind of all-or-nothing argument, where either the reference panel must include all possible genetic signature variants across all regional strains and all recognized subtypes of every known breed, or else the results will be gibberish. But there’s an awful lot of middle ground between those things. If a test dog happens to be, say, 50/50 McNab/Dutch Shepherd (highly unlikely since both are rare, but just as an example), and neither breed were in the reference panel, then the results won’t be right-on-the-money accurate, but that doesn’t mean they could come out as any old random hodgepodge. The dog might for example get called as BC/GSD (very close relatives with extensive haplotype sharing), or as those two + some % “No Call,” or as those two + some % [other closely related breeds] + some % “No Call,” or simply as “No Call,” depending not only on the diversity of the reference panel, but also on the algorithm’s probabilistic modeling and the quantity and ancestry-informative value of the markers queried for the genotyping/sequencing. But the dog certainly wouldn’t get called as equal parts Poodle, Dal and Pug (no matter which currently available breed test you’re talking about); no way is there going to be sufficient matching for calls like that. If instead it were a megamutt, and Poodle, Dal and Pug were just calls on single-digit-percentages’ worth of its DNA, then in that case way-off-base false positives become significantly more possible, since with distant purebred ancestors like that you’re reduced to making inferences from tiny portions of DNA.
With Embark and Wisdom Panel results, off the top of my head, I’ve seen ABCA Border Collies test as purebred Border Collies, NSDR-registered ACDs test as purebred ACDs (no NSDR dogs were in the reference panels), UKC-registered APBTs test as AmStaffs (both tests have since added APBT to their reference panels), and an FCI-registered Saluki from Bahrain test as high-content Saluki with some “mixed”/indeterminate. (Embark also has the nice feature of including village/pariah dogs from several international locations in their reference panel, which in some cases has enabled them to ID imported street dogs from Asia and Europe that had came back 100% “mixed”/indeterminate with Wisdom Panel.) These are all just anecdotes, but the point is, while there’s no one-size-fits-all answer to “What would happen if…,” neither does any testing service’s algorithm default to eeny-meeny-miny-moe when slam-dunk matches are lacking.
Regarding using BYB dogs in a reference panel, I’d imagine that would be highly inadvisable in most breeds’ cases, due to increased risk that some of your samples won’t really be purebred.
Having been a shelter volunteer for 23 years, my own experience is that shelter mix guesses are more often than not based on the crudest, most seat-of-the-pants visual assessments–and we don’t, of course, make any attempt whatsoever to seriously consider the possibility that Happy might have 4+ breeds in him when formulating guesses; there’s no way the human brain can simultaneously evaluate the likelihood of that many theoretically possible combinations for producing a dog with the cumulative assortment of separately inherited physical and behavioral traits that you see in front of you. In reality, even just a basic knowledge of coat genetics alone will immediately make evident how many of the breed mixes you’ll see ascribed on e.g. Petfinder are at best incomplete, if even partially correct. (To turn your line of questioning around: Does the average shelter worker know what other patterns and colors BCs come in besides dominant black with Irish spotting–or even that those two are separately inherited traits, which are in turn masking other heritable colors? Does s/he know the difference between a field-bred Lab and a bench-bred Lab in conformation? Does s/he realize that 1st gen Lab mixes are almost always black regardless of the Lab parent’s color, that 1st-gen Poodle mixes usually have wire coats that can make them dead ringers for terrier mixes, that the unique Dalmatian spotting pattern is recessive so a 1st gen Dal mix can’t express it, that a 1st gen Pug mix can’t be black-and-tan? No to every one of those, at least in every shelter I’ve ever volunteered at–and if s/he doesn’t know those things, what are the odds that s/he has any meaningful idea at all what other potential mixes of “Top 50” breeds might produce a dog that looks like Happy?) I’d love to think that years of working with assorted shelter mutts and purebreds has made me an expert at ID’ing mixes by looks alone, but in reality it’s only driven home for me how hopeless that endeavor is, unless *maybe* we’re talking picking up on something with a really extreme phenotype being in the mix (sighthound, Chow, Pug, to some degree pit types)–and even then, at best that only tells me one breed or type.in there. And mutts can only beget mutts, so yes, it’s entirely plausible that the statistically typical US shelter mutt would be a multimix, even when it does happen to have one purebred parent.
IMO, one really cool thing about this study is the wealth of new ancestry-informative data the purebreds in it will contribute to future reference panels, due to the low-coverage sequencing.