View
7
Download
0
Category
Preview:
Citation preview
Supplementary information
Title: Canine transmissible venereal tumor genome reveals ancient introgression from coyotes to pre-contact dogs in North America Xuan Wang1, Bo-Wen Zhou1,2, Melinda A. Yang3, Ting-Ting Yin1, Fang-Liang Chen4, Sheila C. Ommeh5, Ali Esmailizadeh6, Melissa M. Turner7, Andrei D. Poyarkov8, Peter Savolainen9, Guo-Dong Wang1,2,10, Qiaomei Fu3,11, Ya-Ping Zhang1,2,10 1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China 2Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650223, China 3Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing, 100044, China 4Kunming Police Dog Base of the Ministry of Public Security, Kunming, 650204, China 5Animal Biotechnology Group, Institute of Biotechnology Research, Jomo Kenyatta University of Agriculture and Technology, Nairobi 00200, Kenya 6Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran 7Department of Forestry and Environmental Resources, Fisheries, Wildlife, and Conservation Biology Program, North Carolina State University, Raleigh, NC 27695, USA 8Severtsov Institute of Ecology and Evolution, Russian Academy of Science, Leninskiy prospect, 33, Moscow, 119071, Russia 9Department of Gene Technology, KTH-Royal Institute of Technology, Science for Life Laboratory, Tomtebodavägen 23A, Solna, 17165, Sweden 10Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China 11Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing, 100044, China These authors contributed equally: Xuan Wang, Bo-Wen Zhou, Melinda A. Yang Correspondence: Guo-Dong Wang (wanggd@mail.kiz.ac.cn) or Qiaomei Fu (fuqiaomei@ivpp.ac.cn) or Ya-Ping Zhang (zhangyp@mail.kiz.ac.cn) File Description: Supplementary Note, Supplementary Methods, Supplementary References and Supplementary Figures
Supplementary Note
Recent whole genome sequencing (WGS) studies of ancient and modern canids all
indicate an intricate history, and with high population turnover during the course of
their evolutionary history1-9. These findings are in large part due to increased
availability of ancient genomic sequences, which are a powerful resource for
elucidating past demographic history of different species. Ancient genomic studies
have extensively refined our understanding of genetic history and adaptive evolution
in humans10, 11 and the evolutionary history of domestication in livestock12 and crops13.
Thus, the sequencing of more ancient canid genomes will play a pivotal role in
clarifying the genetic history of canine evolution and dog domestication14. However,
ancient genomes need not always derive from past specimens.
The canine transmissible venereal tumor (CTVT) is the oldest known somatic cell line,
originating from cancer cells transmitted from a host to other canids during the mating
process15. Since it was shown ten years ago that living cells from an ancient host
could be transmitted among canids, the origin of CTVT has been studied
continuously16. Multi loci genetic analyses indicate that all CTVT cells are derived
from a single neoplastic clone of an original founder individual that lived many
generations ago17. Thus, CTVT cells can be treated as “living fossils”, whose genetic
material can provide insight on the founder and its population. Studies first narrowed
the CTVT founder (the original canid infected with CTVT) to a spitz type dog or
wolf17, and later phylogenomic analyses further indicated that the CTVT founder is
potentially an Arctic sled dog15, 18. However, horizontal transfer of mitochondrial
DNA (mtDNA) from infected dogs into the CTVT cells has occurred at least five
times19, making the maternal genealogy of the CTVT founder untraceable. Recent
comparison of the CTVT genetic data with a more comprehensive canine reference
panel including pre-contact dogs (PCDs) from North America argued that the CTVT
founder is the closest detectable lineage to PCDs, and that this clade possessed
introgression from wild canids in North America1.
However, these previous studies may not take into account several potential biases in
the genotyping methods for CTVT samples and the strategy for collecting loci. First,
contamination of host cells will dilute the ratio of reads from tumor cells. Massive
copy number variation (CNV) resulting from chromosomal instability in early somatic
evolution15, 20, 21 can result in dilution (deletion) or concentration (duplication) of
reads from tumor cells. These factors can lead to genotyping errors when using direct
germline calling methods or a rigid variation allele fraction (VAF) interval1, 15, 18
(Supplemental information, Methods). Second, each CTVT genome represents a
complex mixture of entities – systematic errors, alleles inherited by the founder,
lineage-specific somatic mutations, and earlier somatic mutations. Somatic mutations
resulting in polymorphic genotypes help to test whether tumor cells originated from a
single founder or multiple clonal origins, as observed for facial tumors found in
Tasmanian Devils22, but their inclusion biases evolutionary analyses. For instance, in
phylogenetic analyses, branch lengths18 are overestimated and the likelihood of long
branch attraction increases1. In population genetic analyses, somatic mutations add
extreme outliers in a principal component analysis (PCA)1 and potentially affect the
significance of several statistical tests due to increased sharing between somatic
mutant alleles and unrelated germline populations. Previous studies, while aware of
this complexity, still used multiple CTVT samples to confirm whether these samples
have a single origin and to search the origin of the founder alongside other germline
samples simultaneously in the genetic analyses1, 15, 18. Third, uneven sample sizes and
different levels of sequencing depth of the reference panel may also bias tests of
admixture23. Previously used genomic reference panels often contain unbalanced
sample sizes for different sub-populations of dogs and wild canids. And most village
dogs24, except those in East Asia, were sequenced to a low to middle depth of less
than 10.
We collected new CTVT samples and modern canids, and then used newly developed
tool and refined strategy to address these biases. Two new CTVT samples used in the
present study, named KM1 and KM2, were obtained in Kunming, China
(Supplementary information, Fig. S1 and Table S2). WGS was performed on host
(20 depth) and tumor (40 depth) tissues of each sample. We also included WGS
data for three previously published CTVT samples from Australia, Brazil15, and
Gambia1. Together, these five CTVTs from four continents allow us to exclude
lineage-specific somatic mutations.
Along with the accumulation of genome-wide canine data, estimation of CTVT’s
origin has gradually improved1, 15, 17, 18. However, as mentioned in previous studies15,
18, the integrity of the reference panel may influence how accurately CTVT’s origin
can be estimated. As CTVT has evolved for thousands of years, comparison to ancient
canine samples is most useful for directly tracing its origin, but despite great advances,
ancient DNA is still difficult to sequence. Instead, village dogs can be alternative
genetic proxies of ancient populations from the last millennium, assuming no
dramatic population exchange happened. We collected village dogs from diverse
geographical regions to avoid the influence of population exchange and admixture in
some “ancient” breeds by European dogs during colonization. Closely related wild
canids, such as gray wolves, coyotes, and golden jackals, are also useful to test
whether CTVT originated from wild canids or possess partial genetic ancestry from
them. Thus, we collected some wild canine samples to build a more integrated genetic
panel. In total, we additionally collected 22 canids from around the world
(Supplementary information, Table S1), including two golden jackals (Canis aureus)
from western Russia, one coyote (Canis latrans) from western North America, two
gray wolves (Canis lupus) from Iran, and village dogs (Canis lupus familiaris) from
East Asia, the Indian Peninsula, Central Asia, the Middle East, and Africa. Most of
these canids were sequenced to an average 20 depth (Supplementary information,
Table S1). To maximize the spatial and temporal resolution of canine genetic
diversity, we selected another 81 canids from previously published samples (strategy
of selection is described in Supplementary information, Methods). Present-day
samples include coyotes from North America25, worldwide gray wolves4, 6, 8, 18, 25-27,
village dogs6, 28, European breed dogs 18, 27, and breed dogs from other regions6, 8, 18, 24,
28-32. We also included seven ancient North American dogs1, three ancient European
dogs4, 5, and the ~34,900-year-old Taimyr wolf7 (Supplementary information, Table
S1, locations of dogs are depicted in Fig. 1f).
Using filtering, mapping, and single nucleotide polymorphisms (SNPs) calling
procedures as described in the Supplementary information, Methods, we jointly called
24.1M SNPs from 92 present-day canids. The eleven ancient canids were ascertained
and genotyped on these 24.1M SNPs using the same methods developed previously4.
Collectively, the 24.1M SNP panel for these 103 modern and ancient canids were
used as our reference panel to study the genetic ancestry of the CTVT founder. This
reference panel is much more dense than previous studies, giving us the opportunity
to develop a refined landscape of the genetic ancestry making up CTVT’s genome.
Next, we performed ploidy, contamination (cellularity) and CNV analyses of the
CTVT samples using a grid based maximum Bayes estimating method sequenza33.
The ploidy of the five CTVTs ranged from 1.8 to 2, which is a very short interval
(Supplementary information, Fig. S2), confirming no drastic chromosomal variation
at the whole genome level occurred during worldwide dispersal21. The CNV profiles
of the five CTVTs showed a conserved pattern similar to that found in previous
results15, 21 (Fig. 1g), suggesting that the five CTVT samples had a singular origin.
Methods of studying somatic mutations keep advancing as the fields of mutational
mechanisms34, 35, intratumor heterogeneity36-38 and clonal evolution39, 40 in human
tumors progress. As chromosomal instability is considered the predominant somatic
mutational type in the tumorigenesis of CTVT20, the CNV profile is necessary to
determine the genotype at local sites. Thus, we developed a method, the transmissible
tumor genotyper (ttgeno), which is the first genotyping tool designed specifically to
analyze whole genome sequencing data from paired transmissible tumors and their
hosts, to obtain per-site allelic copy number of the tumor (Supplementary information,
Methods). This tool simultaneously takes into account the ploidy, contamination,
local copy number state of both host and tumor, and small indels in the tumor,
removing the sub-clonal factor, as previous studies have shown that CTVT has
already been almost homogeneous15, 20. We genotyped each CTVT using this tool,
obtaining successful genotyping rates from 95.5% to 97.4%.
The genotyped CTVT genome is composed of a mix of different mutations. These
include systematic errors, alleles inherited by the founder, lineage-specific somatic
mutations, and earlier somatic mutations. Assuming a single origin for CTVT,
lineage-specific somatic mutations can be distinguished from genotype-polymorphic
mutations using multiple worldwide CTVT samples. That is, alleles inherited by the
founder and earlier somatic mutations should be genotype-monomorphic among
CTVT samples. We found ~1.7G genotype-monomorphic sites, allowing one missing
CTVT sample at each site. Another 2.9M sites were genotype-polymorphic loci
among the five CTVTs, allowing two missing CTVT samples at each site. We used
the genotype-polymorphic sites to assess the relationship between these five CTVTs
(Supplementary information, Fig. S3) and excluded these from subsequent analyses.
However, the remaining ~1.7G of genotype-monomorphic sites can either contain an
inherited germline allele that originated from the CTVT founder, or an early somatic
mutation that was not in the CTVT founder and arose before the CTVT became
widespread. We hypothesized that SNPs that are both genotype-monomorphic in the
CTVT samples and polymorphic in the reference panel (i.e. biallelic intersection) are
likely inherited germline polymorphisms, whereas private genotype-monomorphic
alleles found only in the CTVT samples are likely early somatic mutations. We found
that of the ~1.7G sites that were genotype-monomorphic in five CTVT samples,
17.4M sites (2M non-ref alleles) are biallelic polymorphic in the reference panel,
while 1.5M sites were private to CTVT samples. However, while we restrict our study
to those loci that are genotype-monomorphic in five CTVT samples but polymorphic
in the reference panel, alternative possibilities remain: 1) Some genotype-
monomorphic loci private to CTVT may in fact be germline polymorphic loci
belonging to an ancient population that has undergone drift. 2) Some CTVT’s alleles
in the 17.4M biallelic intersected sites may be early somatic mutations. For instance,
some polymorphic loci in the reference panel may have mutated more recently than
the time the CTVT founder lived and by chance matched early somatic mutations in
the CTVT lineage. 3) Some genotype-monomorphic loci may contain early somatic
alleles that mutated to alleles observed in coyotes and golden jackals.
Different mutagenic processes often generate different characteristic imprints, which
are combinations of mutation types, termed “signatures”35, 41. We assessed the extent
of somatic and germline mutations contained in the 1.5M genotype-monomorphic loci
private to CTVT samples, and the 17.4M biallelic intersected sites, testing our
assumption that these loci mostly germline mutations. To do so, we performed an
analysis of mutation signatures using signeR42, a method based on an empirical
Bayesian treatment of the non-negative mutational spectra matrix factorization model.
We did not distinguish ancestral and derived alleles for the variants, so mutation
signatures were used to determine relative similarities or differences among samples,
rather than to represent past mutagenic mechanisms. Eight signatures were estimated
from the 96 tri-nucleotide mutational spectra matrix at the 17.4M intersected sites for
both CTVT and the samples in reference panel, and for the 1.5M CTVT-private
alleles (Supplementary information, Fig. S4 and Bayesian Information Criterion in
Supplementary information, Fig. S5). The contribution of each signature in all
samples reveals that Signature8, which is similar with the C>T signature found in
CTVT in previous studies15, 18, is enriched in the CTVT-private alleles (96.7%), but
less than 11.3% in all germline samples (Fig. 1a). The others show that Signature1,
Signature2, and Signature7 are enriched in golden jackals, and Signature4 and
Signature5 are enriched in coyotes (Supplementary information, Figs. S6-S7). For the
set of 17.4M intersected loci, in CTVT, the contribution of Signature8 is only 9.5%,
and most other signatures are not different from that found for other dogs (P>0.05,
Kruskal-Wallis rank sum test). The exception is Signature6, which is found in a
diverse set of dogs, and for whom there may have been a lack of statistical power as
only one sample represents the CTVT at these loci (Supplementary information, Figs.
S6-S7). Unsupervised clustering of samples based on the relationship of these eight
signatures reveals that different species and populations form general clades with a
few exceptions, and the CTVT of the 17.4M intersected loci clusters into a clade
composed of dogs (Fig. 1b). Recent studies about the mutation rate and mutational
mechanisms reflected by mutation signatures show that these signatures can
distinguish between somatic and germline mutations41, 43 among different species43
and between different germline populations44-47, a pattern we also find here (Fig. 1b).
These results indicate that most genotype-monomorphic sites in CTVT genome that
are polymorphic in the reference panel are inherited germline SNPs. Thus, we treated
the 17.4M sites as direct descendants of the CTVT founder and use these sites in
subsequent population genetic analyses. A small proportion (3.3%) of germline
signatures were estimated in CTVT private alleles, such that the loss of these loci in
subsequent analyses is negligible with 17.4M SNPs remaining for inferring the
genetic ancestry of CTVT. We describe the pipeline from genotyping to loci selection
in Fig. 1c. We emphasize that our pipeline is more advanced, by excluding bias from
genotyping errors and somatic mutations, and constructing the suppositional ancient
canid “the CTVT founder”.
In previous analyses, CTVT grouped most closely with, but not into pre-contact dogs
(PCDs) from North America1. In a phylogenetic analysis of the CTVT founder with
our comprehensive canine reference panel, the topology of maximum-likelihood (ML)
tree (Fig. 1d) shows the CTVT founder clusters within several PCDs. Specifically, the
CTVT founder is placed closest to the Baum Village (SAMEA104190271) and Uyak
(SAMEA104190274) dogs. The PCD clade then groups with modern Arctic sled dogs
(ASDs), confirming that PCDs and ASDs share a common ancestor1. However, our
results do not support that all PCDs form a monophyletic clade1. Particularly, we
noted that the Koster dog (SAMEA104190270) is basal to all other dogs, indicating
this PCD may possess ancestry from a wild canid.
Our whole genome sequencing data also support that the PCD/ASD clade is the most
basal dog lineage, similar to Ní Leathlobhair et al.1, but not others where southern
East Asia dogs (EADs) are the most basal4, 6. Ní Leathlobhair et al. proposed that the
discrepancy may be due to post-divergence gene flow from European dogs (EUDs) to
specific EADs. However, exclusion or inclusion of the putative admixed EADs in
neighbor-joining (NJ) trees or ML trees did not result in any distinct changes
(Supplementary information, Figs. S8-S10). When we excluded PCDs and the CTVT
founder, EADs became the most basal clade in the NJ tree, rather than ASDs
(Supplementary information, Fig. S12). These results suggest that PCDs may lead to
the uncertainty in phylogeny dominantly. If introgression from wild canids exists in
the PCD clade, this clade, even with ASDs, can turn to outside of EADs. We then test
this hypothesis in subsequent analyses.
We additionally included three published ancient European dogs’ genomes (NGD,
Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog) in our reference
panel4, 5. We found that HXH and NGD clustered together in the EUD clade, while
CTC is outside of the EUD clade. As previous studies have shown, HXH and NGD
are thought to be derived almost completely from ancient EUDs, while CTC has a
mixture of European and southern East Asian-related ancestry4, 6, 48. Three modern
“American native” breeds (chihuahua and two hairless dogs) also cluster into the
EUD clade (Fig. 1d), which suggests that these three breeds carry ancestry related to
EUDs. Our results are consistent with a population history where EUDs were almost
completely replaced Native American dog lineages1.
PCA excluding the golden jackals shows that PC1 (4.62% of variation) distinguishes
all coyotes and gray wolves from all dogs, and PC2 (3.35% of variation) distinguishes
coyotes and grey wolves, as well as different sub-populations within dogs (Fig. 1b).
We found that the CTVT founder is closest to the PCD cluster. The Taimyr wolf
(TMR) is located between PCDs and worldwide wolves, supporting placement of the
TMR at the split between the dog and wolf lineage7. Ancient European dogs (HXH
and NGD) and modern American dogs fall mostly within an EUD cluster. All
worldwide dogs cluster relatively tightly, but PCDs, including the CTVT founder,
cluster separately from the dog cluster and more closely to wild canids, suggesting
that PCDs may have admixed with wild canids in North America. Our PCA results
generally corroborate with the ML tree and previous results1, although the Port au
Choix dog (SAMEA104190273) closest to the CTVT founder in the PCA analysis
(Supplementary information, Fig. S13) is different from the closest in the ML tree.
In order to validate the above phylogenetic relationships, we then examined the
genetic relatedness between the CTVT founder and each dog in the reference panel by
performing outgroup f3-statistics49, 50, using coyotes as outgroup. Higher f3 values
indicate increased shared drift between the samples, and therefore higher genetic
similarities. The CTVT founder showed the most genetic similarity with the pre-
contact Port au Choix dog, followed by the pre-contact Uyak dog, and then by most
other PCDs and ASDs (Fig. 1f). In particular, the Koster dog is not very genetically
similar to the CTVT founder, but it does share high genetic similarity with coyotes.
Overall, our results further support our phylogenetic and principal component
analyses, where we found that the CTVT founder is genetically most similar to PCDs.
To test the hypothesis whether admixture exists in PCDs and the CTVT founder, we
performed unsupervised ancestral clustering (K) analyses with ADMIXTURE51 on
pruned loci of the reference panel with the CTVT founder, excluding two golden
jackals (K ranges from 2 to 7, Supplementary information, Fig. S14a, and coefficient
of variation in Supplementary information, Fig. S14b). At K=2, coyotes and gray
wolves separate from all dogs, meanwhile all PCDs and the CTVT founder possess
notable amounts of ancestry that is also present in modern coyotes and gray wolves.
At K=3, coyotes separate from gray wolves, with wolves from the Qinghai-Tibetan
Plateau possessing ancestry related to modern coyotes. PCDs and the CTVT founder
still possess ancestry from coyotes and gray wolves. At K=4, all dogs split into two
major clades representing the Eastern and Western Eurasian lineages for dogs. All
PCDs and the CTVT founder cluster into the Eastern Eurasian lineage, further
supporting an East Eurasian origin for PCDs. At K=5, dogs from the Indian Peninsula
separate from Western Eurasian dogs. Ancestry related to Indian dogs also appear in
some dogs from East Asia, the Middle East, and Central Asia, potentially due to
admixture with Indian dog populations. At K=6, the PCDs, the CTVT founder, and
the modern ASDs form a single clade, separating from southern East Asian dogs,
indicating that they share a close genetic relationship. At K=7, African dogs separate
from West Eurasian dogs. In summary, all results support the population structure
observed in our previous phylogenetic and principal component analyses. We find
that the main ancestral component found in the CTVT founder is associated primarily
with PCDs with minor components found predominantly in wild canine populations.
Specifically, we find that the Koster dog possesses a main ancestral component from
gray wolves, consistent with results suggesting that the Koster dog is the most basal
branch of all dogs on the phylogeny and the most related to wild canids in the PCA.
One possibility is that the Koster dog is a recent offspring of backcrosses to dogs after
initial hybridization with American wild canids. We also find that the three modern
“American native” breeds all possess European-like genetic ancestry. Finally, we also
find the ancestral components possessed by three ancient European dogs are
consistent with Botigué et al.’s result4, indicating continuity of European-like genetic
ancestry from modern dogs through the entire Neolithic period.
To further investigate whether the CTVT founder and PCDs experienced
introgression from a population distantly related to dogs, we calculated D-statistics49
to test whether significant asymmetry (positive D value, Z>3) exists between Pop1
and Pop2 using the form D(Pop1, Pop2; Candidate Introgressor, Outgroup). As
extensive gene flow exists in the genus Canis52, we used an ~11 Andean fox
(Lycalopex culpaeus) genome24 as the outgroup, which we genotyped at the 17.4M
intersected SNPs, randomly calling alleles at heterozygous sites to account for low
depth. We tested every non-dog group as a candidate introgressor for the CTVT
founder using D(CTVT founder, Pop2; Introgressor, Andean Fox), where Pop2 was
each canid population in turn (Supplementary information, Fig. S15 and Table S3).
Only coyotes were found to be a robust candidate introgressor. Coyotes from
Monterey showed significantly positive D-statistics for most Pop2 populations except
the other coyotes, New World wolves, and PCDs (Z>3.7). Coyotes from California
and Alabama are also potential candidate introgressors, but modern coyotes from the
Midwest, Ohio, and Florida were not robust introgressors, with several D(CTVT
founder, Pop2; Introgressor, Andean Fox) ~0. We found that coyotes from the
Midwest, Ohio, and Florida are closer to wolves and dogs than coyotes from the
Monterey area, California, and Alabama (Supplementary information, Table S3),
indicating potential geneflow among America canids53-55 or ancestral population
structure among coyotes. Similar to previous analyses1, we found that two PCDs (i.e.
Port au Choix, Weyanoke Old Town) showed significantly positive D(CTVT founder,
Pop2; PCD, Andean Fox) statistics for all Pop2 populations (Z>46), indicating the
close relationship between the CTVT founder and PCDs in our panel. Taken together,
the CTVT founder is likely an ancient American dog with introgression from
populations carrying ancestry related to coyotes from the Monterey area, California,
and Alabama. We also tested whether other dogs (Pop1) possessed introgression from
coyotes by using D(Pop1, Pop2; Coyote, Andean Fox), where Pop2 was tested using
all other groups in turn (Supplementary information, Fig. S16). We found no evidence
of introgression from coyotes in any dog population except PCDs and the CTVT
founder. Due to the CTVT founder’s high coverage, we used it as a surrogate for
PCDs to test whether any other canids carry ancestry from PCDs (Supplementary
information, Fig. S17). Only Arctic sled dogs in North America show more similarity
to PCDs, followed by Siberian and Alaskan huskies. However, whether asymmetric
D-statistics indicate introgression from closely related populations, or an inherited
relationship cannot be determined without high-density sampling of ancient and
modern PCDs and ASDs over a broad geographical region and time frame.
To confirm our result of introgression from coyotes to the CTVT founder shown in D-
statistics analyses, we utilized the coyote-specific diagnostic alleles53, fd-statistics56,
and fdM-statistics57 in sliding windows, as well as RFMix58 to infer the local ancestry
in the genome of the CTVT founder (Fig. 1g). We found the results were consistent
using these methods, with several regions introgressed from coyotes. From the RFMix
estimation, the estimated proportion of introgression from coyotes is ~0.9%. We also
used the F4-ratio test49 to estimate the proportion of coyote-related ancestry in the
CTVT founder and dogs sampled from America. We found the introgressed
proportion is 2.6%±0.5% (Z=5.735) for the CTVT founder and 4.9%±0.5%
(Z=10.334) for the Port au Choix dog, whereas the proportion is negative for the
Greenland dog and close to zero for the Alaska malamute, Mexico naked, and
Peruvian naked dog. The proportion inferred by RFMix is smaller than the F4-ratio,
likely because RFMix finds local ancestral segments using a smoothing algorithm for
limited generations after initial hybridization. These results reveal introgression from
coyotes into PCDs, but none in later introduced American dogs. We also identified
introgressed regions from New World wolves (NWW, 1.9%, physical length
proportion) in the genome of the CTVT founder when three ancestral references were
used in RFMix, supported by an F4-ratio estimate of 6.2%±1.1% (Z=5.742). Similar
to coyote introgression, the Port au Choix dog also has a higher admixture proportion
from NWW (F4-ratio, 12.0%±1.1%, Z=10.205). The extensive gene flow between
wild canids and PCDs may reflect overlap in the ecological habitats of PCDs and
coyotes.
TreeMix59 determines the graph structure of ancestral populations that allows for both
population splits and potential gene flow by using genome-wide allele frequency data
and a Gaussian approximation of genetic drift. We used TreeMix to investigate the
genetic relationship between the CTVT founder, PCDs, other ancient and present-day
canids (Supplementary information, Figs. S18-S21). The ML tree without admixture
(m=0, Supplementary information, Fig. S18a) showed that EAD form the basal clade
of all dogs. Other dogs split into two major clades, one is composed of ASDs and
PCDs in North America, while the other one is composed of Western Eurasian dogs
and African dogs. The topology is consistent with the NJ tree constructed without
PCDs. We observe that the PCD/CTVT founder clade clusters with the Greenland dog
and Alaskan malamute, and this super clade in turn clusters with two kinds of huskies
as a sister clade. This indicates that present-day Arctic dogs in North America may
possess high amounts of genetic ancestry inherited from the ancestral population of
the CTVT founder and sampled PCDs. This result is inconsistent with our
phylogenetic results in the text and other previous results1, but appeared in some
phylogenies when we adjusted samples used in the reference panel (not shown). In
view of this result, the refined evolutionary history of American dogs from their initial
introduction until the present-day is still uncertain and may require increased
sampling ancient and modern American dogs. We visualized the matrix of residuals
(Supplementary information, Fig. S18b) to determine how the estimated genetic
relationship between each pair of canids fit the model. A high residual indicates that
the pair does not fit the graph model and may be candidates for an admixture event.
We find three candidate admixture events: 1) between coyotes and the PCD/CTVT
founder, 2) between Siberian and Alaskan huskies, and 3) between Indian and African
village dogs. In a reticulate ML graph allowing three admixture events, a migration
event from the coyote lineage to the PCD/CTVT founder clade is included (Fig. 1h,
matrix of residuals in Supplementary information, Fig. S21b). The other two events
reflect the extensive admixture in Eurasian canids. The topology of the graph
remained unchanged when migrant events were included. Thus, several methods
support the presence of gene flow from coyotes into the ancient native dog population
represented by the CTVT founder and PCDs. This reticulate graph also demonstrated
the concordant result of the Out of Southern East Asia hypothesis of living dogs
suggested in previous studies4, 6 (Fig. 1h), where East Asian dogs are the basal clade
of all dogs, and two major superclades are found in the dog phylogeny, representing
two migration routes into the regions of Far East-America and Indian Peninsula-West
Eurasia6.
Due to the low sequencing depth of most PCDs, we just included one or two PCDs in
the statistical analyses used to determine introgression, but the ADMIXTURE results
suggest that introgression was extensive, over a long timeframe and across a broad
region (Supplementary information, Fig. S14a). However, previous published results
on ancient mtDNA of morphologically identified PCDs classified the vast majority as
haplotypes belonging to either dogs or wolves, but not coyotes1, 60-64. An uneven
proportion of introgression from coyotes, supported by mtDNA or autosomal analyses,
likely existed in PCDs. A model-based survey demonstrated that sex-biased
introgression arises when asymmetries exist between the sexes in fitness or mating
behavior in the hybridizing migrant pool or in the source species65. Some studies have
claimed that male-biased introgression from dogs occurred in modern eastern
coyotes53, 66. The sex bias of transient coyotes has not been recorded in such detail67, 68,
which suggests that further work is needed to determine whether transient coyotes are
male-biased. Studies indicate a high level of diversity in bone morphology in early
PCDs69, but whether the coyote-PCD hybridization was found naturally or was
introduced by humans to develop new breeds is still unknown. Classic cases such as
the Tibetan Mastiff acquiring adaptation to hypoxia at high elevations due to
introgression from Tibetan wolves70, 71 also highlight the importance of denser
sampling in the future to study whether any introgressed regions from coyotes are
under selection in PCDs.
Although studies based on mtDNA provides a timeframe for the initial introduction of
PCDs with humans1, 60, the refined demography of American dogs is still
controversial, especially for Arctic dogs in North America1, 61-64, 72. Our results are
also not enough for concluding this issue. Recent studies reveal a complex population
history of Native Americans73, 74, which suggest that American dogs have complex
histories associated not only with hybridization with wild canids but also with human
migration within the Americas. High quality genome data from PCDs are in high
demand to answer these questions. Thus, the CTVT founder, inferred from the
geographically dispersed CTVT samples, is a useful high-quality proxy for PCDs.
The CTVT-private genotype-monomorphic sites will greatly aid cancer evolution
studies75, and more importantly, the extraction of the CTVT founder genome from
genotype-monomorphic sites in CTVT samples is invaluable to canine population
studies. Thus, we provide the genotype-monomorphic diploidized sites of the five
geographically dispersed CTVTs in the DogDG database of the iDog76 platform for
researchers to conveniently use in future studies.
Supplementary Methods
Sample collection and data set aggregation
Canine transmissible venereal tumor sample collection The two CTVT samples
used in this study (both from male dogs), named KM1 and KM2, were obtained in
Kunming, China (Supplementary information, Fig. S1). Both the tumors and blood
from the host dogs were collected with the approval of their owners. After
cytodiagnosis and removal during anesthetic surgery at two animal hospitals
respectively, the tumors were soaked in absolute ethanol. Blood from the host was
collected from forelimb vein and stored in EDTA-anticoagulation tubes. WGS data
for three published CTVT samples (24T, 79T, 609T) and their corresponding hosts
(24H, 79H, 609H) were also downloaded for use1, 15.
Compiling the reference panel To investigate the genetics of the CTVT founder,
we collected all published canine WGS data to date. Then we selected samples using
criteria to balance the sample size of sub-populations and improve the quality of panel:
1) If both high depth (>20) and low depth samples (<10) exist at one geographic
point, low depth samples were excluded. 2) If the distribution of a sample’s
sequencing depth along chromosomes is not uniform, the sample was excluded. 3) If
close relatives in two generations exist based on kinship coefficients77, only one of
them was retained. 4) If admixed village dogs not from Europe possess extreme EUD
ancestry and clusters with EUDs, photos of these dogs were checked to identify
admixed characters, and these samples were excluded. 5) All ancient samples were
retained. 6) EUDs were selected randomly, but geographically dispersed to a
proportionate sample size compared with other groups. Specifically, we sampled two
golden jackals from Western Russia, as it was previously shown that African and
Israeli golden jackals show admixture related to wolves and dogs, suggesting that they
may be more closely related to wolves and dogs than to coyotes4, 8, 78. Using these
conditions, we included WGS data from 103 canids in the reference panel, containing
worldwide gray wolves (Canis lupus), dogs (Canis lupus familiaris), coyotes (Canis
latrans), and golden jackals (Canis aureus) (Supplementary information, Table S1).
Genomic library construction and sequencing
CTVT samples All genomic DNA were extracted from the CTVT tumors and the
blood of their corresponding hosts using QIAGEN DNeasy Blood & Tissue kit. The
DNA extracts were then sent to Tianjin Novogene Bioinformatics Technology Co.,
Ltd after vacuum freeze-drying for sequencing. Four paired-end libraries (insert size:
250bp, 650bp, 800bp, 650bp) were constructed for DNA extracts from each CTVT
sample and one paired-end library (insert size: 300bp) was constructed for each host.
Sequencing was carried out for paired-end reads on the Illumina HiSeq2000 platform
according to the manufacturer’s instructions. For the raw reads, sequencing adapters
were removed. Contaminated reads (chloroplast, mitochondrial, bacterial and viral
sequences, etc.) were screened by alignment to the NCBI-NR database using
megablast (version 2.2.26)79, 80 with the parameters “-v 1 -b 1 -e 1e-5 -m 8 -a 13”. The
in-house script duplication_rm.v2 was used to remove the duplicated read pairs. The
low-quality reads were filtered, and the following conditions led to filtering out of the
reads: 1) reads with ≥10% unidentified nucleotides (N), 2) reads with adapters, 3)
reads with >20% bases having Phred quality less than 5. Finally, the average
sequencing depth for both CTVT tumors was ~40, and the average sequencing depth
for both of their hosts was ~20.
Newly collected canids For newly collected canids, we sent Whatman® FTA®
Cards containing their DNA to Tianjin Novogene Bioinformatics Technology Co.,
Ltd for sequencing. Paired-end genomic sequence libraries were constructed with an
insert size of 250-400 bp, and sequencing was carried out on the Illumina HiSeq
XTen platform. The filtering scheme used is the same as that described above.
Sequencing depth and coverage information are shown in Supplementary information,
Table S1.
Sequence data pre-processing and variant calling
We mapped all clean reads to the CanFam3.1
(ftp://hgdownload.soe.ucsc.edu/goldenPath/canFam3) reference sequence using bwa
mem -M (version 0.7.5-r1140)81. Mapped reads were sorted using samtools sort
(version 1.5)82. We applied picard (version 2.9.0,
http://broadinstitute.github.io/picard/) to remove duplicated reads and merged BAM
files for multiple lanes. Indels were realigned using the GenomeAnalysisTK
(GATK ,version 3.7.0)83 Indelrealign. Base quality was recalibrated using GATK
BQSR to produce a final BAM file for each sample. The depth and coverage for all
samples were calculated using GATK DepthOfCoverage.
We applied the GATK HaplotypeCaller to simultaneously call variants (SNPs and
indels) from the final BAM files of 92 modern canids. We removed SNPs that are
within three base pairs of an indel using bcftools (version 1.5)84 SnpGap. Biallelic
SNPs were retained using a hard filter of QD < 2.0, MQ < 40.0, FS > 60.0, SOR > 3.0,
MQRankSum < -12.5, or ReadPosRankSum < -8.0 in accordance with the GATK
Tutorials (https://software.broadinstitute.org/gatk/documentation/article?id=2806). To
retain private alleles belonging to golden jackals and coyotes, we filtered out SNPs
where the minor allele count was less than two. Then, we removed SNPs with a
missing rating less than 0.9 using vcftools (version 0.1.15)85. We ended up with a final
set of 24.2M SNPs. Then these sites were genotyped for ancient canids individually
using the script aDNA_GenoCaller.py as described before4. 24.1M biallelic SNPs
were retained from the union of modern and ancient canine SNPs. Subset of SNPs
belonging to subset of samples in several analyses was acquired by bcftools view -S
samples_list -c 1:minor, respectively.
Ploidy, cellularity and copy number variation analyses of CTVTs
The GC content of all chromosomes (removing unplaced contigs) were calculated in
50bp windows. Base content, sequencing depth and strand information at all sites
were extracted from the BAM files of every CTVT sample and its corresponding host
using sequenza-utils bam2seqz and binned using sequenza-utils seqz_binning in 50bp
windows. We used median normalization method to determine the depth ratio, the fast
method for segmentation, and three alleles so that at each locus we could estimate the
tumor genotype. The gender parameter was assigned according to the host’s gender.
Finally, ploidy and cellularity were estimated using the sequenza (version 2.1.2)33
package in R. CNV profiles were then estimated using the estimated ploidy and
cellularity results.
Development of a transmissible tumor genotyper pipeline
In order to perform accurate population genetic analyses to study the CTVT founder
and its relationship to the reference panel, a comprehensive genotyped set of loci that
are not somatic single nucleotide variations is required. Thus, we need obtain the per-
site genotype of each CTVT sample firstly, and then judge whether polymorphism
exists at a specific locus among CTVTs to classify it into recent somatic mutations or
potential early somatic/germline genotype-monomorphic sites. Finally, through
mutation spectrum deconvolution, we can assess the contribution of different type of
mutations in the sets.
We first give two examples to demonstrate the weakness of previous methods using
VAF, and then we describe the first tool to obtain per-site genotypes specific for
paired whole genome sequencing data of transmissible tumor and its host, naming as
transmissible tumor genotyper (ttgeno).
The first example:
Assuming a homozygous site AA in host, if beta VAF>0.1 is the threshold to
determine a heterozygous genotype in tumor. If 1) the real genotype of tumor is CC
without copy number alternation, 2) the local reads depth is same as host sample, and
3) the sequencing possibility of each haplotype is same, when the contamination of
host cells increases, the observed VAF will also increases. Only when the
contamination is less than 10%, the genotyping result of tumor is true CC, otherwise,
it is falsely genotyped as AC.
The second example:
Assuming a homozygous site AA in host, if beta VAF>0.1 is the threshold to
determine a heterozygous genotype in tumor. If 1) the real genotype of tumor is
“ACCCCCCCCCC” with copy number alternation to 11; 2) the sequencing possibility
of each haplotype is same; 3) no any contamination of host cells exists, the biallelic
genotyping result is “CC”, but the truth may be “AC” by amplification of the “C”
haplotype’s chromosomal segment.
When these two factors exist at the same time, the genotyping result will be biased
depending on how contamination is and whether chromosomal instability exists. Also,
reads depth and few CNV in host will also affect genotyping results. Thus, here we
developed the ttgeno to consider contamination, ploidy, CNV of both tumor and host,
and reads depth together:
1) We called SNPs and indels of the host using GATK HaplotypeCaller, filtering
using the same strategy described in Sequence data pre-processing and variant
calling section of Materials and Methods. SNPs within three base pairs of an indel
were annotated as ambiguous sites.
2) We then extracted the genotypes of small deletions in the host to recalibrate the
local depth ratio of the tumor and its corresponding host, as deletions in the host
do not meet the global assumption of diploidy for normal samples analyzed in
sequenza.
3) We called deletions in the tumor BAM file and in the corresponding host BAM
file using GATK Mutect2. Genotypes for host’s deletions extracted from the
Mutect2 results were combined into results of step 2), in consideration of omission
of GATK HaplotypeCaller. Read counts covering local deletion regions in tumor
were used to recalibrate the estimated local copy number, as the large segments of
copy number variation were broken under tolerance of small indels in sequenza.
4) We called deletions in the tumor BAM files using bcftools mpileup, with
parameters -Q 20, -d 500 and -L 500, in consideration of omission of GATK
Mutect2. Read counts covering local deletion regions in the tumor were combined
into results of GATK Mutect2 in step 3).
5) We determined the large CNV in the host using CNVnator (version 0.3.3)86, using
bins of 400 bp. The CNV regions were filtered with length>1000 bp, e-val1<0.01,
q0<0.5, an overlap ratio with gaps
(http://hgdownload.soe.ucsc.edu/goldenPath/canFam3/database/gap.txt.gz) <0.5,
and overlap ratio with repeatmask regions
(http://hgdownload.soe.ucsc.edu/goldenPath/canFam3/database/rmsk.txt.gz) <0.5.
The large CNV for host were also used to recalibrate the local depth ratio, as CNV
in the host do not meet the global assumption of diploidy for normal samples
analyzed in sequenza.
6) The CNV of the tumor was estimated using sequenza.
7) We generated the multi-informative seqz file for the tumor and host using the
results from samtools mpileup, with the command sequenza-utils bam2seqz. All
host’s SNPs with genotypes that differ from the GATK HaplotypeCaller&Mutect2
result were annotated as ambiguous sites.
8) We generated the per-site read count acgt file for the tumor using the results from
samtools mpileup, with the command sequenza-utils pileup2acgt. If the site is
ambiguous in the host, we excluded it for genotyping. For the main genotyping
process, we first recalibrate the local CNV status of the host. Second, we infer the
host’s allelic reads contamination ratio in the tumor data based on global
cellularity, local copy number of host and the allelic sequence depth of tumor.
Third, we recalibrate the local CNV status of the tumor if small deletions were
found. Finally, we calculated the allelic copy number using calibrated allelic read
counts and the local CNV status of the tumor. These steps were implemented
using perl and available at https://github.com/xuan-wang/ttgeno. The ratio of sites
that have an unequal value between the sum of per-site allelic copy number and
recalibrated locus copy number is approximately 0.00001 at the whole genome
level.
The demo output of the ttgeno is given below as example:
Chr Pos Ref A C G T Total
1 1836 G 0 0 4 0 4
1 1838 G 0 1 3 0 4
1 1841 C 1 2 1 0 4
1 1842 T 0 0 0 1 1
Tumor’s sites with ambiguous state in host and uncovered sites were excluded in the
output. Sites with copy number lower than 2 were masked as missing, because the
ancestral diploid genotype is unknown. We treated the final per-site diploidized
genotype of allelic copy number as the genotype of the tumor under assumption of
maximum parsimony, which means the genotype of chr1:1838 site is CG, chr1:1841
is ACG, chr1:1842 is NN, and chr1:1836 is homozygous GG. The reason we
performed this conversion is the absolute copy number may be different among
samples, leading to disparate information to get the genotype-monomorphic sites
among transmissible tumors.
CTVT loci selection
We selected the genotype-monomorphic sites among five CTVTs, allowing one
missing sample, as any genotype-polymorphic site likely is a result of somatic
mutations. An unrooted neighbor-joining phylogeny was constructed by MEGA-CC
(version 7.0.25)87 based on the genotype-polymorphic sites (allowing two missing
samples) to show the diversity among CTVTs. Comparing with the SNPs from the
reference panel, we further determined two categories: the first containing the biallelic
intersection between the genotype-monomorphic sites of CTVTs and the reference
panel’s SNPs, and the second containing CTVT-private genotype-monomorphic sites.
Mutation signatures analyses
We utilized signeR42 to factorize the 96 trinucleotide mutational counts matrix of
mutation signatures to assess the extent of somatic and germline mutations contained
in the intersected SNPs and the CTVT-private alleles. The bar plot of contributions
from each signature is made with the medians of the evaluations of each signature
from the combined set of samples. Default distance and cluster methods were used to
group samples according to their levels of exposure to the signatures. Differentially
active signatures among previously defined groups of samples (GDJ, golden jackals;
CYT, coyotes; WOLF, worldwide wolves; DOG, worldwide dogs; CTVT intersected,
the biallelic intersected SNPs between genotype-monomorphic sites and the SNPs of
reference panel; CTVT private, CTVT-private genotype-monomorphic alleles) were
determined in signeR by Kruskal-Wallis Rank Sum Test. Ultra-low depth ancient
samples were excluded in mutation signature analyses.
Population phylogeny analysis
We constructed approximately maximum-likelihood phylogenetic trees using
FastTreeMP (version 2.1)88. The bootstrap replicates were generated using the
defaults in FastTreeMP. We adopted the same methods to construct two other ML
trees using a subset of samples excluding specific admixed EADs, or the PCDs and
the CTVT founder (Supplementary information, Figs. S9, S11). In addition, we built
three NJ trees using MEGA-CC (version 7.0.25)87 to compare the effect of different
methods of phylogenetic estimation. The NJ trees were built using the same SNPs and
subsets of samples used for the ML trees (Supplementary information, Figs. S8, S10
and S12). 100 replicates were generated to calculate bootstrap values. All tree figures
were illustrated using GDJs as outgroup in FigTree (version 1.4.3,
http://tree.bio.ed.ac.uk/software/figtree) and colored according to geography.
Principal component analysis
PCA for the subset of samples excluding GDJs was performed using SNPRelate
(version 1.12.0)89. SNPs were pruned by linkage disequilibrium, resulting in 76K
SNPs in subsequent analyses. PCA figures were created using the R package ggplot2
(version 3.0.0)90. Colors were assigned according to geography similar to that found
in the phylogenies. Specifically, we colored each PCD separately to distinguish them
from each other (Supplementary information, Fig. S13).
Population structure analysis
Population structure was inferred using ADMIXTURE (version 1.3.0)51 on pruned loci
from PCA analyses, with the number of inferred ancestries (K) ranging from two to
seven. We used the lowest cross validation error across all K to determine the
component of each inferred ancestral populations. The best value was found for two
ancestral populations (K=2), which was determined according to the overall minimum
coefficient of variation.
Statistical analyses
The symmetry statistical tests were performed by programs within the Admixtools
software package49, with the setting numchrom=38. The genetic map for our SNPs
was inferred from Auton et al.24, using
https://github.com/armartin/ancestry_pipeline/blob/master/makeMap.py. Standard
errors were estimated by Admixtools using a weighted block jackknife based on the
genetic map as previously described in Patterson et al.49. We performed outgroup f3-
statistics analysis49, 50 of the form f3(CTVT founder, Pop2; Coyote), to assess the
relative genetic similarity of the CTVT founder to present-day dogs (Pop2). f3-
statistics were calculated using the qp3pop program (version 412). We created a heat
map of the outgroup f3-statistics using the R package ggplot290 and a public map in R
package ggmap91. We performed D-statistics (ABBA-BABA tests) analysis49 of the
form D(Pop1, Pop2; Pop3, Andean Fox) for all combinations of groups to assess
potential introgression between populations. The ~11 Andean fox (Lycalopex
culpaeus) genome24 was used as outgroup, and due to its low sequencing depth,
alleles were called randomly at heterozygous sites. Of canids, only two PCDs were
used, SAMEA104190273 and SAMEA104190275, as the others had very low
sequencing depth. Labels are shown in Supplementary information, Table S1 and
results are shown in Supplementary information, Table S3. Some results not directly
relevant to this study were included for completeness. D-statistics were calculated
using the qpDstat program (version 712). The proportion of introgression from
western coyotes (Monterey area, California and Alabama) to the CTVT founder, Port
au Choix dog, Greenland dog and Alaskan Malamute (X in turn) was calculated using
F4(Andean Fox, SEAD; X, Siberian Huskies) / F4(Andean Fox, SEAD; Coyotes,
Siberian Huskies). The proportion of introgression from western coyotes to Mexico
and Peruvian naked dogs (X) was calculated using F4(Andean Fox, SEAD; X,
Newgrange dog) / F4(Andean Fox, SEAD; Coyotes, Newgrange dog). The proportion
of introgression from New World wolves to the CTVT founder and Port au Choix dog
was calculated using F4(Andean Fox, SEAD; X, Siberian Huskies) / F4(Andean Fox,
SEAD; New World wolves, Siberian Huskies). F4-ratio statistics were calculated
using the qpF4ratio program (version 300). Labels of samples used in F4-ratio are
recorded in Supplementary information, Table S1 and results can be found in
Supplementary information, Table S4.
Local ancestry inference
According to D-statistic results, coyotes from the Midwest, Ohio and Florida are more
closely related to dogs and wolves than coyotes from Monterey area, California and
Alabama are. Thus, we used coyotes from the Monterey area, California and Alabama
as the coyote-ancestral reference in the following local ancestry inference. High
frequency coyote-private non-reference (non-ref) alleles can be used as diagnostic
alleles to test for admixture53. We extracted fixed non-ref alleles (positions are
colored as blue in the second inner circle of Fig. 1g) from the SNP set private to all
coyotes using bcftools view -S westerncoyotes.list -c 6:nref, and checked the allelic
state at these loci in the genome of CTVT founder. If the CTVT founder share the
coyote diagnostic allele, we plot the position of the locus (red in the first inner circle
of Fig. 1g). We then calculated fd-statistics56, fdM-statistics57 and D-statistics in a 500
kb sliding window using a step size of 250 kb by
https://github.com/simonhmartin/genomics_general/blob/master/ABBABABAwindo
ws.py, setting the Andean fox as outgroup, the three aforementioned coyotes as
introgressers, the Greenland dog and Alaskan Malamute as Pop2, and the CTVT
founder as Pop1. The top 1% of negative windows for the fd-statistics and fdM-
statistics are highlighted, and windows of negative D-statistics are highlighted green
as a reference. Beagle5 (version 28Sep18.793)92 is used to phase the SNPs excluding
GDJs and ultra-low depth samples for RFMix58 local ancestry inference. The average
effective population size was set to 60,000 in accordance with previous studies4, 6. The
three mentioned CYT, four North NWW, Greenland dog and ASD were set as
ancestral references. We performed RFMix (version v2.03-r0) with additional
parameters: -e=40 --reanalyze-reference -G 25. The physical proportion of each
ancestry was the average value weighted by length of each chromosome segment, and
local ancestry of two haplotypes are recorded in Supplementary information, Table S5.
TreeMix analysis
We applied TreeMix (version 1.13)59 to investigate the genetic relationship between
the CTVT founder, PCDs, other ancient and present-day canids. Ultra-low depth
ancient samples, Tibetan wolves and Eastern coyotes were excluded, and labels of
samples used are recorded in Supplementary information, Table S1. Allele
frequencies were counted by plink (version v1.90b5.2)93 and converted to TreeMix
input using the script plink2treemix.py. The analysis was performed with the
parameters: -k 100 -root CYT. To further investigate how well the tree model fits the
data, we visualized the matrix of residuals for the tree model with no admixture. We
tested trees for zero to three migration events and show results for zero and three
migration events.
Accession number
Sequencing data is archived and available in the Genome Sequence Archive (GSA,
http://bigd.big.ac.cn/gsa/). The accession number for the 22 newly collected canids is
CRA000938, and the accession number of the two CTVT samples and their
corresponding hosts is CRA000939. The genotype-monomorphic sites based on the
five CTVT samples are available in the DogGD database from the iDog platform
(http://bigd.big.ac.cn/doggd/pages/modules/download/download.jsp). The
transmissible tumor genotyper is available at https://github.com/xuan-wang/ttgeno.
Supplementary References
1. Ní Leathlobhair M, et al. The evolutionary history of dogs in the Americas.
Science 361, 81 (2018).
2. Ostrander EA, Wayne RK, Freedman AH, Davis BW. Demographic history,
selection and functional diversity of the canine genome. Nat. Rev. Genet. 18,
705 (2017).
3. Freedman AH, Wayne RK. Deciphering the origin of dogs: from fossils to
genomes. Annu. Rev. Anim. Biosci. 5, 281-307 (2017).
4. Botigué LR, et al. Ancient European dog genomes reveal continuity since the
Early Neolithic. Nat. Commun. 8, 16082 (2017).
5. Frantz LAF, et al. Genomic and archaeological evidence suggest a dual origin
of domestic dogs. Science 352, 1228-1231 (2016).
6. Wang G-D, et al. Out of Southern East Asia: the natural history of domestic
dogs across the world. Cell Res. 26, 21 (2015).
7. Skoglund P, Ersmark E, Palkopoulou E, Dalén L. Ancient wolf genome
reveals an early divergence of domestic dog ancestors and admixture into
high-latitude breeds. Curr. Biol. 25, 1515-1519 (2015).
8. Freedman AH, et al. Genome sequencing highlights the dynamic early history
of dogs. PLoS Genet. 10, e1004016 (2014).
9. Fan Z, et al. Worldwide patterns of genomic variation and admixture in gray
wolves. Genome Res. 26, 163-173 (2016).
10. Yang MA, Fu Q. Insights into modern human prehistory using ancient
genomes. Trends Genet. 34, 184-196 (2018).
11. Nielsen R, et al. Tracing the peopling of the world through genomics. Nature
541, 302 (2017).
12. MacHugh DE, Larson G, Orlando L. Taming the past: ancient DNA and the
study of animal domestication. Annu. Rev. Anim. Biosci. 5, 329-351 (2017).
13. Pont C, et al. Paleogenomics: reconstruction of plant evolutionary trajectories
from modern and ancient DNA. Genome Biol. 20, 29 (2019).
14. Larson G, et al. Rethinking dog domestication by integrating genetics,
archeology, and biogeography. Proc. Natl. Acad. Sci. USA 109, 8878-8883
(2012).
15. Murchison EP, et al. Transmissible dog cancer genome reveals the origin and
history of an ancient cell lineage. Science 343, 437-440 (2014).
16. Ostrander EA, Davis BW, Ostrander GK. Transmissible tumors: breaking the
cancer paradigm. Trends Genet. 32, 1-15 (2016).
17. Murgia C, et al. Clonal origin and evolution of a transmissible cancer. Cell
126, 477-487 (2006).
18. Decker B, et al. Comparison against 186 canid whole-genome sequences
reveals survival strategies of an ancient clonally transmissible canine tumor.
Genome Res. 25, 1646-1655 (2015).
19. Strakova A, et al. Mitochondrial genetic diversity, selection and
recombination in a canine transmissible cancer. eLife 5, e14552 (2016).
20. Ujvari B, Papenfuss AT, Belov K. Transmissible cancers in an evolutionary
context. BioEssays 38, S14-S23 (2016).
21. Thomas R, et al. Extensive conservation of genomic imbalances in canine
transmissible venereal tumors (CTVT) detected by microarray-based CGH
analysis. Chromosome Res. 17, 927 (2009).
22. Stammnitz MR, et al. The origins and vulnerabilities of two transmissible
cancers in tasmanian devils. Cancer Cell 33, 607-619.e615 (2018).
23. Meirmans PG. Subsampling reveals that unbalanced sampling affects structure
results in a multi-species dataset. Heredity 122, 276-287 (2019).
24. Auton A, et al. Genetic recombination is targeted towards gene promoter
regions in dogs. PLoS Genet. 9, e1003984 (2013).
25. vonHoldt BM, et al. Whole-genome sequence analysis shows that two
endemic species of North American wolf are admixtures of the coyote and
gray wolf. Sci. Adv. 2, e1501714 (2016).
26. Zhang W, et al. Hypoxia adaptations in the grey wolf (Canis lupus chanco)
from Qinghai-Tibet Plateau. PLoS Genet. 10, e1004466 (2014).
27. Marsden CD, et al. Bottlenecks and selective sweeps during domestication
have increased deleterious genetic variation in dogs. Proc. Natl. Acad. Sci.
USA 113, 152 (2016).
28. Gou X, et al. Whole-genome sequencing of six dog breeds from continuous
altitudes reveals adaptation to high-altitude hypoxia. Genome Res. 24, 1308-
1315 (2014).
29. Kim H-M, et al. Whole genome comparison of donor and cloned dogs. Sci.
Rep. 3, 2998 (2013).
30. Kim RN, et al. Genome analysis of the domestic dog (Korean Jindo) by
massively parallel sequencing. DNA Res. 19, 275-288 (2012).
31. Wiedmer M, et al. A RAB3GAP1 SINE insertion in Alaskan Huskies with
Polyneuropathy, Ocular Abnormalities, and Neuronal Vacuolation (POANV)
resembling human Warburg Micro Syndrome 1 (WARBM1). G3: Genes,
Genomes, Genet. 6, 255-262 (2016).
32. Liu Y-H, et al. Whole-genome sequencing of African dogs provides insights
into adaptations against tropical parasites. Mol. Biol. Evol. 35, 287-298 (2018).
33. Favero F, et al. Sequenza: allele-specific copy number and mutation profiles
from tumor sequencing data. Ann. Oncol. 26, 64-70 (2015).
34. Lee JK, Choi YL, Kwon M, Park PJ. Mechanisms and consequences of cancer
genome instability: lessons from genome sequencing studies. Annu. Rev.
Pathol. 11, 283-312 (2016).
35. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational
signatures in human cancers. Nat. Rev. Genet. 15, 585-598 (2014).
36. Rosenthal R, McGranahan N, Herrero J, Swanton C. Deciphering genetic
intratumor heterogeneity and its impact on cancer evolution. Annu. Rev.
Cancer Biol. 1, 223-240 (2017).
37. Schwartz R, Schaffer AA. The evolution of tumour phylogenetics: principles
and practice. Nat. Rev. Genet. 18, 213-229 (2017).
38. Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation
sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19,
269-285 (2018).
39. Shpak M, Lu J. An evolutionary genetic perspective on cancer biology. Annu.
Rev. Ecol. Evol. Syst. 47, 25-49 (2016).
40. Wu CI, Wang HY, Ling S, Lu X. The ecology and evolution of cancer: the
ultra-microevolutionary process. Annu. Rev. Genet. 50, 347-369 (2016).
41. Alexandrov LB, et al. Signatures of mutational processes in human cancer.
Nature 500, 415-421 (2013).
42. Rosales RA, et al. signeR: an empirical Bayesian approach to mutational
signature discovery. Bioinformatics 33, 8-16 (2017).
43. Milholland B, et al. Differences between germline and somatic mutation rates
in humans and mice. Nat. Commun. 8, 15183 (2017).
44. Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of
germ-line de novo mutation, base composition, divergence and diversity in
humans. PLoS Genet. 14, e1007254 (2018).
45. Rahbari R, et al. Timing, rates and spectra of human germline mutation. Nat.
Genet. 48, 126-133 (2016).
46. Harris K, Pritchard JK. Rapid evolution of the human mutation spectrum.
eLife 6, e24284 (2017).
47. Scally A. Global clues to the nature of genomic mutations in humans. eLife 6,
e27605 (2017).
48. Shannon LM, et al. Genetic structure in village dogs reveals a Central Asian
domestication origin. Proc. Natl. Acad. Sci. USA 112, 13639-13644 (2015).
49. Patterson N, et al. Ancient admixture in human history. Genetics 192, 1065
(2012).
50. Raghavan M, et al. Upper Palaeolithic Siberian genome reveals dual ancestry
of Native Americans. Nature 505, 87 (2013).
51. Alexander DH, Novembre J, Lange K. Fast model-based estimation of
ancestry in unrelated individuals. Genome Res. 19, 1655-1664 (2009).
52. Gopalakrishnan S, et al. Interspecific gene flow shaped the evolution of the
genus Canis. Curr. Biol. 28, 3441-3449.e3445 (2018).
53. Monzon J, Kays R, Dykhuizen DE. Assessment of coyote-wolf-dog admixture
using ancestry-informative diagnostic SNPs. Mol. Ecol. 23, 182-197 (2014).
54. vonHoldt BM, et al. A genome-wide perspective on the evolutionary history
of enigmatic wolf-like canids. Genome Res. 21, 1294-1305 (2011).
55. Sinding M-HS, et al. Population genomics of grey wolves and wolf-like
canids in North America. PLoS Genet. 14, e1007745 (2018).
56. Martin SH, Davey JW, Jiggins CD. Evaluating the use of ABBA-BABA
statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244-257 (2015).
57. Malinsky M, et al. Genomic islands of speciation separate cichlid ecomorphs
in an East African crater lake. Science 350, 1493-1498 (2015).
58. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative
modeling approach for rapid and robust local-ancestry inference. Am. J. Hum.
Genet. 93, 278-288 (2013).
59. Pickrell JK, Pritchard JK. Inference of population splits and mixtures from
genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
60. Leonard JA, et al. Ancient DNA evidence for Old World origin of New World
dogs. Science 298, 1613-1616 (2002).
61. Castroviejo-Fisher S, et al. Vanishing native American dog lineages. BMC
Evol. Biol. 11, 73 (2011).
62. Brown SK, Darwent CM, Sacks BN. Ancient DNA evidence for genetic
continuity in arctic dogs. J. Archaeol. Sci. 40, 1279-1288 (2013).
63. Witt KE, et al. DNA analysis of ancient dogs of the Americas: Identifying
possible founding haplotypes and reconstructing population histories. J. Hum.
Evol. 79, 105-118 (2015).
64. van Asch B, et al. Pre-Columbian origins of Native American dog breeds,
with only limited replacement by European dogs, confirmed by mtDNA
analysis. Proc. R. Soc. B 280, 20131142 (2013).
65. Patten MM, Carioscia SA, Linnen CR. Biased introgression of mitochondrial
and nuclear genes: a comparison of diploid and haplodiploid systems. Mol.
Ecol. 24, 5200-5210 (2015).
66. Wheeldon TJ, et al. Y-chromosome evidence supports asymmetric dog
introgression into eastern coyotes. Ecol. Evol. 3, 3005-3020 (2013).
67. Bekoff M, Wells MC. Behavioral ecology of coyotes: social organization,
rearing patterns, space use, and resource defense. Z. Tierpsychol. 60, 281-305
(1982).
68. Gese EM, Ruff RL. Scent-marking by coyotes, Canis latrans: the influence of
social and ecological factors. Anim. Behav. 54, 1155-1166 (1997).
69. Perri A, et al. New evidence of the earliest domestic dogs in the Americas. Am.
Antiq. 84, 68-87 (2019).
70. Miao B, Wang Z, Li Y. Genomic analysis reveals hypoxia adaptation in the
Tibetan Mastiff by introgression of the gray wolf from the Tibetan Plateau.
Mol. Biol. Evol. 34, 734-743 (2017).
71. vonHoldt B, Fan Z, Ortega-Del Vecchyo D, Wayne RK. EPAS1 variants in
high altitude Tibetan wolves were selectively introgressed into highland dogs.
PeerJ 5, e3522 (2017).
72. Brown SK, Darwent CM, Wictum EJ, Sacks BN. Using multiple markers to
elucidate the ancient, historical and modern relationships among North
American Arctic dog breeds. Heredity 115, 488 (2015).
73. Posth C, et al. Reconstructing the deep population history of central and South
America. Cell 175, 1185-1197 e1122 (2018).
74. Moreno-Mayar JV, et al. Early human dispersals within the Americas. Science
362, eaav2621 (2018).
75. Ostrander EA, Dreger DL, Evans JM. Canine cancer genomics: lessons for
canine and human health. Annu. Rev. Anim. Biosci. 7, 449-472 (2019).
76. Tang B, et al. iDog: an integrated resource for domestic dogs and wild canids.
Nucleic Acids Res. 47, D793-D800 (2018).
77. Manichaikul A, et al. Robust relationship inference in genome-wide
association studies. Bioinformatics 26, 2867-2873 (2010).
78. Koepfli K-P, et al. Genome-wide evidence reveals that African and Eurasian
golden jackals are distinct species. Curr. Biol. 25, 2158-2165 (2015).
79. Morgulis A, et al. Database indexing for production MegaBLAST searches.
Bioinformatics 24, 1757-1764 (2008).
80. Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning
DNA sequences. J. Comput. Biol. 7, 203-214 (2000).
81. Li H. Aligning sequence reads, clone sequences and assembly contigs with
BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v1 (2013).
82. Li H, et al. The sequence Alignment/Map format and SAMtools.
Bioinformatics 25, 2078-2079 (2009).
83. McKenna A, et al. The Genome Analysis Toolkit: A MapReduce framework
for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297-
1303 (2010).
84. Li H. A statistical framework for SNP calling, mutation discovery, association
mapping and population genetical parameter estimation from sequencing data.
Bioinformatics 27, 2987-2993 (2011).
85. Danecek P, et al. The variant call format and VCFtools. Bioinformatics 27,
2156-2158 (2011).
86. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to
discover, genotype, and characterize typical and atypical CNVs from family
and population genome sequencing. Genome Res. 21, 974-984 (2011).
87. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics
analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870-1874 (2016).
88. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-
likelihood trees for large alignments. PLoS One 5, e9490 (2010).
89. Zheng X, et al. A high-performance computing toolset for relatedness and
principal component analysis of SNP data. Bioinformatics 28, 3326-3328
(2012).
90. Wickham H. (eds) ggplot2: elegant graphics for data analysis (Springer, 2016).
91. Kahle D, Wickham H. ggmap: spatial visualization with ggplot2. R Journal 5,
144-161 (2013).
92. Browning SR, Browning BL. Rapid and accurate haplotype phasing and
missing-data inference for whole-genome association studies by use of
localized haplotype clustering. Am. J. Hum. Genet. 81, 1084-1097 (2007).
93. Chang CC, et al. Second-generation PLINK: rising to the challenge of larger
and richer datasets. GigaScience 4, 7 (2015).
Supplementary Figures
Supplementary information, Fig. S1. Tumor appearance (KM2, a male dog).
Supplementary information, Fig. S2. Ploidy, cellularity estimation using
sequenza. The log posterior probability (LPP) of the observed data were calculated
for a range of candidate ploidy and cellularity values. The point estimate is the ploidy
and cellularity with maximum LPP. The 95% C.R. (Confidence Region) is the
smallest (not necessarily contiguous) set of points with a total posterior
probability >0.95. The background color indicates the rank of the LPP (blue = most
likely, white = least likely), provided here to contrast other possible parameters that
are very unlikely under our model but might still be of interest. Local maxima are
indicated with a “+” and indicate possible alternative solutions.
Supplementary information, Fig. S3. Unrooted neighbor-joining tree of 2.9M
genotype-polymorphic loci (allowing 2 missed samples) among 5 CTVTs. Node
labels indicate bootstrap values.
Supplementary information, Fig. S4. Mutation spectrum. Signatures barplot with
error bars reflecting the sample percentiles 0.05, 0.25, 0.75, and 0.95 for each entry.
Supplementary information, Fig. S5. Boxplot of Bayesian Information Criterion
values of signeR analysis, showing that the optimal number of signatures is 8.
Supplementary information, Fig. S6. Differential Exposure Analysis plot showing
group-specific exposed signatures. P-values were calculated by comparing each
signature exposures among sample groups with Kruskal-Wallis Rank Sum Test.
Benjamini & Hochberg (1995) method was the correction method for P-values adjust
at the post-hoc tests. GDJ, golden jackals; CYT, coyotes; CTVT intersected, CTVT
sites intersected with panel’s SNPs; CTVT private, CTVT private alleles
Supplementary information, Fig. S7. Box plots showing the significant
differences found of each signature when groups are compared against each
other. X axis labels represent eigen groups assigned according to the contribution of
this signature in different sample groups. Y axis represents mutation counts assigned
to each sample groups of this signature. GDJ, golden jackals; CYT, coyotes; PCD,
pre-contact dogs; DOG, all dogs except PCDs; WOLF, all wolves except Taimyr wolf.
Supplementary information, Fig. S8. A neighbor-joining tree based on whole
genomes of 104 individuals. The golden jackals are the outgroup. Node labels
indicate bootstrap values. GDJ, golden jackals; CYT, coyotes; NWW, New World
wolves; OWW, Old World wolves; TMR, Taimyr wolf; PCD, pre-contact dogs; ASD,
Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China dogs; IPD, India
Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD, African dogs;
MSD, mixed sled dogs; EUD, European dogs; NGD, Newgrange dog; HXH,
Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the CTVT founder.
Supplementary information, Fig. S9. An approximate maximum-likelihood tree
based on whole genomes of 99 individuals (excluding specific admixed East Asian
dogs). The golden jackals are the outgroup. Node labels indicate bootstrap values.
GDJ, golden jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; TMR, Taimyr wolf; PCD, pre-contact dogs; ASD, Arctic sled dogs; EAD,
East Asia dogs; NCD, Northern China dogs; IPD, India Peninsula dogs; MECAD,
Middle East and Central Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD,
European dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave
dog; CTVT_F, the CTVT founder.
Supplementary information, Fig. S10. A neighbor-joining tree based on whole
genomes of 99 individuals (excluding specific East Asian admixed dogs). The
golden jackals are the outgroup. Node labels indicate bootstrap values. GDJ, golden
jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; TMR,
Taimyr wolf; PCD, pre-contact dogs; ASD, Arctic sled dogs; EAD, East Asia dogs;
NCD, Northern China dogs; IPD, India Peninsula dogs; MECAD, Middle East and
Central Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD, European dogs;
NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F,
the CTVT founder.
Supplementary information, Fig. S11. An approximate maximum-likelihood tree
based on whole genomes of 96 individuals (excluding PCDs and the CTVT
founder). The golden jackals are the outgroup. Node labels indicate bootstrap values.
GDJ, golden jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; TMR, Taimyr wolf; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD,
Northern China dogs; IPD, India Peninsula dogs; MECAD, Middle East and Central
Asia dogs; AFD, African dogs; MSD, mixed sled dogs; EUD, European dogs; NGD,
Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the
CTVT founder.
Supplementary information, Fig. S12. A neighbor-joining tree based on whole
genomes of 96 individuals (excluding PCDs and the CTVT founder). The golden
jackals are the outgroup. East Asian dogs are basal to all the dogs consistently with
recently published work. Node labels indicate bootstrap values. GDJ, golden jackals;
CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; TMR, Taimyr
wolf; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China dogs; IPD,
India Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD, African
dogs; MSD, mixed sled dogs; EUD, European dogs; NGD, Newgrange dog; HXH,
Herxheim dog; CTC, Cherry Tree Cave dog; CTVT_F, the CTVT founder.
Supplementary information, Fig. S13. Principal components analysis of 102
individuals (excluding golden jackals). Specially, we colored each PCD to
distinguish them. CYT, coyotes; WOLF, worldwide wolves; TMR, Taimyr wolf;
DOG, all dogs except PCDs; PCD, pre-contact dogs; CTVT_F, the CTVT founder.
a
b
Supplementary information, Fig. S14. a Population structure between the CTVT
founder, ancient and contemporary canids. ADMIXTURE clustering for K=2 to K=7
on pruned sites excluding golden jackals. Vertical lines represent individuals. Colors
indicate different ancestral components. Minimum coefficient of variation is when
K=2. CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; PCD, pre-
contact dogs; ASD, Arctic sled dogs; EAD, East Asia dogs; NCD, Northern China
dogs; IPD, India Peninsula dogs; MECAD, Middle East and Central Asia dogs; AFD,
African dogs; MSD, mixed sled dogs; EUD, European dogs; CTC, Cherry Tree Cave
dog; CTVT_F, the CTVT founder. b Cross-validation plot for the ADMIXTURE
analyses. K ranges from 2 to 7.
Supplementary information, Fig. S15. D(CTVT founder, Pop2; Pop3, Andean Fox),
with the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. GDJ, golden
jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; DOG,
all dogs except PCDs; PCD, pre-contact dogs. Detail of Pop3’s label is recorded in
Supplementary information, Table S1.
Supplementary information, Fig. S16. D(Pop1, Pop2; Coyote, Andean Fox), with
the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. NWW, New
World wolves; OWW, Old World wolves; DOG, all dogs except PCDs; PCD, pre-
contact dogs; CTVT_F, the CTVT founder. Detail of Pop3’s label is recorded in
Supplementary information, Table S1.
Supplementary information, Fig. S17. D(Pop1, Pop2; CTVT founder, Andean Fox),
with the Z-score given on the x axis. Dashed lines indicate ±3 of Z score. GDJ, golden
jackals; CYT, coyotes; NWW, New World wolves; OWW, Old World wolves; DOG,
all dogs except PCDs; PCD, pre-contact dogs. Detail of Pop3’s label is recorded in
Supplementary information, Table S1.
a
b
Supplementary information, Fig. S18. a TreeMix graph without migration edge. b
Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,
Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern
China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle
East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European
dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;
CTVT_F, the CTVT founder.
a
b
Supplementary information, Fig. S19. a TreeMix graph with one migration edge. b
Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,
Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern
China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle
East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European
dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;
CTVT_F, the CTVT founder.
a
b
Supplementary information, Fig. S20. a TreeMix graph with two migration edge. b
Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,
Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern
China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle
East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European
dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;
CTVT_F, the CTVT founder.
a
b
Supplementary information, Fig. S21. a TreeMix graph with three migration edge. b
Matrix of residuals. CYT, coyotes; NWW, New World wolves; OWW, Old World
wolves; EAD, East Asia dogs; PCD, pre-contact dogs; SIH, Siberian Husky; ALH,
Alaskan Husky; ALM, Alaskan Malamute; GRD, Greenland dog; NCD, Northern
China dogs; ESL, East Siberian Laika; IPD, India Peninsula dogs; MECAD, Middle
East and Central Asia dogs; AFD, African dogs; SAM, Samoyed; EUD, European
dogs; NGD, Newgrange dog; HXH, Herxheim dog; CTC, Cherry Tree Cave dog;
CTVT_F, the CTVT founder.
Recommended