30 - Guedes Et Al. (2024) - Macroecological Correlates of Darwinian Shortfalls
30 - Guedes Et Al. (2024) - Macroecological Correlates of Darwinian Shortfalls
1. Introduction
Author for correspondence:
In a rapidly changing world where biodiversity is being lost at unparal-
Jhonny J. M. Guedes leled rates [1], biodiversity knowledge shortfalls [2] can hinder the effec-
e-mail: [email protected] tive implementation of data-driven conservation strategies [3]. For instance,
only a small fraction (about 20%) of all extant species on Earth have been
named [4,5], and for most described species the available information is
incomplete and biased towards charismatic or easily accessible organisms [6–
8]. These knowledge shortfalls, either individually or through their interac-
tions [3,9], beset our ability to better understand biodiversity patterns and
their underlying processes. For example, mechanisms like mutation, gene
flow, natural selection and genetic drift have shaped the genetic inheritance
that species share. Hence, comparative analyses must consider the shared
evolutionary history among taxa since species are not independent of each
other [10]. But for most species, their placement in the tree of life is largely
unknown—a problem named the Darwinian shortfall [11]—especially owing
to a lack of comprehensive molecular and/or morphological data [12]. This
biodiversity shortfall hampers the explicit incorporation of evolution in
large-scale conservation and biodiversity analyses, which could lead to biased
results and impair conservation practices [11,13,14].
Electronic supplementary material is available
online at https://doi.org/10.6084/
m9.figshare.c.7358195.
© 2024 The Author(s). Published by the Royal Society. All rights reserved.
The Darwinian shortfall can be divided into three main components: (i) the lack of data on phylogenies, that is on basic 2
relationships among species, (ii) high uncertainties in divergence time estimates owing to a lack of knowledge about branch
lengths and temporal calibration of diversification events, and (iii) a paucity of models of trait evolution [11]. The latter two
2. Methods
(a) Phylogenetic assessment and species-level covariates
We first identified which tetrapod species had their evolutionary relationships imputed in fully sampled phylogenies available
for the group [16–20], thus creating a binary response variable for 33 281 species included in such phylogenies. We excluded 228
marine species (according to the International Union for Conservation of Nature (IUCN)), keeping only terrestrial vertebrates
(n = 33 053) in subsequent analyses. We then selected eight putative predictors to investigate how species’ biology, geography,
biodiversity appeal and socioeconomic-related factors affect the probability of species being imputed on phylogenies. Many
predictors used here come from the TetrapodTraits, a recent database for the world’s tetrapod including standardized spe-
cies-level attributes [48]. Here, we outline the attributes used as predictors, along with brief explanations of how they were
computed.
The year of species description was based on the original description publication date and was available for all but one
species (Indotyphlops pushpakumara, an undescribed species included in Tonini’s phylogeny [19]). For body size, we used body
mass information for birds, mammals and reptiles as data coverage was on average higher than 95% (n = 24 761 out of 25 815
species). Conversely, body mass information was available for only 21.4% (n = 1547) amphibian species, and therefore, we used
body length, which represented 97.9% (n = 7083) species. Information on microhabitat use was available for 31 501 (95.3%)
species and it was converted into a continuous metric of verticality [49], with species being scored as 0 = strictly fossorial, 0.25
= fossorial and aquatic/terrestrial or fossorial and aquatic and terrestrial; 0.5 = aquatic/terrestrial, or fossorial and arboreal or
fossorial and aquatic/terrestrial and arboreal; 0.75 = terrestrial/aquatic and arboreal, or terrestrial and aquatic and arboreal or
terrestrial and aerial and 1 = strictly arboreal or aerial. Species-specific sources on body size and microhabitat are available in
[48].
Some TetrapodTraits attributes were derived as within-range predictors based on expert-based range maps for amphibians
[50–52], birds [17,52], mammals [51–53] and reptiles [16,51,54] overlaid onto a 110 × 110 km cylindrical equal-area grid cell
scheme. This grain size minimizes false presences related to the use of expert range maps [55]. Species range size was represen-
ted as the number of 110 × 110 km grid cells overlapped by range maps. Two other attributes corresponded to within-range
averages of raster layers aggregated to the resolution of the grid cell scheme, namely elevation, which was based on a 1 km
topographic layer [56] and roughly captures broad elevational patterns in a continuous format, without the discretization of
species into lowland versus highland areas and human density (as inhabitants per km2 in year 2017), derived from the HYDE
3.2 database at a spatial resolution of 5 arc-min [57]. Additionally, we obtained endemism richness, which represents a proxy for
range rarity. This metric was computed as the sum of the inverse range sizes of all species per taxonomic class in a grid cell [58],
which we then used to calculate the median endemism richness for each species based on the cells they occur in.
We extracted the number of preserved specimens per species deposited in biological collections worldwide using the
function occ_count in the rgbif R package [59]. For that, we created search queries containing valid species name plus their
unique synonyms (i.e. invalid names that can be traced back to a single valid name) and set the basisOfRecord argument to the
preserved specimen. Synonym information was obtained from the IUCN taxonomy backbone using the rl_synonyms function in
3
Box 1. Potential drivers that may affect the probability of a species being imputed in available phylogenies.
Species biology
Biological attributes can affect species’ detectability and researchability, as well as sampling logistics, which can increase the
chances of a species being explicitly included in phylogenies.
— Body size. Larger animals are easier to detect, collect and study [23–25]. They also tend to attract more scientific,
societal and governmental attention [26–28]. We expect that larger animals will be less likely to be phylogenetically
imputed than small-bodied organisms.
— Verticality. Terrestrial and aquatic species are more easily found and collected in their microhabitats than fossorial,
arboreal or aerial species using standard sampling techniques [29,30]. For instance, fossorial reptiles are usually
underrepresented in scientific collections [31], which influences the relatively low research effort directed towards
this group [27]. As limited sample sizes may limit the chances of a species being phylogenetically evaluated, we
anticipate an inverse hump-shaped relationship between verticality and the probability of a species being imputed on
phylogenies, i.e. higher likelihood for species positioned at the extremes of the verticality spectrum.
— Number of preserved specimens. Several millions of preserved specimens are currently housed in scientific collections
[32,33], with the number of preserved specimens potentially mirroring species abundance in nature [34]. These
specimens form the basis for most subsequent scientific investigations [35,36]. We hypothesize that species better
represented in scientific collections will be less likely to be phylogenetically imputed.
Socioeconomics
Socioeconomic factors can influence phylogenetic research through the availability of either researchers or the infrastruc-
ture needed for conducting studies, as well as through effects on sampling logistics and financial constraints.
— Human density. Species occurring in highly populated areas may have increased detectability as these areas are more
accessible to researchers, who usually work under many logistics and financial constraints, particularly in developing
countries [41]. Hence, remote and usually less populated areas—and the species therein—are visited less frequently
by researchers [42,43]. We expect that species living close to densely populated areas will have a lower probability of
being phylogenetically imputed.
— Year of description. Early described species usually receive more scientific attention than either recently described taxa
or those yet to be named [8,27,44]. Therefore, time is clearly needed for accumulating scientific knowledge about the
described diversity [45], contributing to their proper inclusion in the tree of life. However, the process of collecting
tissue samples is common in recent collections. In contrast, older specimens, especially from rare species, often lack
suitable genetic material owing to historical and methodological constraints or storage in media that no longer permit
or hinder extraction. Thus, the year of description can influence the accumulation of phylogenetic knowledge in
contrasting ways.
the rredlist R package [60]. Data on these—and the spatially based—predictors were available for more than 99.9% of species.
After excluding species with missing data in at least one predictor, our final dataset included 30 321 species: 6526 amphibians,
9369 birds, 5052 mammals and 9374 reptiles.
Proportion of phylogenetically
22 0
45
1
2
5
2
323
5
3
1
4
9
1
6
1
6
3
8
1
1
4
3
53
77
13
42
37
24
60
11
48
15
11
52
12
19
44
12
15
13
16
16
11
21
49
16
97
17
11
21
41
16
23
12
27
20
28
32
14
33
28
42
12
86
14
1.0 1.0
53
47
30
10
19
9
4
4
9
4
7
imputed species
0.5 0.5
0.0 0.0
ra hy sot e
Ty batr hii ae
Cy H nec dae
Ce ora odidae
ut mp ae
C an e
e
Rh lma opi e
in tob dae
yc m ae
Cr tra eni ae
St uga hyli ae
O Cer om rid e
e
A ype ynid e
ro liid e
Le icr epti ae
o y e
H erm tyl ae
eo ph ae
eg yn ae
M B hry ae
Rh batr nid e
ac ac ae
ho ae
Ra Hyl ae
xa e
ae
ph ac dae
do at an ae
en pi ae
Cy Mic sau dae
Le dro apid e
io ph ae
A U pha dae
T isb lti e
Le rop aen dae
ot p ae
X An lop ae
od ie ae
A atid e
e
om am e
Co ops ae
no ph brid e
Tr phth opi ae
on m e
hi e
ha E eii ae
La oda pid e
m cty ae
e
bl ro ae
ha ae
Le Sci idae
ol ri e
Ph A em ae
lo m e
ct ae
ae
Ce lcht mi iida
m da
Si hik tida
a
ra st da
op hy a
H hyr rida
rth ro a
pt oh da
yo uf ida
ni ida
ph pe da
m a
ng a
H Dib uida
ym Ty lu da
og al da
op ida
er la da
Po rop lida
Li sau ida
yl ga ida
to op id
cl yl tid
Te hon ilid
Pt tre iid
Ba had atid
a c d
nt op tid
D dac lid
el o id
M phr iid
op id
op hid
rid
lid
X rho did
pt ido id
h id
en n id
er llid
al id
T id
Eu ych hiid
ep tid
a d
da id
id
lo hi
ho hi
o li
ce ii
m ro l i
i
yp hi
r
io nc
yl
H ecil
b o
er i
G alep
l
M l
Ca
p
e
m
a
l
no
A
Sp
G
(c) (d)
Proportion of phylogenetically
2
5
3
26
16
13
44
32
21
94
67
25
10
36
58
19
11
11
50
14
17
23
37
39
28
35
17
17
32
22
10
10
11
33
43
13
98
51
88
11
15
14
45
11
82
27
62
10
57
1.0 1.0
7
8
7
5
5
3
4
9
9
4
imputed species
0.5 0.5
0.0 0.0
Ci Ind lbu dae
O loso tor ae
nt at ae
ho ae
lo ng ae
eg se e
Ci in ae
co ae
ro al ae
ra ae
oe am e
Re culi ae
i e
V loce ae
N Ap nidae
ec od e
er ii e
Co T clid ae
po ic ae
a e
Pa peti e
Cr sser ae
A ticid e
C luad ae
rm b e
M cari ae
il ot e
M epi idae
ill e
ae
lo uri e
N ysci e
D roc rida e
Ta Pit octi ae
ch he dae
Te ossi ae
as ec e
M pod dae
Ph M ssid e
al ur ae
Ct T eri ae
od sii e
Pr acty dae
Tr avii ae
S ul e
N ori idae
T y e
pr aii e
H yi e
D tri e
t d e
Le erop rgid e
V pile odi ae
pe u e
M Ca tili idae
ec ch ae
ha ae
te e
ae
as om e
A rop ida
Ph Tin cida
m da
o a
Pt arin ida
ph ida
a
ac ida
Fo olum ida
Ph om ida
ac a
Ca mal rida
m da
A yc da
D nr da
o a
en ar da
ag da
om da
Ca up ida
om da
a
Ba ipo cida
Pt hye ida
es m da
A ida
nc ica lid
do m id
op id
Ch Va rid
ith id
sti id
ac R lid
co lid
ni id
P zid
ire id
o d
no urn id
Eu gid
d
i id
ot ttid
id
r d
l d
ol id
an id
ys d
m tri id
op id
lid
a i
yp i
yg cii
y i
oc li
es ci
G con
r r
yr lli on
g
i
b e
no Gl
g
c
t
Bu
A
al
Ph
Figure 1. Top 30 families with knowledge gaps on species evolutionary relationships. Per family proportion of phylogenetically imputed species across (a) amphibians,
(b) reptiles, (c) birds and (d) mammals. The number above the bars indicates total species richness per family. Silhouette illustrations reproduced from phylopic.org.
[62]. We included the taxonomic family as a random variable in our models to minimize dependence issues among species.
Families with less than three species (electronic supplementary material, table S1) were removed to reduce instability in model
estimates [63], which decreased the number of species to be modelled to 30 178.
We evaluated whether phylogenetic regression models were needed by examining the phylogenetic autocorrelation of
GLMM residuals through Moran’s I correlograms computed across 14 distance classes [10]. The phylogenetic correlograms were
based on averaged results from 50 fully sampled phylogenies for each taxonomic class [16–20]. For reptiles, we first constructed
supertrees by combining Tonini’s and Colston’s phylogenies [16,19] using the function tree.merger in the R package RRphylo [64],
which preserves branch length information in the combined trees [65]. For the global models, we used only five trees owing
to computational limitations. Analyses of phylogenetic autocorrelation were performed using the R packages phylobase [66] and
phylosignal [67].
Prior to constructing the GLMMs, continuous predictors were log10 transformed if skewness or kurtosis were outside the
range of −2 and +2 [68], then centred and scaled (z-transformed) to allow direct comparisons of their effect sizes. We checked
for multicollinearity among predictors using the variation inflation factors (VIF), where strong multicollinearity is usually
attributed to VIFs > 5, indicating that variables should be removed from the analysis [69]. Since none of our continuous variables
had VIF > 4, we kept them all in subsequent analyses (electronic supplementary material, table S2).
We modelled our binary response variable separately for amphibians, birds, mammals and reptiles. Models were constructed
globally as well as separately for each biogeographic realm [70], except for Oceania owing to its low sample size (electronic
supplementary material, table S3). Species whose range overlapped realms by >70% were assigned to realm-scale models. We
inspected the model fit using the R package DHARMa [71] and assessed the explained variation by calculating the pseudo-R2
with the R package performance [72]. We used the package lme4 [73] for fitting the mixed-effect models and usdm [74] for
computing VIF values. All analyses were performed using R version 4.3.1 [75]. See data accessibility for raw data and R-code.
Finally, we computed a measure of ‘Darwinian deficit’ [76] per family and class to quantify the relative contribution of
phylogenetically imputed relationships in representing the accumulated evolutionary history across taxa. This measure is
based on Faith’s phylogenetic diversity (PD) metric [77] and was calculated as PD imputed species/(PD imputed species + PD
non-imputed species). The Darwinian deficit ranges from 0 to 1 and informs the proportion of total PD (i.e. the sum of branch
lengths) that is attributed to imputed species in a sample (e.g. family). We obtained the Darwinian deficit per family across 100
fully sampled phylogenies for each taxonomic class, using only families with at least one imputed species. We then inspected
whether average values per family were influenced by the respective species richness.
3. Results
The proportion of phylogenetically imputed species was highly variable across taxa (figure 1 and electronic supplementary
material, table S4). Some suborders (sensu reference [25]), such as Crocodylia, Perissodactyla and Sphenisciformes already have
all of their described species explicitly incorporated into the most recent fully sampled phylogenies available. Conversely,
coverage is still relatively low for many taxa, such as Gymnophiona (n = 127 imputed species out of 190, or 66%), Tinamiformes
(n = 27 out of 43, or 62.8%), Dibamoidea (n = 13 out of 21, or 61.9%) and Monotremata (n = 2 out of 4, or 50%). Across
5
Amphibians
200
Reptiles
Density
100
50
0
0.36 0.39 0.42
Darwinian deficit
Figure 2. Density plots show the distribution of ‘Darwinian deficits’ computed across 100 phylogenetic trees for each taxonomic class. This metric ranges from 0 to 1
and informs the proportion of total phylogenetic diversity that is attributed to imputed species. Vertical dashed lines show mean values.
Body size
Verticality2
N. preserved
specimens
Range size
Elevation
Endemism
richness
Human
density
Year of
description
–1 0 1 –2 –1 0 –4 0 4 –1 0 –2 –1 0 1
.0
.5
.5
.0
.5
0
5
0.
0.
0.
–1
–0
–1
–1
–0
tetrapod classes, the proportion of phylogenetically imputed species per family was unrelated to species richness (electronic
supplementary material, figure S1). Furthermore, imputed species contributed to approximately 35% of the total phylogenetic
diversity for birds and mammals, 41% for reptiles and 43% for amphibians (figure 2). At the family level, the Darwinian deficit
was not influenced by species richness (electronic supplementary material, table S5 and figure S2).
Global models explained between 47.6 and 53.4% of the variation in the probability of species being imputed in fully
sampled phylogenies, but this number varied from 43% to 93.4% in realm-specific analyses (electronic supplementary material,
table S6). GLMMs were mostly robust in terms of fit, and their residuals did not show phylogenetic autocorrelation (electronic
supplementary material, figures S3–S7). Across all tetrapod classes, the number of preserved specimens emerged as the most
important predictor, which had a negative effect on the chances of a species being phylogenetically imputed (figure 3 and
electronic supplementary material, figure S8). Large-bodied and wide-ranged species showed lower chances of being phyloge-
netically imputed across most class-realm combinations. Verticality (in the hump-shaped form) and year of description had a
positive relationship with our response variable globally, while the effects of elevation, human density and endemism richness
affected different taxonomic classes in contrasting ways (figure 3). The effect sizes of other predictors were also important, but
their direction and magnitude varied across tetrapod classes and realms (figure 3).
4. Discussion 6
Despite recent advances in the field of phylogenetics, major gaps and biases still remain in the tree of life [13,15], with about
Ethics. This work did not require ethical approval from a human subject or animal welfare committee.
Data accessibility. Raw data and R-code to replicate the findings of this study are available at Dryad Digital Repository [95].
Supplementary material is available online [96].
Declaration of AI use. We have not used AI-assisted technologies in creating this article.
Authors’ contributions. J.J.M.G.: conceptualization, formal analysis, investigation, methodology, writing—original draft, writing—review and editing;
J.A.F.D.-F.: supervision, writing—review and editing; M.R.M.: supervision, writing—review and editing, visualization, resources.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration. We declare we have no competing interests.
Funding. This work was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) (finance code 001), as a
PhD scholarship provided to J.J.M.G. (proc. 88887.478942/2020-00); CNPq productivity fellowship and the National Institutes for Science and
Technology (INCT) in Ecology, Evolution and Biodiversity Conservation (grant numbers 465610/2014-5); Goiás Research Foundation (FAPEG)
(grant no. 201810267000023); and São Paulo Research Foundation (FAPESP) (grant nos. 2021/11840-6 and 2022/12231-6).
Acknowledgements. The authors are grateful to Gabriel Nakamura, Juliana Stropp, Luis M. Bini and Leandro Duarte for their comments on an
earlier version of this manuscript. J.J.M.G. thanks the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) for a
PhD scholarship. Work by J.A.F.D.-F. is supported by Goiás Research Foundation (FAPEG) and CNPq productivity fellowships, and has been
developed associated with the National Institutes for Science and Technology (INCT) in Ecology, Evolution and Biodiversity Conservation.
M.R.M. acknowledges support from São Paulo Research Foundation (FAPESP).
References 7