1 Introduction

In a world dominated by data, where the hegemonic male norm serves as the implicit benchmark, the absence of data on women results in their invisibility (Criado, 2019). Monitoring has been strongly promoted by international organisations, such as the United Nations and the European Union, as a means to advance gender equality. Its main motivations have been to assess programme outcomes, generate comparable national data over time, and ensure that gender is not marginalised in decision-making (Walby, 2005).

However, the proliferation of indicators and monitoring practices has not been free from criticism. Feminist scholars have expressed concern about the tendency to generate abstract data that fails to account for lived experiences or contextual realities (Bear et al., 2015). Moreover, the EU governance frameworks embedding gender monitoring and equality policies have been questioned for their competitiveness-driven and market-oriented logics (Bruno et al., 2006; Pollack & Hafner-Burton, 2000). The prioritisation of strategy instruments such as monitoring, over a deeper engagement with the diverse meanings and complexities of equality, has led to a lack of problem definition and limited policy impact (Lombardo & Meier, 2006; Pollack & Hafner-Burton, 2000; Lombardo et al., 2009; McNutt & Béland, 2015; Minto & Mergaert, 2018; Otero-Hermida & Lorenzo, 2019; Woodward, 2003).

While monitoring can make inequalities visible and support policy interventions, it is also shaped by broader political contexts and power structures, raising doubts about its potential to drive transformative change (Connell, 2006; Squires, 2007). Yet monitoring itself is subject to contestation, and its evolution has been made possible by the entry of women into scientific fields, thereby incorporating, at least to some extent, their perspectives (Walby, 2005; Walby & Armstrong, 2010).

In response to these challenges, this article examines the evolution of gender monitoring in science and innovation through the European Commission’s flagship She Figures series, from its inception in 2003 to the 2021 edition. Specifically, we ask: How has gender monitoring in science and innovation evolved through She Figures, and to what extent does this evolution open or close debates on gender equality, facilitating or hindering the transformation of research and innovation policy?

To address this overarching question, we pose a series of specific inquiries: What conceptual and methodological changes have occurred in She Figures between 2003 and 2021? What are the justifications (why we monitor) and purposes (intended uses) of monitoring across editions, and how have these evolved? Which measurement dimensions have been prioritised, and which have been omitted? To what extent does She Figures incorporate equity dimensions beyond numerical representation? How are issues such as intersectionality, researcher mobility, and gendered innovation framed, and what biases persist?

We analyse and discuss these developments through equality policy literature and insights from gender-sensitive and responsible metrics in science and innovation.

Behind our analysis lies a key concern: Are we achieving a form of monitoring that overcomes the entrenched logic of invisibilisation and provides actionable knowledge for researchers, policymakers, and organizations seeking to improve equality practices? Beyond charting the historical evolution of She Figures, this study aims to identify its limitations and suggest how gender monitoring can better support transformative equality policies.

Our results show significant progress in She Figures, but also reveal persistent gaps in the ways gendered inequalities are conceptualised, measured, and addressed. Central feminist concerns—such as work-life balance, organisational cultures, and structural barriers—remain underexplored or absent. We highlight emerging trends that move beyond the mere presence of women to address deeper structural dynamics, and offer proposals to confront the remaining challenges.

To this end, we first trace the evolution of equality policies and their relationship with monitoring. Second, we review key gender issues in the scientific and innovation domains, alongside debates on gendered and responsible metrics, which underpin our analytical framework and methods. We then present our findings and discussion, followed by a section proposing concrete improvements to monitoring, and finally, our conclusions.

2 Equality Policies Evolution and Monitoring

This section introduces the evolution of equality policies and how this trajectory shapes monitoring, informing our analysis of gender monitoring in European science and innovation.

Equality policies have evolved alongside different waves of feminism, adapting their objectives and problem definitions to shifting political contexts. As a result, gender regimes have been reconfigured over time (Sümer, 2016; Walby, 2020). These policies can be seen as clusters that reflect competing visions of how gender issues should be addressed (Verloo, 2007).

Feminist theory has long debated sameness versus difference, and systemic versus individualistic approaches to equality. Rees (1998) proposes an evolutionary framework for the European context: beginning with a superficial emphasis on equal opportunities and individual rights (tinkering, 1970 s), moving towards positive actions like quotas (tailoring, 1980 s), and culminating in systemic transformation (transforming, 1990 s), closely tied to gender mainstreaming strategies. National spheres would replicate this evolution, as in the British case (Breitenbach et al., 2002), while in other such as the Spanish case, each stage would begin a decade later, motivated by late democratisation and incorporation into the EU (Otero-Hermida & Lorenzo, 20182019). Scholars and the European Commission (COM 96 67; Booth & Bennett, 2002) argue that all three strategies remain relevant and complementary.

Since the 2010 s, newer theoretical approaches have gained visibility, though not always in policy. Constructivist and new materialist perspectives examine how gender discourses and emotions are mobilised in political processes (Kantola & Lombardo, 2017), offering deeper insights into power and agency. Realist and post-humanist thinkers like Barad (2003, 2007), Bennett (2010), and Latour (1999) challenge individualistic frameworks, emphasising collective agency, ontological relationality, and the co-production of knowledge. These foundations invite new ways of framing (in)equality and recognising how socio-material assemblages shape experience and exclusion. However, such perspectives remain largely absent from mainstream political analysis (Kantola & Lombardo, 2017).

Intersectionality, which addresses how gender interacts with variables like race, age, and ability (Crenshaw, 1989), has gained some traction in EU policy (Lombardo & Verloo, 2010, 2017; COM/2020/152 final), but its application remains limited (Alonso et al., 2012). Earlier strategies continue to dominate, often reducing equality to women attaining male standards, while leaving patriarchal values unchallenged (Squires, 2013; Walby, 2005).

Squires (2007) identifies three different focuses for equality policies that might be included in the different strategies presented before, thus adding a powerful analytical lens for studying gender in policies: presence (increasing women's numbers), voice (improve the articulation of women's specific concerns and transformative capacities) and process (systematic approach to equality in policymaking). These dimensions help us assess She Figures, as will be shown later.

Drawing on Fraser (2006) and Sandercock (2002), Squires critiques the technocratic turn in equality policies, where voice has been marginalised. In neoliberal contexts, feminist practices have been absorbed and repurposed through utility-based rationales, accompanied by technocratic processes in their promotion. Thus, presence and process would be the indicators of parity of participation, while what has been lost is the emphasis on participatory democracy, as well as the feminist action of"extending the boundaries", transforming the policies themselves, what is valued, what is not, their reach and scope, that opens up other worlds linked to voice (Squires, 2007).

Squires (2007) precisely uses the case of monitoring to illustrate this loss. Indicators allow for quantification but tend to foreclose debates on the conceptualisation of equality itself, where the transformative capacities rely. Connell (2006) adds that Western policies often reify binary categories, equating gender with rigid statistical differences between men and women. This approach obscures the dynamic and relational nature of gender (West & Zimmerman, 1987), and overlooks how organisational cultures reproduce inequalities as they assign value to the masculine-related practices, capabilities, or norms by presenting them as neutral, while dismissing feminine-related ones (Acker, 1990; Burton, 2005). For Connell, addressing gendered structures requires attention to power, labour division, and affective dimensions within institutions.

Recent theoretical contributions continue to illuminate these issues, revealing how gendered power emerges from what is valued or excluded in organisations and policy processes—including how emotions are framed as irrational or invisible. These perspectives, as said, remain peripheral in mainstream policy and are largely absent from science and innovation monitoring.

3 Gender (in)Equality in Science and Innovation and Monitoring

Having introduced the broader equality policy debate, we now examine how it plays out in the field of science and innovation. Inequality in academia is often framed through metaphors such as the leaky pipeline, which describes the progressive disappearance of women as academic careers advance, resulting in their underrepresentation in senior positions. This phenomenon spans both feminised disciplines, such as the humanities, and masculinised ones like engineering or physics. It is often portrayed as passive attrition, neglecting the underlying reasons (Grogan, 2019). Factors such as career dropouts and slower advancement, linked to the glass ceiling, help explain disparities in research output and citation impact (Huang et al., 2020). In the US, 40% of women versus 23% of men leave academia after their first child (Cech & Blair-Loy, 2019). In Europe, international mobility, limited social capital, high productivity demands, and caregiving responsibilities are frequently cited as reasons for leaving academia (Nielsen, 2017).

A central issue in the debate is how science and academic excellence are conceptualised and operationalised. Access, promotion, and funding processes, framed as meritocratic and neutral, often reproduce gender biases (Blickenstaff, 2005a, b; Samper-Gras, 2022; Samper-Gras et al., 2021). In mathematics, for example, women's representation has declined significantly since 2006—despite near parity in the 1980 s, according to data from 1985 to 2020—paralleling growing competitiveness in the field (Samper-Gras, 2022). Disciplinary and institutional dynamics are crucial here. For instance, in medicine, informal recruitment practices mean women are less likely to be invited to apply for professorships (Van den Brink & Benschop, 2012).

Although European discourse increasingly promotes responsible science and open science- related to public participation in science and public data access- normative aspects have little relevance to academic excellence (Forsberg et al., 2018). This raises the critical question: Can science that excludes a gender perspective truly be considered good science? Excluding half the population and their perspectives risks perpetuating biased frameworks.

The European policies are increasingly linking excellence and equality within Horizon Europe’s framework as a cross-cutting principle and eligibility criterion. Institutions must have monitored and resourced equality plans. Evaluation criteria also assess gender balance in research teams and the integration of gender into research questions, data, theory, and results (EU 2021/695).

Innovation policy, however, presents added complexity. Women remain underrepresented in technology, entrepreneurship, and patents, holding only one inventorship for every ten held by men (EU She Figures, 2021). This limits the diversity of solutions and ideas that address the needs of underrepresented groups. Moreover, diversity-driven innovation is widely acknowledged to enhance creativity (Tannenbaum et al., 2019). Yet, the innovation sector—especially private industry—often lacks active equality policies (Karaulova et al., 2023), and gender-disaggregated data is scarce in core monitoring systems (Alsos et al., 2013; Incorvaia et al., 2023; Otero-Hermida et al., 2020).

Academic literature highlights several limitations in current gender indicators in science and innovation. A dominant concern is the overreliance on quantitative metrics focused on representation, which fail to capture structural inequalities or institutional dynamics (Bordons et al., 2003; Nygaard et al., 2022). For instance, productivity indicators often undervalue women’s contributions due to disciplinary norms, collaboration styles, and co-authorship practices (Nygaard et al., 2022).

A second concern is the lack of qualitative and contextual indicators. Scholars have called for moving beyond headcounts to include metrics on academic culture, discrimination, and caregiving responsibilities (Hernández et al., 2023; Otero-Hermida & García-Melón, 2018).

Another point of contention is the opacity and rigidity of indicator design processes. Existing systems often reflect established power relations and fail to involve women scientists or gender experts in the co-construction of indicators (Huyer & Westholm, 2007). Furthermore, it remains unclear whether these metrics are actually intended to evaluate the effective implementation of gender equality policies, or to support processes of learning and policy adaptation, or whether they serve another purposes (Otero-Hermida & García-Melón, 2018).

To address these issues, authors propose participatory approaches to indicator development (Otero-Hermida & García-Melón, 2018), inclusion of qualitative indicators such as perceptions of inclusion (Hernández et al., 2023), and monitoring that captures how overlapping inequalities affect participation-related to the national authorities to identify the micro-mechanisms that operate in the social, economic and scientific contexts, avoiding programmes that not address these diversity (Sugimoto et al., 2015; Larivière et al., 2015). A shift from individual-level outputs to institutional change processes is also recommended to enhance the transformative potential of gender monitoring (Huyer & Westholm, 2007; Nygaard et al., 2022).

We connect this field to debates on responsible metrics to explore how metrics shape not only scientific careers, but also the lives of those affected by scientific practices (Wilsdon, 2015). The ultimate justification (why we monitor) and purpose (what for, such as policy improvement or benchmarquing organizations or countries for distributing funds) influence the choice of frameworks and indicators, and their clarity is essential to the debate (Ràfols, 2018). Justifications such as efficiency versus social justice generate different monitoring trajectories (Molas-Gallart & Castro-Martínez, 2007; Ràfols, 2018). Likewise, whether the purpose is policy improvement or benchmarking changes how we conceptualise innovation.

For example, gender-sensitive innovation monitoring driven by equality as justification—rather than EU performance comparisons—shifts the focus from inputs and outputs to the people who innovate and their contexts. This also broadens the concept of innovation to include social and public innovation, not just technological advances in the market (Otero-Hermida, 2022; Otero-Hermida et al., 2020).

A clear definition of equality dimensions is crucial to selecting and assessing relevant indicators (Otero-Hermida & García-Melón, 2018; Walby et al., 2008). For instance, health, participation, safe-environments, collaborativeness of the environment, career promotion, working-life balance or other. They also raise the issue of data needs—indicators that are conceptually necessary even if statistical data does not yet exist (Otero-Hermida & García-Melón, 2018; Walby et al., 2008).

The dimensions change if the purposes and justifications change, as we have indicated with gender monitoring in innovation policies (Otero-Hermida, 2022). Both feminist literature and responsible metrics stress the importance of contextualisation. Indicators do not have meaning but receive it in institutional practices (Gläser & Laudel, 2007; Leydesdorff et al., 2016). We must be cautious with the indicators that are inevitably used as a proxy, making their translations explicit, locating in the measure what the data are for, their sources and limitations (Ràfols, 2018; Saltelli & Di Fiore, 2020; Wilsdon, 2015; Wilsdon et al., 2017).

Finally, the responsible metrics literature urges us to ask whether monitoring opens or closes debates (Ràfols, 2018; Stirling, 2008). This question is central to equality policies that aspire to achieve transformative change, as we have discussed before.

4 Research Frame and Methods

Building on the insights and demands outlined in the previous sections, this study empirically analyses the European Commission’s She Figures monitoring reports on science and innovation. Generally, our hypothesis or expectation is that gender monitoring has evolved, producing more detailed metrics that go beyond merely counting how many women are in the area, moving towards more nuanced understandings of what the gendered problematics are.

Our main research questions, which guide our analytical focus, are the following: 1.How has gender monitoring in science and innovation evolved through the European Commission’s She Figures series (2003–2021), and to what extent has this evolution contributed to opening or closing debates on gender equality in research and innovation policy? 2. What improvements could be made to gender monitoring to enhance its transformative potential in shaping equality policies in science and innovation? Considering this focus, we have asked the reports specific questions that help to address those overarching questions: What changes have occurred in gender-related concepts within She Figures from 2003 to 2021? What are the justifications, purposes, and target audiences of gender monitoring across editions, and how have these evolved? Which dimensions and measurement themes have been prioritised in She Figures, and which have been omitted? How does She Figures address the equality policy foci of presence, process, and voice (Squires, 2007)? How has the process of constructing gender monitoring evolved, and to what extent does it enable effective evaluation of gender equality policies? How are emerging issues—such as intersectionality, researcher mobility, and gendered innovation—addressed, and what conceptual biases may persist?

These questions guide our discussion of what is measured, the policy priorities these measurements reflect, and the understandings of equality and inequality embedded in the monitoring framework. We examine how selected indicators are structured, the concepts applied, and the nature of policy recommendations, with attention to how these elements have evolved over time.

To answer these questions, we adopted a qualitative content analysis of the full She Figures series from 2003 to 2021. The corpus includes 13 documents: main reports, policy briefs, and methodological annexes where available [see database, (Furió-Vico & Otero-Hermida, 2024)]. We used Atlas.ti software to support the coding process.

Initial coding followed a deductive strategy grounded in our analytical dimensions: why, what, and how equality is monitored. This was later complemented by inductive coding to refine analytical categories. For example, early topic codes were based on main measurement categories identified in the 2003 report, with new categories added iteratively as subsequent documents introduced new indicators.

Through this process, aggregate codes such as"ultimate justifications and purposes","policy focus", and"measurement topics"were defined to address our specific research questions. We employed axial coding to iterate between the data and the literature used to frame our inquiry. More detailed information on coding structure and data organisation is available in (Furió-Vico & Otero-Hermida, 2024).

5 Results: European Commission's She Figures Evolution 2003–2021

This section presents the findings regarding the above-mentioned research questions. In the following tables, we present a comparative evolution of the different editions of She Figures. The main categories common to all editions are included: the title, the justification for monitoring, the purpose of the document, the targets and the structure. A comparison is also made in relation to the policy focus of the indicators (Squires, 2007). Finally, a set of measurement topics and their contextualization is proposed.

As a preliminary note, we observe that the changes are not merely additive, including more issues and indicators, but rather there are shifts in substance.

5.1 The Use of Key Concepts Related to Gender in She Figures Over Time

First, we look at the different concepts used in the different editions. Some concepts have remained stable over time—such as horizontal and vertical segregation—while others, considered key, display significant intermittency. Terms like equal opportunities, work-life balance, and related notions such as domestic work appear inconsistently across editions. Although references to gender equality have increased, its counterpart, gender inequality, is entirely absent in some reports. We also observe the disappearance of terms such as critical mass (Table 1).

Table 1 Evolution of Gender Concepts. She Figs. 2003–2021

Notably, mentions of both women and sex declined in certain editions, even as the overall length of the reports expanded. However, both terms reappear prominently in the most recent edition. These shifts are also reflected in the evolution of the report titles (Table 2).

Table 2 She Figures Evolution 2003–2021: Titles

5.2 Justifications, Purposes and Targets in She Figures Monitoring over Time

The ultimate justification refers to why monitoring takes place, and how it relates to different purposes or intended outcomes. Over time, we observe a shift from utilitarian justifications—centred on enhancing the competitiveness and success of the European Research Area (ERA)—towards a justification grounded in equality as an intrinsic value, which becomes more prominent in the later editions.

Notably, equality is framed in varied ways across reports: as voice in decision-making and field configuration (2003); as linked to women’s concerns and untapped talent and potential (2009, 2018); and as an integral component of science policy and excellence itself (2015, 2018) (Table 3).

Table 3 She Figures Evolution 2003–2021: Justifications

Alongside these justifications, the 2003 edition introduced two initial purposes: benchmarking and neutral reporting. Over time, additional purposes emerged—such as raising awareness and identifying biases—culminating in aims related to policy evaluation and systemic change.

While the report generally maintains a neutral tone, from 2012 onwards it acknowledges that change does not occur organically and emphasises the need for policy-driven transformation. Accordingly, the most recent editions include policy recommendations and, in 2021, thematic policy briefs. These reports are also interspersed with examples of good practice, complementing the data and reinforcing a normative stance over a neutral one (Table 4).

Table 4 She Figures Evolution 2003–2021: Purposes

With regard to target audiences, policymakers have been consistently addressed across all editions, alongside researchers and universities. However, references to the general public appear sporadically: students in 2009 and 2021, citizens in 2012, and individuals interested in these issues in 2018—without continuity or clear connection between editions (Table 5).

Table 5 She Figures Evolution 2003–2021: Targets

5.3 Dimensions and Measurement Topics in She Figures Monitoring over Time

We now turn to what the reports are measuring, presenting the frequency of mentions across the topics we have identified. The overarching dimensions—such as vertical segregation, which could link several of these topics—are not explicitly stated and must be inferred from the chapter structure, which provides limited information.

Our first step was to examine the issues for which indicators are provided. A large number of indicators relate to employment (e.g. ratio of the scientific workforce to total employment) and academic careers (e.g. distribution of pre-doctoral and senior positions), with extensive breakdowns by discipline (e.g. STEM, Social Sciences) and sector (e.g. government, entrepreneurship, private sector). As discussed previously, these dimensions are particularly significant and clearly predominate in the monitoring framework.

From the earliest editions, the reports have also included indicators on equality/inequality dynamics in meritocratic environments, such as the gender pay gap and success rates in funding applications. Over time, additional topics have been incorporated, including authorship, mobility, and the gender and intersectional dimension in Horizon 2020 projects.

However, some critical areas remain underexplored. Decision-making, for instance—measured through gender distribution among institutional heads—receives limited attention. Even more notably, employment conditions are scarcely addressed. It is worth highlighting that work-life balance is measured only once, in the 2012 edition, and is absent from all subsequent reports (Table 6).

Table 6 She Figures Evolution 2003–2021: Measurement topics

5.4 Policy focus on She Figures Monitoring over Time

Advancing to the next level of analysis, we present the results for the policy focus of the indicators. As presented before, Squires (2007) identifies three different focus of equality policies: presence (increasing women's numbers) represents almost all the She Figures's indicators; process (systematic approach to equality in policymaking) includes just a couple indicators oriented to detect organisations with equality plans and measures and and their public visibility on institutional websites —introduced in the most recent editions. Finally, as for voice (amplifying women’s concerns and expanding the system’s boundaries), it is scarcely addressed. Only a limited number of indicators refer to this dimension, such as the inclusion of gender and intersectional considerations in Horizon 2020 projects (Table 7).

Table 7 She Figures Evolution 2003–2021: Presence, Process and Voice

5.5 The Monitoring Process and its Contextualisation in She Figures over Time

She Figures has significantly enhanced data contextualisation in its most recent editions (2018, 2021), with handbooks exceeding two hundred pages. These include explanations for each indicator—its rationale or motivation, precise definition, calculation method, data source, and, crucially, specifications, comments, and critical issues, such as changes across editions or the limitations of bibliometric data.

Most data are sourced from Eurostat, the EU’s statistical agency, and from Statistical Correspondents in EU member and associated countries, who provide information such as the gender composition of scientific boards. Since 2012, additional sources have been integrated to address emerging topics. These include the MORE Surveys, which are part of the Mobility and Career Paths of Researchers in Europe (mobility), Web of Science database (2015) (gendered authorships and content) or EU Open Data Portal (indicators about gender dimension and intersectional integration in EU funded projects). Some knowledge-based gender projects, such as Hypatia, are cited to support indicator rationales, though they are not used directly as data sources.

Importantly, the latest reports provide insight into the monitoring process, noting that the selection of new indicators involves in-depth consultation with key stakeholders (She Figures, 2021, p. 172). The relevance of indicators is informed by steering group discussions and state-of-the-art mappings (She Figures, 2021, p. 173). However, key questions remain unanswered—particularly regarding who the stakeholders are, beyond the ERA Committee’s Standing Working Group on Gender in Research and Innovation.

Policy briefs—crucial for translating data into policy insights—are issued as separate reports authored by large consultancies. The handbook also includes a “History and Background of the She Figures” section, yet this only mentions the founding actor the Helsinki Group on Women in Science in 1999, leaving unclear who has been involved since. For these reasons, the report still functions to some extent as a technical black box regarding participant involvement (Table 8).

Table 8 She Figures Evolution 2003–2021: Contextualisation

5.6 Shapshots in Mobility, Innovation and Intersectionality: Translations and Implications

We cannot analyse the conceptualisation of all She Figures issues in detail. However, as previously discussed, paying close attention to indicator translation and operationalisation is essential for both responsible metrics and gender equality. In this section, we focus on three priority policy areas: intersectionality, mobility, and innovation.

Intersectionality was introduced in She Figs. 2021 as an exploratory indicator aimed at assessing its integration in Horizon 2020 projects, with the aim of informing future development. The report explicitly outlines the need for intersectionality and identifies potential sub-dimensions (e.g. disability, sexual orientation, socioeconomic status, vulnerable populations), offering some promise for broader inclusion of diverse gendered voices. However, caution is needed. In practice, intersectionality has at times diluted the focus on women—treating them as one diversity category among many—thereby reducing attention and resources allocated to them (Walby et al., 2012). Moreover, this approach has been linked to an emphasis on individual anti-discrimination via legal channels (Bustelo, 2009), or identity-based claims that overlook deeper issues such as power and wealth redistribution (Squires, 2007).

Mobility became a key focus following the policy shift in 2012. As a core marker of excellence in research careers, mobility is also a strategic priority for the European Research Area (EU Council, 2021). It was initially defined as international mobility lasting a minimum of three months over the previous three years (She Figures, 2012, p. 43). The data reveal notable gender gaps with crucial implications: “Besides age, part-time jobs and mobility are possible explanations for gender differences in scientific employment” (She Figures, 2012:34).

Globally, mobility is seen as both an opportunity and a constraint—particularly for women, due to their role in caregiving (Xie & Shauman, 2003; Ackers, 2008; González Ramos & Malpica-Lander, 2013; Zippel, 2011). Additionally, gendered patterns in mobility are detected. The most mobile researchers are not always the most productive, and men and women display distinct mobility behaviours (Cañibano et al., 2016). For example, women exhibit greater international mobility than men, but their stays are typically shorter, closer to home, and occur earlier in their careers—particularly during their twenties and thirties. The lowest rates of female mobility are observed at the postdoctoral stage.

In more recent editions, the narrative evolves. She Figs. 2021 disaggregates the data and notes that mobility gaps mainly affect senior women, while at the predoctoral stage, gender differences are minimal. A dedicated"tip box"highlights good practices, such as universities funding childcare during international mobility (She Figures2021, p. 162). The definition of mobility was also adjusted: three months during the PhD stage, and three months minimum within ten years for postdoctoral researchers (She Figures, 2021, Handbook).

Still, the thresholds themselves may require re-evaluation. Given that women often engage in shorter mobility periods (Cañibano et al., 2016), would the data change if the threshold were lowered to one month? Another concern is the lack of further differentiation within the postdoctoral category. The immediate post-PhD stage—arguably the most vulnerable—often coincides with family formation, and is precisely when mobility is required in grant applications, such as for the prestigious Marie Skłodowska-Curie fellowships. This stage may mark the beginning of the gendered "leaky pipeline"(Grogan, 2019).

This raises broader questions about the purpose of mobility and its centrality in research careers. Mobility has been criticised as a proxy for internationalisation—an accepted"rite of passage"—that is often disconnected from actual knowledge transfer or collaboration (Ackers, 2008, 2013). Should we instead monitor and promote scientific collaboration? This, too, is a gendered issue. Women tend to collaborate with more diverse partners, which could enhance transfer policies and engagement with non-academic actors (Bozeman & Gaughan, 2011; Díaz-Faes et al., 2020). Such patterns are not visible through simplistic metrics like co-authorship counts—currently the main measure of collaboration in She Figures.

Turning to innovation, its conceptualisation remains highly gender-biased, as previously discussed (Alsos et al., 2013). Innovation in monitoring panels is typically equated with market-oriented technological development, excluding broader interpretations such as social or public innovation (Otero-Hermida, 2022; Otero et al., 2020). Although She Figs. 2021 acknowledges this bias (Policy Briefs, p. 57), its data framework reinforces the reductionist view—focusing on digitalisation, entrepreneurship, and indicators such as leadership and venture capital access (p. 59). No data needs are identified regarding social or public innovation, areas where women are often more present. While the conceptual issue is noted, it remains unaddressed in terms of data collection and indicator development.

6 Discussion: Key findings, Study Limitations and Detected Challenges

This section synthesises the main findings of the study, contextualising them within broader literature on gender monitoring and responsible metrics.

The evolution of the European Commission’s flagship initiative in gender monitoring, She Figures, reflects growing efforts towards contextualisation. Its justification has gradually shifted from competitiveness to equality, and its purposes have evolved from information provision to awareness-raising, and more recently, to policy evaluation for systemic change.

These shifts have influenced, to some extent, what is being measured. For example, more critical gender concepts such as critical mass or inequality have been abandoned. The notion of critical mass implies a minimum number of women necessary to produce effects and prevent tokenism in male-dominated environments (Grey, 2006), although more is required to produce transformation—such as critical actors or leadership (Childs & Krook, 2009; Etzkowitz et al., 1994). In recent editions, critical framing re-emerges within the justification and purpose of the reports, indicating aspirations for more ambitious, systemic change.

There has also been a shift in terminology, with categories such as woman and sex replaced by gender. Recent editions appear to offer a compromise aligned with EU policy trends. The European Strategy for Gender Equality 2020–2025 (COM/2020/152 final) explicitly acknowledges the diversity within the categories of men and women, referencing sex, gender identity, expression, and sexual characteristics. This embraces fluidity and non-binarism while continuing to centre the category woman as essential to understanding inequality and violence (Jabazz, 2023). Future editions may incorporate non-binary perspectives, although doing so may be challenging within a monitoring framework and statistical data primarily based on binary presence indicators.

As discussed in earlier sections, all three equality policy foci—presence, process, and voice—are necessary. She Figures has made efforts to expand its focus beyond presence, incorporating limited indicators addressing process and voice. Nevertheless, the imbalance remains significant. The voice dimension, which entails expanding the boundaries of the system (Squires, 2007), remains marginal. It is not just the lack of voice-oriented indicators; this limitation is also evident in other dimensions of the monitoring.

Firstly, in the topics measured. The imbalance between the public (masculinised) and private (feminised) spheres is a longstanding issue in gender theory and science policy, yet remains unmeasured in She Figures. The only indicator related to work-life balance appeared in 2012 and was subsequently dropped. While the issue is frequently mentioned in recommendations and tip boxes, monitoring for systemic change and policy evaluation—as claimed in recent editions—requires actual data. Without it, women’s concerns are not truly integrated, and system boundaries remain unchallenged.

This gap reflects a broader issue: EU equality policy has long been shaped by economic priorities, framing equality largely as women’s labour market integration. This has marginalised other visions of equality, especially those relating to the private and reproductive sphere (Lombardo, 2017; Walby, 2004, 2005).

Secondly, core conceptual frameworks and their gendered nature are rarely problematised. As seen with mobility and innovation, even when acknowledged, issues remain unaddressed. As a result, data may suggest that women contribute little to innovation, when in fact they are often active in social, non-technical, public and non-entrepreneurial domains—none of which are measured or identified as areas for future data development. While the report aims to make inequality visible to improve policymaking, many crucial aspects remain obscured, and key translations of feminist concerns fail to challenge dominant norms. Consequently, women’s outcomes may still be interpreted as the problem, rather than the systems that marginalise them.

This raises the issue of who is being monitored. Gender inequality is not only driven by irrational bias or stereotypes; institutions themselves are gendered in ways that exclude women, non-binary, and non-heteronormative individuals and perspectives. Thus, the focus of monitoring should also pay attention to institutions (Connell, 2006; Gläser & Laudel, 2007; Huyer & Westholm, 2007; Leydesdorff et al., 2016; Nygaard et al., 2022)—particularly R&I actors such as funding agencies and research organisations, but very few indicators are found.

After two decades of She Figures, its intended users remain largely the same: policymakers, researchers, employers, and “anyone with a general interest” (She Figures, 2018, p. 216). There is a need to reflect on how these actors actually use the data, and how this might influence or limit its impact—an issue addressed in more detail in the following section.

This leads us to the monitoring process itself. While recent editions provide more detailed contextualisation for each indicator—aligning with responsible metrics principles through clearer rationales, sources and limitations—gender concerns still appear as external to the system. Data sources and their conceptual assumptions have barely changed. One cannot genderise data without first genderising its definitions, as shown in the cases of mobility and innovation.

Some new sources, such as EU-funded projects, have been incorporated, but no gender-specific projects are used as data sources, despite their potential. Moreover, information on actors involved in indicator development remains vague. It is unclear who is consulted during indicator selection or conceptualisation, or how data interpretation is shaped. In this respect, the report retains elements of a technical black box, offering limited transparency on process or participation. Its connections and history are barely visible, although these constitute the very “fact-data” (Latour, 1999) needed to trace the translations and rationales behind monitoring decisions.

Thus, although progress has been made, what remains absent is the participatory democratic emphasis championed by feminist approaches (Squires, 2007). If we are to move from tinkering to transforming (Rees, 1998), a rethinking of the monitoring process is needed—including greater involvement of actors beyond experts and agencies.

Consequently, in response to Research Question 1—regarding the evolution of gender monitoring through the European Commission’s She Figures series and its contribution to opening or closing debates on gender equality in research and innovation policy—our analysis suggests that, despite some relevant progress, the debate remains largely closed.

Finally, this study is limited to analysing She Figures as a monitoring product. A fundamental question remains: How are these indicators used in science and innovation policy after twenty years of monitoring? And how does their ultimate use relate to the way they are produced through the policy process? (Otero-Hermida, 2022).

7 Proposals: Imagining New Monitoring Paths

In light of the challenges discussed, and in response to our second research question concerning how She Figures can be improved, we identify several areas where further development is needed.

We argue that the key issue is not replacing the policy focus but achieving a better balance between presence, process, and voice. Recently, the denial of gender inequality has intensified with the rise of anti-feminist far-right movements and their growing institutionalisation. This development places gender at the heart of broader anti-democratisation trends (Lombardo et al., 2021). In our view, presence indicators remain essential for raising awareness purposes and should not be diminished. However, improved representation does not automatically translate into systemic change, as evidenced in feminised research areas. Other critical elements—such as leadership, critical actors, and institutional transformation—must also be addressed (Childs & Krook, 2009; Etzkowitz et al., 1994; Grey, 2006).

The governance approach underpinning monitoring in the European Union has been criticised for its reliance on benchmarking, a soft mechanism often seen as replacing more ambitious tools like legislation (Borrás & Jacobsson, 2004; Bruno et al., 2006). As previously mentioned, participatory democracy and the incorporation of voice are essential elements of feminist and gender-sensitive policy. Monitoring should support this aim, rather than closing down debates or reinforcing ostensibly neutral framings that conceal gendered dynamics.

Our main proposal, therefore, is to improve the monitoring process itself, especially by incorporating missing voices.

  • First, after two decades, a participatory process is needed to define the dimensions and sub-dimensions we wish to measure. These are not clearly stated in the reports, and important issues—such as work-life balance—are absent. The R&I landscape has also changed significantly over the past 20 years, with increased competitiveness and productivity demands. This raises new questions: should we measure who delivers care? Or should we ask how hyper-competitive environments disproportionately affect work-life balance and gender equality? The focus could shift from “fixing women” to supporting all researchers’ well-being. These kinds of shifts in problem framing require inclusive debate and participation.

  • Second, equality policies and their associated metrics reflect different, often competing, visions. Engaging actors currently underrepresented in policymaking could help diversify perspectives. Intersectionality can provide a framework to understand the heterogeneity of experiences across women, men, and non-binary individuals in science and innovation. Other underrepresented perspectives—such as new materialist or ecofeminist approaches that emphasise socio-material conditions, collectiveness, emotion, and care—could also enrich the conceptualisation of indicators.

  • Third, we propose reflecting on interrelationships among monitored dimensions. For instance, challenges like vertical and horizontal segregation, the gender pay gap, or mobility patterns may be linked to leadership or work-life balance. These interconnections are inherently multifactorial and multi-criteria, as explored in some monitoring methodologies (Sánchez-López et al., 2024). Recognising such links could help achieve a more balanced integration of presence, process, and voice in gender-informed policy. Similarly, exploring how other dimensions of responsible research and innovation—such as public engagement, science communication, ethics, or openness—intersect with gender equality would be valuable (Mejlgaard et al., 2018).

  • Fourth, we recommend expanding data sources beyond statistical indicators to include qualitative insights and the voices of those affected. This approach is widely supported in gender indicator research (Hernández et al., 2023; Humbert et al., 2021; Huyer & Westholm, 2007; Nygaard et al., 2022; Otero-Hermida & García-Melón, 2018). Some existing initiatives provide inspiration: the EU Expert Group in Responsible Research and Innovation (2015) recommends perception-based indicators, and the Spanish gender monitoring initiative incorporates qualitative work to include voices typically absent from datasets—such as research technicians or women who have left the academic system (MICIU, 2025). Results from this project reveal concerns not captured by existing indicators, such as the need to decelerate the scientific system to improve both social impact and researchers’ well-being. These insights reflect a more collective vision of scientific work and may contribute to systemic change of science policies. Moreover, the report incorporates emotional analysis that offers valuable insight into how these issues affect researchers. Nevertheless, there remains considerable scope for further exploring the role of emotions in future monitoring efforts (MICIU, 2025).

  • Fifth, monitoring could also be directed towards specific institutional actors—such as funders, ministries, or companies—rather than focusing solely on men/women or individual “beneficiaries”. If the purpose is policy evaluation, this shift would be essential, and there are precedents in EU Open Science monitoring (European Commission, 2020). Gender monitoring at team, conference, or research centres levels may also offer insights, especially when integrating variables like age, caregiving responsibilities, education, tenure, or contract type (Humbert et al., 2021; Corona-Sobrino et al., 2020; Sánchez-López et al., 2024). Context-sensitive integration of such data could better capture the dynamism of gendered structures.

  • Sixth, if the main goal is policy evaluation, several clarifications are needed. Monitoring alone does not constitute evaluation, but can serve as a valuable tool within it. This nuance may be lost in the current She Figures framing, potentially leading to unrealistic expectations. The report could benefit from referencing other ongoing evaluation initiatives, and from providing more information on equality measures, particularly to understand the wide disparities between EU countries.

  • Finally, to enhance policy learning, we recommend stronger links between monitoring and the policy process, incorporating the policy lifecycle of indicators (Otero-Hermida, 2022). This includes understanding:

    1. 1.

      How monitoring panels are created: What are the underlying values and purposes? Who participates in their design and data collection?

    2. 2.

      How indicators are used: Who uses them, in which forums, for what decisions, and how are they contextualised or complemented by qualitative perspectives or other data? How they inform and/or close debates, what is the focus of work, do they justify policies or other?

    3. 3.

      How their usefulness and limitations are assessed: Are they reviewed or replaced across political cycles? How are new issues incorporated into statistical systems?

As previously argued, the utility of monitoring for policymaking can be heavily related on how indicators are created and used within a broader and more inclusive policy process (Otero-Hermida, 2022).

8 Conclusions and Further Research

This study has examined the evolution of gender monitoring in European research and innovation policy through a critical analysis of the European Commission’s She Figures series (2003–2021). Drawing on literature on gender equality policies and gendered and responsible metrics, we have identified both significant advancements and persistent shortcomings in how gender disparities are conceptualised and measured. Generally, our hypothesis or expectation is that gender monitoring has evolved, producing more detailed metrics that go beyond merely counting how many women are in the area, moving towards more nuanced understandings of gendered problematics. This expectation has only been partially met.

Our findings show that She Figures has contributed to establishing a common monitoring framework across Europe, improving data comparability and contextualisation, and enhancing the visibility of gender gaps. Over time, the series has broadened its scope from a narrow focus on women’s representation in academic positions to include dimensions such as research funding and publication patterns. It has also shifted its emphasis from mere presence to the inclusion of institutional measures, while the ultimate justifications and purposes of monitoring have become more ambitious, oriented towards systemic change in the latest editions.

However, the evolution of why we monitor has advanced more substantially than the what and how. There remains considerable scope for improvement in integrating a thoughtful gender perspective, especially in the selection and framing of dimensions to be measured. A central gap concerns the lack of voice from those directly affected by gender monitoring in science and innovation.

For these reasons, the capacity to open up debates—rather than close them—around prevailing value systems and practices, which tend to devalue the feminine, the non-binary, or that which does not conform to the male norm perceived as universal or neutral, also remains in need of improvement.

To address these limitations and enhance the transformative potential of gender monitoring, we have outlined several avenues for development in Section 8. These include expanding data sources and methodologies, incorporating qualitative and institutional-level data beyond individual metrics to move beyond headcounting, exploring the multidimensionality of data and relationships among indicators, and including a more diverse range of expertise to integrate theoretical insights and beneficiaries of the policies. Participation is crucial in translating gender-related problems into meaningful monitoring practices for improved policy learning.

Future research should explore how these challenges persist across national contexts and governance levels, and how they may be addressed. Similar patterns may be observed in other policy domains, as monitoring approaches are deeply embedded in the governance architecture of the European Union. Strengthening monitoring through a gender lens holds transformative potential to generate significant impact on public policies and the broader pursuit of equality.