Background

Phylogenetic analyses are commonly used to infer the relationships among genes, proteins, species, or organisms [1,2,3,4,5,6]. These analyses have proven efficient in predicting gene function [7], identifying protein–protein interactions [8, 9], discovering natural products [10], classifying cell types [11], exploring evolutionary processes [12] and establishing taxonomy [13]. More recently, phylogenetics has also played a crucial role during the COVID-19 epidemic [14,15,16], with the aim of identifying the origin of virus outbreaks, tracking the virus evolution, and comprehending the mechanism of the pathogenic virus.

As the complexity of data increases, effectively visualizing data, especially within specific scenarios, has become critical for phylogenetic analysis. One of the core challenges is the joint display of phylogenetic trees and complementary charts. While some traditional tools can offer scenario extensions, further development is still needed. Therefore, an online platform is needed to integrate these visualizations through intuitive interfaces and straightforward parameter controls.

In light of this, we present PhyloScape, a web-based application for scalable visualizing, editing, and annotating phylogenetic trees (Fig. 1). PhyloScape features composable plug-ins that allow users to freely combine and customize visualization components on the page. In addition to common tree settings, layouts, and styles, PhyloScape is designed to be user-friendly and publication-ready, tailored with metadata annotations for specific scenarios. Notably, we have developed a plug-in ecosystem that serves as a versatile toolkit for users. This ecosystem enables users to conveniently select and combine plug-ins to meet the demands of different scenarios via PhyloScape.

Fig. 1
figure 1

Overview of PhyloScape. a Supported tree format and annotation types. b Optimization for tree branch heterogeneity with a reshape method. c PhyloScape plug-in ecosystems. d Applications of PhyloScape

Additionally, we introduce the design and architecture of PhyloScape, along with the optimizations made to improve the user experience. We present the user interface and use cases of PhyloScape, illustrating how it supports a wide range of applications in phylogenetic analysis. We then discuss the advantages and limitations of PhyloScape and provide a comparison with other visualization software. All the use cases were implemented on the PhyloScape web platform, allowing users to access and interact with the data directly.

Implementation

Architecture

The PhyloScape web application enables users to perform real-time tree editing, achieve interactivity between different charts, and leverage composable plug-ins for customizable visualizations. The major part of the platform is the PhyloScape JavaScript library. The PhyloScape JavaScript library was created using the d3.js v7 framework, which is both lightweight and fast. This makes it easy to integrate PhyloScape into other web-based applications. PhyloScape uses built-in callback functions as configurable features that developers can reuse. One such function, named “formatter”, can be configured for tree nodes. This function accepts two inputs: a d3.js instance used to modify the display of the node, and a JSON formatted configuration containing specific information, such as branch length, node name, and bootstrap value. The function is invoked to draw each node and can be customized to meet developers' needs.

Annotation system

With the increase in the number of tree features, editing the annotation files becomes a challenge. The PhyloScape application enables users to display and manage tree annotations effectively. Users can start by inputting files in csv or txt format, with the first column of the file defined as leaf names, and other columns corresponding to other features. For trees, features in the input files include node signs, leaf settings, metadata signs and tooltips (Fig. 2). In PhyloScape, we design two modes: simple and detailed. The setting of metadata and node signs is initially automatically assigned a default value, and users can specify those features according to their needs by re-editing the control panel or using table settings. For detailed annotation, common settings include the font, position, and color. Each symbol has corresponding settings for the width, height, and value.

Fig. 2
figure 2

Tree annotation. A phylogenetic tree of artificial data with multiple annotation types. The table displays the input formats of the tree annotations, either in simple or detailed mode. a Annotation of tree leaf types illustrates visualization using different node symbols. With the simple mode, only the first column of the input table is used as group information. b Leaf settings allow modification of font size and color for various annotations, and the background color is set when the Leaf Mask annotation is on. c Hovering over the leaf names displays a tooltip. d Annotation of metadata information via diverse labels and symbols. Each parameter included in the code can be set by the table and the table is set by the long annotation format

Different scenarios correspond to different inputs. The input for plug-ins has been redesigned for integration with the annotation system. As the features in plug-ins correspond with the tree features, the leaf name is placed in the first column of both the data and the tree input for an interactive view with the trees. In cases where features exist in both trees and plug-ins, priority is given to the tree features.

Tree visualization optimization

There are several challenges to visualizing trees, especially for trees with extreme branch length variation. PhyloScape addresses these problems through built-in functions, including a reshaping method to enhance branch visibility. Specifically, we design a multi-classification-based branch length reshaping method, which resolves branch length heterogeneity by grouping branches into multiple classes using adaptive length intervals and injective functions. Each class maps original branch lengths to normalized scales, improving the interpretability of evolutionary relationships in trees with heterogeneous branch lengths.

Dependencies

PhyloScape is written primarily in JavaScript and Python. The PhyloScape library was developed with d3.js. For large tree visualization, PhyloScape adjusts and implements Phylocanvas.gl [17], a WebGL-based library capable of efficiently rendering hundreds of thousands of nodes. For the dynamic integration of plug-ins, Vue is used for the front-end display, whereas an iframe is employed to enable the integration. Specific extensions for PhyloScape, include a heatmap plugin for demonstrating pairwise correlations and ACMap, which is designed for antigenic cartography on the basis on Racmacs [18]. For 3D protein structure visualization, the pdbe-molstar library [19] was employed. For map visualization, OpenLayers is used in conjunction with the Tile Python package for tile rendering. Additionally, Echarts (https://echarts.apache.org/zh/index.html) is utilized for other statistical charts.

General workflow

The general workflow of the PhyloScape web application can be divided as follows: panel selection, tree upload, tree styles editing, plug-in selection, plug-in file upload, plug-in styles, visualization editing, and tree sharing (Fig. 3).

Fig. 3
figure 3

PhyloScape workflow: (1) online layout selection, (2) upload tree files in various formats, (3) customize the tree styles, (4) upload plug-in files via the plug-in control panel, (5) customize the plug-in styples, (6) visualize, and (7) share a tree

Data input and output

The PhyloScape web interface imports common tree formats, including Newick [20], NEXUS [21], PhyloXML [22], and NeXML [23], and also supports a PhyloScape JSON format. The visualization of the tree part can be exported in PNG or SVG formats, and specific subtrees can be selected and exported. Users can edit input tree and annotation data on the webpage and make adjust to the values without reuploading the files.

Shared results

The PhyloScape application supports sharing results. We also provide a gallery page where users can communicate and share their results with other users. On this page, basic information and styles of the trees can be shared, and users can copy, edit, download, and reuse the styles. A unique web address is given for sharing and reuse on the user’s own webpage.

Results

User interface

The PhyloScape web application interface can be divided into four main parts (Fig. 4): the layout panel, the tree control panel, the plug-in control panel, and the drawing panel. The layout panel is designed to divide the main drawing panel into different layouts. The tree control panel is used to upload tree files and edit settings such as branch patterns, leaf patterns, tree layouts, and metadata annotations. The plug-in control panel allows users to define and select different plug-ins from the PhyloScape plug-in ecosystem, and we currently provide three different themes of plug-ins: geographic maps, statistical diagrams and protein structures. Finally, the drawing panel jointly displays the tree and the selected plug-ins.

Fig. 4
figure 4

Overview of the PhyloScape web application. a Panel selection for different layouts. b Control panels for tree settings, including the common tree and large tree views. c Plug-in control panels allow the editing of the selected plug-in. d A partial phylogenetic tree of the West Nile virus transmission shown on the drawing panel, visualized in rectangular views with plug-ins of MapTransmission and FreqStack [24, 25]. The color represents different lineages

Case study of pathogen phylogeny

Acinetobacter pittii, a gram-negative bacterial species, is a pathogen that causes opportunistic infections [26]. In this case, we included a total of 149 strains to visualize the phylogenetic inference of the pathogen. The analysis represented sample metadata through differentiated symbols for attributes like isolation source, host, country, disease, collection date, and genome length, enabling a comprehensive overview of its evolutionary characteristics. As shown in Fig. 5, the main hosts of A.pittii include humans, animals, plants, food and the environment. The strains were isolated from tissue samples from blood, urinary system, respiratory system and digestive system. A complete display has been provided in the gcPathogen database (https://nmdc.cn/gcpathogen/species?taxonid=48296) [27].

Fig. 5
figure 5

A phylogenetic tree of the A.pittii with multiple annotation types visualized using the PhyloScape JS library. The tree is annotated with metadata including the host, disease, country, isolation source, and time. With interactive views, the tooltip can associate the rings with their corresponding legends

Case study of Ruegeria taxonomy

In taxonomic studies, the average amino acid identity (AAI) is a crucial metric for evaluating protein similarity between taxa. We designed an interactive heatmap plug-in to display pairwise AAI values corresponding to a phylogenetic tree. The genome of Ruegeria pomeroyi DSS-3 (Alphaproteobacteria group, accession number GCF_000011965.2) [28] was chosen to demonstrate the Heatmap plug-in's functionality (Fig. 6). The phylogenomic tree was generated via the TYGS genome server (https://tygs.dsmz.de) [29] for strain level resolution. Pairwise AAI values were calculated using the EzAAI tool [30] and formatted into a CSV matrix as input for the heatmap. The selection of a heatmap grid cell highlights the phylogenetic tree tips that correspond to the AAI value, thereby elucidating the values and relationships. In Fig. 6a, the tips of the input genome and Ruegeria pomeroyi DSS-3, which share an AAI value of 100% on the same branch, are highlighted in red. Upon clicking on a clade of the tree, the display of the AAI heatmap automatically zooms in on the specified taxa. Figure 6b shows that the selection of the branch corresponding to input genome, Ruegeria pomeroyi DSS-3, and Ruegeria alba 1NDH52C results in the regeneration of a focused heatmap for these three taxa.

Fig. 6
figure 6

Visualization of microbial taxonomy with pairwise average amino acid identity. The tree was produced via TYGS with the input of the whole genome sequence (GCF_000011965.2). a When a grid cell in the heatmap of average amino acid identity is selected, the corresponding tree tips are highlighted. In this case, the input genome and Ruegeria pomeroyi DSS-3 are highlighted in red. b If a clade is selected, the relevant section of the heatmap is displayed for the pairwise average amino acid identities. This figure is visualized with a PhyloScape Heatmap

Case study of the Chinese vascular plant tree of life

Research on plant resources will facilitate the planning of conservation, especially by identifying hotspots of phylogenetic diversity and discovering the areas of high species richness [31]. Here we present the visualization of the tree of life for 13,663 Chinese vascular plants (http://darwintree.cn/index.shtml) from a large tree view, colored with the plant distribution at both the provincial and county levels. The total species covered a 44.0% of all native Chinese vascular plants [32, 33]. The visualization started with the input of a Newick format file with 15,092 tips, with the MapColor plug-in chosen to show the geographical distribution of the plant species. In general, the distribution of vascular plants varies from region to region. Among them, Yunnan Province has the most species of vascular plant collected, ranging from 2521 to 2802, followed by other southwestern regions of China (Fig. 7a). When zooming in, the above pattern can be visualized at the county level. Areas around the Qinling Mountains, i.e. southern Shaanxi province, have a greater distribution than northern regions (Fig. 7b).

Fig. 7
figure 7

Visualization of Chinese vascular plant trees. a Visualization of tree and plant distributions at the province level. b Plant distribution at the county level (zoom in). c Amaranthaceae Suaeda corniculate distribution with a subset of trees of life. This figure is visualized with PhyloScape MapColor

The plug-in also supports the rebuilding trees for specific regions or the recoloring of regions by plant lineages. For example, if a species in the Amaranthaceae family, Suaeda corniculata is selected, its distribution will be displayed on a map and the northwestern region of China will be highlighted, which is in accordance with the known knowledge that the plant can grow on saline-alkaline soils (Fig. 7c). The results have been implemented in the gallery page with the title of the Chinese vascular plant tree of life.

Discussion

Advantages and shortcomings of PhyloScape

Phylogenetic tree visualization serves as a powerful analytical tool for evolutionary studies. As technologies and applications continue to advance, the demand for effective phylogenetic tree visualization is growing. In addition to simply displaying tree topologies, integrating and visualizing associated biological or evolutionary data presents an additional challenge.

We present PhyloScape, an intuitive online platform designed for phylogenetic tree visualization and analysis. Unlike conventional tools, PhyloScape not only displays tree topologies but also integrates common data types with different scenarios, enabling dynamic and interactive visualizations. Its user-friendly graphical interface allows seamless online operation without requiring high-performance local hardware, making it accessible to users unfamiliar with command-line tools. For developers, PhyloScape also supports customization and integration through APIs and modular extensions.

PhyloScape allows users to visualize extremely large trees and also provides an optimization for the display of trees with extreme branch length variation. It offers a range of application scenario visualizations, having developed and integrated plugins for the display of common scenarios such as maps, charts, and protein structures. PhyloScape supports the online combination and display of plug-ins. It has also simplified complex metadata annotations, offering two data upload formats to facilitate simplified annotations based on user needs. Additionally, the platform supports the sharing of constructed phylogenetic trees, facilitating collaborative research and downstream visualization development.

PhyloScape has several limitations. It only supports prebuilt trees and does not cover the entire analysis pipeline. For visualization, supports for processing multiple trees simultaneously or comparing trees is currently lacking. Additionally, while optimized for common use cases, its plug-in ecosystem remains limited, and continuous development of new plugins will be necessary to address this.

A comparison with other available visualization tools

As the primary data type of phylogenetic analyses, phylogenetic trees can be visualized and annotated using several existing tools. Among them, PhyloScape has unique features (Table 1). Desktop applications such as FigTree [17], MEGA4 [18] and CamlTree [19] are well-suited for local use, but they lack features for complex tree visualizations. The R library ggtree [20, 21] and the Python package ETE3 [22] offer multiple features and support various tree formats, but these tools often require some level of programming skills to utilize all their features. In contrast to those tools, PhyloScape supports most tree features and requires no additional installation. The web applications iTol [23] and Evolview [24] provide easy online access, supporting multiple input formats and dynamic interactions. PhyloScape enables the visualization of more features. As the amount of genomic data analyzed increases, efficiently visualizing large trees becomes a major challenge. Several tools, including Dendroscope [25], are specifically designed for large tree visualization. However, these tools lack the ability to integrate with web applications. Tools developed with JavaScript, such as phylotree.js [27], Archaeopteryx.js [28], PhyD3 [29] and IcyTree [30], are easy to use in a web browser and support interactive annotations, but they require additional development to adapt to different scenarios. Nextstrain [31] provides one-stop visualization output after data curation and analysis, but it is optimized primarily for pathogen evolution. In contrast, PhyloScape allows the combination of the tree view and scenario views, and is suitable for composable visualization on the platform.

Table 1 Comparison of PhyloScape with other visualization tools

Conclusions

This study introduces PhyloScape, a multifunctional online platform that supports the interactive visualization of phylogenetic trees in various scenarios. It enables the combination and joint display of multiple application scenarios along with the tree view on a single page, thereby simplifying and accelerating the analysis and mining of data. The platform reduces usage complexity by enabling parameter modification through the control panel. PhyloScape also offers online sharing capabilities and API interfaces to support collaborative research and system integration.