A Distributed Approach to Solving Overlay Mismatching Problem
Yunhao Liu1
, Zhenyun Zhuang1
, Li Xiao1
and Lionel M. Ni2
1
Department of Computer Science & Engineering
Michigan State University
East Lansing, Michigan, USA
{liuyunha, zhuangz1,lxiao}@cse.msu.edu
2
Department of Computer Science
Hong Kong University of Science and Technology
Kowloon, Hong Kong, China
ni@cs.ust.hk
Abstract
In unstructured peer-to-peer (P2P) systems, the
mechanism of a peer randomly joining and leaving a P2P
network causes topology mismatching between the P2P
logical overlay network and the physical underlying net-
work, causing a large volume of redundant traffic in the
Internet. In order to alleviate the mismatching problem, we
propose Adaptive Connection Establishment (ACE), an
algorithm of building an overlay multicast tree among each
source node and the peers within a certain diameter from
the source peer, and further optimizing the neighbor con-
nections that are not on the tree, while retaining the search
scope. Our simulation study shows that this approach can
effectively solve the mismatching problem and significantly
reduce P2P traffic. We further study the tradeoffs between
the topology optimization rate and the information ex-
change overhead by changing the diameter used to build
the tree.
1. Introduction
In unstructured P2P systems, queries are flooded among
peers (such as in Gnutella [2]) or among supernodes (such
as in KaZaA [1]). In such systems, all participating peers
form a P2P network over a physical network. A P2P net-
work is an abstract, logical network called an overlay net-
work. When a new peer wants to join a P2P network, a
bootstrapping node provides the IP addresses of a list of
existing peers in the P2P network. The new peer then tries
to connect with these peers. If some attempts succeed, the
connected peers will be the new peer's neighbors. Once this
peer connects into a P2P network, the new peer will peri-
odically ping the network connections and obtain the IP
addresses of some other peers in the network. These IP
addresses are cached by this new peer. When a peer leaves
the P2P network and then wants to join the P2P network
again (no longer the first time), the peer will try to connect
to the peers whose IP addresses have already been cached.
This mechanism of a peer joining a P2P network and the
fact of a peer randomly joining and leaving causes an in-
teresting matching problem between a P2P overlay network
topology and the underlying physical network topology.
Studies in [15] show that only 2 to 5 percent of Gnutella
connections link peers within a single autonomous system
(AS), but more than 40 percent of all Gnutella peers are
located within the top 10 ASes. This means that most
Gnutella-generated traffic crosses AS borders so as to in-
crease topology mismatching costs. The same message can
traverse the same physical link multiple times, causing
large amount of unnecessary traffic.
The objective of this paper is to minimize the effect due
to topology mismatching. We propose the Adaptive Con-
nection Establishment (ACE) that builds an overlay multi-
cast tree among each source node and the peers within a
certain diameter from the source peer, and further optimizes
the neighbor connections that are not on the tree, while
retaining the search scope. ACE is scalable and completely
distributed in the sense that it does not require global
knowledge of the whole overlay network when each node is
optimizing the organization of its logical neighbors. Our
simulations show that ACE can significant improve the
performance. We also show that a larger diameter leads to a
better topology optimization rate and a higher overhead due
to extra information exchanging. Our experiments and
discussions provide a guide on how to achieve a good per-
formance by considering the tradeoffs between the topol-
ogy optimization rate and the information exchange over-
head in selecting the diameter to determine the peers to
form the multicast tree for a source peer.
The rest of the paper is organized as follows. Section 2
discusses related work. Section III presents the adaptive
connection establishment (ACE) scheme. Section IV de-
scribes our simulation methodology. Performance evalua-
tion of the ACE is presented in Section V, and we conclude
the work in Section VI.
2. Related Work
In order to reduce unnecessary flooding traffic and im-
prove search performance, two approaches have typically
been used to improve from the flooding-based search
mechanism in unstructured P2P systems. Rather than
flooding a query to all neighbors, the first approach routes
queries to peers that are likely to have the requested items
by some heuristics based on maintained statistic informa-
This work was partially supported by the US National Science
Foundation (NSF) under grant ACI-0325760, by Michigan State Univer-
sity IRGP Grant 41114, and by Hong Kong RGC Grant HKUST6161/03E.
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
tion [10, 22, 23]. In the second approach, a peer keeps in-
dices of other peers’ sharing information or caches query
responses in hoping that subsequent queries can be satisfied
quickly by the cached indices or responses [6, 11, 12, 14, 18,
20, 22]. The performance gains of both approaches are
seriously limited by the topology mismatching problem.
The third approach is based on overlay topology opti-
mization that is closely related to what we are presenting in
this paper. Here we briefly introduce three types of solu-
tions and their comparisons with our approach. End system
multicast, Narada, was proposed in [5], which first con-
structs a rich connected graph on which to further construct
shortest path spanning trees. Each tree rooted at the corre-
sponding source using well-known routing algorithms. This
approach introduces large overhead of forming the graph
and trees in a large scope, and does not consider the dy-
namic joining and leaving characteristics of peers. The
overhead of Narada is proportional to the multicast group
size. This approach is infeasible to large-scale P2P systems.
Researchers have also considered to cluster close peers
based on their IP addresses (e.g., [7, 13]). We believe there
are two limitations for this approach. First, the mapping
accuracy is not guaranteed by this approach. Second, this
approach may affect the search scope in P2P networks. In
contrast, our technique is measurement based and can ac-
curately and dynamically connect the physically closer
peers, and disconnect physically distant peers. Furthermore,
our scheme does not shrink the search scope.
Recently, researchers in [21] have proposed to measure
the latency between each peer to multiple stable Internet
servers called ``landmarks”. The measured latency is used
to determine the distance between peers. This measurement
is conducted in a global P2P domain and needs the support
of additional landmarks. Similarly, this approach also af-
fects the search scope in P2P systems. In contrast, our
measurement is conducted in many small regions, signifi-
cantly reducing the network traffic.
Gia [4] introduced a topology adaptation algorithm to
ensure that high capacity nodes are indeed the ones with
high degree and low capacity nodes are within short reach
of high capacity nodes. It addresses a different matching
problem in overlay networks, but does not address the to-
pology mismatching problem between the overlay and
physical networks.
A preliminary design of ACE, which is called AOTO,
has been discussed in [8]. We have also proposed a loca-
tion-aware topology matching scheme [9], in which each
peer issues a detector in a small region so that the peers
receiving the detector can record relative delay information.
Based on the delay information, a receiver can detect and
cut most of the inefficient and redundant logical links, and
add closer nodes as its direct neighbors. However, this
approach creates slightly more overhead and requires that
the clocks in all peers be synchronized.
3. Adaptive Connection Establishment
In unstructured P2P systems, the most popular search
mechanism in use is to blindly “flood" a query to the net-
work among peers or among supernodes. A query is
broadcast and rebroadcast until a certain criterion is satis-
fied. If a peer receiving the query can provide the requested
object, a response message will be sent back to the source
peer along the inverse of the query path. This mechanism
ensures that the query will be “flooded” to as many peers as
possible within a short period of time in a P2P overlay
network. A query message will also be dropped if the query
message has visited the peer before. In this section, we use
examples to explain the unnecessary traffic incurred by
flooding based search and the topology mismatching
problem. Then we introduce the design of proposed ap-
proach, ACE.
3.1. Unnecessary Traffic by Flooding
Figure 1 shows an example of a P2P overlay topology
where solid lines denote overlay connections among logical
P2P neighbors. Consider the case when node S sends a
query. A solid arrow represents a delivery of the query
message along one logical connection. The query is relayed
by many peers, which incurs a lot of unnecessary traffic.
For example, after node S sends the query to L and M, since
none of L or M knows the other one will receive the same
query from S, they will forward the query to each other. The
pair of transmission on the logical link LM is unnecessary.
In such a simple overlay, node M will receive the same
query message for 4 times. In this case, it is clear that the
search scope of the query from node S will not shrink
without logical connections of LM, MQ, LQ and MP.
S
L M P
Q
Figure 1: An example of P2P overlay
3.2. Topology Mismatching
As we have discussed the stochastic peer connection
and peers randomly joining and leaving a P2P network can
cause topology mismatching between the P2P logical
overlay network and the physical underlying network. For
example, Figures 2(a) and 2(b) are two overlay topologies
on top of the underlying physical topology shown in Figure
2(c). Suppose nodes S and B are in the same autonomous
system (AS) at Michigan State University (MSU) in USA,
while nodes A and C are in another AS at Tsinghua Uni-
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
versity in China. So we can assume that the physical link
delay between nodes S and C is much longer than the link
between nodes S and B, or nodes A and C in Figure 2(c).
Clearly, in the inefficient mismatching overlay of Figure
2(a), the query message from source S will traverse the
longest link SC three times, which is a scenario of topology
mismatching. If we can construct an efficient overlay
shown in Figure 2(b), the message needs to traverse all the
physical links in Figure 2(c) only once.
S
MSUB
D
C
Tsinghua A
B S C A
S A B C
(a) Mismatching Overlay
(b) Matching Overlay
(c) Underlying Physical Topology
Figure 2: Topology mismatching problem
3.3. Design of ACE
Optimizing inefficient overlay topologies can funda-
mentally improve P2P search efficiency. The proposed
approach, Adaptive Connection Establishment (ACE),
includes three phases.
Phase 1: we use network delay between two nodes as a
metric for measuring the cost between nodes. We modify
the Limewire implementation of Gnutella 0.6 P2P protocol
by adding one routing message type. Each peer probes the
costs with its immediate logical neighbors and forms a
neighbor cost table. Two neighboring peers exchange their
neighbor cost tables so that a peer can obtain the cost be-
tween any pair of its logical neighbors. Thus, a small
overlay topology of a source peer and all its logical
neighbors is known to the source peer.
Phase 2: based on obtained neighbor cost tables, a
minimum spanning tree among each peer and its immediate
logical neighbors then can be built by simply using an al-
gorithm like PRIM which has a computation complexity of
O(m2
), where m is the number of logical neighbors of the
source node. Now the message routing strategy of a peer is
to select the peers that are the direct neighbors in the mul-
ticast tree to send its queries, instead of flooding queries to
all neighbors. An example is shown in Figure 3. In Figure
3(a), the traffic incurred by node S’s flooding of messages
to its direct neighbor E, F, and G is: 4+14+14+15+6+20+
20=93. After phase 2, we can see the forwarding connec-
tions are changed as shown in Figure 3(b), and the total
traffic cost becomes: 6+4+14=24.
In Figure 3(b), node S sends a message only to nodes E
and F and expects that node E will forward the message to
node G. Note that in this phase, even node S does not flood
its query message to node G any more. S still retains the
connections with G and keeps exchanging the neighbor cost
tables. We call node G non-flooding neighbor of node S,
which is the direct neighbor potentially to be replaced in the
next phase.
E
S G
F
4 14
15
6 20
E
S G
F
4 14
6
( a) ( b)
Figure 3: Second phase in ACE
Phase 3: this phase reorganizes the overlay topology.
Note that each peer has a neighbor list which is further
divided into flooding neighbors and non-flooding neighbors
in Phase 2. Each peer also has the neighbor cost tables of all
its neighbors. In this phase, it tries to replace those physi-
cally far away neighbors by physically close by neighbors,
thus minimizing the topology mismatching traffic. An ef-
ficient method to identify such a candidate peer to replace a
far away neighbor is critical to the system performance.
Many methods may be proposed. In ACE, a non-flooding
neighbor may be replaced by one of the non-flooding
neighbor’s neighbor.
The basic concept of phase 3 is illustrated in Figure 4.
In Figure 4(a), node S is probing the distance to one of its
non-flooding neighbor G’s neighbors, for example, H. If
SH is smaller than SG, as shown in Figure 4(b), connection
SG will be cut. If SG is smaller than SH, but S finds that the
cost between nodes G and H is even larger than the cost
between nodes S and H, as shown in Figure 4(c), S will
keep H as a new neighbor. Since the algorithm is executed
in each peer independently, S cannot let G to remove H
from its neighbor list. However, as long as S keeps both G
and H as its logical neighbors, we may expect that node H
will become a non-flooding neighbor to node G after node
G’s Phase 2 since node G expects S to forward messages to
H to reduce unnecessary traffic. Then G will try to find
another peer to replace H as its neighbor. After knowing
that H is no longer a neighbor to G from periodically ex-
changed neighbor cost tables from node G (or from node H),
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
S will cut connection SG, although S has already stopped
sending query messages to G for a period of time since the
spanning tree has been built for S. Obviously if SH is larger
than SG and GH, as shown in Figure 4 (d), this connection
will not be built and S will keep probing other G’s director
neighbors.
E
S G
F
6
(a) S probes G’s neighbor H
15
H
E
S G
F
6
(b)SH<SG,replace G by H
15
H8
E
S G
F
6
(c)SH>SG, but SH<GH, S keeps
H as a direct neighbor
15
H
28
18
E
S G
F
6
(d)SH>SG and SH>GH, S starts
probing next G’ neighbor
15
H
18
28
?
Figure 4: Third phase in ACE
3.4. Depth of Optimization
We define h-neighbor closure of a source peer as the set
of peers within h hops from the source peer. For example, a
2-neighbor closure includes the source peer, all its direct
neighbors, and all the neighbors of the direct neighbors. In
the initial ACE described in Section 3.3, the optimization is
only conducted within 1-neighbor closure (among each
source peer and all its direct logical neighbors). We can
enlarge the optimization scope by increasing the value of h.
A larger value of h leads to a better topology matching
improvement, but a higher overhead due to the extra in-
formation exchanging. We will further study in this direc-
tion with the aim of reaching a good performance level by
considering the tradeoffs between the topology optimiza-
tion improvement and the information exchange overhead.
Figure 5 illustrates the overlay trees constructed for
each peer within 1-neighbor closures. Peer A initiates a
query. The bold links denote the links on the tree, and the
arrows indicate the query directions. The query is sent from
peer A to B and D, since both B and D are direct logical
neighbors of A on the overlay tree. Peer B then forwards the
query to E, and D forwards the query to E. Peer E finally
forwards the query to D and C. Peer C will not forward the
query because only E is its direct neighbor, but E is the peer
who forwards the query to C. So the query process termi-
nates. The query paths and corresponding costs for this
query are listed in Table 1. The total cost for this query from
peer A to be forwarded to all other peers through the overlay
trees built in 1-neighbor closures is 68. The number of
unnecessary messages is reduced from 3 to 1 compared
with blind flooding in this example. In 1-neighbor closure,
the query message traverses one path twice, which is E–D.
In blind flooding, the same query message traverses 3 paths
twice, which is B–D, D–E, and C–E.
Figure 6 and Table 2 illustrate the overlay tree built in
2-neighbor closure and the corresponding query direction
and cost. The total cost to forward a query from peer A to all
other peers is 39. No path is traversed twice by ACE with
h=2 in this example. We can see that the number of un-
necessary messages and the total traffic is decreased as the
value of h is increased.
A
B D
C E
10
12
7
148
20
15
A
B D
C E
10
12
7
148
20
15
A
B D
C E
10
12
7
148
20
15
Original topology Overlay tree rooted at A Overlay tree rooted at B
A
B D
C E
10
12
7
148
20
15
A
B D
C E
10
12
7
148
20
15
A
B D
C E
10
12
7
8
20
15
14
Overlay tree rooted at C Overlay tree rooted at D Overlay tree rooted at E
Figure 5: Overlay trees built in 1-neighbor closure
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
Table 1. Query paths and costs on overlay trees
built in 1-neighbor closure
Query Path
From To Corresponding Cost
A B, D 10+15=25
B E 8
D E 14
E C, D 7+14=21
Total Cost 68
Table 2. Query paths and costs on the overlay
tree built in 2-neighbor closure
Query Path
From To
Corresponding
Cost
A B 10
B E 8
E C, D 7+14=21
Total Cost 39
A
B D
C E
10
12
7
148
20
15
Figure 6: Overlay tree built in 2-neighbor closure
4. Simulation Methodology
We describe the topology generation, performance
metrics used in our simulations, our simulation setup, and
parameter settings in this section.
4.1. Topology Generation
Both physical topologies and logical overlay topologies
which can accurately reflect the topological properties of
real networks in each layer are needed in the simulation
study. Previous studies have shown that both large scale
Internet physical topologies [6] and P2P overlay topologies
[7] follow small world and power law properties. Power law
describes the node degree while small world describes
characteristics of path length and clustering coefficient [9].
The study in [6] found that the topologies generated using
the AS Model have the properties of small world and power
law. BRITE is a topology generation tool that provides the
option to generate topologies based on the AS Model. We
generate 10 physical topologies each with 20,000 nodes.
The logical topologies are generated with the number of
peers (nodes) ranging from 1000 to 8000. For each given
number of nodes, we generate logical topologies with av-
erage edge connections between 1 and 20. Note that an edge
connection of value m indicates that there are 2m logical
neighbors for each peer.
4.2. Performance Metrics
A well-designed search mechanism should seek to op-
timize both efficiency and Quality of Service (QoS). Effi-
ciency focuses on better utilizing resources, such as band-
width and processing power, while QoS focuses on
user-perceived qualities, such as number of returned results
and response time. In unstructured P2P systems, the QoS of
a search mechanism generally depends on the number of
peers being explored (queried), response time, and traffic
overhead. If more peers can be queried by a certain query, it
is more likely that the requested object can be found. So we
use several performance metrics as follows.
Traffic cost is one of the parameters seriously concerned
by network administrators. Heavy network traffic limits the
scalability of P2P networks [16] and is also a reason why a
network administrator may prohibit P2P applications. We
define the traffic cost as network resource used in an in-
formation search process of P2P systems, which is a func-
tion of consumed network bandwidth and other related
expenses.
Search scope is defined as the number of peers that
queries have reached in an information search process.
Thus, with the same traffic cost, we aim to maximize the
search scope; while with the same search scope, we aim to
minimize the traffic cost.
Response time of a query is one of the parameters con-
cerned by P2P users. We define response time of a query as
the time period from when the query is issued until when
the source peer received a response result from the first
responder.
Optimization rate is defined as gain/penalty ratio, i.e.,
the ratio of query traffic reduction and overhead traffic
increment, in order to study the tradeoffs between query
traffic and overhead traffic by changing the value of opti-
mization depth of h. One major factor to impact the traffic
overhead is the frequency of exchanging cost information.
We define frequency ratio, R, as the ratio of query fre-
quency to use the overlay trees to the frequency of cost
information changes. For a given P2P network topology, if
the frequency of the topology and cost changes and query
frequency can be measured so that R is determined, we
should be able to adjust the value of h to achieve optimal
gain/penalty ratio. ACE is worth to use only if the
gain/penalty ratio is larger than 1.
4.3. A Dynamic P2P Environment
P2P networks are highly dynamic with peers joining
and leaving frequently. The observations in [19] have
shown that over 20% of the logical connections in a P2P last
1 minute or less, and around 60% of the IP addresses keep
active in FastTrack for no more than 10 minutes each time
after they join the system. The measurement reported in [17]
indicated that the median up-time for a node in Gnutella and
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
0 2 4 6 8 10
0
10
20
30
40
50
60
70
80
90
100
ACE optimization steps
T
ra
ffic
c
o
s
t
pe
r
qu
e
r
y
(1
0
5
)
10 neighbors
8 neighbors
6 neighbors
4 neighbors
0 2 4 6 8 10
8
10
12
14
16
18
20
22
24
26
ACE optimization steps
R
e
s
po
n
s
e
tim
e
pe
r
qu
e
r
y
10 neighbors
8 neighbors
6 neighbors
4 neighbors
Figure 7 : Traffic reduction vs. optimization step Figure 8: Average Response time vs. opt. step
5 10 15 20 25 30
0
10
20
30
40
50
60
Queries (105
)
T
ra
ffic
c
o
s
t
pe
r
qu
e
r
y
(1
0
5
)
Gnutella-like
ACE
5 10 15 20 25 30
0
5
10
15
20
25
Queries (105
)
R
e
s
po
n
s
e
tim
e
pe
r
qu
e
r
y
Gnutella-like
ACE
Figure 9: Average traffic cost reduction in a
dynamic P2P environment
Figure 10: Average response time reduction in a
dynamic P2P environment
Napster is 60 minutes. Studies in [3] have argued that
measurement according to host IP addresses underestimates
peer-to-peer host availability and have shown that each host
joins and leaves a P2P system 6.4 times a day on average,
and over 20% of the hosts arrive and depart every day.
Although the numbers they provided are different to some
extent, they share the same point that the peer population is
quite transient. We simulate the joining and leaving be-
havior of peers via turning on/off logical peers. In our
simulation, every node issues 0.3 queries per minute, which
is calculated from the observation data shown in [20], i.e.,
12,805 unique IP addresses issued 1,146,782 queries in 5
hours. When a peer joins, a lifetime in seconds will be
assigned to the peer. The lifetime of a peer is defined as the
time period the peer will stay in the system. The lifetime is
generated according to the distribution observed in [17].
The mean of the distribution is chosen to be 10 minutes [19].
The value of the variance is chosen to be half of the value of
the mean. The lifetime will be decreased by one after
passing each second. A peer will leave in next second when
its lifetime reaches zero. During each second, there are a
number of peers leaving the system. We then randomly pick
up (turn on) the same number of peers from the physical
network to join the overlay.
5. Performance Evaluation
We have simulated ACE for all the generated logical
topologies on top of each of the 10 generated physical to-
pologies with 20,000 nodes. We have also simulated ACE
in a real-world P2P topology (based on DSS Clip2 trace).
We obtained consistent results on the real-world topology
and the generated topologies. We representatively present
the results based on 8,000 peers only.
5.1. ACE in Static Environments
In our first simulation, we study the effectiveness of
ACE in a static P2P environment where the peers do not
join and leave frequently. This will show that without
changing the overlay topology, how many optimization
steps are required to reach a better topology matching.
As we have discussed, the first goal of ACE schemes is
to reduce traffic cost as much as possible while retaining the
same search scope. Figure 7 shows the traffic cost reduction
of ACE, where the curve of ‘cn neighors’ means the average
traffic cost caused by a query to cover the search scope in
x-axis, and in the system the average number of logical
neighbors is cn. We can see that the traffic cost decreases
when ACE is conducted multiple times, where the search
scope is all 8000 peers. ACE may reduce traffic cost by
around 65% and it converges in around 10 steps.
The simulation results in Figure 8 show that ACE can
shorten the query response time by about 35% after 10 steps.
The tradeoff between query traffic cost and response time
has been discussed in [23]. P2P systems with a large num-
ber of average connections offer a faster search speed while
increasing traffic. One of the strengths of ACE schemes is
that it reduces both query traffic cost and response time
without decreasing the query success rate.
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
70
80
90
100
Depth of Neighbor Closure (h)
Q
u
e
r
y
T
ra
ffic
R
e
d
u
c
tio
n
R
a
te
(%
)
E=4
E=8
E=12
E=16
Figure 11: Query traffic
reduction rate
1 2 3 4 5 6 7 8
0
5000
10000
15000
Depth of Neighbor Closure (h)
O
ve
rh
e
a
d
T
ra
ffic
E=4
E=8
E=12
E=16
Figure 12: Overhead traffic
1 2 3 4 5 6 7 8
0.5
1
1.5
2
Depth of Neighbor Closure (h)
O
ptim
iza
tio
n
R
a
te
Base
R=1.0
R=1.2
R=1.4
R=1.6
R=1.8
R=2.0
Figure 13: Optimization rate
(E=16)
1 2 3 4 5 6 7 8
0
0.5
1
1.5
2
2.5
3
3.5
Depth of Neighbor Closure (h)
O
ptim
iz
a
tio
n
R
a
te
Base
R=1.0
R=1.2
R=1.4
R=1.6
R=1.8
R=2.0
R=2.2
R=2.4
Figure 14: Optimization rate
(E=4)
1 1.5 2 2.5
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
Frequency Ratio
O
ptim
iza
tio
n
R
a
te
Base
h=1
h=2
h=3
h=4
Figure 15: Optimization rate
(E=16)
1 1.5 2 2.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Frequency Ratio
O
ptim
iza
tio
n
R
a
te
Base
h=1
h=2
h=3
h=4
h=5
h=6
h=7
h=8
Figure 16: Optimization rate
(E=4)
5.2. ACE in Dynamic Environments
We further evaluate the effectiveness of ACE in dy-
namic P2P systems. In this simulation, we assume that peer
average lifetime in a P2P system is 10 minutes; 0.3 queries
are issued by each peer per minute; and the frequency for
ACE at every peer to conduct optimization operations is
twice per minute. Figure 9 shows the average traffic cost
per query of Gnutella-like P2P systems, ACE enabled
Gnutella. Note that here the traffic cost includes the over-
head needed by each operation in the optimization steps.
ACE could significantly reduce the traffic cost while re-
taining the same search scope. Figure 10 shows that with
reduction of the traffic, the queries’ average response times
of ACE are also reduced in a dynamic environment.
In a dynamic P2P environment, we simulate ACE em-
ployed together with other approaches, such as response
index caching scheme or some forwarding based strategies.
We obtained very good results. For example, using a
200-item size cache at each peer, ACE with index cache
will reduce 75% of the traffic cost and 70% of the response
time. Due to the page limitation, we do not show the de-
tailed curves here.
5.3. The Impact of Optimization Depth
Figure 11 illustrates the query traffic reduction rate
over blind flooding versus the depths of neighbor closure to
construct overlay trees. Different curves correspond to the
performance on different topologies with different values of
E, where E is the average number of neighbors. For a given
depth of neighbor closure, the reduction rate increases with
increased average number of neighbors. For a given aver-
age number of neighbors, the reduction rate also increases
as the depths of neighbor closure increases. There is a
threshold of depth for each E, from which the query traffic
is hard to be further reduced.
Figure 12 shows the overhead traffic versus the depth
of neighbor closure. The overhead traffic increases as the
depths of neighbor closure increases, or as the average
number of neighbors increases.
Figures 13 and 14 show the optimization rate versus
the depth of neighbor closure with E=16 and E=4, respec-
tively. Different curves in each figure correspond to dif-
ferent values of R. Based on this figure, we can determine,
for a given value of R, the minimal value of h to achieve
performance gain in ACE. The minimal value of h is de-
fined as the value of h that leads to an optimization rate of 1.
To achieve performance gain, we should choose the depth
values that can lead to optimization rates that are greater
than 1. When we increase R, the optimization rate increases
for a given depth value (h) and the minimal value of h to
achieve performance gain decreases. As h increases, the
optimization rate also increases. However, there is a
threshold of h from which the optimization rate is hard to
increase anymore. Figures 13 and 14 also show that for a
large value of E, a small minimal value of h is needed to
achieve performance gain for a given value of R.
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE
Figures 15 and 16 show the optimization rate versus
frequency ratio with E=16 and E=4, respectively. When the
value R increases, the optimization rate significantly in-
creases. A large value of R means that the query frequency
is high and the tree reconstruction frequency is low. For a
given network after a period of time, if we can find a rela-
tive stable value of R, we will be able to find a minimal
value of h to construct overlay trees and achieve perform-
ance gain in ACE. We can see from Figure 15 that for R=1,
the optimization rate is always less than 1. Thus, using ACE
under an environment with R=1, the given topology will not
improve any performance. From Figure 15, the minimal
value of h is 2 for R=1.5, and is 1 for R=2. Comparing
Figure15 with Figure 16, for the same value of R, the
minimal value of h is small for a large value of E. For ex-
ample, for R=2, the minimal value of h is 1 for E=16, while
the minimal value of h is 5 for E=4. Thus, ACE is more
effective in a topology with high connectivity density.
6. Conclusion
In this paper, we propose a distributed approach to
solving overlay mismatching problem. Our simulation
shows that the average cost of each query to reach the same
scope of nodes is reduced by about 65% when using our
proposed ACE in a Gnutella-like P2P network without
losing any autonomy feature, and the average response time
of each query can be reduced by 35%. The proposed ACE
technique is fully distributed, easy to implement, and
adaptive to the dynamic nature of P2P systems. Further-
more, a larger diameter leads to a better topology optimi-
zation rate and a higher overhead due to extra information
exchanging. ACE is more effective in a topology with high
connectivity density. It will make the decentralized flood-
ing-based P2P file sharing systems more scalable and effi-
cient.
It is very important for ACE to quickly identify the best
candidate from a non-flooding neighbor’s neighbor list to
minimize replacement overhead. In our simulations, we
only use random policy to replace a non-flooding neighbor
by a random selected candidate. We are studying several
alternatives to choose the candidate. For example, the naïve
policy simply disconnects the source node’s most expen-
sive neighbor. The source node will probe the costs to some
other nodes, and try to find a less expansive node as a re-
placement of the disconnected neighbor. The second one is
closest policy in which the source will probe the costs to all
of the non-flooding neighbor’s neighbors, and select the
closest one.
References
[1] KaZaA, http://www.kazaa.com
[2] Gnutella, http://gnutella.wego.com/
[3] R. Bhagwan, S. Savage, and G. M. Voelker, "Understanding
Availability," Proceedings of the 2nd International Workshop
on Peer-to-Peer Systems (IPTPS), 2003.
[4] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S.
Shenker, "Making Gnutella-like P2P Systems Scalable," Pro-
ceedings of ACM SIGCOMM, 2003.
[5] Y. Chu, S. G. Rao, and H. Zhang, "A case for end system
multicast," Proceedings of ACM SIGMETRICS, 2000.
[6] E. Cohen and S. Shenker, "Replication strategies in un-
structured peer-to-peer networks," Proceedings of SIGCOMM,
2002.
[7] B. Krishnamurthy and J. Wang, "Topology modeling via
cluster graphs," Proceedings of SIGCOMM Internet Meas-
urement Workshop, 2001.
[8] Y. Liu, Z. Zhuang, L. Xiao, and L. M. Ni, "AOTO: Adaptive
Overlay Topology Optimization in Unstructured P2P Systems,"
Proceedings of GLOBECOM, San Francisco, USA, 2003.
[9] Y. Liu, X. Liu, L. Xiao, L. M. Ni, and X. Zhang, "Loca-
tion-Aware Topology Matching in Unstructured P2P Systems,"
Proceedings of INFOCOM, 2004.
[10] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, "Search and
replication in unstructured peer-to-peer networks," Proceedings
of the 16th ACM International Conference on Supercomputing,
2002.
[11] E. P. Markatos, "Tracing a large-scale peer to peer system:
an hour in the life of gnutella," Proceedings of the 2nd
IEEE/ACM International Symp. on Cluster Computing and the
Grid, 2002.
[12] D. A. Menasce and L. Kanchanapalli, "Probabilistic Scal-
able P2P Resource Location Services," ACM SIGMETRICS
Performance Evaluation Review, vol. 30, pp. 48-58, 2002.
[13] V. N. Padmanabhan and L. Subramanian, "An investigation
of geographic mapping techniques for Internet hosts," Pro-
ceedings of ACM SIGCOMM, 2001.
[14] S. Patro and Y. C. Hu, "Transparent Query Caching in
Peer-to-Peer Overlay Networks," Proceedings of the 17th In-
ternational Parallel and Distributed Processing Symposium
(IPDPS), 2003.
[15] M. Ripeanu, A. Iamnitchi, and I. Foster, "Mapping the
Gnutella Network," IEEE Internet Computing, 2002.
[16] Ritter,Why Gnutella can't scale. No, really,
http://www.tch.org/gnutella.html
[17] S. Saroiu, P. Gummadi, and S. Gribble, "A Measurement
Study of Peer-to-Peer File Sharing Systems," Proceedings of
Multimedia Computing and Networking (MMCN), 2002.
[18] S. Saroiu, K. P.Gummadi, R. J. Dunn, S. D. Gribble, and H.
M. Levy, "An Analysis of Internet Content Delivery Systems,"
Proceedings of the 5th Symposium on Operating Systems De-
sign and Implementation, 2002.
[19] S. Sen and J. Wang, "Analyzing peer-to-peer traffic across
large networks," Proceedings of ACM SIGCOMM Internet
Measurement Workshop, 2002.
[20] K. Sripanidkulchai,The popularity of Gnutella queries and
its implications on scalability,
http://www2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.ht
ml
[21] Z. Xu, C. Tang, and Z. Zhang, "Building topology-aware
overlays using global soft-state," Proceedings of ICDCS, 2003.
[22] B. Yang and H. Garcia-Molina, "Efficient search in
peer-to-peer networks," Proceedings of ICDCS, 2002.
[23] Z. Zhuang, Y. Liu, L. Xiao, and L. M. Ni, "Hybrid Periodical
Flooding in Unstructured Peer-to-Peer Networks," Proceedings
of International Conference on Parallel Processing, 2003.
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04)
1063-6927/04 $20.00 © 2004 IEEE

More Related Content

PDF
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
PDF
P2P DOMAIN CLASSIFICATION USING DECISION TREE
PDF
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
PDF
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
PDF
G0434045
PDF
A Systematic Review of Congestion Control in Ad Hoc Network
DOCX
Transfer reliability and congestion control strategies in opportunistic netwo...
PDF
Flexible Bloom for Searching Textual Content Based Retrieval System in an Uns...
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
P2P DOMAIN CLASSIFICATION USING DECISION TREE
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
G0434045
A Systematic Review of Congestion Control in Ad Hoc Network
Transfer reliability and congestion control strategies in opportunistic netwo...
Flexible Bloom for Searching Textual Content Based Retrieval System in an Uns...

What's hot (11)

PDF
The International Journal of Engineering and Science (IJES)
PDF
An Extensive Literature Review of Various Routing Protocols in Delay Tolerant...
PDF
A Proximity-Aware Interest-Clustered P2P File Sharing System
PDF
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
PDF
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
PDF
Study of the topology mismatch problem in peer to-peer networks
PDF
Routing performance of structured overlay in Distributed Hash Tables (DHT) fo...
PDF
C0351725
PDF
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
The International Journal of Engineering and Science (IJES)
An Extensive Literature Review of Various Routing Protocols in Delay Tolerant...
A Proximity-Aware Interest-Clustered P2P File Sharing System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
International Journal of Computational Engineering Research(IJCER)
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
Study of the topology mismatch problem in peer to-peer networks
Routing performance of structured overlay in Distributed Hash Tables (DHT) fo...
C0351725
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Ad

Viewers also liked (20)

PDF
Libro 30-ideas
PDF
Leveraging Global Events to Reach Your Social Audience
PDF
Chon ram-may-tinh
PDF
Rich Social: Not Your Grandfather's Rich Media
PDF
How Auto Brands are Using Social to Maximize Marketing Strategies
PPT
Cmns 480 presentation - Tactical Communications
PPT
Crim 100 terrorism presentation
PPTX
Accelerate Native Advertising Using Rich Media On Social
PDF
Improving energy efficiency of location sensing on smartphones
PDF
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
PDF
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
PDF
Dynamic Layer Management in Super-Peer Architectures
PDF
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
PPTX
Facebook Rich Media Announcement: What does it all mean?
PDF
Optimizing JMS Performance for Cloud-based Application Servers
PPTX
The State of Social Rich Media
PDF
Real time social media marketing in action
PPTX
Your Guide to Social Advertising for the Holiday Season
PDF
WebAccel: Accelerating Web access for low-bandwidth hosts
PDF
The 7 Elements of a Perfect Social Media Campaign
Libro 30-ideas
Leveraging Global Events to Reach Your Social Audience
Chon ram-may-tinh
Rich Social: Not Your Grandfather's Rich Media
How Auto Brands are Using Social to Maximize Marketing Strategies
Cmns 480 presentation - Tactical Communications
Crim 100 terrorism presentation
Accelerate Native Advertising Using Rich Media On Social
Improving energy efficiency of location sensing on smartphones
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
Dynamic Layer Management in Super-Peer Architectures
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Facebook Rich Media Announcement: What does it all mean?
Optimizing JMS Performance for Cloud-based Application Servers
The State of Social Rich Media
Real time social media marketing in action
Your Guide to Social Advertising for the Holiday Season
WebAccel: Accelerating Web access for low-bandwidth hosts
The 7 Elements of a Perfect Social Media Campaign
Ad

Similar to A Distributed Approach to Solving Overlay Mismatching Problem (20)

PDF
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
PDF
Bx32903907
PDF
Fu2510631066
PDF
PDF
S26117122
DOCX
On optimizing overlay topologies for search
PDF
Evaluation of a topological distance
PDF
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
PDF
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
PDF
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
PDF
Analyse the performance of mobile peer to Peer network using ant colony optim...
PDF
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
PDF
Research Inventy : International Journal of Engineering and Science
PDF
An efficient hybrid peer to-peersystemfordistributeddatasharing
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
PDF
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
Bx32903907
Fu2510631066
S26117122
On optimizing overlay topologies for search
Evaluation of a topological distance
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Analyse the performance of mobile peer to Peer network using ant colony optim...
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
Research Inventy : International Journal of Engineering and Science
An efficient hybrid peer to-peersystemfordistributeddatasharing
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems

More from Zhenyun Zhuang (19)

PDF
Designing SSD-friendly Applications for Better Application Performance and Hi...
PDF
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
PDF
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
PDF
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
PDF
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PDF
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
PDF
Client-side web acceleration for low-bandwidth hosts
PDF
A3: application-aware acceleration for wireless data networks
PDF
Mutual Exclusion in Wireless Sensor and Actor Networks
PDF
Hazard avoidance in wireless sensor and actor networks
PDF
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
PDF
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
PDF
Enhancing Intrusion Detection System with Proximity Information
PDF
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
PDF
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
PDF
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
PDF
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
PDF
OS caused Large JVM pauses: Deep dive and solutions
PDF
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Designing SSD-friendly Applications for Better Application Performance and Hi...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
Client-side web acceleration for low-bandwidth hosts
A3: application-aware acceleration for wireless data networks
Mutual Exclusion in Wireless Sensor and Actor Networks
Hazard avoidance in wireless sensor and actor networks
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Enhancing Intrusion Detection System with Proximity Information
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
OS caused Large JVM pauses: Deep dive and solutions
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...

Recently uploaded (20)

PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PDF
Computer organization and architecuture Digital Notes....pdf
PPTX
Petroleum Refining & Petrochemicals.pptx
PPTX
Principal presentation for NAAC (1).pptx
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
Implantable Drug Delivery System_NDDS_BPHARMACY__SEM VII_PCI .pdf
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PPTX
Feature types and data preprocessing steps
PPTX
Software Engineering and software moduleing
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
mechattonicsand iotwith sensor and actuator
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPTX
Building constraction Conveyance of water.pptx
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
August -2025_Top10 Read_Articles_ijait.pdf
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
Computer organization and architecuture Digital Notes....pdf
Petroleum Refining & Petrochemicals.pptx
Principal presentation for NAAC (1).pptx
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Implantable Drug Delivery System_NDDS_BPHARMACY__SEM VII_PCI .pdf
MLpara ingenieira CIVIL, meca Y AMBIENTAL
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
Feature types and data preprocessing steps
Software Engineering and software moduleing
Module 8- Technological and Communication Skills.pptx
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Abrasive, erosive and cavitation wear.pdf
mechattonicsand iotwith sensor and actuator
Amdahl’s law is explained in the above power point presentations
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Building constraction Conveyance of water.pptx
distributed database system" (DDBS) is often used to refer to both the distri...
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY

A Distributed Approach to Solving Overlay Mismatching Problem

  • 1. A Distributed Approach to Solving Overlay Mismatching Problem Yunhao Liu1 , Zhenyun Zhuang1 , Li Xiao1 and Lionel M. Ni2 1 Department of Computer Science & Engineering Michigan State University East Lansing, Michigan, USA {liuyunha, zhuangz1,lxiao}@cse.msu.edu 2 Department of Computer Science Hong Kong University of Science and Technology Kowloon, Hong Kong, China [email protected] Abstract In unstructured peer-to-peer (P2P) systems, the mechanism of a peer randomly joining and leaving a P2P network causes topology mismatching between the P2P logical overlay network and the physical underlying net- work, causing a large volume of redundant traffic in the Internet. In order to alleviate the mismatching problem, we propose Adaptive Connection Establishment (ACE), an algorithm of building an overlay multicast tree among each source node and the peers within a certain diameter from the source peer, and further optimizing the neighbor con- nections that are not on the tree, while retaining the search scope. Our simulation study shows that this approach can effectively solve the mismatching problem and significantly reduce P2P traffic. We further study the tradeoffs between the topology optimization rate and the information ex- change overhead by changing the diameter used to build the tree. 1. Introduction In unstructured P2P systems, queries are flooded among peers (such as in Gnutella [2]) or among supernodes (such as in KaZaA [1]). In such systems, all participating peers form a P2P network over a physical network. A P2P net- work is an abstract, logical network called an overlay net- work. When a new peer wants to join a P2P network, a bootstrapping node provides the IP addresses of a list of existing peers in the P2P network. The new peer then tries to connect with these peers. If some attempts succeed, the connected peers will be the new peer's neighbors. Once this peer connects into a P2P network, the new peer will peri- odically ping the network connections and obtain the IP addresses of some other peers in the network. These IP addresses are cached by this new peer. When a peer leaves the P2P network and then wants to join the P2P network again (no longer the first time), the peer will try to connect to the peers whose IP addresses have already been cached. This mechanism of a peer joining a P2P network and the fact of a peer randomly joining and leaving causes an in- teresting matching problem between a P2P overlay network topology and the underlying physical network topology. Studies in [15] show that only 2 to 5 percent of Gnutella connections link peers within a single autonomous system (AS), but more than 40 percent of all Gnutella peers are located within the top 10 ASes. This means that most Gnutella-generated traffic crosses AS borders so as to in- crease topology mismatching costs. The same message can traverse the same physical link multiple times, causing large amount of unnecessary traffic. The objective of this paper is to minimize the effect due to topology mismatching. We propose the Adaptive Con- nection Establishment (ACE) that builds an overlay multi- cast tree among each source node and the peers within a certain diameter from the source peer, and further optimizes the neighbor connections that are not on the tree, while retaining the search scope. ACE is scalable and completely distributed in the sense that it does not require global knowledge of the whole overlay network when each node is optimizing the organization of its logical neighbors. Our simulations show that ACE can significant improve the performance. We also show that a larger diameter leads to a better topology optimization rate and a higher overhead due to extra information exchanging. Our experiments and discussions provide a guide on how to achieve a good per- formance by considering the tradeoffs between the topol- ogy optimization rate and the information exchange over- head in selecting the diameter to determine the peers to form the multicast tree for a source peer. The rest of the paper is organized as follows. Section 2 discusses related work. Section III presents the adaptive connection establishment (ACE) scheme. Section IV de- scribes our simulation methodology. Performance evalua- tion of the ACE is presented in Section V, and we conclude the work in Section VI. 2. Related Work In order to reduce unnecessary flooding traffic and im- prove search performance, two approaches have typically been used to improve from the flooding-based search mechanism in unstructured P2P systems. Rather than flooding a query to all neighbors, the first approach routes queries to peers that are likely to have the requested items by some heuristics based on maintained statistic informa- This work was partially supported by the US National Science Foundation (NSF) under grant ACI-0325760, by Michigan State Univer- sity IRGP Grant 41114, and by Hong Kong RGC Grant HKUST6161/03E. Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 2. tion [10, 22, 23]. In the second approach, a peer keeps in- dices of other peers’ sharing information or caches query responses in hoping that subsequent queries can be satisfied quickly by the cached indices or responses [6, 11, 12, 14, 18, 20, 22]. The performance gains of both approaches are seriously limited by the topology mismatching problem. The third approach is based on overlay topology opti- mization that is closely related to what we are presenting in this paper. Here we briefly introduce three types of solu- tions and their comparisons with our approach. End system multicast, Narada, was proposed in [5], which first con- structs a rich connected graph on which to further construct shortest path spanning trees. Each tree rooted at the corre- sponding source using well-known routing algorithms. This approach introduces large overhead of forming the graph and trees in a large scope, and does not consider the dy- namic joining and leaving characteristics of peers. The overhead of Narada is proportional to the multicast group size. This approach is infeasible to large-scale P2P systems. Researchers have also considered to cluster close peers based on their IP addresses (e.g., [7, 13]). We believe there are two limitations for this approach. First, the mapping accuracy is not guaranteed by this approach. Second, this approach may affect the search scope in P2P networks. In contrast, our technique is measurement based and can ac- curately and dynamically connect the physically closer peers, and disconnect physically distant peers. Furthermore, our scheme does not shrink the search scope. Recently, researchers in [21] have proposed to measure the latency between each peer to multiple stable Internet servers called ``landmarks”. The measured latency is used to determine the distance between peers. This measurement is conducted in a global P2P domain and needs the support of additional landmarks. Similarly, this approach also af- fects the search scope in P2P systems. In contrast, our measurement is conducted in many small regions, signifi- cantly reducing the network traffic. Gia [4] introduced a topology adaptation algorithm to ensure that high capacity nodes are indeed the ones with high degree and low capacity nodes are within short reach of high capacity nodes. It addresses a different matching problem in overlay networks, but does not address the to- pology mismatching problem between the overlay and physical networks. A preliminary design of ACE, which is called AOTO, has been discussed in [8]. We have also proposed a loca- tion-aware topology matching scheme [9], in which each peer issues a detector in a small region so that the peers receiving the detector can record relative delay information. Based on the delay information, a receiver can detect and cut most of the inefficient and redundant logical links, and add closer nodes as its direct neighbors. However, this approach creates slightly more overhead and requires that the clocks in all peers be synchronized. 3. Adaptive Connection Establishment In unstructured P2P systems, the most popular search mechanism in use is to blindly “flood" a query to the net- work among peers or among supernodes. A query is broadcast and rebroadcast until a certain criterion is satis- fied. If a peer receiving the query can provide the requested object, a response message will be sent back to the source peer along the inverse of the query path. This mechanism ensures that the query will be “flooded” to as many peers as possible within a short period of time in a P2P overlay network. A query message will also be dropped if the query message has visited the peer before. In this section, we use examples to explain the unnecessary traffic incurred by flooding based search and the topology mismatching problem. Then we introduce the design of proposed ap- proach, ACE. 3.1. Unnecessary Traffic by Flooding Figure 1 shows an example of a P2P overlay topology where solid lines denote overlay connections among logical P2P neighbors. Consider the case when node S sends a query. A solid arrow represents a delivery of the query message along one logical connection. The query is relayed by many peers, which incurs a lot of unnecessary traffic. For example, after node S sends the query to L and M, since none of L or M knows the other one will receive the same query from S, they will forward the query to each other. The pair of transmission on the logical link LM is unnecessary. In such a simple overlay, node M will receive the same query message for 4 times. In this case, it is clear that the search scope of the query from node S will not shrink without logical connections of LM, MQ, LQ and MP. S L M P Q Figure 1: An example of P2P overlay 3.2. Topology Mismatching As we have discussed the stochastic peer connection and peers randomly joining and leaving a P2P network can cause topology mismatching between the P2P logical overlay network and the physical underlying network. For example, Figures 2(a) and 2(b) are two overlay topologies on top of the underlying physical topology shown in Figure 2(c). Suppose nodes S and B are in the same autonomous system (AS) at Michigan State University (MSU) in USA, while nodes A and C are in another AS at Tsinghua Uni- Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 3. versity in China. So we can assume that the physical link delay between nodes S and C is much longer than the link between nodes S and B, or nodes A and C in Figure 2(c). Clearly, in the inefficient mismatching overlay of Figure 2(a), the query message from source S will traverse the longest link SC three times, which is a scenario of topology mismatching. If we can construct an efficient overlay shown in Figure 2(b), the message needs to traverse all the physical links in Figure 2(c) only once. S MSUB D C Tsinghua A B S C A S A B C (a) Mismatching Overlay (b) Matching Overlay (c) Underlying Physical Topology Figure 2: Topology mismatching problem 3.3. Design of ACE Optimizing inefficient overlay topologies can funda- mentally improve P2P search efficiency. The proposed approach, Adaptive Connection Establishment (ACE), includes three phases. Phase 1: we use network delay between two nodes as a metric for measuring the cost between nodes. We modify the Limewire implementation of Gnutella 0.6 P2P protocol by adding one routing message type. Each peer probes the costs with its immediate logical neighbors and forms a neighbor cost table. Two neighboring peers exchange their neighbor cost tables so that a peer can obtain the cost be- tween any pair of its logical neighbors. Thus, a small overlay topology of a source peer and all its logical neighbors is known to the source peer. Phase 2: based on obtained neighbor cost tables, a minimum spanning tree among each peer and its immediate logical neighbors then can be built by simply using an al- gorithm like PRIM which has a computation complexity of O(m2 ), where m is the number of logical neighbors of the source node. Now the message routing strategy of a peer is to select the peers that are the direct neighbors in the mul- ticast tree to send its queries, instead of flooding queries to all neighbors. An example is shown in Figure 3. In Figure 3(a), the traffic incurred by node S’s flooding of messages to its direct neighbor E, F, and G is: 4+14+14+15+6+20+ 20=93. After phase 2, we can see the forwarding connec- tions are changed as shown in Figure 3(b), and the total traffic cost becomes: 6+4+14=24. In Figure 3(b), node S sends a message only to nodes E and F and expects that node E will forward the message to node G. Note that in this phase, even node S does not flood its query message to node G any more. S still retains the connections with G and keeps exchanging the neighbor cost tables. We call node G non-flooding neighbor of node S, which is the direct neighbor potentially to be replaced in the next phase. E S G F 4 14 15 6 20 E S G F 4 14 6 ( a) ( b) Figure 3: Second phase in ACE Phase 3: this phase reorganizes the overlay topology. Note that each peer has a neighbor list which is further divided into flooding neighbors and non-flooding neighbors in Phase 2. Each peer also has the neighbor cost tables of all its neighbors. In this phase, it tries to replace those physi- cally far away neighbors by physically close by neighbors, thus minimizing the topology mismatching traffic. An ef- ficient method to identify such a candidate peer to replace a far away neighbor is critical to the system performance. Many methods may be proposed. In ACE, a non-flooding neighbor may be replaced by one of the non-flooding neighbor’s neighbor. The basic concept of phase 3 is illustrated in Figure 4. In Figure 4(a), node S is probing the distance to one of its non-flooding neighbor G’s neighbors, for example, H. If SH is smaller than SG, as shown in Figure 4(b), connection SG will be cut. If SG is smaller than SH, but S finds that the cost between nodes G and H is even larger than the cost between nodes S and H, as shown in Figure 4(c), S will keep H as a new neighbor. Since the algorithm is executed in each peer independently, S cannot let G to remove H from its neighbor list. However, as long as S keeps both G and H as its logical neighbors, we may expect that node H will become a non-flooding neighbor to node G after node G’s Phase 2 since node G expects S to forward messages to H to reduce unnecessary traffic. Then G will try to find another peer to replace H as its neighbor. After knowing that H is no longer a neighbor to G from periodically ex- changed neighbor cost tables from node G (or from node H), Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 4. S will cut connection SG, although S has already stopped sending query messages to G for a period of time since the spanning tree has been built for S. Obviously if SH is larger than SG and GH, as shown in Figure 4 (d), this connection will not be built and S will keep probing other G’s director neighbors. E S G F 6 (a) S probes G’s neighbor H 15 H E S G F 6 (b)SH<SG,replace G by H 15 H8 E S G F 6 (c)SH>SG, but SH<GH, S keeps H as a direct neighbor 15 H 28 18 E S G F 6 (d)SH>SG and SH>GH, S starts probing next G’ neighbor 15 H 18 28 ? Figure 4: Third phase in ACE 3.4. Depth of Optimization We define h-neighbor closure of a source peer as the set of peers within h hops from the source peer. For example, a 2-neighbor closure includes the source peer, all its direct neighbors, and all the neighbors of the direct neighbors. In the initial ACE described in Section 3.3, the optimization is only conducted within 1-neighbor closure (among each source peer and all its direct logical neighbors). We can enlarge the optimization scope by increasing the value of h. A larger value of h leads to a better topology matching improvement, but a higher overhead due to the extra in- formation exchanging. We will further study in this direc- tion with the aim of reaching a good performance level by considering the tradeoffs between the topology optimiza- tion improvement and the information exchange overhead. Figure 5 illustrates the overlay trees constructed for each peer within 1-neighbor closures. Peer A initiates a query. The bold links denote the links on the tree, and the arrows indicate the query directions. The query is sent from peer A to B and D, since both B and D are direct logical neighbors of A on the overlay tree. Peer B then forwards the query to E, and D forwards the query to E. Peer E finally forwards the query to D and C. Peer C will not forward the query because only E is its direct neighbor, but E is the peer who forwards the query to C. So the query process termi- nates. The query paths and corresponding costs for this query are listed in Table 1. The total cost for this query from peer A to be forwarded to all other peers through the overlay trees built in 1-neighbor closures is 68. The number of unnecessary messages is reduced from 3 to 1 compared with blind flooding in this example. In 1-neighbor closure, the query message traverses one path twice, which is E–D. In blind flooding, the same query message traverses 3 paths twice, which is B–D, D–E, and C–E. Figure 6 and Table 2 illustrate the overlay tree built in 2-neighbor closure and the corresponding query direction and cost. The total cost to forward a query from peer A to all other peers is 39. No path is traversed twice by ACE with h=2 in this example. We can see that the number of un- necessary messages and the total traffic is decreased as the value of h is increased. A B D C E 10 12 7 148 20 15 A B D C E 10 12 7 148 20 15 A B D C E 10 12 7 148 20 15 Original topology Overlay tree rooted at A Overlay tree rooted at B A B D C E 10 12 7 148 20 15 A B D C E 10 12 7 148 20 15 A B D C E 10 12 7 8 20 15 14 Overlay tree rooted at C Overlay tree rooted at D Overlay tree rooted at E Figure 5: Overlay trees built in 1-neighbor closure Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 5. Table 1. Query paths and costs on overlay trees built in 1-neighbor closure Query Path From To Corresponding Cost A B, D 10+15=25 B E 8 D E 14 E C, D 7+14=21 Total Cost 68 Table 2. Query paths and costs on the overlay tree built in 2-neighbor closure Query Path From To Corresponding Cost A B 10 B E 8 E C, D 7+14=21 Total Cost 39 A B D C E 10 12 7 148 20 15 Figure 6: Overlay tree built in 2-neighbor closure 4. Simulation Methodology We describe the topology generation, performance metrics used in our simulations, our simulation setup, and parameter settings in this section. 4.1. Topology Generation Both physical topologies and logical overlay topologies which can accurately reflect the topological properties of real networks in each layer are needed in the simulation study. Previous studies have shown that both large scale Internet physical topologies [6] and P2P overlay topologies [7] follow small world and power law properties. Power law describes the node degree while small world describes characteristics of path length and clustering coefficient [9]. The study in [6] found that the topologies generated using the AS Model have the properties of small world and power law. BRITE is a topology generation tool that provides the option to generate topologies based on the AS Model. We generate 10 physical topologies each with 20,000 nodes. The logical topologies are generated with the number of peers (nodes) ranging from 1000 to 8000. For each given number of nodes, we generate logical topologies with av- erage edge connections between 1 and 20. Note that an edge connection of value m indicates that there are 2m logical neighbors for each peer. 4.2. Performance Metrics A well-designed search mechanism should seek to op- timize both efficiency and Quality of Service (QoS). Effi- ciency focuses on better utilizing resources, such as band- width and processing power, while QoS focuses on user-perceived qualities, such as number of returned results and response time. In unstructured P2P systems, the QoS of a search mechanism generally depends on the number of peers being explored (queried), response time, and traffic overhead. If more peers can be queried by a certain query, it is more likely that the requested object can be found. So we use several performance metrics as follows. Traffic cost is one of the parameters seriously concerned by network administrators. Heavy network traffic limits the scalability of P2P networks [16] and is also a reason why a network administrator may prohibit P2P applications. We define the traffic cost as network resource used in an in- formation search process of P2P systems, which is a func- tion of consumed network bandwidth and other related expenses. Search scope is defined as the number of peers that queries have reached in an information search process. Thus, with the same traffic cost, we aim to maximize the search scope; while with the same search scope, we aim to minimize the traffic cost. Response time of a query is one of the parameters con- cerned by P2P users. We define response time of a query as the time period from when the query is issued until when the source peer received a response result from the first responder. Optimization rate is defined as gain/penalty ratio, i.e., the ratio of query traffic reduction and overhead traffic increment, in order to study the tradeoffs between query traffic and overhead traffic by changing the value of opti- mization depth of h. One major factor to impact the traffic overhead is the frequency of exchanging cost information. We define frequency ratio, R, as the ratio of query fre- quency to use the overlay trees to the frequency of cost information changes. For a given P2P network topology, if the frequency of the topology and cost changes and query frequency can be measured so that R is determined, we should be able to adjust the value of h to achieve optimal gain/penalty ratio. ACE is worth to use only if the gain/penalty ratio is larger than 1. 4.3. A Dynamic P2P Environment P2P networks are highly dynamic with peers joining and leaving frequently. The observations in [19] have shown that over 20% of the logical connections in a P2P last 1 minute or less, and around 60% of the IP addresses keep active in FastTrack for no more than 10 minutes each time after they join the system. The measurement reported in [17] indicated that the median up-time for a node in Gnutella and Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 6. 0 2 4 6 8 10 0 10 20 30 40 50 60 70 80 90 100 ACE optimization steps T ra ffic c o s t pe r qu e r y (1 0 5 ) 10 neighbors 8 neighbors 6 neighbors 4 neighbors 0 2 4 6 8 10 8 10 12 14 16 18 20 22 24 26 ACE optimization steps R e s po n s e tim e pe r qu e r y 10 neighbors 8 neighbors 6 neighbors 4 neighbors Figure 7 : Traffic reduction vs. optimization step Figure 8: Average Response time vs. opt. step 5 10 15 20 25 30 0 10 20 30 40 50 60 Queries (105 ) T ra ffic c o s t pe r qu e r y (1 0 5 ) Gnutella-like ACE 5 10 15 20 25 30 0 5 10 15 20 25 Queries (105 ) R e s po n s e tim e pe r qu e r y Gnutella-like ACE Figure 9: Average traffic cost reduction in a dynamic P2P environment Figure 10: Average response time reduction in a dynamic P2P environment Napster is 60 minutes. Studies in [3] have argued that measurement according to host IP addresses underestimates peer-to-peer host availability and have shown that each host joins and leaves a P2P system 6.4 times a day on average, and over 20% of the hosts arrive and depart every day. Although the numbers they provided are different to some extent, they share the same point that the peer population is quite transient. We simulate the joining and leaving be- havior of peers via turning on/off logical peers. In our simulation, every node issues 0.3 queries per minute, which is calculated from the observation data shown in [20], i.e., 12,805 unique IP addresses issued 1,146,782 queries in 5 hours. When a peer joins, a lifetime in seconds will be assigned to the peer. The lifetime of a peer is defined as the time period the peer will stay in the system. The lifetime is generated according to the distribution observed in [17]. The mean of the distribution is chosen to be 10 minutes [19]. The value of the variance is chosen to be half of the value of the mean. The lifetime will be decreased by one after passing each second. A peer will leave in next second when its lifetime reaches zero. During each second, there are a number of peers leaving the system. We then randomly pick up (turn on) the same number of peers from the physical network to join the overlay. 5. Performance Evaluation We have simulated ACE for all the generated logical topologies on top of each of the 10 generated physical to- pologies with 20,000 nodes. We have also simulated ACE in a real-world P2P topology (based on DSS Clip2 trace). We obtained consistent results on the real-world topology and the generated topologies. We representatively present the results based on 8,000 peers only. 5.1. ACE in Static Environments In our first simulation, we study the effectiveness of ACE in a static P2P environment where the peers do not join and leave frequently. This will show that without changing the overlay topology, how many optimization steps are required to reach a better topology matching. As we have discussed, the first goal of ACE schemes is to reduce traffic cost as much as possible while retaining the same search scope. Figure 7 shows the traffic cost reduction of ACE, where the curve of ‘cn neighors’ means the average traffic cost caused by a query to cover the search scope in x-axis, and in the system the average number of logical neighbors is cn. We can see that the traffic cost decreases when ACE is conducted multiple times, where the search scope is all 8000 peers. ACE may reduce traffic cost by around 65% and it converges in around 10 steps. The simulation results in Figure 8 show that ACE can shorten the query response time by about 35% after 10 steps. The tradeoff between query traffic cost and response time has been discussed in [23]. P2P systems with a large num- ber of average connections offer a faster search speed while increasing traffic. One of the strengths of ACE schemes is that it reduces both query traffic cost and response time without decreasing the query success rate. Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 7. 1 2 3 4 5 6 7 8 0 10 20 30 40 50 60 70 80 90 100 Depth of Neighbor Closure (h) Q u e r y T ra ffic R e d u c tio n R a te (% ) E=4 E=8 E=12 E=16 Figure 11: Query traffic reduction rate 1 2 3 4 5 6 7 8 0 5000 10000 15000 Depth of Neighbor Closure (h) O ve rh e a d T ra ffic E=4 E=8 E=12 E=16 Figure 12: Overhead traffic 1 2 3 4 5 6 7 8 0.5 1 1.5 2 Depth of Neighbor Closure (h) O ptim iza tio n R a te Base R=1.0 R=1.2 R=1.4 R=1.6 R=1.8 R=2.0 Figure 13: Optimization rate (E=16) 1 2 3 4 5 6 7 8 0 0.5 1 1.5 2 2.5 3 3.5 Depth of Neighbor Closure (h) O ptim iz a tio n R a te Base R=1.0 R=1.2 R=1.4 R=1.6 R=1.8 R=2.0 R=2.2 R=2.4 Figure 14: Optimization rate (E=4) 1 1.5 2 2.5 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Frequency Ratio O ptim iza tio n R a te Base h=1 h=2 h=3 h=4 Figure 15: Optimization rate (E=16) 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Frequency Ratio O ptim iza tio n R a te Base h=1 h=2 h=3 h=4 h=5 h=6 h=7 h=8 Figure 16: Optimization rate (E=4) 5.2. ACE in Dynamic Environments We further evaluate the effectiveness of ACE in dy- namic P2P systems. In this simulation, we assume that peer average lifetime in a P2P system is 10 minutes; 0.3 queries are issued by each peer per minute; and the frequency for ACE at every peer to conduct optimization operations is twice per minute. Figure 9 shows the average traffic cost per query of Gnutella-like P2P systems, ACE enabled Gnutella. Note that here the traffic cost includes the over- head needed by each operation in the optimization steps. ACE could significantly reduce the traffic cost while re- taining the same search scope. Figure 10 shows that with reduction of the traffic, the queries’ average response times of ACE are also reduced in a dynamic environment. In a dynamic P2P environment, we simulate ACE em- ployed together with other approaches, such as response index caching scheme or some forwarding based strategies. We obtained very good results. For example, using a 200-item size cache at each peer, ACE with index cache will reduce 75% of the traffic cost and 70% of the response time. Due to the page limitation, we do not show the de- tailed curves here. 5.3. The Impact of Optimization Depth Figure 11 illustrates the query traffic reduction rate over blind flooding versus the depths of neighbor closure to construct overlay trees. Different curves correspond to the performance on different topologies with different values of E, where E is the average number of neighbors. For a given depth of neighbor closure, the reduction rate increases with increased average number of neighbors. For a given aver- age number of neighbors, the reduction rate also increases as the depths of neighbor closure increases. There is a threshold of depth for each E, from which the query traffic is hard to be further reduced. Figure 12 shows the overhead traffic versus the depth of neighbor closure. The overhead traffic increases as the depths of neighbor closure increases, or as the average number of neighbors increases. Figures 13 and 14 show the optimization rate versus the depth of neighbor closure with E=16 and E=4, respec- tively. Different curves in each figure correspond to dif- ferent values of R. Based on this figure, we can determine, for a given value of R, the minimal value of h to achieve performance gain in ACE. The minimal value of h is de- fined as the value of h that leads to an optimization rate of 1. To achieve performance gain, we should choose the depth values that can lead to optimization rates that are greater than 1. When we increase R, the optimization rate increases for a given depth value (h) and the minimal value of h to achieve performance gain decreases. As h increases, the optimization rate also increases. However, there is a threshold of h from which the optimization rate is hard to increase anymore. Figures 13 and 14 also show that for a large value of E, a small minimal value of h is needed to achieve performance gain for a given value of R. Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE
  • 8. Figures 15 and 16 show the optimization rate versus frequency ratio with E=16 and E=4, respectively. When the value R increases, the optimization rate significantly in- creases. A large value of R means that the query frequency is high and the tree reconstruction frequency is low. For a given network after a period of time, if we can find a rela- tive stable value of R, we will be able to find a minimal value of h to construct overlay trees and achieve perform- ance gain in ACE. We can see from Figure 15 that for R=1, the optimization rate is always less than 1. Thus, using ACE under an environment with R=1, the given topology will not improve any performance. From Figure 15, the minimal value of h is 2 for R=1.5, and is 1 for R=2. Comparing Figure15 with Figure 16, for the same value of R, the minimal value of h is small for a large value of E. For ex- ample, for R=2, the minimal value of h is 1 for E=16, while the minimal value of h is 5 for E=4. Thus, ACE is more effective in a topology with high connectivity density. 6. Conclusion In this paper, we propose a distributed approach to solving overlay mismatching problem. Our simulation shows that the average cost of each query to reach the same scope of nodes is reduced by about 65% when using our proposed ACE in a Gnutella-like P2P network without losing any autonomy feature, and the average response time of each query can be reduced by 35%. The proposed ACE technique is fully distributed, easy to implement, and adaptive to the dynamic nature of P2P systems. Further- more, a larger diameter leads to a better topology optimi- zation rate and a higher overhead due to extra information exchanging. ACE is more effective in a topology with high connectivity density. It will make the decentralized flood- ing-based P2P file sharing systems more scalable and effi- cient. It is very important for ACE to quickly identify the best candidate from a non-flooding neighbor’s neighbor list to minimize replacement overhead. In our simulations, we only use random policy to replace a non-flooding neighbor by a random selected candidate. We are studying several alternatives to choose the candidate. For example, the naïve policy simply disconnects the source node’s most expen- sive neighbor. The source node will probe the costs to some other nodes, and try to find a less expansive node as a re- placement of the disconnected neighbor. The second one is closest policy in which the source will probe the costs to all of the non-flooding neighbor’s neighbors, and select the closest one. References [1] KaZaA, http://www.kazaa.com [2] Gnutella, http://gnutella.wego.com/ [3] R. Bhagwan, S. Savage, and G. M. Voelker, "Understanding Availability," Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS), 2003. [4] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker, "Making Gnutella-like P2P Systems Scalable," Pro- ceedings of ACM SIGCOMM, 2003. [5] Y. Chu, S. G. Rao, and H. Zhang, "A case for end system multicast," Proceedings of ACM SIGMETRICS, 2000. [6] E. Cohen and S. Shenker, "Replication strategies in un- structured peer-to-peer networks," Proceedings of SIGCOMM, 2002. [7] B. Krishnamurthy and J. Wang, "Topology modeling via cluster graphs," Proceedings of SIGCOMM Internet Meas- urement Workshop, 2001. [8] Y. Liu, Z. Zhuang, L. Xiao, and L. M. Ni, "AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems," Proceedings of GLOBECOM, San Francisco, USA, 2003. [9] Y. Liu, X. Liu, L. Xiao, L. M. Ni, and X. Zhang, "Loca- tion-Aware Topology Matching in Unstructured P2P Systems," Proceedings of INFOCOM, 2004. [10] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, "Search and replication in unstructured peer-to-peer networks," Proceedings of the 16th ACM International Conference on Supercomputing, 2002. [11] E. P. Markatos, "Tracing a large-scale peer to peer system: an hour in the life of gnutella," Proceedings of the 2nd IEEE/ACM International Symp. on Cluster Computing and the Grid, 2002. [12] D. A. Menasce and L. Kanchanapalli, "Probabilistic Scal- able P2P Resource Location Services," ACM SIGMETRICS Performance Evaluation Review, vol. 30, pp. 48-58, 2002. [13] V. N. Padmanabhan and L. Subramanian, "An investigation of geographic mapping techniques for Internet hosts," Pro- ceedings of ACM SIGCOMM, 2001. [14] S. Patro and Y. C. Hu, "Transparent Query Caching in Peer-to-Peer Overlay Networks," Proceedings of the 17th In- ternational Parallel and Distributed Processing Symposium (IPDPS), 2003. [15] M. Ripeanu, A. Iamnitchi, and I. Foster, "Mapping the Gnutella Network," IEEE Internet Computing, 2002. [16] Ritter,Why Gnutella can't scale. No, really, http://www.tch.org/gnutella.html [17] S. Saroiu, P. Gummadi, and S. Gribble, "A Measurement Study of Peer-to-Peer File Sharing Systems," Proceedings of Multimedia Computing and Networking (MMCN), 2002. [18] S. Saroiu, K. P.Gummadi, R. J. Dunn, S. D. Gribble, and H. M. Levy, "An Analysis of Internet Content Delivery Systems," Proceedings of the 5th Symposium on Operating Systems De- sign and Implementation, 2002. [19] S. Sen and J. Wang, "Analyzing peer-to-peer traffic across large networks," Proceedings of ACM SIGCOMM Internet Measurement Workshop, 2002. [20] K. Sripanidkulchai,The popularity of Gnutella queries and its implications on scalability, http://www2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.ht ml [21] Z. Xu, C. Tang, and Z. Zhang, "Building topology-aware overlays using global soft-state," Proceedings of ICDCS, 2003. [22] B. Yang and H. Garcia-Molina, "Efficient search in peer-to-peer networks," Proceedings of ICDCS, 2002. [23] Z. Zhuang, Y. Liu, L. Xiao, and L. M. Ni, "Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks," Proceedings of International Conference on Parallel Processing, 2003. Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04) 1063-6927/04 $20.00 © 2004 IEEE