6
Most read
8
Most read
12
Most read
Anıl ALİBEYOĞLU
JNCIE-­‐SP	
  #2681
JNCIE-­‐ENT	
  #710
CCIE-­‐R&S	
  #24974
1
Quick	
  Recap	
  Of	
  “BGP	
  Persistent	
  
Route	
  Oscillation	
  Condition	
  
(RFC3345)”
Before	
  We	
  Start
Firstly,	
  it	
  might	
  be	
  surprising	
  that	
  this	
  scenario	
  can	
  
still	
  be	
  hit	
  even	
  after	
  16	
  years	
  from	
  its	
  discovery	
  J
So,	
  I	
  wanted	
  to	
  recap	
  RFC3345	
  for	
  the	
  people who	
  
troubleshoot	
  and/or design	
  BGP	
  in	
  service/content	
  
provider	
  networks.
I	
  also	
  thought	
  that	
  it	
  will	
  be	
  more	
  beneficial	
  for	
  the	
  
readers	
  to	
  have both	
  Turkish	
  and	
  English	
  version	
  of	
  
the	
  same	
  document.
/*	
  Yes	
  J This	
  is	
  the	
  English	
  version */
2
Content
•Summarizing	
  what	
  RFC3345	
  tells	
  us.
•Theory and	
  step	
  by	
  step	
  explanation	
  of	
  the	
  
scenario.
•Possible	
  solutions	
  to	
  mitigate	
  this	
  scenario.
•Practical	
  investigation	
  of	
  the	
  scenario	
  on	
  Juniper	
  
Networks	
  routers.	
  /*	
  I	
  reproduced	
  the	
  scenario	
  in	
  
my	
  lab	
  with	
  MX960	
  routers	
  +	
  Junos OS	
  15.1R1	
  
code,	
  but	
  the	
  outputs	
  are	
  valid	
  for	
  any	
  Junos OS	
  
code */
3
What	
  Is	
  RFC3345	
  About	
  Basically?	
  
•RFC3345	
  explains	
  the	
  control	
  plane	
  oscillation	
  
in	
  BGP	
  as	
  a	
  result	
  of	
  improper	
  design	
  in	
  RR	
  
and	
  confederation	
  environments.	
  
•It's	
  important	
  to	
  mention	
  that	
  the	
  loop	
  is	
  in	
  
the	
  control	
  plane	
  which	
  makes	
  the	
  
convergence	
  endless	
  (TTL	
  doesn't	
  help,	
  it's	
  not	
  
a	
  forwarding	
  plane	
  loop).	
  
•Many	
  service/content	
  providers	
  hit	
  this	
  issue	
  
and	
  it	
  is	
  a	
  clear	
  "if-­‐then"	
  scenario.	
  
4
Topology
5
/*	
  I	
  recommend	
  you	
  printing	
  out	
  the	
  topology	
  and	
  keeping	
  an	
  eye	
  on	
  it	
  while	
  reading	
  the	
  steps	
  explained	
  in	
  the	
  following	
  slides	
  */
Setup	
  Introduction
•There	
  is	
  nothing	
  tricky	
  with	
  steps	
  (1)	
  and	
  (2)	
  in	
  the	
  
topology,	
  basic	
  NLRI	
  distribution	
  between	
  EBGP	
  
peers.	
  We	
  focus	
  on	
  AS1	
  so	
  the	
  next	
  slide	
  will	
  start	
  
from	
  step	
  (3).	
  
•2	
  RR	
  clusters	
  in	
  AS1 with	
  cluster-­‐ids	
  “12”	
  and	
  “3”.
•C1	
  and	
  C2	
  have	
  IBGP	
  with	
  RR12;	
  C3	
  has	
  IBGP	
  with	
  
RR3;	
  RRs	
  have	
  IBGP	
  between	
  each	
  other	
  as	
  well.
•AS1	
  is	
  a	
  neighbor	
  of	
  2	
  autonomous	
  systems;	
  1	
  EBGP	
  
peering	
  with	
  AS3,	
  2	
  EBGP	
  peerings with	
  AS2.
•MED	
  and	
  IGP	
  metric	
  values	
  are	
  seen	
  in	
  the	
  topology.
6
Step	
  By	
  Step	
  Analysis	
  (1)
3) C3,	
  C1	
  and	
  C2	
  shares	
  20/8	
  NLRI	
  information	
  with	
  
their	
  RRs	
  respectively.
4) Firstly,	
  RR12	
  has	
  2	
  sources	
  for	
  20/8,	
  it	
  selects	
  C1	
  as	
  
an	
  exit	
  point	
  compare	
  to	
  C2	
  (AS	
  Path	
  is	
  same,	
  
Local	
  Preference	
  is	
  same,	
  the	
  remaining	
  is	
  IGP	
  
metric).	
  
5) RR12	
  shares	
  this	
  information	
  with	
  RR3.	
  So,	
  what's	
  
the	
  status	
  in	
  RR3	
  in	
  this	
  case?	
  
7
Step	
  By	
  Step	
  Analysis	
  (2)
6) RR3	
  already	
  receives	
  the	
  NLRI	
  from	
  C3	
  which	
  is	
  the	
  
only	
  client.	
  Also,	
  it	
  receives	
  "C1	
  is	
  the	
  exit	
  point"	
  
information	
  from	
  RR12.	
  Now	
  RR3	
  has	
  to	
  select	
  the	
  
one	
  between	
  C1	
  and	
  C3.	
  
7) Of	
  course,	
  RR3	
  selects	
  C3	
  since	
  its	
  MED	
  is	
  lower	
  
(From	
  the	
  sameAS,	
  “AS	
  Path”	
  is	
  also	
  same,	
  the	
  
remaining	
  is	
  MED).	
  And	
  as	
  expected,	
  RR3	
  shares	
  
this	
  decision	
  with	
  RR12.	
  J Party	
  has	
  not	
  started	
  
yet,	
  so	
  far	
  so	
  good.	
  
8
Step	
  By	
  Step	
  Analysis	
  (3)
8) Now,	
  RR12	
  has	
  3	
  exit	
  points	
  to	
  decide;	
  C1,	
  C2	
  and	
  
C3	
  from	
  RR3.	
  Then,	
  who	
  is	
  selected	
  by	
  RR12?	
  
9) RR12	
  has	
  already	
  selected	
  C1	
  as	
  an	
  exit	
  point,	
  let's	
  
keep	
  it	
  in	
  the	
  pocket.	
  The	
  reason	
  of	
  that	
  selection	
  
was	
  lower	
  IGP	
  metric.	
  But	
  now,	
  RR12	
  has	
  C3	
  that	
  
came	
  from	
  RR3.	
  Of	
  course,	
  since	
  the	
  exit	
  point	
  
candidates	
  are	
  from	
  the	
  same	
  AS,	
  MED	
  can	
  be	
  
considered.	
  Means	
  C3	
  is	
  selected.	
  
9
Step	
  By	
  Step	
  Analysis	
  (4)
10) But,	
  after	
  the	
  selection	
  of	
  C3,	
  the	
  conditions	
  were	
  
changed	
  for	
  RR12	
  and	
  C2	
  becomes	
  candidate	
  exit	
  
point	
  from	
  different	
  AS	
  +	
  lower	
  IGP	
  metric	
  
(15<30).	
  For	
  that	
  reason,	
  RR12	
  selects	
  C2	
  and	
  
shares	
  this	
  information	
  with	
  RR3.	
  
11) Now	
  we	
  are	
  at	
  RR3	
  again.	
  RR3	
  has	
  C3	
  and	
  C2	
  from	
  
RR12.	
  Of	
  course,	
  RR3	
  selects	
  C2	
  in	
  this	
  comparison	
  
(AS	
  Path	
  is	
  same,	
  LP	
  is	
  same,	
  remaining	
  is	
  IGP	
  
metric).	
  
10
Step	
  By	
  Step	
  Analysis	
  (5)
12) As	
  you	
  might	
  guess,	
  RR3	
  stops	
  announcing	
  C3	
  
since	
  it	
  selected	
  C2	
  in	
  the	
  previous	
  step.	
  ("C3"	
  is	
  
not	
  best	
  anymore!)
13) Finally,	
  RR12	
  has	
  C1	
  and	
  C2	
  to	
  decide.	
  C1	
  is	
  
selected...etc.	
  and	
  all	
  scenario	
  restarts	
  again	
  from	
  
the	
  beginning.
11
Possible	
  Solutions	
  (1)
According	
  to	
  the	
  complexity	
  of	
  the	
  topology,	
  different	
  
options	
  can	
  be	
  considered:	
  
1) IGP	
  metrics	
  can	
  be	
  used.	
  If	
  the	
  metric	
  between	
  
routers	
  in	
  different	
  RR	
  clusters	
  is	
  higher	
  than	
  the	
  
metric	
  between	
  the	
  routers	
  in	
  the	
  same	
  RR	
  cluster,	
  
control-­‐plane	
  oscillation	
  stops.	
  (E.g.	
  RR3	
  doesn't	
  
select	
  C2	
  if	
  it’s	
  >	
  25)	
  
2) MED	
  can	
  be	
  always	
  compared	
  (to	
  cover	
  “Update	
  
messages	
  from	
  different	
  AS”	
  scenarios).	
  E.g.	
  in	
  this	
  
topology,	
  C3	
  is	
  selected	
  without	
  oscillation.
12
Possible	
  Solutions	
  (2)
3) RRs	
  can	
  send	
  “all	
  paths”	
  in	
  stead	
  of	
  the	
  only	
  "best"	
  
path.	
  Logically,	
  if	
  the	
  RRs	
  have	
  the	
  same	
  view	
  with	
  
all	
  exit	
  points,	
  oscillation	
  stops	
  at	
  some	
  point	
  
because	
  they	
  sync	
  on	
  time	
  (which	
  means	
  no	
  more	
  
Update	
  message	
  ping-­‐pongs	
  on	
  the	
  control	
  plane).
4) Of	
  course,	
  last	
  but	
  not	
  least,	
  using	
  local-­‐preference	
  
eliminates	
  the	
  whole	
  probability	
  of	
  this	
  issue,	
  very	
  
safe. I’m	
  sure	
  (let’s	
  hope	
  J)	
  nobody	
  will	
  miss	
  it	
  in	
  
any	
  network.	
  Traffic	
  steering	
  101!
13
Practical	
  Analysis	
  (1)
•I	
  won’t	
  cover	
  all	
  in	
  depth	
  BGP	
  troubleshooting	
  
vectors/methods…etc.	
  Because	
  the	
  issue	
  can	
  be	
  
easily	
  seen	
  with	
  a	
  few	
  basic	
  outputs	
  if	
  you	
  already	
  
know	
  the	
  theory	
  well.
•Topology	
  is	
  the	
  one	
  that	
  is	
  seen	
  on	
  slide	
  5.	
  Only	
  20/8	
  
subnet	
  is	
  announced	
  by	
  RTR4,	
  so	
  we	
  focus	
  on	
  20/8.
•Physical/loopback	
  interfaces	
  &	
  subnets	
  +	
  IGP	
  costs	
  
are	
  seen	
  in	
  the	
  topology.	
  All	
  information	
  can	
  be	
  
found	
  there.	
  
•You	
  can	
  find	
  the	
  screenshots	
  in	
  the	
  following	
  slides.
14
Practical	
  Analysis	
  (2)
•Possible	
  symptoms	
  can	
  be	
  high	
  CPU	
  usage,	
  flapping	
  
routes,	
  RTT	
  changes	
  on	
  testers,	
  continuous	
  path	
  
changes,	
  huge	
  amount	
  of	
  bidirectional	
  Update	
  
message	
  propagation,	
  unstable	
  network.
•Let’s	
  have	
  a	
  look	
  at	
  RR12	
  before	
  the	
  oscillation.	
  At	
  
the	
  beginning,	
  the	
  cost	
  on	
  ge-­‐0/0/0	
  is	
  105	
  (solution	
  
#1 in	
  slide	
  #12).	
  So	
  far,	
  so	
  good.
15
Practical	
  Analysis	
  (3)
•As	
  you	
  see,	
  there	
  is	
  no	
  oscillation,	
  Update	
  messages	
  
are	
  stable	
  (output	
  is	
  refreshed	
  in	
  every	
  3	
  seconds).
16
Practical	
  Analysis	
  (4)
•Let’s	
  trigger	
  the	
  issue	
  by	
  decreasing	
  the	
  IGP	
  cost	
  
from	
  105	
  to	
  15	
  on	
  RR12	
  (towards	
  C2).	
  
17
Practical	
  Analysis	
  (5)
• Let’s	
  check	
  the	
  network	
  just	
  for	
  1	
  NLRI	
  loop	
  J Exit	
  point	
  is	
  
moving	
  between	
  34.0.0.3	
  and	
  48.0.0.8	
  continuously.	
  Also	
  ~210	
  
Update	
  messages	
  are	
  sent/received	
  in	
  every	
  3	
  seconds!
18
Practical	
  Analysis	
  (6)
•E.g.	
  you	
  can	
  stop	
  it	
  by	
  changing	
  the	
  Local	
  Preference	
  
attribute	
  at	
  the	
  desired	
  border	
  point.	
  I	
  selected	
  C3	
  
and	
  changed	
  it	
  to	
  500	
  before	
  sending	
  20/8	
  to	
  RR3.
•It	
  is	
  received	
  from	
  RTR2	
  and	
  by	
  default	
  LP	
  is	
  0	
  
(67.0.0.7	
  is	
  RTR2).	
  So,	
  RR3	
  receives	
  this	
  NLRI	
  with	
  LP	
  
500:
19
Practical	
  Analysis	
  (7)
•After	
  stopping	
  the	
  oscillation,	
  everything	
  is	
  stable	
  as	
  
expected. Stable	
  Update	
  messages,	
  stable	
  routing	
  
table:
20
Final	
  Words
•I	
  haven't	
  written	
  anything	
  new	
  here,	
  the	
  wheel	
  has	
  
been	
  already	
  invented.	
  J For	
  more	
  details	
  and	
  
different	
  scenarios,	
  please	
  check	
  the	
  RFC	
  itself.
•My	
  purpose	
  is	
  to	
  prepare	
  a	
  brief	
  summary	
  for	
  the	
  
colleagues	
  who	
  are	
  afraid	
  of	
  reading	
  RFCs.
•If	
  this	
  topic	
  is	
  not	
  clearly	
  understood,	
  when	
  you	
  hit	
  
this	
  first	
  time	
  (still	
  happening	
  J),	
  troubleshooting	
  
will	
  be	
  very	
  painful	
  until	
  you	
  figure	
  out	
  the	
  issue.	
  
•I	
  hope,	
  this	
  will	
  be	
  useful	
  &	
  helpful	
  for	
  the	
  
colleagues	
  from	
  both	
  operational	
  and	
  design	
  side.	
  
21

More Related Content

PPTX
Bgp protocol
PPT
Routing
PPTX
Border Gateway Protocol (BGP)
PPTX
BGP Path Selection & Attributes BGP Weight & BGP AS-Path Prepending
PPT
BGP protocol presentation
PPT
Chap 12 tcp
PPTX
Bgp protocol
Routing
Border Gateway Protocol (BGP)
BGP Path Selection & Attributes BGP Weight & BGP AS-Path Prepending
BGP protocol presentation
Chap 12 tcp

What's hot (20)

PDF
BGP (border gateway routing protocol)
PPT
PPT
CCNA PPT
PPTX
ARP,RARP,DHCP,ICMP NETWORKING PROTOCOLS INTERNET
PPTX
Multicastingand multicast routing protocols
PPTX
Routing Information Protocol
PPTX
OPEN SHORTEST PATH FIRST (OSPF)
PDF
QOS (Quality of Services) - Computer Networks
PPTX
Border Gateway Protocol
PPT
Csma(carriers sense-multiple-acess)
PPTX
PDF
CS6551 COMPUTER NETWORKS
PPTX
IPv6 - Neighbour Discovery
PPTX
EIGRP Routing Protocols
PDF
Bgp route reflector
PPTX
An Overview of Border Gateway Protocol (BGP)
PPT
Static Routing
PPTX
CSMA IN COMPUTER NETWORK
PPTX
Cisco Live Milan 2015 - BGP advance
BGP (border gateway routing protocol)
CCNA PPT
ARP,RARP,DHCP,ICMP NETWORKING PROTOCOLS INTERNET
Multicastingand multicast routing protocols
Routing Information Protocol
OPEN SHORTEST PATH FIRST (OSPF)
QOS (Quality of Services) - Computer Networks
Border Gateway Protocol
Csma(carriers sense-multiple-acess)
CS6551 COMPUTER NETWORKS
IPv6 - Neighbour Discovery
EIGRP Routing Protocols
Bgp route reflector
An Overview of Border Gateway Protocol (BGP)
Static Routing
CSMA IN COMPUTER NETWORK
Cisco Live Milan 2015 - BGP advance
Ad

Similar to Quick Recap of RFC3345 BGP Persistent Route Oscillation Condition (20)

PDF
A new approach to fight against the inter-domain BGP oscillations
PDF
K010126674
PDF
PPTX
DOCX
Ospf and eigrp concepts and configuration
PDF
PLNOG15: BGP New Advanced Features - Piotr Wojciechowski
PDF
Resolution of Some Cases of Bgp Inter-Domain Oscillations with the Spvpoc Alg...
PDF
Tutorial: Network State Awareness Troubleshooting
PPT
bgp.ppt
PDF
Computer network (14)
PPT
CCNA Routing Protocols
PDF
Internal BGP tuning: Mesh peering to avoid loop
PPTX
Week13 lec2
PPT
CCNA Advanced Routing Protocols
PDF
BGP troubleshooting: route origin
PPT
Bgp 6 advanced transit as issues
PDF
Naked BGP
PDF
Routing Protocol EIGRP
PDF
COMPUTER COMMUNICATION NETWORKS-R-Routing protocols 2
PDF
BGP tuning: Peer with loopback
A new approach to fight against the inter-domain BGP oscillations
K010126674
Ospf and eigrp concepts and configuration
PLNOG15: BGP New Advanced Features - Piotr Wojciechowski
Resolution of Some Cases of Bgp Inter-Domain Oscillations with the Spvpoc Alg...
Tutorial: Network State Awareness Troubleshooting
bgp.ppt
Computer network (14)
CCNA Routing Protocols
Internal BGP tuning: Mesh peering to avoid loop
Week13 lec2
CCNA Advanced Routing Protocols
BGP troubleshooting: route origin
Bgp 6 advanced transit as issues
Naked BGP
Routing Protocol EIGRP
COMPUTER COMMUNICATION NETWORKS-R-Routing protocols 2
BGP tuning: Peer with loopback
Ad

Recently uploaded (20)

PPTX
curriculumandpedagogyinearlychildhoodcurriculum-171021103104 - Copy.pptx
PPTX
Partner to Customer - Sales Presentation_V23.01.pptx
PPT
12 Things That Make People Trust a Website Instantly
PDF
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
DOCX
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
PDF
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
PPTX
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
PPTX
COPD_Management_Exacerbation_Detailed_Placeholders.pptx
PPTX
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
PPTX
最新版美国埃默里大学毕业证(Emory毕业证书)原版定制文凭学历认证
PPTX
Digital Project Mastery using Autodesk Docs Workshops
PDF
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
PDF
Public for study about wiring to confirm.
PPTX
MY PRESENTATION66666666666666666666.pptx
PPTX
Cyber Hygine IN organizations in MSME or
PPTX
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
PDF
Containerization lab dddddddddddddddmanual.pdf
PPTX
Basic understanding of cloud computing one need
curriculumandpedagogyinearlychildhoodcurriculum-171021103104 - Copy.pptx
Partner to Customer - Sales Presentation_V23.01.pptx
12 Things That Make People Trust a Website Instantly
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
COPD_Management_Exacerbation_Detailed_Placeholders.pptx
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
Exploring The Internet Of Things(IOT).ppt
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
最新版美国埃默里大学毕业证(Emory毕业证书)原版定制文凭学历认证
Digital Project Mastery using Autodesk Docs Workshops
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
Public for study about wiring to confirm.
MY PRESENTATION66666666666666666666.pptx
Cyber Hygine IN organizations in MSME or
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
Containerization lab dddddddddddddddmanual.pdf
Basic understanding of cloud computing one need

Quick Recap of RFC3345 BGP Persistent Route Oscillation Condition

  • 1. Anıl ALİBEYOĞLU JNCIE-­‐SP  #2681 JNCIE-­‐ENT  #710 CCIE-­‐R&S  #24974 1 Quick  Recap  Of  “BGP  Persistent   Route  Oscillation  Condition   (RFC3345)”
  • 2. Before  We  Start Firstly,  it  might  be  surprising  that  this  scenario  can   still  be  hit  even  after  16  years  from  its  discovery  J So,  I  wanted  to  recap  RFC3345  for  the  people who   troubleshoot  and/or design  BGP  in  service/content   provider  networks. I  also  thought  that  it  will  be  more  beneficial  for  the   readers  to  have both  Turkish  and  English  version  of   the  same  document. /*  Yes  J This  is  the  English  version */ 2
  • 3. Content •Summarizing  what  RFC3345  tells  us. •Theory and  step  by  step  explanation  of  the   scenario. •Possible  solutions  to  mitigate  this  scenario. •Practical  investigation  of  the  scenario  on  Juniper   Networks  routers.  /*  I  reproduced  the  scenario  in   my  lab  with  MX960  routers  +  Junos OS  15.1R1   code,  but  the  outputs  are  valid  for  any  Junos OS   code */ 3
  • 4. What  Is  RFC3345  About  Basically?   •RFC3345  explains  the  control  plane  oscillation   in  BGP  as  a  result  of  improper  design  in  RR   and  confederation  environments.   •It's  important  to  mention  that  the  loop  is  in   the  control  plane  which  makes  the   convergence  endless  (TTL  doesn't  help,  it's  not   a  forwarding  plane  loop).   •Many  service/content  providers  hit  this  issue   and  it  is  a  clear  "if-­‐then"  scenario.   4
  • 5. Topology 5 /*  I  recommend  you  printing  out  the  topology  and  keeping  an  eye  on  it  while  reading  the  steps  explained  in  the  following  slides  */
  • 6. Setup  Introduction •There  is  nothing  tricky  with  steps  (1)  and  (2)  in  the   topology,  basic  NLRI  distribution  between  EBGP   peers.  We  focus  on  AS1  so  the  next  slide  will  start   from  step  (3).   •2  RR  clusters  in  AS1 with  cluster-­‐ids  “12”  and  “3”. •C1  and  C2  have  IBGP  with  RR12;  C3  has  IBGP  with   RR3;  RRs  have  IBGP  between  each  other  as  well. •AS1  is  a  neighbor  of  2  autonomous  systems;  1  EBGP   peering  with  AS3,  2  EBGP  peerings with  AS2. •MED  and  IGP  metric  values  are  seen  in  the  topology. 6
  • 7. Step  By  Step  Analysis  (1) 3) C3,  C1  and  C2  shares  20/8  NLRI  information  with   their  RRs  respectively. 4) Firstly,  RR12  has  2  sources  for  20/8,  it  selects  C1  as   an  exit  point  compare  to  C2  (AS  Path  is  same,   Local  Preference  is  same,  the  remaining  is  IGP   metric).   5) RR12  shares  this  information  with  RR3.  So,  what's   the  status  in  RR3  in  this  case?   7
  • 8. Step  By  Step  Analysis  (2) 6) RR3  already  receives  the  NLRI  from  C3  which  is  the   only  client.  Also,  it  receives  "C1  is  the  exit  point"   information  from  RR12.  Now  RR3  has  to  select  the   one  between  C1  and  C3.   7) Of  course,  RR3  selects  C3  since  its  MED  is  lower   (From  the  sameAS,  “AS  Path”  is  also  same,  the   remaining  is  MED).  And  as  expected,  RR3  shares   this  decision  with  RR12.  J Party  has  not  started   yet,  so  far  so  good.   8
  • 9. Step  By  Step  Analysis  (3) 8) Now,  RR12  has  3  exit  points  to  decide;  C1,  C2  and   C3  from  RR3.  Then,  who  is  selected  by  RR12?   9) RR12  has  already  selected  C1  as  an  exit  point,  let's   keep  it  in  the  pocket.  The  reason  of  that  selection   was  lower  IGP  metric.  But  now,  RR12  has  C3  that   came  from  RR3.  Of  course,  since  the  exit  point   candidates  are  from  the  same  AS,  MED  can  be   considered.  Means  C3  is  selected.   9
  • 10. Step  By  Step  Analysis  (4) 10) But,  after  the  selection  of  C3,  the  conditions  were   changed  for  RR12  and  C2  becomes  candidate  exit   point  from  different  AS  +  lower  IGP  metric   (15<30).  For  that  reason,  RR12  selects  C2  and   shares  this  information  with  RR3.   11) Now  we  are  at  RR3  again.  RR3  has  C3  and  C2  from   RR12.  Of  course,  RR3  selects  C2  in  this  comparison   (AS  Path  is  same,  LP  is  same,  remaining  is  IGP   metric).   10
  • 11. Step  By  Step  Analysis  (5) 12) As  you  might  guess,  RR3  stops  announcing  C3   since  it  selected  C2  in  the  previous  step.  ("C3"  is   not  best  anymore!) 13) Finally,  RR12  has  C1  and  C2  to  decide.  C1  is   selected...etc.  and  all  scenario  restarts  again  from   the  beginning. 11
  • 12. Possible  Solutions  (1) According  to  the  complexity  of  the  topology,  different   options  can  be  considered:   1) IGP  metrics  can  be  used.  If  the  metric  between   routers  in  different  RR  clusters  is  higher  than  the   metric  between  the  routers  in  the  same  RR  cluster,   control-­‐plane  oscillation  stops.  (E.g.  RR3  doesn't   select  C2  if  it’s  >  25)   2) MED  can  be  always  compared  (to  cover  “Update   messages  from  different  AS”  scenarios).  E.g.  in  this   topology,  C3  is  selected  without  oscillation. 12
  • 13. Possible  Solutions  (2) 3) RRs  can  send  “all  paths”  in  stead  of  the  only  "best"   path.  Logically,  if  the  RRs  have  the  same  view  with   all  exit  points,  oscillation  stops  at  some  point   because  they  sync  on  time  (which  means  no  more   Update  message  ping-­‐pongs  on  the  control  plane). 4) Of  course,  last  but  not  least,  using  local-­‐preference   eliminates  the  whole  probability  of  this  issue,  very   safe. I’m  sure  (let’s  hope  J)  nobody  will  miss  it  in   any  network.  Traffic  steering  101! 13
  • 14. Practical  Analysis  (1) •I  won’t  cover  all  in  depth  BGP  troubleshooting   vectors/methods…etc.  Because  the  issue  can  be   easily  seen  with  a  few  basic  outputs  if  you  already   know  the  theory  well. •Topology  is  the  one  that  is  seen  on  slide  5.  Only  20/8   subnet  is  announced  by  RTR4,  so  we  focus  on  20/8. •Physical/loopback  interfaces  &  subnets  +  IGP  costs   are  seen  in  the  topology.  All  information  can  be   found  there.   •You  can  find  the  screenshots  in  the  following  slides. 14
  • 15. Practical  Analysis  (2) •Possible  symptoms  can  be  high  CPU  usage,  flapping   routes,  RTT  changes  on  testers,  continuous  path   changes,  huge  amount  of  bidirectional  Update   message  propagation,  unstable  network. •Let’s  have  a  look  at  RR12  before  the  oscillation.  At   the  beginning,  the  cost  on  ge-­‐0/0/0  is  105  (solution   #1 in  slide  #12).  So  far,  so  good. 15
  • 16. Practical  Analysis  (3) •As  you  see,  there  is  no  oscillation,  Update  messages   are  stable  (output  is  refreshed  in  every  3  seconds). 16
  • 17. Practical  Analysis  (4) •Let’s  trigger  the  issue  by  decreasing  the  IGP  cost   from  105  to  15  on  RR12  (towards  C2).   17
  • 18. Practical  Analysis  (5) • Let’s  check  the  network  just  for  1  NLRI  loop  J Exit  point  is   moving  between  34.0.0.3  and  48.0.0.8  continuously.  Also  ~210   Update  messages  are  sent/received  in  every  3  seconds! 18
  • 19. Practical  Analysis  (6) •E.g.  you  can  stop  it  by  changing  the  Local  Preference   attribute  at  the  desired  border  point.  I  selected  C3   and  changed  it  to  500  before  sending  20/8  to  RR3. •It  is  received  from  RTR2  and  by  default  LP  is  0   (67.0.0.7  is  RTR2).  So,  RR3  receives  this  NLRI  with  LP   500: 19
  • 20. Practical  Analysis  (7) •After  stopping  the  oscillation,  everything  is  stable  as   expected. Stable  Update  messages,  stable  routing   table: 20
  • 21. Final  Words •I  haven't  written  anything  new  here,  the  wheel  has   been  already  invented.  J For  more  details  and   different  scenarios,  please  check  the  RFC  itself. •My  purpose  is  to  prepare  a  brief  summary  for  the   colleagues  who  are  afraid  of  reading  RFCs. •If  this  topic  is  not  clearly  understood,  when  you  hit   this  first  time  (still  happening  J),  troubleshooting   will  be  very  painful  until  you  figure  out  the  issue.   •I  hope,  this  will  be  useful  &  helpful  for  the   colleagues  from  both  operational  and  design  side.   21