SlideShare a Scribd company logo
TWNOG WORKSHOP 2010/7/2, Taipei 網路維運常見問題原因、偵錯 (Troubleshooting) 技術解析 網路與 TCP 效能關聯探討 智匯亞洲有限公司 許至凱 CCIE/JNCIE kaeatforum [at] gmail.com
Objects 對象:網路設備操作、維運人員 了解有那些網路環境因子會對於 TCP 效能造成影響,以連結網路維運與網路應用程式效能,做為網路環境改善方式的參考。 了解 TCP 運作原理 那些網路事件發生時將影響 TCP 效能表現? 因應對策
Agenda TCP Briefing TCP Performance Factors Network Event Impact Improvement – Network approach Improvement – Appliance approach Reference
TCP Briefing TCP/IP stack in a computer system Linux Application Socket Layer (net/socket.c) Inet Layer (net/ipv4/af_inte.c) IP Layer (various ip files in net/ipv4) TCP Layer (net/ipv4/tcp.c) UDP Layer (net/ipv4/udp.c) Ethernet Device Driver Ethernet Card Other Drivers Parallel/Serial/Other Interface Drivers
TCP Briefing TCP/IP stack in a computer system Windows TCP/IP Stack (Tcpip.sys) Windows Sockets Applications Windows Sockets AFD WSK Clients WSK NetBT and other TDI clients TDI TDX TCP UDP RAW IPv6 IPv4 802.3 PPP 802.11 Loopback IPv4 Tunnel NDIS User Kernel
TCP Briefing TCP/IP position in computer and network environment
TCP Briefing TCP header format (RFC793)
TCP Briefing TCP header format (updated by RFC3168)
TCP Performance Factors TCP Performance Factors Monitoring Tools Flow control Congestion control
TCP Performance Factors Measurement tools Monitoring tools tcpdump On Windows platform - Wireshark tcpstat Benchmarking tools ttcp Netperf NetPIPE DBS (Distributed Benchmark System)
TCP Performance Factors Flow control Sliding Window (window size = 6 in the example) Step 1 Step 2 Step 3 Step 4 Time 已收到 ACK 等待 ACK 中 可傳送區間 不可傳送區間 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0
TCP Performance Factors Flow control Window Size Adjustment “ Receiver window size filed” in TCP header
TCP Performance Factors Congestion Control Flow control 讓接收端控制進入之流量,避免 buffer overflow 情況發生 藉由 AdvertisedWindow 調整發送端 window size 無法反應網路連線狀況 無法避免所經網路是否有類似 buffer overflow 情況發生 為能偵測可能的網路壅塞, TCP 使用 Congestion control 。 藉由 CongestionWindow (cwnd) 來進行調整 Congestion control 主要含四種方式 (RFC5681) : Slow start Congestion avoidance Fast retransmit Fast recovery
TCP Performance Factors Slow start TCP connection 剛建立時,使用小的 window size 。等到收到 ACK 後再慢慢增加。 cwnd 初始值為 1 旨在偵測網路頻寬狀況 每收到 1 個 ACK 則 cwnd+1 如此一來,每經過一個 round-trip time (RTT) , cwnd 的值則變成上一次 RTT 的兩倍 指數成長 為避免 cwnd 增加太快,俟 cwnd 超過” slow start threshold, ssthresh” 後,每一 RTT 只增加 1 線性成長
TCP Performance Factors Congestion avoidance 在此階段 : cwnd > ssthresh cwnd + 1 for each RTT 當有 packet loss 發生時,則 : ssthresh -> cwnd/2 cwnd -> 1 packet retransmission 一旦 packet loss 發生時, TCP Performance 將受到嚴重影響。
TCP Performance Factors Slow start & Congestion avoidance characteristic
TCP Performance Factors Fast retransmit (Tahoe) 仍套用 slow start + congestion avoidance sender 收到 3 個 duplicate ACK 後即重新傳送封包 避免 sender timeout 後,因必須調整 ssthreh/cwnd 造成 TCP 效能嚴重下降 Fast recovery (Reno) 先套用 fast retransmit 收到 duplicate 封包後即進入 congestion avoidance 再執行 fast recovery ssthresh -> cwnd/2 重送封包 cwnd -> ssthresh + 3 NewReno, SACK, Vegas….. 都在 TCP 端進行效能改善
Network Event Impact Packet loss By TCP congestion control, packet loss will launch TCP retransmission 儘管 TCP congestion control 做的再好, packet loss 都會造成 TCP Performance downgrade
Network Event Impact Packet out-of-order Packet out-of-order 時 ,  雖然 TCP 能夠將封包組回 ,  但若 TCP fast recovery 作用時反可能會造成資源浪費 Reno 在收到 duplicate ACK 後即會開始重送封包,直到收到 Partial ACK 後才停止。 若 packet 只是慢點到而不是不到,則 sender 勢必會重傳不需要重傳的封包,造成資源浪費。 NewReno 為改善 Reno 的效率,會在收到 Final ACK 後才停止重傳遺失封包。 NewReno 會重覆送的封包數量有可能比 Reno 還多。
Improvement – Network approach Reduce packet loss Packet loss 對 TCP Performance 影響很大,網路環境中所有 packet loss 都應儘量排除。 Layer 1, layer 2 error Unqualified physical media CRC, P3 error etc… Layer 3 Router/Switch hardware or software error Congestion Reduce congestion impact by QoS deployment Avoid packet drop for high sensitive TCP application
Improvement – Network approach Packet forward process without QoS Tail-drop 網路設備 hardware queue 因線路擁塞而被佔滿,在無法容納更多待傳送封包後直接將待傳送封包丟棄。 Hardware queue 無法判斷 packet priority ,一但發生 queue 塞滿的情況時則無差別的將封包丟棄。 此類情況即為 Tail-drop 要儘量避免發生 Tail-drop 情況。
Improvement – Network approach Packet forward process with QoS 先使用不同的 logical queue 來存放 priority 不同的封包,再置入 h/w queue 中。在 H/W queue 塞滿之前,主動丟棄某些暫存於 low priority queue 的封包,防止 Tail-drop 情況發生。 RED – Random Early Detection WRED – Weighted Random Early Detection
Improvement – Network approach Reduce out-of-order packets 避免同一 TCP session 走在不同的 path 上 Per-packet load-sharing Load-sharing by destination IP only Per-flow load-sharing Load-sharing by IP packet hash value. Hash index includes: Source IP 、 Destination IP Protocol Source Port 、 Destination Port 有著相同 hash 值的封包會走相同的 next-hop interface ,避免 packet out-of-order 情況發生。 TCP 實作 Selective Acknowledgements RFC2018 RFC2883
Improvement – Appliance approach Operating System has to handle TCP session routine It’s CPU/Memory dependent Huge TCP session will occupy system resource like CPU cycles and memory utilization, and shrink the real service processes in asking CPU/Memory Reduce system resource consumption in TCP session handling TCP Offload TCP Optimization
Improvement – Appliance approach TCP Offload Migrate TCP handling out of kernel Use dedicate hardware to handle TCP Save system resource for real service processes TOE (TCP Offload Engine) NIC Handle TCP/IP on NIC
Improvement – Appliance approach TCP Offload NIC w/o TOE and NIC w/ TOE comparison
Improvement – Appliance approach TCP Offload TOE is wide deployed in iSCSI environment iSCSI:
Improvement – Appliance approach TCP Optimization Migrate huge TCP session out of system For any TCP session, 3-way handshaking and 4-way handshaking is necessary 3-way handshaking for TCP connection establishment 4-way handshaking for TCP connection termination Reduce TCP connection number will reduce connection “overhead” Deploy dedicate hardware in the front of servers
Improvement – Appliance approach TCP Optimization Regular TCP connection Client Server SYN ACK SYN+ACK GET FIN ACK ACK Data Data Data FIN
Improvement – Appliance approach TCP Optimization Reduce server TCP connection number Only ONE 3-way handshaking is necessary in early stage Client Server TCP Proxy SYN ACK SYN+ACK GET FIN ACK ACK Data Data Data GET Data Data Data FIN
Improvement – Appliance approach TCP Optimization 現實環境中很少僅用來改善 TCP 效能 多搭配其它功能 L4~L7 load-balance 由於 Client TCP connection end-to-end 是建立在 TCP Proxy 上,更多其它功能可以被加入 SSL 加速 Reverse cache
Reference Books High-Speed Networks and Internets – Performance and Quality of Service, 2nd Ed. By  William Stallings ; Prentice Hall High Performance TCP/IP Networking – Concepts, Issues and Solutions By  Mahbub Hassan  and  Raj Jain ; Pearson Prentice Hall TCP/IP Illustrated, Volume 1 By  W. Richard Stevens ; Addison Wesley Articles TCP Performance By  Geoff Huston ; The Internet Protocol Journal - Volume 3, No. 2 A very good “sliding window” description http://www.it.uu.se/edu/course/homepage/datakom/civinght04/schema/sliding_window.pps
Q & A

More Related Content

What's hot (20)

PDF
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
PPTX
Ovs dpdk hwoffload way to full offload
Kevin Traynor
 
PPT
Troubleshooting TCP/IP
vijai s
 
PPT
Tcp congestion control
Abdo sayed
 
PPTX
Analysis of TCP variants
Institute of Technology, Nirma University
 
PDF
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
The Linux Foundation
 
PDF
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OpenvSwitch
 
ODP
A Baker's dozen of TCP
Stephen Hemminger
 
PPTX
Congestion control in tcp
samarai_apoc
 
PPTX
TCP-FIT: An Improved TCP Congestion Control Algorithm and its Performance
Kevin Tong
 
PPTX
Tcp congestion avoidance
Ahmed Kamel Taha
 
PPT
Congestion control avoidance
Anthony-Claret Onwutalobi
 
PPT
Tcp congestion avoidance algorithm identification
Bala Lavanya
 
PPTX
Cache aware-server-push in H2O version 1.5
Kazuho Oku
 
PDF
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK
 
PDF
Transaction TCP
amardeepsingh1902
 
ODP
7.protocols 2
Marian Marinov
 
PPT
Tcp Congestion Avoidance
Ram Dutt Shukla
 
PPT
TCP congestion control
Shubham Jain
 
PDF
Developing the fastest HTTP/2 server
Kazuho Oku
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
Ovs dpdk hwoffload way to full offload
Kevin Traynor
 
Troubleshooting TCP/IP
vijai s
 
Tcp congestion control
Abdo sayed
 
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
The Linux Foundation
 
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OpenvSwitch
 
A Baker's dozen of TCP
Stephen Hemminger
 
Congestion control in tcp
samarai_apoc
 
TCP-FIT: An Improved TCP Congestion Control Algorithm and its Performance
Kevin Tong
 
Tcp congestion avoidance
Ahmed Kamel Taha
 
Congestion control avoidance
Anthony-Claret Onwutalobi
 
Tcp congestion avoidance algorithm identification
Bala Lavanya
 
Cache aware-server-push in H2O version 1.5
Kazuho Oku
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK
 
Transaction TCP
amardeepsingh1902
 
7.protocols 2
Marian Marinov
 
Tcp Congestion Avoidance
Ram Dutt Shukla
 
TCP congestion control
Shubham Jain
 
Developing the fastest HTTP/2 server
Kazuho Oku
 

Viewers also liked (20)

PDF
Botnets & DDoS Introduction
Kae Hsu
 
PDF
How To Process And Solve Network Security In ISP
Kae Hsu
 
PPTX
FEGTS IP Training - Network Diagnostic Introduction
Kae Hsu
 
PPTX
Rawnet Lightning Talk - Web Components
Rawnet
 
PDF
4 Byte As Ns Test Scenarios
Kae Hsu
 
PDF
Rawnet Lightning Talk - 'What is an idea & how do you create them?'
Rawnet
 
PPTX
A review of Concrete 5 and what is new in version 5.7
Rawnet
 
PDF
Rawnet Lightning Talk - Design Inspiration
Rawnet
 
PPT
Toward The Semantic Deep Web
Samiul Hoque
 
PDF
Noisy information transmission through molecular interaction networks
Michael Stumpf
 
PPTX
Rawnet Lightning Talk - Elasticsearch
Rawnet
 
PPT
4 byte AS number workshop material
Kae Hsu
 
PDF
4byte As Number Migration Suggestion
Kae Hsu
 
PPTX
How internet works and how messages are transferred in Internet
pagetron
 
PPT
Web 101 by Jennifer Lill
Jennifer Lill
 
PPTX
Rawnet Lightning talk - 'A Day in the Life of an Account Manager'
Rawnet
 
PDF
Rawnet Lightning Talk - Anyone Can Draw.
Rawnet
 
PDF
20th TWNIC OPM IPv6 Support by SDN & NFV
Kae Hsu
 
PPT
CDN and ISP Operation
Kae Hsu
 
PPT
Network Design in Cloud-ready IDC
Kae Hsu
 
Botnets & DDoS Introduction
Kae Hsu
 
How To Process And Solve Network Security In ISP
Kae Hsu
 
FEGTS IP Training - Network Diagnostic Introduction
Kae Hsu
 
Rawnet Lightning Talk - Web Components
Rawnet
 
4 Byte As Ns Test Scenarios
Kae Hsu
 
Rawnet Lightning Talk - 'What is an idea & how do you create them?'
Rawnet
 
A review of Concrete 5 and what is new in version 5.7
Rawnet
 
Rawnet Lightning Talk - Design Inspiration
Rawnet
 
Toward The Semantic Deep Web
Samiul Hoque
 
Noisy information transmission through molecular interaction networks
Michael Stumpf
 
Rawnet Lightning Talk - Elasticsearch
Rawnet
 
4 byte AS number workshop material
Kae Hsu
 
4byte As Number Migration Suggestion
Kae Hsu
 
How internet works and how messages are transferred in Internet
pagetron
 
Web 101 by Jennifer Lill
Jennifer Lill
 
Rawnet Lightning talk - 'A Day in the Life of an Account Manager'
Rawnet
 
Rawnet Lightning Talk - Anyone Can Draw.
Rawnet
 
20th TWNIC OPM IPv6 Support by SDN & NFV
Kae Hsu
 
CDN and ISP Operation
Kae Hsu
 
Network Design in Cloud-ready IDC
Kae Hsu
 
Ad

Similar to Network and TCP performance relationship workshop (20)

PDF
UAV Data Link Design for Dependable Real-Time Communications
Gerardo Pardo-Castellote
 
PPTX
chapter 3.2 TCP.pptx
Tekle12
 
PDF
Improving Performance of TCP in Wireless Environment using TCP-P
IDES Editor
 
PPT
TCP Over Wireless
Farooq Khan
 
DOCX
2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...
IEEEFINALSEMSTUDENTSPROJECTS
 
DOCX
IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...
IEEEGLOBALSOFTSTUDENTPROJECTS
 
PDF
Primer to Browser Netwroking
Shuya Osaki
 
PDF
Master Class : TCP/IP Mechanics from Scratch to Expert
Abhishek Sagar
 
PPTX
Lec 2.pptx
ahmedraed19
 
PDF
Lecture 19 22. transport protocol for ad-hoc
Chandra Meena
 
PDF
Analytical Research of TCP Variants in Terms of Maximum Throughput
IJLT EMAS
 
PPTX
Online TCP-IP Networking Assignment Help
Computer Network Assignment Help
 
PDF
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
 
PPTX
Mobile Transpot Layer
Maulik Patel
 
PPTX
High Performance Networking with Advanced TCP
Dilum Bandara
 
PDF
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OpenvSwitch
 
PDF
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
PDF
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
PDF
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
PDF
Computer network (11)
NYversity
 
UAV Data Link Design for Dependable Real-Time Communications
Gerardo Pardo-Castellote
 
chapter 3.2 TCP.pptx
Tekle12
 
Improving Performance of TCP in Wireless Environment using TCP-P
IDES Editor
 
TCP Over Wireless
Farooq Khan
 
2014 IEEE JAVA NETWORKING PROJECT Receiver based flow control for networks in...
IEEEFINALSEMSTUDENTSPROJECTS
 
IEEE 2014 JAVA NETWORKING PROJECTS Receiver based flow control for networks i...
IEEEGLOBALSOFTSTUDENTPROJECTS
 
Primer to Browser Netwroking
Shuya Osaki
 
Master Class : TCP/IP Mechanics from Scratch to Expert
Abhishek Sagar
 
Lec 2.pptx
ahmedraed19
 
Lecture 19 22. transport protocol for ad-hoc
Chandra Meena
 
Analytical Research of TCP Variants in Terms of Maximum Throughput
IJLT EMAS
 
Online TCP-IP Networking Assignment Help
Computer Network Assignment Help
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
 
Mobile Transpot Layer
Maulik Patel
 
High Performance Networking with Advanced TCP
Dilum Bandara
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OpenvSwitch
 
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
Enhancing HTTP Web Protocol Performance with Updated Transport Layer Techniques
IJCNCJournal
 
Computer network (11)
NYversity
 
Ad

More from Kae Hsu (8)

PPT
FEGTS IP training - TCP/IP Introduction
Kae Hsu
 
PPT
TWNIC 13th OPM session
Kae Hsu
 
PPT
How Internet Works
Kae Hsu
 
PDF
Redundant Internet service provision - customer viewpoint
Kae Hsu
 
PDF
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 
PDF
r2
Kae Hsu
 
PDF
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 
PDF
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 
FEGTS IP training - TCP/IP Introduction
Kae Hsu
 
TWNIC 13th OPM session
Kae Hsu
 
How Internet Works
Kae Hsu
 
Redundant Internet service provision - customer viewpoint
Kae Hsu
 
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 
Suggestions for end users to deploy multihoming, load-balance and load-sharing
Kae Hsu
 

Recently uploaded (20)

PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PPTX
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 

Network and TCP performance relationship workshop

  • 1. TWNOG WORKSHOP 2010/7/2, Taipei 網路維運常見問題原因、偵錯 (Troubleshooting) 技術解析 網路與 TCP 效能關聯探討 智匯亞洲有限公司 許至凱 CCIE/JNCIE kaeatforum [at] gmail.com
  • 2. Objects 對象:網路設備操作、維運人員 了解有那些網路環境因子會對於 TCP 效能造成影響,以連結網路維運與網路應用程式效能,做為網路環境改善方式的參考。 了解 TCP 運作原理 那些網路事件發生時將影響 TCP 效能表現? 因應對策
  • 3. Agenda TCP Briefing TCP Performance Factors Network Event Impact Improvement – Network approach Improvement – Appliance approach Reference
  • 4. TCP Briefing TCP/IP stack in a computer system Linux Application Socket Layer (net/socket.c) Inet Layer (net/ipv4/af_inte.c) IP Layer (various ip files in net/ipv4) TCP Layer (net/ipv4/tcp.c) UDP Layer (net/ipv4/udp.c) Ethernet Device Driver Ethernet Card Other Drivers Parallel/Serial/Other Interface Drivers
  • 5. TCP Briefing TCP/IP stack in a computer system Windows TCP/IP Stack (Tcpip.sys) Windows Sockets Applications Windows Sockets AFD WSK Clients WSK NetBT and other TDI clients TDI TDX TCP UDP RAW IPv6 IPv4 802.3 PPP 802.11 Loopback IPv4 Tunnel NDIS User Kernel
  • 6. TCP Briefing TCP/IP position in computer and network environment
  • 7. TCP Briefing TCP header format (RFC793)
  • 8. TCP Briefing TCP header format (updated by RFC3168)
  • 9. TCP Performance Factors TCP Performance Factors Monitoring Tools Flow control Congestion control
  • 10. TCP Performance Factors Measurement tools Monitoring tools tcpdump On Windows platform - Wireshark tcpstat Benchmarking tools ttcp Netperf NetPIPE DBS (Distributed Benchmark System)
  • 11. TCP Performance Factors Flow control Sliding Window (window size = 6 in the example) Step 1 Step 2 Step 3 Step 4 Time 已收到 ACK 等待 ACK 中 可傳送區間 不可傳送區間 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0 12 13 11 10 9 8 7 6 5 4 3 2 1 0
  • 12. TCP Performance Factors Flow control Window Size Adjustment “ Receiver window size filed” in TCP header
  • 13. TCP Performance Factors Congestion Control Flow control 讓接收端控制進入之流量,避免 buffer overflow 情況發生 藉由 AdvertisedWindow 調整發送端 window size 無法反應網路連線狀況 無法避免所經網路是否有類似 buffer overflow 情況發生 為能偵測可能的網路壅塞, TCP 使用 Congestion control 。 藉由 CongestionWindow (cwnd) 來進行調整 Congestion control 主要含四種方式 (RFC5681) : Slow start Congestion avoidance Fast retransmit Fast recovery
  • 14. TCP Performance Factors Slow start TCP connection 剛建立時,使用小的 window size 。等到收到 ACK 後再慢慢增加。 cwnd 初始值為 1 旨在偵測網路頻寬狀況 每收到 1 個 ACK 則 cwnd+1 如此一來,每經過一個 round-trip time (RTT) , cwnd 的值則變成上一次 RTT 的兩倍 指數成長 為避免 cwnd 增加太快,俟 cwnd 超過” slow start threshold, ssthresh” 後,每一 RTT 只增加 1 線性成長
  • 15. TCP Performance Factors Congestion avoidance 在此階段 : cwnd > ssthresh cwnd + 1 for each RTT 當有 packet loss 發生時,則 : ssthresh -> cwnd/2 cwnd -> 1 packet retransmission 一旦 packet loss 發生時, TCP Performance 將受到嚴重影響。
  • 16. TCP Performance Factors Slow start & Congestion avoidance characteristic
  • 17. TCP Performance Factors Fast retransmit (Tahoe) 仍套用 slow start + congestion avoidance sender 收到 3 個 duplicate ACK 後即重新傳送封包 避免 sender timeout 後,因必須調整 ssthreh/cwnd 造成 TCP 效能嚴重下降 Fast recovery (Reno) 先套用 fast retransmit 收到 duplicate 封包後即進入 congestion avoidance 再執行 fast recovery ssthresh -> cwnd/2 重送封包 cwnd -> ssthresh + 3 NewReno, SACK, Vegas….. 都在 TCP 端進行效能改善
  • 18. Network Event Impact Packet loss By TCP congestion control, packet loss will launch TCP retransmission 儘管 TCP congestion control 做的再好, packet loss 都會造成 TCP Performance downgrade
  • 19. Network Event Impact Packet out-of-order Packet out-of-order 時 , 雖然 TCP 能夠將封包組回 , 但若 TCP fast recovery 作用時反可能會造成資源浪費 Reno 在收到 duplicate ACK 後即會開始重送封包,直到收到 Partial ACK 後才停止。 若 packet 只是慢點到而不是不到,則 sender 勢必會重傳不需要重傳的封包,造成資源浪費。 NewReno 為改善 Reno 的效率,會在收到 Final ACK 後才停止重傳遺失封包。 NewReno 會重覆送的封包數量有可能比 Reno 還多。
  • 20. Improvement – Network approach Reduce packet loss Packet loss 對 TCP Performance 影響很大,網路環境中所有 packet loss 都應儘量排除。 Layer 1, layer 2 error Unqualified physical media CRC, P3 error etc… Layer 3 Router/Switch hardware or software error Congestion Reduce congestion impact by QoS deployment Avoid packet drop for high sensitive TCP application
  • 21. Improvement – Network approach Packet forward process without QoS Tail-drop 網路設備 hardware queue 因線路擁塞而被佔滿,在無法容納更多待傳送封包後直接將待傳送封包丟棄。 Hardware queue 無法判斷 packet priority ,一但發生 queue 塞滿的情況時則無差別的將封包丟棄。 此類情況即為 Tail-drop 要儘量避免發生 Tail-drop 情況。
  • 22. Improvement – Network approach Packet forward process with QoS 先使用不同的 logical queue 來存放 priority 不同的封包,再置入 h/w queue 中。在 H/W queue 塞滿之前,主動丟棄某些暫存於 low priority queue 的封包,防止 Tail-drop 情況發生。 RED – Random Early Detection WRED – Weighted Random Early Detection
  • 23. Improvement – Network approach Reduce out-of-order packets 避免同一 TCP session 走在不同的 path 上 Per-packet load-sharing Load-sharing by destination IP only Per-flow load-sharing Load-sharing by IP packet hash value. Hash index includes: Source IP 、 Destination IP Protocol Source Port 、 Destination Port 有著相同 hash 值的封包會走相同的 next-hop interface ,避免 packet out-of-order 情況發生。 TCP 實作 Selective Acknowledgements RFC2018 RFC2883
  • 24. Improvement – Appliance approach Operating System has to handle TCP session routine It’s CPU/Memory dependent Huge TCP session will occupy system resource like CPU cycles and memory utilization, and shrink the real service processes in asking CPU/Memory Reduce system resource consumption in TCP session handling TCP Offload TCP Optimization
  • 25. Improvement – Appliance approach TCP Offload Migrate TCP handling out of kernel Use dedicate hardware to handle TCP Save system resource for real service processes TOE (TCP Offload Engine) NIC Handle TCP/IP on NIC
  • 26. Improvement – Appliance approach TCP Offload NIC w/o TOE and NIC w/ TOE comparison
  • 27. Improvement – Appliance approach TCP Offload TOE is wide deployed in iSCSI environment iSCSI:
  • 28. Improvement – Appliance approach TCP Optimization Migrate huge TCP session out of system For any TCP session, 3-way handshaking and 4-way handshaking is necessary 3-way handshaking for TCP connection establishment 4-way handshaking for TCP connection termination Reduce TCP connection number will reduce connection “overhead” Deploy dedicate hardware in the front of servers
  • 29. Improvement – Appliance approach TCP Optimization Regular TCP connection Client Server SYN ACK SYN+ACK GET FIN ACK ACK Data Data Data FIN
  • 30. Improvement – Appliance approach TCP Optimization Reduce server TCP connection number Only ONE 3-way handshaking is necessary in early stage Client Server TCP Proxy SYN ACK SYN+ACK GET FIN ACK ACK Data Data Data GET Data Data Data FIN
  • 31. Improvement – Appliance approach TCP Optimization 現實環境中很少僅用來改善 TCP 效能 多搭配其它功能 L4~L7 load-balance 由於 Client TCP connection end-to-end 是建立在 TCP Proxy 上,更多其它功能可以被加入 SSL 加速 Reverse cache
  • 32. Reference Books High-Speed Networks and Internets – Performance and Quality of Service, 2nd Ed. By William Stallings ; Prentice Hall High Performance TCP/IP Networking – Concepts, Issues and Solutions By Mahbub Hassan and Raj Jain ; Pearson Prentice Hall TCP/IP Illustrated, Volume 1 By W. Richard Stevens ; Addison Wesley Articles TCP Performance By Geoff Huston ; The Internet Protocol Journal - Volume 3, No. 2 A very good “sliding window” description http://www.it.uu.se/edu/course/homepage/datakom/civinght04/schema/sliding_window.pps
  • 33. Q & A