Users complain about slow page loads, API response delays, and soaring bandwidth costs, yet few realize the bottleneck might lie in gzip—a compression algorithm that has faithfully served for decades. While it was once sufficient, gzip now struggles to keep pace with today's dynamic content and high-concurrency demands. That's why we've integrated the modern compression algorithm zstd into OpenResty Edge. By leveraging lower CPU overhead, we achieve higher compression ratios and faster transmission speeds. Curious about how to embrace next-generation compression technology in OpenResty Edge? This article has the answers you're looking for: https://lnkd.in/gS8zrGKj
How zstd improves OpenResty Edge's performance
More Relevant Posts
-
How Are OpenTelemetry and Fluent Bit Related? Learn about Fluent Bit's relationship to OpenTelemetry and its evolution in capturing not only logs but local metrics such as CPU, memory and storage use.
To view or add a comment, sign in
-
Learn about Fluent Bit's relationship to OpenTelemetry and its evolution in capturing not only logs but local metrics such as CPU, memory and storage use. By Phil Wilkins, thanks to Chronosphere
To view or add a comment, sign in
-
𝗪𝗲 𝗼𝗻𝗰𝗲 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝘄𝗼𝘂𝗹𝗱 𝗺𝗮𝗸𝗲 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗳𝗮𝘀𝘁𝗲𝗿. Spoiler: it didn’t. We were saving bandwidth like champs. But then latency crept up. CPU usage hit the roof. And users started noticing slower responses. That’s when it hit me — compression is a trade-off, not a magic fix. You’re basically trading CPU time for bandwidth. Sometimes it’s worth it. Sometimes it’s a disaster. Here’s what I’ve learned from that mess: • If your system is CPU-bound, compression will hurt more than it helps. • Tiny payloads? Don’t bother. The headers alone might make them bigger. • Stop using gzip for everything. LZ4 is better for real-time stuff. Brotli wins for static assets. • Adaptive compression is the real deal. Let your system decide when to compress based on data size and latency budget. And for the love of clean architecture, measure everything. Don’t just benchmark compression speed. Measure end-to-end latency, including decompression on the client side. Because yeah, saving 10KB sounds great… until it adds 50ms to your API response. The takeaway? Compression vs latency isn’t about sides. It’s about context. Figure out your constraints. Pick your battles. And maybe skip that “compress everything” toggle next time.
To view or add a comment, sign in
-
-
Bloom Filter Go optimization journey: 3s -> 440ms -> 675ms(!!) -> 66µs A Go optimization story. I just published a deep dive on the (sometimes painful) process of optimizing my Go Bloom filter. It's a real-world "pprof" war story, with a 6,595x speedup at the end. It started with a benchmark that took 3 seconds per op. Spoiler: The benchmark was lying. Profiling showed 79% of CPU time was in fmt.Sprintf. After fixing the test, the real baseline was 440ms. pprof showed the new bottleneck: runtime.mallocgc and runtime.mapassign_fast64. I was allocating a map on every single Add/Contains call. * My first "fix" (map pooling with for...delete) was a disaster. It was 1.5x slower (~675ms). Lesson learned: Zero Allocations ≠ Fast. * The real win came from a data structure change: replacing map[uint64] with a direct-access array. This gave O(1) indexing and zero allocations, bringing the time down to 66 microseconds. The journey taught me some critical lessons about performance: 1. Your Benchmark is Lying: My first profile showed 79% of CPU time was in fmt.Sprintf, not my code. Always pprof your tests! 2. Zero Allocations ≠ Fast: My first map pooling "fix" achieved 1 alloc/op but was 1.5x slower. We traded a GC bottleneck for a worse CPU bottleneck (O(N) for...delete). 3. Data Structure > Allocations: The real win wasn't just fewer allocations. It was replacing map[int] with array[...]. This single change gave O(1) access and eliminated 74% of CPU overhead. I documented the entire saga, with all the profiles, failed attempts, and benchmark charts. If you're into Go performance, profiling, or optimization, this one's for you. You can read about the full story here: https://lnkd.in/dCzTvbFS #golang #performance #optimization #pprof #softwareengineering #simd #go
To view or add a comment, sign in
-
This blog explains how a high-traffic payment service in Go was CPU-bound, so its most intensive endpoints were rewritten in Rust for better performance and efficiency. The new Rust implementation doubled traffic capacity and cut CPU usage, saving nearly $300,000 annually.
To view or add a comment, sign in
-
React Tip: Pass a Function to useState, Don’t Call It We’ve all written code like this before: const [value, setValue] = useState(getInitialValue()); It looks innocent — but this line hides a subtle performance pitfall. When you call useState(getInitialValue()), that function executes on every render, not just the first one. That’s because React re-runs your component function on every render, and the argument to useState() is evaluated before React decides whether to reuse the existing state. If getInitialValue() does something expensive — like computing, parsing, or reading from storage — you’re wasting CPU cycles on every render. function getInitialValue() { console.log("getInitialValue() called"); // imagine something slow here... return Math.random(); } function ExampleComponent() { const [value, setValue] = useState(getInitialValue); return ( <div> <p>Value: {value}</p> <button onClick={() => setValue(v => v + 1)}>Increment</button> </div> ); } If you open the console, you’ll see: getInitialValue( https://lnkd.in/gPA6WT8d
To view or add a comment, sign in
-
std::variant has been around since 2017 and it still never ceases to amaze me. It promises to hold objects of a fixed set of types (fundamental types/user defined types etc) in an internal buffer big enough for the largest type. IOWs: it *directly* holds the underlying object. The classic way of holding heterogeneous objects belonging to an inheritance hierarchy in a vector is via a (owning) pointer to the base class: class Base {}; class Derived1 : public Base {}; class Derived2 : public Base {}; ... class DerivedN : public Base {}; using MyCollection = std::vector<std::unique_ptr<Base>>; But even though the contents of the std::vector may fit in a cache line, the underlying objects would be lying elsewhere in memory - better if they come from a custom allocator, worse if they were just new'd randomly (which also leads to memory fragmentation). ***So, every time we try to access the underlying objects, we almost certainly nuke the CPU caches*** Enter std::variant, we can directly hold the underlying heterogeneous objects in a vector as: using MyCollection = std::vector<std::variant<Derived1, Derived2, ...DerivedN>>; This paves the way for the objects to fit in the cache lines, since there is no indirection now!!! As a result, once we access some object, other adjacent objects get pulled into the CPU caches owing to data locality leading to perf gains on subsequent accesses. The perf gains would be more pronounced if the objects are small in size, thus leading to more objects packed in a single cache line. So don't just approach std::variant to look and feel modern, you're getting free perf gains under the hood given the way CPU caches work :)
To view or add a comment, sign in
-
-
Reliability and large storage capacity are just two benefits that multilayer ceramic capacitors (MLCCs) and polymer tantalum capacitors bring to AI systems. We take a look at the role that MLCCs and polymer tantalum capacitors have in AI servers here: http://arw.li/60417zgCB
To view or add a comment, sign in
-
𝘆𝗼𝘂 𝘁𝗵𝗶𝗻𝗸 𝗼𝗻𝗹𝘆 𝗵𝘂𝗺𝗮𝗻𝘀 𝗵𝗮𝘃𝗲 𝗳𝗿𝗶𝗰𝘁𝗶𝗼𝗻𝘀? here are some cool examples of deep system conflict: • GC compaction invalidates CPU caches and TLBs, spoiling temporal locality. under memory pressure, they blame each other • CPUs reorder instructions for throughput, but exceptions (signals, page fault) demand precise flow state, leading CPU to rollback executions, undoing any gain from reordering • JIT optimizes based on currently visible bytecode. when new classes or modules load (e.g. OSGi) everything is invalidated, forcing deoptimization storms • NUMA optimizes for locality, paging optimizes for availability. locality vanishes once the OS swaps or migrates pages, defeating NUMA's purpose • GCC vectorizers expect predictable loops. polymorphic code breaks such assumptions, forcing de-vectorization and running into slow paths • instrumentation injects extra logic, monkey patches rewrite functions, profilers skew timing. every probe alters code paths, object lifetimes and code optimization. end result: the observer mutates the observed • hardware prefetchers assume linear access. in bad times, they pull stale or useless data, polluting caches • simultaneous multi threading boosts CPU utilization. but shared execution units cause random stalls, proving deadly for real time workloads • branch prediction boosts performance. but some security patches (e.g. fence instructions for spectre) nullify advantages, trading gigahertz for safety bottom line: conflicts in systems arise from unreconciled assumptions (machine) and incoherent creativity (man)
To view or add a comment, sign in
-
-
Our friends at Signal Integrity Journal ran an interesting article by Samtec’s Andrew Josephson, “ManyPoint Networks: A System Co-Design Framework for 448 Gbps AI Fabrics and Beyond.” This article introduces a hardware-centric definition of compute cluster bisection bandwidth as a performance metric for AI-scale 448 Gbps systems. Unlike traditional abstractions, this metric is grounded in physical interconnect layout and IO port availability, enabling system architects to evaluate bandwidth provisioning through real, bidirectional link paths. https://lnkd.in/gPEXRCWv
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development