If you’ve ever noticed sudden pauses in your Java application—even when CPU usage seemed normal—you’ve likely encountered a Stop-the-World (STW) event. These pauses are moments when the JVM halts all application threads so it can perform critical internal tasks like Garbage Collection (GC), JIT compilation, or class redefinition.
In this tutorial, we’ll explore what Stop-the-World events are, why they happen, their impact on performance, and strategies to minimize them in production systems.
Why Stop-the-World Events Matter
- Directly impact application latency.
- Critical in low-latency systems like trading platforms and real-time services.
- Understanding STW is essential for GC tuning and troubleshooting.
Analogy: Imagine a busy highway where all traffic is stopped so workers can repair the road. Once repairs finish, cars move again. This is exactly how JVM STW pauses work.
What is a Stop-the-World Event?
A Stop-the-World (STW) event occurs when the JVM pauses all non-JVM threads to safely perform memory management or other internal operations. No application code executes during this pause.
Common Triggers of STW Events
- Garbage Collection (GC) → Most frequent cause.
- JIT Compilation → Methods compiled to native code.
- Heap Dump Creation → For debugging.
- Class Redefinition / Loading → During dynamic changes.
- Biased Lock Revocation → Synchronization-related.
Garbage Collection and STW
Most garbage collectors in the JVM rely on STW events:
- Minor GC → Brief pause to clean the Young Generation.
- Major/Full GC → Longer pause to clean the Old Generation.
- Compact Phase → Moves objects to reduce fragmentation.
Even concurrent collectors (G1, ZGC, Shenandoah) require short STW phases for safety.
Example: Observing STW in Action
public class STWExample {
public static void main(String[] args) {
for (int i = 0; i < 1000000; i++) {
String s = new String("STW-" + i);
}
System.gc(); // Suggests a Full GC -> STW pause likely
System.out.println("Finished");
}
}
Running this with -XX:+PrintGCDetails
will show GC pauses indicating STW events.
Impact of STW Events
- Latency spikes in user-facing systems.
- Throughput reduction in batch systems.
- Unpredictable performance in microservices.
- Jitter in real-time applications.
GC Algorithms and STW Duration
- Serial GC → Long STW, not for large heaps.
- Parallel GC → Uses multiple threads, shorter pauses.
- CMS → Concurrent phases but still has STW.
- G1 GC → Region-based, predictable pause times.
- ZGC & Shenandoah → Concurrent, ultra-low STW (<10ms).
Monitoring STW Events
- Java Flight Recorder (JFR) → Low-overhead monitoring.
- Java Mission Control (JMC) → Visualize pauses and GC activity.
- VisualVM → Heap and GC graphs.
- jstat → Command-line GC statistics.
JVM Tuning to Reduce STW Pauses
-XX:+UseG1GC
→ Use G1 for balanced pauses.-XX:+UseZGC
or-XX:+UseShenandoahGC
→ For ultra-low latency.-XX:MaxGCPauseMillis=<n>
→ Target pause times.-Xms
/-Xmx
→ Proper heap sizing to reduce Full GCs.- Profile workloads before tuning.
Real-World Case Study
A high-frequency trading application suffered 500ms pauses due to Full GCs. By switching from CMS to ZGC and tuning heap size, pauses dropped to <5ms, enabling stable, predictable latency.
JVM Version Tracker
- Java 8 → Parallel GC default, CMS widely used.
- Java 9 → G1 became default.
- Java 11 → ZGC introduced (experimental).
- Java 17 → ZGC & Shenandoah stable.
- Java 21+ → NUMA-aware GC and Project Lilliput improve pause efficiency.
Best Practices
- Use modern collectors (G1, ZGC, Shenandoah).
- Avoid unnecessary
System.gc()
calls. - Tune Young vs Old Gen sizes to balance collections.
- Monitor GC logs regularly.
- For microservices, prioritize predictable low-latency GC.
Conclusion & Key Takeaways
- Stop-the-World events pause all threads for JVM internal work.
- GC is the primary cause, but other JVM activities also trigger STW.
- Modern GCs minimize pause duration, but STW cannot be eliminated entirely.
- Monitoring and tuning are essential for reducing STW impact in production.
FAQs
1. What is the JVM memory model and why does it matter?
It defines memory interaction rules across threads, ensuring safe concurrency.
2. How does G1 GC differ from CMS?
G1 is region-based with predictable pauses; CMS was prone to fragmentation.
3. When should I use ZGC or Shenandoah?
For latency-sensitive applications needing ultra-low pause times.
4. What are JVM safepoints?
Points where all threads pause for GC or JIT optimizations.
5. How do I solve long STW pauses?
Switch to concurrent collectors, tune heap, and monitor GC logs.
6. What are the trade-offs of throughput vs latency in GC?
Throughput maximizes work done, latency minimizes pause times.
7. How do I read STW duration from logs?
Use -XX:+PrintGCDetails
and analyze with JMC or GCViewer.
8. How does JIT compilation affect STW?
JIT can trigger STW when compiling methods, though usually short.
9. What’s new in Java 21 for STW reduction?
Project Lilliput reduces object header size, cutting GC pause times.
10. How does GC differ in microservices vs monoliths?
Microservices focus on predictable latency; monoliths optimize throughput.