Mark-and-Sweep Algorithm: Foundations of Garbage Collection

Illustration for Mark-and-Sweep Algorithm: Foundations of Garbage Collection
By Last updated:

Garbage Collection (GC) in Java is built on several foundational algorithms, and one of the earliest and most important is the Mark-and-Sweep algorithm. This method introduced the idea of automatically reclaiming unused memory by identifying unreachable objects and clearing them from the heap.

In this tutorial, we’ll explore how the Mark-and-Sweep algorithm works, its role in the JVM, real-world relevance, and how modern collectors have evolved from this foundation.


Why the Mark-and-Sweep Algorithm Matters

  • It’s the basis of most modern garbage collectors (G1, CMS, ZGC).
  • Helps developers understand object reachability and root sets.
  • Explains GC pauses and performance bottlenecks.
  • Essential for debugging memory leaks and tuning JVMs.

Analogy: Imagine a janitor checking every desk in an office (mark phase) and then throwing away unused papers (sweep phase). That’s how Mark-and-Sweep reclaims memory.


How the Mark-and-Sweep Algorithm Works

The process involves two main phases: Mark and Sweep.

1. Mark Phase

  • Start from GC Roots (e.g., local variables, static fields, active threads).
  • Traverse the object graph, marking all reachable objects.
  • Unmarked objects are considered garbage.

2. Sweep Phase

  • Scan through the heap.
  • Collect unmarked objects and reclaim memory.
[GC Roots] → Reachable Objects → Marked
Others → Unmarked → Collected in Sweep

Example Code Demonstrating Garbage Creation

public class MarkSweepDemo {
    public static void main(String[] args) {
        String data1 = new String("Hello");
        String data2 = new String("World");
        data1 = null; // Eligible for GC
        System.gc(); // May trigger GC (Mark-and-Sweep in action)
        System.out.println("GC suggested");
    }
}
  • data1 becomes unreachable → marked as garbage.
  • GC reclaims it in the sweep phase.

Strengths of Mark-and-Sweep

  • Simple and effective → The foundation of GC.
  • Automatic memory reclamation.
  • Eliminates manual memory management.

Limitations of Mark-and-Sweep

  • Stop-the-World Pauses → Application threads pause during GC.
  • Fragmentation → Memory may become scattered.
  • Performance Issues → Inefficient for large heaps.

Evolution Beyond Mark-and-Sweep

Modern collectors improve on Mark-and-Sweep with additional techniques:

  • Mark-Compact → Compacts heap to reduce fragmentation.
  • Copying Collector → Moves live objects to new space for efficiency.
  • Generational GC → Young vs Old generation for optimized collection.
  • Concurrent Collectors → Reduce pause times (CMS, G1, ZGC, Shenandoah).

GC Algorithms Built on Mark-and-Sweep

  • CMS (Concurrent Mark-Sweep) → Parallelized marking, concurrent sweeping.
  • G1 GC → Region-based, incremental marking.
  • ZGC & Shenandoah → Ultra-low pause collectors with concurrent marking.

Monitoring and Tuning

  • Use -XX:+PrintGCDetails and -XX:+PrintGC to observe GC logs.
  • Tools like VisualVM, JMC, and JFR reveal pause times and heap usage.
  • Modern GCs handle fragmentation better, but tuning heap sizes still matters.

Real-World Case Study

A web server with a large heap experienced frequent pauses due to fragmentation from Mark-and-Sweep GC. Switching to G1 GC reduced pauses and improved throughput by balancing incremental collection and compaction.


JVM Version Tracker

  • Java 8 → Parallel GC default; CMS used widely.
  • Java 9 → G1 became default; CMS deprecated.
  • Java 11 → ZGC introduced.
  • Java 17 → ZGC & Shenandoah stable.
  • Java 21+ → Project Lilliput reduces object header size for efficiency.

Best Practices

  • Understand the basics of Mark-and-Sweep before tuning advanced collectors.
  • Use modern collectors (G1, ZGC, Shenandoah) for production.
  • Avoid unnecessary System.gc() calls.
  • Monitor GC logs and heap fragmentation patterns.

Conclusion & Key Takeaways

  • The Mark-and-Sweep algorithm laid the foundation for JVM garbage collection.
  • It works in two phases: mark reachable objects and sweep away garbage.
  • Limitations include pauses and fragmentation, which modern collectors solve.
  • Knowledge of Mark-and-Sweep is essential for understanding GC internals.

FAQs

1. What is the JVM memory model and why does it matter?
It defines thread interactions with memory, ensuring consistency and correctness.

2. How does G1 GC differ from CMS?
G1 is region-based with compaction, while CMS suffered from fragmentation.

3. When should I use ZGC or Shenandoah?
For applications requiring ultra-low latency and large heaps.

4. What are JVM safepoints?
Points where threads pause to allow GC or JIT optimizations.

5. How do I solve GC fragmentation issues?
Use compaction-based collectors (G1, ZGC, Shenandoah).

6. What is the downside of Mark-and-Sweep?
Pauses and memory fragmentation.

7. How do I monitor Mark-and-Sweep activity?
Enable GC logs and analyze with tools like JMC or GCViewer.

8. How does JIT affect GC?
JIT can reduce allocations (via escape analysis), lowering GC frequency.

9. What’s new in Java 21 for GC?
NUMA-aware GC and Project Lilliput improve efficiency and pause reduction.

10. How does GC differ in microservices vs monoliths?
Microservices focus on predictable low latency; monoliths prioritize throughput.