HashSet in Java – How It Works Internally

Illustration for HashSet in Java – How It Works Internally
By Last updated:

HashSet is one of the most efficient and widely used implementations of the Set interface in Java. Backed by a HashMap, it ensures that no duplicate elements are stored while offering constant-time performance for basic operations.

This tutorial demystifies the internals of HashSet, how it leverages HashMap, its performance profile, and how to use it effectively in enterprise-grade Java applications.


📚 What is a HashSet?

  • Implements the Set interface
  • Ensures uniqueness of elements
  • Backed internally by a HashMap
  • Offers average-case O(1) time for add, remove, and contains operations
Set<String> names = new HashSet<>();
names.add("Alice");
names.add("Bob");
names.add("Alice"); // Duplicate ignored
System.out.println(names); // [Alice, Bob]

🧠 Internal Structure: HashMap at Work

Internally, HashSet uses a HashMap<E, Object> where each element becomes a key, and a dummy constant PRESENT is used as the value.

private transient HashMap<E,Object> map;
private static final Object PRESENT = new Object();

When you call hashSet.add("Java"), it internally calls:

map.put("Java", PRESENT);

This leverages all of HashMap’s hashing, bucket, and collision resolution logic.


🧩 How HashSet Ensures Uniqueness

  • Uses the hashCode() of the object to find a bucket
  • Then uses equals() to check for actual value equality
  • If both match, the element is not added (duplicate)
  • Otherwise, it’s inserted into the appropriate bucket

⚠️ Important: Custom objects in a HashSet must override both equals() and hashCode().

@Override
public int hashCode() {
    return Objects.hash(id);
}

@Override
public boolean equals(Object obj) {
    if (this == obj) return true;
    if (obj == null || getClass() != obj.getClass()) return false;
    MyObject other = (MyObject) obj;
    return id == other.id;
}

⏱️ Performance and Time Complexity

Operation Time Complexity Notes
add() O(1) average O(n) worst (collisions)
remove() O(1) average O(n) worst
contains() O(1) average O(n) worst
iterator() O(n) Linear traversal

🔍 Memory Layout

  • HashMap buckets are stored in an array of Node<K, V>
  • Each node contains: hash, key, value, next
  • Collisions are resolved by chaining (linked list or balanced tree)

Since values are constant (PRESENT), only keys occupy logical space.


🔄 Real-World Use Cases

  • Removing duplicates from a list or stream
  • Tracking visited nodes in algorithms (DFS/BFS)
  • Caching or membership testing (e.g., already processed elements)
List<String> emails = List.of("a@example.com", "b@example.com", "a@example.com");
Set<String> unique = new HashSet<>(emails); // Removes duplicates

🧬 Streams + HashSet

List<String> items = List.of("apple", "banana", "apple");

Set<String> unique = items.stream()
    .collect(Collectors.toSet());

Preserve insertion order:

Set<String> ordered = items.stream()
    .collect(Collectors.toCollection(LinkedHashSet::new));

🔁 Comparison: HashSet vs TreeSet vs LinkedHashSet

Feature HashSet LinkedHashSet TreeSet
Order No Insertion order Sorted order
Performance O(1) O(1) O(log n)
Null support Yes (1 null) Yes (1 null) No (unless handled)
Backed by HashMap HashMap TreeMap

❌ Anti-Patterns and Pitfalls

  • ❌ Failing to override equals() and hashCode() in custom objects
  • ❌ Assuming order (use LinkedHashSet instead)
  • ❌ Modifying the set during iteration without using Iterator.remove()

✅ Always use:

Iterator<String> it = set.iterator();
while (it.hasNext()) {
    if (condition) it.remove();
}

📌 What's New in Java Versions?

Java 8

  • Streams + Collectors.toSet()
  • removeIf, forEach for functional ops

Java 9

  • Set.of(...) for immutable sets

Java 10

  • var keyword simplifies declarations

Java 11

  • Minor internal performance tuning

Java 17 & 21

  • Garbage collector improvements affecting hash collections
  • More efficient string hashing in JDK core

🔧 Refactoring Example

Before:

List<String> raw = Arrays.asList("a", "b", "a");
List<String> clean = new ArrayList<>();
for (String s : raw) {
    if (!clean.contains(s)) clean.add(s);
}

After (HashSet):

List<String> clean = new ArrayList<>(new HashSet<>(raw));

🧠 Real-World Analogy

A HashSet is like a digital guest list: if someone tries to RSVP twice, the system silently ignores them. The hash function assigns each guest a unique bucket, and equality checks avoid duplication.


❓ FAQ – Expert-Level Questions

  1. Can HashSet store null?

    • Yes, only one null value allowed.
  2. What is the load factor in HashSet?

    • Default is 0.75; triggers resizing when exceeded.
  3. How does resizing affect performance?

    • Causes rehashing, which is expensive.
  4. How to avoid resizing?

    • Use new HashSet<>(expectedSize, loadFactor)
  5. Is HashSet thread-safe?

    • No. Use Collections.synchronizedSet()
  6. How does HashSet handle collisions?

    • Uses linked list or tree structure per bucket
  7. How many collisions break HashMap performance?

    • Excessive collisions lead to O(n) operations
  8. What is PRESENT used for?

    • Dummy constant as value in internal HashMap
  9. How to make HashSet immutable?

    • Use Set.of(...) (Java 9+)
  10. How to sort a HashSet?

Set<String> sorted = new TreeSet<>(hashSet);

🏁 Conclusion and Key Takeaways

  • HashSet is the go-to for uniqueness with performance
  • Internally powered by HashMap
  • Override equals() and hashCode() for custom objects
  • Use Collectors.toSet() in streams for modern Java code
  • Avoid using it when order or thread-safety is essential

💡 Pro Tip: If your object’s hashCode() is unstable or mutable — HashSet will betray you.