HashSet
is one of the most efficient and widely used implementations of the Set
interface in Java. Backed by a HashMap
, it ensures that no duplicate elements are stored while offering constant-time performance for basic operations.
This tutorial demystifies the internals of HashSet
, how it leverages HashMap
, its performance profile, and how to use it effectively in enterprise-grade Java applications.
📚 What is a HashSet?
- Implements the
Set
interface - Ensures uniqueness of elements
- Backed internally by a
HashMap
- Offers average-case O(1) time for
add
,remove
, andcontains
operations
Set<String> names = new HashSet<>();
names.add("Alice");
names.add("Bob");
names.add("Alice"); // Duplicate ignored
System.out.println(names); // [Alice, Bob]
🧠 Internal Structure: HashMap at Work
Internally, HashSet
uses a HashMap<E, Object>
where each element becomes a key, and a dummy constant PRESENT
is used as the value.
private transient HashMap<E,Object> map;
private static final Object PRESENT = new Object();
When you call hashSet.add("Java")
, it internally calls:
map.put("Java", PRESENT);
This leverages all of HashMap
’s hashing, bucket, and collision resolution logic.
🧩 How HashSet Ensures Uniqueness
- Uses the
hashCode()
of the object to find a bucket - Then uses
equals()
to check for actual value equality - If both match, the element is not added (duplicate)
- Otherwise, it’s inserted into the appropriate bucket
⚠️ Important: Custom objects in a HashSet must override both equals()
and hashCode()
.
@Override
public int hashCode() {
return Objects.hash(id);
}
@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
MyObject other = (MyObject) obj;
return id == other.id;
}
⏱️ Performance and Time Complexity
Operation | Time Complexity | Notes |
---|---|---|
add() |
O(1) average | O(n) worst (collisions) |
remove() |
O(1) average | O(n) worst |
contains() |
O(1) average | O(n) worst |
iterator() |
O(n) | Linear traversal |
🔍 Memory Layout
- HashMap buckets are stored in an array of
Node<K, V>
- Each node contains:
hash
,key
,value
,next
- Collisions are resolved by chaining (linked list or balanced tree)
Since values are constant (PRESENT
), only keys occupy logical space.
🔄 Real-World Use Cases
- Removing duplicates from a list or stream
- Tracking visited nodes in algorithms (DFS/BFS)
- Caching or membership testing (e.g., already processed elements)
List<String> emails = List.of("a@example.com", "b@example.com", "a@example.com");
Set<String> unique = new HashSet<>(emails); // Removes duplicates
🧬 Streams + HashSet
List<String> items = List.of("apple", "banana", "apple");
Set<String> unique = items.stream()
.collect(Collectors.toSet());
Preserve insertion order:
Set<String> ordered = items.stream()
.collect(Collectors.toCollection(LinkedHashSet::new));
🔁 Comparison: HashSet vs TreeSet vs LinkedHashSet
Feature | HashSet | LinkedHashSet | TreeSet |
---|---|---|---|
Order | No | Insertion order | Sorted order |
Performance | O(1) | O(1) | O(log n) |
Null support | Yes (1 null) | Yes (1 null) | No (unless handled) |
Backed by | HashMap | HashMap | TreeMap |
❌ Anti-Patterns and Pitfalls
- ❌ Failing to override
equals()
andhashCode()
in custom objects - ❌ Assuming order (use
LinkedHashSet
instead) - ❌ Modifying the set during iteration without using
Iterator.remove()
✅ Always use:
Iterator<String> it = set.iterator();
while (it.hasNext()) {
if (condition) it.remove();
}
📌 What's New in Java Versions?
Java 8
- Streams +
Collectors.toSet()
removeIf
,forEach
for functional ops
Java 9
Set.of(...)
for immutable sets
Java 10
var
keyword simplifies declarations
Java 11
- Minor internal performance tuning
Java 17 & 21
- Garbage collector improvements affecting hash collections
- More efficient string hashing in JDK core
🔧 Refactoring Example
Before:
List<String> raw = Arrays.asList("a", "b", "a");
List<String> clean = new ArrayList<>();
for (String s : raw) {
if (!clean.contains(s)) clean.add(s);
}
After (HashSet):
List<String> clean = new ArrayList<>(new HashSet<>(raw));
🧠 Real-World Analogy
A HashSet is like a digital guest list: if someone tries to RSVP twice, the system silently ignores them. The hash function assigns each guest a unique bucket, and equality checks avoid duplication.
❓ FAQ – Expert-Level Questions
-
Can HashSet store null?
- Yes, only one null value allowed.
-
What is the load factor in HashSet?
- Default is 0.75; triggers resizing when exceeded.
-
How does resizing affect performance?
- Causes rehashing, which is expensive.
-
How to avoid resizing?
- Use
new HashSet<>(expectedSize, loadFactor)
- Use
-
Is HashSet thread-safe?
- No. Use
Collections.synchronizedSet()
- No. Use
-
How does HashSet handle collisions?
- Uses linked list or tree structure per bucket
-
How many collisions break HashMap performance?
- Excessive collisions lead to O(n) operations
-
What is PRESENT used for?
- Dummy constant as value in internal
HashMap
- Dummy constant as value in internal
-
How to make HashSet immutable?
- Use
Set.of(...)
(Java 9+)
- Use
-
How to sort a HashSet?
Set<String> sorted = new TreeSet<>(hashSet);
🏁 Conclusion and Key Takeaways
HashSet
is the go-to for uniqueness with performance- Internally powered by
HashMap
- Override
equals()
andhashCode()
for custom objects - Use
Collectors.toSet()
in streams for modern Java code - Avoid using it when order or thread-safety is essential
💡 Pro Tip: If your object’s
hashCode()
is unstable or mutable — HashSet will betray you.