Ever wondered how HashMap
and HashSet
work under the hood in Java? Or wanted to implement your own for better control, experimentation, or interview prep? Building a custom HashMap or HashSet gives you a deeper understanding of hashing, buckets, resizing, and performance bottlenecks.
This tutorial will walk you through the fundamentals of building these core data structures — from scratch — with examples, best practices, and expert-level guidance.
What Is a HashMap?
A HashMap is a key-value pair data structure that allows constant-time access on average using hashing. Internally, it uses:
- Hash function to compute bucket index
- Array of buckets to store entries
- Collision resolution (via chaining or probing)
HashSet = HashMap without values
A HashSet<E>
is effectively a HashMap<E, Object>
with dummy values.
Custom HashMap – Core Implementation
Step 1: Define the Entry Class
class Entry<K, V> {
final K key;
V value;
Entry<K, V> next;
Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
Step 2: Define the Map Class
public class MyHashMap<K, V> {
private static final int DEFAULT_CAPACITY = 16;
private Entry<K, V>[] buckets;
private int size = 0;
public MyHashMap() {
buckets = new Entry[DEFAULT_CAPACITY];
}
private int hash(K key) {
return key == null ? 0 : Math.abs(key.hashCode()) % buckets.length;
}
Step 3: put() Method
public void put(K key, V value) {
int index = hash(key);
Entry<K, V> existing = buckets[index];
while (existing != null) {
if ((key == null && existing.key == null) || (key != null && key.equals(existing.key))) {
existing.value = value;
return;
}
existing = existing.next;
}
Entry<K, V> newEntry = new Entry<>(key, value);
newEntry.next = buckets[index];
buckets[index] = newEntry;
size++;
}
Step 4: get() Method
public V get(K key) {
int index = hash(key);
Entry<K, V> current = buckets[index];
while (current != null) {
if ((key == null && current.key == null) || (key != null && key.equals(current.key))) {
return current.value;
}
current = current.next;
}
return null;
}
Step 5: remove() Method (Optional)
public void remove(K key) {
int index = hash(key);
Entry<K, V> current = buckets[index];
Entry<K, V> prev = null;
while (current != null) {
if ((key == null && current.key == null) || (key != null && key.equals(current.key))) {
if (prev == null) buckets[index] = current.next;
else prev.next = current.next;
size--;
return;
}
prev = current;
current = current.next;
}
}
}
Building a Custom HashSet
public class MyHashSet<E> {
private static final Object DUMMY = new Object();
private final MyHashMap<E, Object> map = new MyHashMap<>();
public void add(E key) {
map.put(key, DUMMY);
}
public boolean contains(E key) {
return map.get(key) != null;
}
public void remove(E key) {
map.remove(key);
}
}
Performance and Big-O Complexity
Operation | Average Time | Worst-case (many collisions) |
---|---|---|
put() | O(1) | O(n) |
get() | O(1) | O(n) |
remove() | O(1) | O(n) |
- Collision handling determines performance ceiling
- Java 8+ uses treeified buckets after a threshold
Real-World Use Cases
- Interview prep and algorithm practice
- Embedded systems where full Java collections aren’t available
- Teaching or educational platforms
- Low-GC custom cache or symbol table
Internal Memory Model
- Array of
Entry<K, V>
→ buckets - Each bucket → linked list of collisions
- Resizing logic usually doubles the capacity when a threshold (load factor) is reached
Java Version Tracker
📌 What’s New in Java?
- Java 8
- Tree-based buckets after 8 collisions
- Lambdas and functional patterns for iterating
- Java 9
Map.of()
for small immutable maps
- Java 10
- Type inference (
var
) for cleaner code
- Type inference (
- Java 21
- Better memory layout optimizations under the hood
Best Practices
- Always override
equals()
andhashCode()
correctly for keys - Keep load factor between 0.5 to 0.75 for performance
- Use prime bucket sizes to reduce collisions (optional)
- Consider using open addressing for high-read scenarios
Anti-Patterns
- Forgetting to resize — leads to performance degradation
- Poor hash function — causes clustering
- Using mutable objects as keys — breaks equality and hash logic
Refactoring Legacy Code
- Replace inefficient List-based lookups with custom
HashMap
- Use
HashSet
semantics for deduplication or uniqueness checks - Wrap existing classes in a
Map
interface for testing or logging
Conclusion and Key Takeaways
- Creating a custom
HashMap
orHashSet
boosts understanding of hashing, buckets, and collision resolution - It’s not just for fun — it’s vital for interviews, performance tuning, and building low-level libraries
- Be mindful of correct hash code implementation, collision handling, and memory management
FAQ – Custom HashMap and HashSet
-
Is Java’s HashMap faster than a custom one?
Yes — it's highly optimized and battle-tested. -
Can I build a resizable map?
Yes. Tracksize
and rehash when load factor exceeds threshold. -
What is treeification?
Replacing linked list buckets with balanced trees to improve worst-case lookup time. -
How does HashSet work internally?
It wraps aHashMap
with dummy values. -
Can I use null keys in my custom map?
Yes, but handlenull.hashCode()
safely. -
Should I use open addressing instead of chaining?
It depends — open addressing has better cache locality but is complex to implement. -
Can I use generics in custom HashMap?
Yes — always prefer<K, V>
for type safety. -
What if two keys have the same hashCode()?
They go into the same bucket — equality is resolved viaequals()
. -
Is custom map thread-safe?
No. Add synchronization or use concurrent alternatives. -
Why learn to build this manually?
It deepens understanding and prepares you for system design interviews.