How to Build a Custom HashMap or HashSet in Java – Internals, Code, and Best Practices

Illustration for How to Build a Custom HashMap or HashSet in Java – Internals, Code, and Best Practices
By Last updated:

Ever wondered how HashMap and HashSet work under the hood in Java? Or wanted to implement your own for better control, experimentation, or interview prep? Building a custom HashMap or HashSet gives you a deeper understanding of hashing, buckets, resizing, and performance bottlenecks.

This tutorial will walk you through the fundamentals of building these core data structures — from scratch — with examples, best practices, and expert-level guidance.


What Is a HashMap?

A HashMap is a key-value pair data structure that allows constant-time access on average using hashing. Internally, it uses:

  • Hash function to compute bucket index
  • Array of buckets to store entries
  • Collision resolution (via chaining or probing)

HashSet = HashMap without values

A HashSet<E> is effectively a HashMap<E, Object> with dummy values.


Custom HashMap – Core Implementation

Step 1: Define the Entry Class

class Entry<K, V> {
    final K key;
    V value;
    Entry<K, V> next;

    Entry(K key, V value) {
        this.key = key;
        this.value = value;
    }
}

Step 2: Define the Map Class

public class MyHashMap<K, V> {
    private static final int DEFAULT_CAPACITY = 16;
    private Entry<K, V>[] buckets;
    private int size = 0;

    public MyHashMap() {
        buckets = new Entry[DEFAULT_CAPACITY];
    }

    private int hash(K key) {
        return key == null ? 0 : Math.abs(key.hashCode()) % buckets.length;
    }

Step 3: put() Method

    public void put(K key, V value) {
        int index = hash(key);
        Entry<K, V> existing = buckets[index];

        while (existing != null) {
            if ((key == null && existing.key == null) || (key != null && key.equals(existing.key))) {
                existing.value = value;
                return;
            }
            existing = existing.next;
        }

        Entry<K, V> newEntry = new Entry<>(key, value);
        newEntry.next = buckets[index];
        buckets[index] = newEntry;
        size++;
    }

Step 4: get() Method

    public V get(K key) {
        int index = hash(key);
        Entry<K, V> current = buckets[index];

        while (current != null) {
            if ((key == null && current.key == null) || (key != null && key.equals(current.key))) {
                return current.value;
            }
            current = current.next;
        }
        return null;
    }

Step 5: remove() Method (Optional)

    public void remove(K key) {
        int index = hash(key);
        Entry<K, V> current = buckets[index];
        Entry<K, V> prev = null;

        while (current != null) {
            if ((key == null && current.key == null) || (key != null && key.equals(current.key))) {
                if (prev == null) buckets[index] = current.next;
                else prev.next = current.next;
                size--;
                return;
            }
            prev = current;
            current = current.next;
        }
    }
}

Building a Custom HashSet

public class MyHashSet<E> {
    private static final Object DUMMY = new Object();
    private final MyHashMap<E, Object> map = new MyHashMap<>();

    public void add(E key) {
        map.put(key, DUMMY);
    }

    public boolean contains(E key) {
        return map.get(key) != null;
    }

    public void remove(E key) {
        map.remove(key);
    }
}

Performance and Big-O Complexity

Operation Average Time Worst-case (many collisions)
put() O(1) O(n)
get() O(1) O(n)
remove() O(1) O(n)
  • Collision handling determines performance ceiling
  • Java 8+ uses treeified buckets after a threshold

Real-World Use Cases

  • Interview prep and algorithm practice
  • Embedded systems where full Java collections aren’t available
  • Teaching or educational platforms
  • Low-GC custom cache or symbol table

Internal Memory Model

  • Array of Entry<K, V> → buckets
  • Each bucket → linked list of collisions
  • Resizing logic usually doubles the capacity when a threshold (load factor) is reached

Java Version Tracker

📌 What’s New in Java?

  • Java 8
    • Tree-based buckets after 8 collisions
    • Lambdas and functional patterns for iterating
  • Java 9
    • Map.of() for small immutable maps
  • Java 10
    • Type inference (var) for cleaner code
  • Java 21
    • Better memory layout optimizations under the hood

Best Practices

  • Always override equals() and hashCode() correctly for keys
  • Keep load factor between 0.5 to 0.75 for performance
  • Use prime bucket sizes to reduce collisions (optional)
  • Consider using open addressing for high-read scenarios

Anti-Patterns

  • Forgetting to resize — leads to performance degradation
  • Poor hash function — causes clustering
  • Using mutable objects as keys — breaks equality and hash logic

Refactoring Legacy Code

  • Replace inefficient List-based lookups with custom HashMap
  • Use HashSet semantics for deduplication or uniqueness checks
  • Wrap existing classes in a Map interface for testing or logging

Conclusion and Key Takeaways

  • Creating a custom HashMap or HashSet boosts understanding of hashing, buckets, and collision resolution
  • It’s not just for fun — it’s vital for interviews, performance tuning, and building low-level libraries
  • Be mindful of correct hash code implementation, collision handling, and memory management

FAQ – Custom HashMap and HashSet

  1. Is Java’s HashMap faster than a custom one?
    Yes — it's highly optimized and battle-tested.

  2. Can I build a resizable map?
    Yes. Track size and rehash when load factor exceeds threshold.

  3. What is treeification?
    Replacing linked list buckets with balanced trees to improve worst-case lookup time.

  4. How does HashSet work internally?
    It wraps a HashMap with dummy values.

  5. Can I use null keys in my custom map?
    Yes, but handle null.hashCode() safely.

  6. Should I use open addressing instead of chaining?
    It depends — open addressing has better cache locality but is complex to implement.

  7. Can I use generics in custom HashMap?
    Yes — always prefer <K, V> for type safety.

  8. What if two keys have the same hashCode()?
    They go into the same bucket — equality is resolved via equals().

  9. Is custom map thread-safe?
    No. Add synchronization or use concurrent alternatives.

  10. Why learn to build this manually?
    It deepens understanding and prepares you for system design interviews.