Back to Blog
Bloom Filters: Space-Efficient Username Availability Check
system-designdata-structuresalgorithmsbackend

Bloom Filters: Space-Efficient Username Availability Check

Learn how Bloom Filters provide a space-efficient probabilistic data structure for checking username availability across millions of users with O(1) lookup time.

Introduction

Imagine you're building a website with 20 million users, and a new user comes to sign up. They want to register with the username @batman. How do you efficiently check if this username is already taken?

This seemingly simple problem becomes a significant engineering challenge at scale. Let's explore different solutions and understand why Bloom Filters are the elegant answer to this problem.

The Problem: Username Availability at Scale

When a new user tries to sign up with a username like @batman, we need to:

  1. Check our database of 20 million existing users
  2. Determine if @batman already exists
  3. Return the result quickly to provide a good user experience

The Naive Approach: Direct Database Query

The straightforward solution is to query the database directly:

SELECT * FROM users WHERE username = 'batman';

But what happens when 5,000 users come for signup simultaneously?

  • Database and server get more load
  • Performance degrades significantly
  • Time complexity: O(n) for each lookup
  • This is bad performance-wise

Solution 1: HashMap

Our first optimization is to use a HashMap to store all usernames:

const usernameMap = {
  '@batman': true,
  '@superman': true,
  '@wonderwoman': true,
  // ... millions more entries
};

HashMap Lookup

When a user enters their desired username, we simply check:

// O(1) lookup time
const isUsernameTaken = usernameMap[username];

Advantages:

  • Lookup time is O(1) - constant time!
  • Much faster than database queries

Disadvantages:

  • Space will escalate dramatically
  • If 1,000 users are added every day → 1,000 new entries daily
  • A lot of space will be needed to store all usernames

For 20 million users with average username length of 15 characters:

  • Each entry ≈ 20 bytes (username + boolean + overhead)
  • Total memory ≈ 400 MB just for usernames!

Solution 2: Bloom Filters

This is where Bloom Filters come to the rescue. A Bloom Filter is a space-efficient probabilistic data structure that works on the principle of false positives.

Key Characteristics

  • If username is taken → Bloom Filter will say "definitely taken"
  • If username is not taken → Bloom Filter will say "maybe available"

Wait, why "maybe"? Let's understand the two scenarios:

Scenario 1: Username IS Taken (@batman exists)

@batman (Taken) → Bloom Filter says "Taken"

This is correct! If the Bloom Filter says it's taken, it IS taken for sure.

Scenario 2: Username is NOT Taken (@batman1 doesn't exist)

@batman1 (Not Taken) → Bloom Filter might say "Taken"

This is okay! A false positive means we might reject an available username, but we'll never allow a duplicate. The user can simply try another username.

Why False Positives are Acceptable

Actual StateBloom Filter SaysResultAcceptable?
TakenTakenCorrect✅ Yes
TakenNot TakenWould be BAD❌ Never
Not TakenTakenFalse Positive✅ Yes
Not TakenNot TakenCorrect✅ Yes

The critical point: Bloom Filters NEVER produce false negatives. If something is in the set, it will always be detected.

How Bloom Filters Work

A Bloom Filter consists of two main components:

  1. Hash Functions - Convert input to array indices
  2. Bit Array (Buckets) - Store presence information as bits

The Bit Array

Let's take a simple example with a 10-bucket bit array:

Index:  0   1   2   3   4   5   6   7   8   9
Bits:  [0] [0] [0] [0] [0] [0] [0] [0] [0] [0]

Each bucket stores just a single bit (0 or 1), making it extremely space-efficient.

Adding a Username: @batman

When we add @batman, the hash function might return indices: 0, 2, 4, 8

hash('@batman') → [0, 2, 4, 8]

We set these bits to 1:

Index:  0   1   2   3   4   5   6   7   8   9
Bits:  [1] [0] [1] [0] [1] [0] [0] [0] [1] [0]
        ↑       ↑       ↑               ↑

Adding Another Username: @superman

Hash function returns: 1, 4, 6, 8

hash('@superman') → [1, 4, 6, 8]

Updated bit array:

Index:  0   1   2   3   4   5   6   7   8   9
Bits:  [1] [1] [1] [0] [1] [0] [1] [0] [1] [0]
            ↑               ↑
            (new)           (new)

Checking Username Availability

Check: Is @superman available?

hash('@superman') → [1, 4, 6, 8]

Check each index:

  • Index 1 → 1 (taken)
  • Index 4 → 1 (taken)
  • Index 6 → 1 (taken)
  • Index 8 → 1 (taken)

All bits are 1 → Username is TAKEN

Check: Is @lemon available?

hash('@lemon') → [2, 3, 4, 7]

Check each index:

  • Index 2 → 1 (taken by @batman)
  • Index 3 → 0 (not set!)

Even one 0 means the username is definitely available! ✅

Check: Is @appleuser available? (False Positive Example)

hash('@appleuser') → [1, 2, 6, 8]

Let's check step by step:

  • Index 1 → 1 (taken by @superman)
  • Index 2 → 1 (taken by @batman)
  • Index 6 → 1 (taken by @superman)
  • Index 8 → 1 (taken by @batman & @superman)

All bits are 1 → Bloom Filter says "Taken"

But wait! We know @appleuser was never added. This is a false positive!

Why False Positives Happen

Since we know there is no @appleuser in our hash function, but all bits at positions [1, 2, 6, 8] are taken by other usernames, the Bloom Filter incorrectly reports it as taken.

This is the reason it's called a false positive.

Reducing False Positives

The false positive rate depends on:

  1. Bucket size - More buckets = fewer collisions
  2. Number of hash functions - More functions = better distribution
  3. Number of elements - More elements = more collisions

Impact of Bucket Size

Bucket SizeFalse Positive Rate
10High collisions
1,000Lower collisions
1,000,000Very low collisions

With a bucket size of 1,000 instead of 10:

  • Number of collisions decreases
  • We can insert more users reliably
  • Still much smaller than storing actual usernames!

Space Efficiency: The Real Magic

For 20 million users:

ApproachSpace Required
HashMap~400 MB
Bloom Filter~1 MB

A Bloom Filter with 1% false positive rate for 20 million items needs only about ~20 MB of space!

Real-World Applications

Bloom Filters are used everywhere:

  1. Chrome Browser - Checks if a URL is malicious against a list of known bad URLs
  2. Gmail - Spam detection
  3. Databases - Check if a key exists before expensive disk lookups (LSM trees)
  4. CDNs - Avoid caching one-hit-wonders
  5. Blockchain - Bitcoin uses Bloom Filters for wallet synchronization

Implementation in JavaScript

Here's a simple Bloom Filter implementation:

class BloomFilter {
  constructor(size = 1000, hashCount = 3) {
    this.size = size;
    this.hashCount = hashCount;
    this.bitArray = new Array(size).fill(0);
  }

  // Simple hash functions
  hash(str, seed) {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      hash = (hash * seed + str.charCodeAt(i)) % this.size;
    }
    return Math.abs(hash);
  }

  // Get all hash positions for a string
  getHashPositions(str) {
    const positions = [];
    for (let i = 0; i < this.hashCount; i++) {
      positions.push(this.hash(str, i * 10 + 1));
    }
    return positions;
  }

  // Add an element
  add(element) {
    const positions = this.getHashPositions(element);
    positions.forEach((pos) => {
      this.bitArray[pos] = 1;
    });
  }

  // Check if element might exist
  mightContain(element) {
    const positions = this.getHashPositions(element);
    return positions.every((pos) => this.bitArray[pos] === 1);
  }
}

// Usage
const usernameFilter = new BloomFilter(10000, 4);

// Add existing usernames
usernameFilter.add('@batman');
usernameFilter.add('@superman');

// Check availability
console.log(usernameFilter.mightContain('@batman')); // true (definitely taken)
console.log(usernameFilter.mightContain('@wonderwoman')); // false (definitely available)
console.log(usernameFilter.mightContain('@newuser')); // might be false positive

When to Use Bloom Filters

Use Bloom Filters when:

  • Space efficiency is critical
  • False positives are acceptable
  • You need O(1) lookup time
  • The dataset is large

Don't use Bloom Filters when:

  • You need to delete elements (use Counting Bloom Filter instead)
  • False positives are unacceptable
  • You need to retrieve actual values (Bloom Filters only check membership)

Conclusion

Bloom Filters are a brilliant example of trading accuracy for efficiency. By accepting a small probability of false positives, we gain:

  • O(1) constant time lookups
  • Massive space savings (often 10x or more)
  • Scalability for millions or billions of elements

Next time you're designing a system that needs to check membership in a large set, consider whether a Bloom Filter might be the perfect solution!


Have questions about Bloom Filters or other system design concepts? Feel free to reach out!

Related Posts

System Design Behind Multi-Conference Video Calls

System Design Behind Multi-Conference Video Calls

Dive deep into the architectural patterns powering popular video conferencing apps like Google Meet and Zoom. Learn about WebRTC, Peer-to-Peer (P2P), Mesh P2P, Multi-point Control Unit (MCU), and the highly scalable Selective Forwarding Unit (SFU) architectures.

system-designwebrtcvideo-conferencing+1 more
Read More

Design & Developed by Saket Kothari
© 2026. All rights reserved.