Dictionary Performance

Understanding time complexity and performance optimization

Dictionary Performance and Time Complexity

Python dictionaries are implemented as hash tables, making them one of the most efficient data structures for key-based operations. This page explores the performance characteristics of Python dictionaries, including time complexity analysis, internal implementation details, and optimization techniques.

Dictionary Internal Implementation

Python dictionaries are implemented using hash tables, which provide fast lookup, insertion, and deletion operations. Here's how they work:

  • Each key is passed through a hash function that produces a hash value (an integer)
  • This hash value is used to determine where to store the key-value pair in memory
  • When looking up a key, Python computes the hash value and goes directly to that location
  • This allows for constant-time O(1) average-case operations instead of O(n) linear search

Hash Function

Python uses the built-in hash() function to compute hash values for dictionary keys. This function is designed to be fast and produce well-distributed values to minimize collisions.

Time Complexity Analysis

OperationAverage CaseWorst CaseAmortized
Get Item: d[key]O(1)O(n)O(1)
Set Item: d[key] = valueO(1)O(n)O(1)
Delete Item: del d[key]O(1)O(n)O(1)
Check existence: key in dO(1)O(n)O(1)
Iteration: for k in dO(n)O(n)O(n)

Performance Factors

Several factors can affect dictionary performance:

  1. Hash Collisions: When two different keys have the same hash value, lookups take longer
  2. Dictionary Size: Very large dictionaries may experience more collisions
  3. Key Types: Some types have more complex hash functions than others
  4. Python Version: Recent Python versions have optimized dictionary implementations

Hash Collisions

Hash collisions occur when two different keys hash to the same value. Python handles collisions using a technique called "open addressing with probing":

  • When a collision occurs, Python looks for the next available slot
  • This leads to worst-case O(n) performance if many collisions occur
  • In practice, Python's hash function is designed to minimize collisions
  • When the dictionary gets too full, Python automatically resizes it to maintain performance

Note: The worst-case O(n) performance happens only in extremely rare cases when the hash function performs poorly for a specific set of keys.

Performance Benchmarks

Let's see how dictionary operations scale with size:

# Performance comparison for dictionaries of different sizes
import time

def test_dict_performance(size):
    # Create dictionary
    start = time.time()
    d = {i: f"value-{i}" for i in range(size)}
    create_time = time.time() - start
    
    # Lookup (successful)
    start = time.time()
    for i in range(1000):
        value = d[i % size]
    lookup_time = (time.time() - start) / 1000
    
    # Lookup (unsuccessful)
    start = time.time()
    for i in range(1000):
        value = size + i in d
    failed_lookup_time = (time.time() - start) / 1000
    
    return {
        "size": size,
        "create_time": create_time,
        "lookup_time": lookup_time,
        "failed_lookup_time": failed_lookup_time
    }

# Test with different sizes
sizes = [100, 1000, 10000, 100000, 1000000]
results = [test_dict_performance(size) for size in sizes]

for r in results:
    print(f"Size: {r['size']:,}")
    print(f"  Creation time: {r['create_time']:.6f} seconds")
    print(f"  Lookup time: {r['lookup_time']:.9f} seconds")
    print(f"  Failed lookup time: {r['failed_lookup_time']:.9f} seconds")
    print()

The above benchmark demonstrates that dictionary lookups remain fast even as dictionary size increases, confirming the O(1) average-case performance.

Performance Optimization Tips

  1. Choose appropriate keys: Simple immutable types like integers, strings, and tuples of immutable objects make the best keys
  2. Pre-size dictionaries: If you know approximately how many items your dictionary will contain, you can optimize performance by pre-allocating memory
    d = dict.fromkeys(range(1000))
  3. Use dictionary comprehensions for creation instead of repeated assignments
    # Better:
    d = {i: i*i for i in range(1000)}
    
    # Slower:
    d = {}
    for i in range(1000):
        d[i] = i*i
  4. Cache frequently accessed results: Use dictionaries for memoization to speed up recursive functions
  5. Avoid repeated lookups: If you need to use a dictionary value multiple times, store it in a variable first
  6. Use get() with a default instead of checking for existence and then accessing
    # Better:
    value = d.get(key, default_value)
    
    # Slower:
    if key in d:
        value = d[key]
    else:
        value = default_value

Memory Usage

Dictionaries provide fast access but have memory overhead:

  • Python dictionaries typically use more memory than equivalent lists or tuples
  • Each entry requires space for the key, value, and hash value
  • Dictionaries maintain spare capacity to keep collisions rare (usually 1/3 of slots are empty)
  • In Python 3.6+, a more compact implementation is used that saves about 20-25% memory

Measuring Dictionary Size

import sys

# Empty dictionary
empty_dict = {}
print(f"Empty dict size: {sys.getsizeof(empty_dict)} bytes")

# Dictionary with 1000 integer keys
d1 = {i: i for i in range(1000)}
print(f"Dict with 1000 integer keys: {sys.getsizeof(d1)} bytes")

# Dictionary with 1000 string keys
d2 = {f"key-{i}": i for i in range(1000)}
print(f"Dict with 1000 string keys: {sys.getsizeof(d2)} bytes")

# Note: this only measures the dictionary container itself,
# not the size of keys and values

Dictionaries vs. Other Data Structures

OperationDictionaryListSet
Access by index/keyO(1)O(1)N/A
Search for valueO(n)O(n)O(1)
Insert/UpdateO(1)O(1) at end, O(n) elsewhereO(1)
DeleteO(1)O(n)O(1)
Memory UsageHighLowMedium
OrderedYes (Python 3.7+)YesNo
MutableYesYesYes
Duplicate ValuesYesYesNo

When to use a dictionary: Use dictionaries when you need fast lookup by key, when the data naturally forms key-value pairs, or when you need to frequently modify values associated with keys.