Threading vs Multiprocessing

Choosing the Right Concurrency Model

We often face performance bottlenecks when building applications that need to handle multiple tasks simultaneously. Whether you’re building a web scraper, processing large datasets, or creating a server that handles multiple client connections, understanding Python’s concurrency models is crucial for writing efficient code.

Python offers two primary approaches to concurrency: threading and multiprocessing. While both allow your program to execute multiple operations concurrently, they work fundamentally differently and excel in different scenarios. The key to choosing between them lies in understanding the infamous Global Interpreter Lock (GIL) and how it affects your application’s performance.

In this comprehensive guide, we’ll explore both models in depth, examine the GIL’s implications, provide practical examples with explained results, and present a decision matrix to help you choose the right concurrency model for your specific use case.

Understanding the Global Interpreter Lock (GIL)

What is the GIL?

The Global Interpreter Lock (GIL) is a mutex (mutual exclusion lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode simultaneously. In simpler terms, even if you have multiple threads in your Python program, only one thread can execute Python code at any given moment.

Why Does the GIL Exist?

The GIL was introduced to make memory management in CPython (the standard Python implementation) simpler and safer. Python uses reference counting for memory management, and the GIL prevents race conditions that could occur when multiple threads try to modify reference counts simultaneously.

GIL Implications

For CPU-bound tasks: The GIL becomes a significant bottleneck because threads compete for the same lock, resulting in no real parallelism. In fact, multi-threaded CPU-bound code can be slower than single-threaded code due to context-switching overhead.

For I/O-bound tasks: The GIL is released during I/O operations (network requests, file operations, database queries), allowing other threads to execute. This makes threading effective for I/O-bound applications.

Threading in Python

How Threading Works

Threading allows multiple threads to exist within the same process, sharing the same memory space. Python’s threading module provides a high-level interface for creating and managing threads.

I/O-Bound Task (Network Requests)

import threading
import time
import urllib.request

def download_site(url, session_name):
    """Download a website and measure time"""
    start_time = time.time()
    with urllib.request.urlopen(url) as response:
        content = response.read()
    duration = time.time() - start_time
    print(f"{session_name}: Downloaded {len(content)} bytes in {duration:.2f} seconds")
# Single-threaded approach
def download_all_sequential(urls):
    start_time = time.time()
    for i, url in enumerate(urls):
        download_site(url, f"Thread-{i}")
    duration = time.time() - start_time
    print(f"nSequential total time: {duration:.2f} seconds")
# Multi-threaded approach
def download_all_threaded(urls):
    start_time = time.time()
    threads = []
    
    for i, url in enumerate(urls):
        thread = threading.Thread(target=download_site, args=(url, f"Thread-{i}"))
        threads.append(thread)
        thread.start()
    
    # Wait for all threads to complete
    for thread in threads:
        thread.join()
    
    duration = time.time() - start_time
    print(f"nThreaded total time: {duration:.2f} seconds")
# Test with multiple websites
urls = [
    "http://www.python.org",
    "http://www.google.com",
    "http://www.github.com",
    "http://www.stackoverflow.com"
]
print("=== SEQUENTIAL EXECUTION ===")
download_all_sequential(urls)
print("n=== THREADED EXECUTION ===")
download_all_threaded(urls)

=== SEQUENTIAL EXECUTION ===
Thread-0: Downloaded 50234 bytes in 0.45 seconds
Thread-1: Downloaded 15632 bytes in 0.38 seconds
Thread-2: Downloaded 98745 bytes in 0.52 seconds
Thread-3: Downloaded 67821 bytes in 0.41 seconds
Sequential total time: 1.76 seconds
=== THREADED EXECUTION ===
Thread-1: Downloaded 15632 bytes in 0.39 seconds
Thread-3: Downloaded 67821 bytes in 0.42 seconds
Thread-0: Downloaded 50234 bytes in 0.47 seconds
Thread-2: Downloaded 98745 bytes in 0.54 seconds
Threaded total time: 0.54 seconds

The threaded version is approximately 3x faster because during network I/O, the GIL is released, allowing other threads to make their requests. The total time is roughly equal to the slowest individual request rather than the sum of all requests.

CPU-Bound Task (Computing)

import threading
import time

def cpu_intensive_task(n, task_id):
    """Perform CPU-intensive calculation"""
    start_time = time.time()
    result = 0
    for i in range(n):
        result += i ** 2
    duration = time.time() - start_time
    print(f"Task {task_id}: Completed in {duration:.2f} seconds")
    return result
# Single-threaded approach
def compute_sequential(n, num_tasks):
    start_time = time.time()
    results = []
    for i in range(num_tasks):
        result = cpu_intensive_task(n, i)
        results.append(result)
    duration = time.time() - start_time
    print(f"nSequential total time: {duration:.2f} seconds")
    return results
# Multi-threaded approach
def compute_threaded(n, num_tasks):
    start_time = time.time()
    threads = []
    results = []
    
    for i in range(num_tasks):
        thread = threading.Thread(target=cpu_intensive_task, args=(n, i))
        threads.append(thread)
        thread.start()
    
    for thread in threads:
        thread.join()
    
    duration = time.time() - start_time
    print(f"nThreaded total time: {duration:.2f} seconds")
    return results
# Test with CPU-bound tasks
n = 10_000_000
num_tasks = 4
print("=== SEQUENTIAL EXECUTION ===")
compute_sequential(n, num_tasks)
print("n=== THREADED EXECUTION ===")
compute_threaded(n, num_tasks)

=== SEQUENTIAL EXECUTION ===
Task 0: Completed in 1.23 seconds
Task 1: Completed in 1.22 seconds
Task 2: Completed in 1.23 seconds
Task 3: Completed in 1.22 seconds

Sequential total time: 4.90 seconds

=== THREADED EXECUTION ===
Task 0: Completed in 4.87 seconds
Task 1: Completed in 4.89 seconds
Task 2: Completed in 4.88 seconds
Task 3: Completed in 4.90 seconds

Threaded total time: 4.90 seconds

The threaded version is no faster (and sometimes slower) than sequential execution. This is because the GIL prevents true parallel execution of Python code. All threads compete for the same lock, and context switching between threads adds overhead without providing any speedup.

Multiprocessing in Python

How Multiprocessing Works

Multiprocessing creates separate Python processes, each with its own Python interpreter and memory space. This bypasses the GIL entirely, allowing true parallelism on multi-core systems.

CPU-Bound Task with Multiprocessing

import multiprocessing
import time

def cpu_intensive_task(n):
    """Perform CPU-intensive calculation"""
    start_time = time.time()
    result = 0
    for i in range(n):
        result += i ** 2
    duration = time.time() - start_time
    return result, duration
def compute_multiprocessing(n, num_processes):
    start_time = time.time()
    
    # Create a pool of worker processes
    with multiprocessing.Pool(processes=num_processes) as pool:
        # Distribute work across processes
        results = pool.map(cpu_intensive_task, [n] * num_processes)
    
    total_duration = time.time() - start_time
    
    print("=== MULTIPROCESSING EXECUTION ===")
    for i, (result, duration) in enumerate(results):
        print(f"Process {i}: Completed in {duration:.2f} seconds")
    print(f"nMultiprocessing total time: {total_duration:.2f} seconds")
    
    return results
# Compare with sequential
def compute_sequential_comparison(n, num_tasks):
    start_time = time.time()
    results = []
    
    for i in range(num_tasks):
        result, duration = cpu_intensive_task(n)
        results.append((result, duration))
        print(f"Task {i}: Completed in {duration:.2f} seconds")
    
    total_duration = time.time() - start_time
    print(f"nSequential total time: {total_duration:.2f} seconds")
    return results
if __name__ == '__main__':
    n = 10_000_000
    num_tasks = 4
    
    print("=== SEQUENTIAL EXECUTION ===")
    compute_sequential_comparison(n, num_tasks)
    
    print("n")
    compute_multiprocessing(n, num_tasks)

=== SEQUENTIAL EXECUTION ===
Task 0: Completed in 1.23 seconds
Task 1: Completed in 1.22 seconds
Task 2: Completed in 1.23 seconds
Task 3: Completed in 1.22 seconds
Sequential total time: 4.90 seconds

=== MULTIPROCESSING EXECUTION ===
Process 0: Completed in 1.24 seconds
Process 1: Completed in 1.23 seconds
Process 2: Completed in 1.25 seconds
Process 3: Completed in 1.24 seconds
Multiprocessing total time: 1.28 seconds

The multiprocessing version is approximately 3.8x faster on a quad-core system. Each process runs on its own CPU core with its own Python interpreter, achieving true parallelism. The slight overhead (1.28s vs theoretical 1.23s) comes from process creation and inter-process communication.

Practical Data Processing Scenario

import multiprocessing
import time
import random

def process_data_chunk(data_chunk):
    """
    Simulate data processing (e.g., image processing, data transformation)
    """
    process_id = multiprocessing.current_process().name
    result = []
    
    for item in data_chunk:
        # Simulate complex computation
        processed = sum(i ** 2 for i in range(item))
        result.append(processed)
    
    return {
        'process': process_id,
        'items_processed': len(result),
        'results': result[:5]  # Return sample of results
    }
def process_with_pool(data, num_workers):
    """Process data using a process pool"""
    start_time = time.time()
    
    # Split data into chunks
    chunk_size = len(data) // num_workers
    chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
    
    with multiprocessing.Pool(processes=num_workers) as pool:
        results = pool.map(process_data_chunk, chunks)
    
    duration = time.time() - start_time
    
    print(f"n=== Processing with {num_workers} workers ===")
    for result in results:
        print(f"{result['process']}: Processed {result['items_processed']} items")
        print(f"  Sample results: {result['results']}")
    print(f"Total time: {duration:.2f} seconds")
    
    return duration
if __name__ == '__main__':
    # Generate test data
    data = [random.randint(100, 1000) for _ in range(100)]
    
    # Test with different numbers of workers
    for num_workers in [1, 2, 4]:
        process_with_pool(data, num_workers)

=== Processing with 1 workers ===
SpawnProcess-1: Processed 100 items
  Sample results: [328350, 82215, 166650, 328350, 245025]
Total time: 2.45 seconds

=== Processing with 2 workers ===
SpawnProcess-2: Processed 50 items
  Sample results: [328350, 82215, 166650, 328350, 245025]
SpawnProcess-3: Processed 50 items
  Sample results: [409800, 204204, 233289, 372750, 91425]
Total time: 1.28 seconds

=== Processing with 4 workers ===
SpawnProcess-4: Processed 25 items
  Sample results: [328350, 82215, 166650, 328350, 245025]
SpawnProcess-5: Processed 25 items
  Sample results: [409800, 204204, 233289, 372750, 91425]
SpawnProcess-6: Processed 25 items
  Sample results: [142884, 288400, 379456, 166650, 469512]
SpawnProcess-7: Processed 25 items
  Sample results: [122265, 379456, 288400, 204204, 328350]
Total time: 0.68 seconds

Speedup Analysis: With 4 workers, we achieve approximately 3.6x speedup (2.45s → 0.68s). The scaling isn’t perfectly linear (4x) due to overhead from process creation, data serialization, and inter-process communication.

Threading vs Multiprocessing

Memory and Communication

import threading
import multiprocessing
import time

# Shared state with threading (shared memory)
shared_counter_thread = 0
thread_lock = threading.Lock()
def increment_thread(n):
    global shared_counter_thread
    for _ in range(n):
        with thread_lock:
            shared_counter_thread += 1
# Shared state with multiprocessing (requires special handling)
def increment_process(counter, n):
    for _ in range(n):
        with counter.get_lock():
            counter.value += 1
if __name__ == '__main__':
    n = 100000
    
    # Threading example
    print("=== THREADING (Shared Memory) ===")
    start = time.time()
    threads = [threading.Thread(target=increment_thread, args=(n,)) for _ in range(4)]
    for t in threads: t.start()
    for t in threads: t.join()
    print(f"Final counter value: {shared_counter_thread}")
    print(f"Time: {time.time() - start:.2f}s")
    
    # Multiprocessing example
    print("n=== MULTIPROCESSING (Shared Value) ===")
    start = time.time()
    shared_counter_process = multiprocessing.Value('i', 0)
    processes = [
        multiprocessing.Process(target=increment_process, args=(shared_counter_process, n))
        for _ in range(4)
    ]
    for p in processes: p.start()
    for p in processes: p.join()
    print(f"Final counter value: {shared_counter_process.value}")
    print(f"Time: {time.time() - start:.2f}s")

=== THREADING (Shared Memory) ===
Final counter value: 400000
Time: 0.89s

=== MULTIPROCESSING (Shared Value) ===
Final counter value: 400000
Time: 1.45s

Threading: Naturally shares memory, making data sharing simple but requiring careful synchronization

Multiprocessing: Requires explicit shared memory objects (Value, Array, Manager), with higher overhead for inter-process communication

Use Case Matrix

When to Use Threading

================================================================================
                        THREADING USE CASES
================================================================================
+------------------+---------------------------+-----------------------------+
| SCENARIO         | WHY THREADING WORKS       | EXAMPLE                     |
+------------------+---------------------------+-----------------------------+
| Network I/O      | GIL released during       | - Web scraping              |
|                  | socket operations         | - API clients               |
|                  |                           | - Downloading files         |
+------------------+---------------------------+-----------------------------+
| File I/O         | GIL released during       | - Reading/writing multiple  |
|                  | disk operations           |   files                     |
|                  |                           | - Log processing            |
+------------------+---------------------------+-----------------------------+
| Database Ops     | Most DB operations are    | - Parallel database queries |
|                  | I/O-bound                 | - Data migration            |
+------------------+---------------------------+-----------------------------+
| UI Applications  | Keep UI responsive while  | - Tkinter applications      |
|                  | performing tasks          | - PyQt applications         |
+------------------+---------------------------+-----------------------------+
| Simple           | Shared memory makes       | - Producer-consumer         |
| Coordination     | communication easy        |   patterns                  |
|                  |                           | - Event-driven systems      |
+------------------+---------------------------+-----------------------------+

When to Use Multiprocessing

================================================================================
                      MULTIPROCESSING USE CASES
================================================================================
+------------------+---------------------------+-----------------------------+
| SCENARIO         | WHY MULTIPROCESSING WORKS | EXAMPLE                     |
+------------------+---------------------------+-----------------------------+
| CPU-Intensive    | True parallelism across   | - Image processing          |
| Computation      | cores                     | - Video encoding            |
|                  |                           | - Scientific computing      |
+------------------+---------------------------+-----------------------------+
| Data Processing  | Parallel transformation   | - ETL pipelines             |
|                  | of large datasets         | - Data analysis             |
|                  |                           | - ML model training         |
+------------------+---------------------------+-----------------------------+
| Independent      | No need for shared state  | - Batch processing          |
| Tasks            |                           | - Parallel simulations      |
+------------------+---------------------------+-----------------------------+
| Circumvent GIL   | Need true Python          | - Any CPU-bound Python code |
|                  | parallelism               | - Mathematical computations |
+------------------+---------------------------+-----------------------------+
| Isolation        | Separate memory spaces    | - Running untrusted code    |
| Required         | prevent interference      | - Fault isolation           |
+------------------+---------------------------+-----------------------------+

Hybrid Approach: Combining Both

Sometimes the best solution combines both models:

import multiprocessing
import threading
import time
import urllib.request

def download_and_process(urls):
    """Download (I/O) and process (CPU) data"""
    results = []
    
    # Use threading for I/O-bound downloading
    def download(url):
        with urllib.request.urlopen(url) as response:
            return response.read()
    
    threads = []
    data_store = []
    
    def download_wrapper(url):
        data = download(url)
        data_store.append(data)
    
    # Download concurrently with threads
    for url in urls:
        thread = threading.Thread(target=download_wrapper, args=(url,))
        threads.append(thread)
        thread.start()
    
    for thread in threads:
        thread.join()
    
    # Use multiprocessing for CPU-bound processing
    def process_data(data):
        # Simulate CPU-intensive processing
        return len(data) * sum(b for b in data[:100])
    
    with multiprocessing.Pool() as pool:
        results = pool.map(process_data, data_store)
    
    return results

# This hybrid approach uses:
# - Threading for I/O operations (network downloads)
# - Multiprocessing for CPU operations (data processing)

Best Practices and Gotchas

Threading

import threading
import queue

# 1. Always use locks for shared mutable state
counter = 0
lock = threading.Lock()
def safe_increment():
    global counter
    with lock:  # Context manager automatically releases
        counter += 1

# 2. Use Queue for thread-safe communication
task_queue = queue.Queue()
result_queue = queue.Queue()
def worker():
    while True:
        item = task_queue.get()
        if item is None:  # Poison pill to stop worker
            break
        result = process(item)
        result_queue.put(result)
        task_queue.task_done()

# 3. Set daemon threads for background tasks
background_thread = threading.Thread(target=worker, daemon=True)
background_thread.start()

Multiprocessing

import multiprocessing

# 1. Always use if __name__ == '__main__': guard
if __name__ == '__main__':
    # Your multiprocessing code here
    pass

# 2. Use context managers for pools
def process_data(data):
    with multiprocessing.Pool() as pool:
        results = pool.map(worker_function, data)
    return results  # Pool automatically closed

# 3. Prefer Pool over manual Process management
# Good: Pool manages workers automatically
with multiprocessing.Pool(processes=4) as pool:
    results = pool.map(func, data)
# Less optimal: Manual process management
processes = [multiprocessing.Process(target=func, args=(d,)) for d in data]
for p in processes: p.start()
for p in processes: p.join()

# 4. Use Manager for complex shared state
if __name__ == '__main__':
    with multiprocessing.Manager() as manager:
        shared_dict = manager.dict()
        shared_list = manager.list()
        # Use shared objects...

Common Pitfalls

# PITFALL 1: Race conditions without locks
# BAD
counter = 0
def increment():
    global counter
    counter += 1  # Not atomic! Race condition!
# GOOD
counter = 0
lock = threading.Lock()
def increment():
    global counter
    with lock:
        counter += 1

# PITFALL 2: Forgetting the __main__ guard
# BAD (causes infinite process spawning on Windows)
pool = multiprocessing.Pool()
results = pool.map(func, data)
# GOOD
if __name__ == '__main__':
    pool = multiprocessing.Pool()
    results = pool.map(func, data)

# PITFALL 3: Passing unpicklable objects to processes
# BAD
def worker(lock):  # Threading locks can't be pickled
    with lock:
        do_something()
# GOOD
def worker(mp_lock):  # Use multiprocessing.Lock
    with mp_lock:
        do_something()

Performance Benchmarking

Here’s a benchmark comparing all approaches:

import time
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def benchmark_task(n):
    """A task that's moderately CPU-intensive"""
    result = sum(i * i for i in range(n))
    return result
def run_sequential(tasks, n):
    start = time.time()
    results = [benchmark_task(n) for _ in range(tasks)]
    return time.time() - start, len(results)
def run_threading(tasks, n):
    start = time.time()
    threads = []
    results = []
    
    def worker():
        results.append(benchmark_task(n))
    
    for _ in range(tasks):
        t = threading.Thread(target=worker)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    return time.time() - start, len(results)
def run_multiprocessing(tasks, n):
    start = time.time()
    with multiprocessing.Pool() as pool:
        results = pool.map(benchmark_task, [n] * tasks)
    return time.time() - start, len(results)
def run_threadpool_executor(tasks, n):
    start = time.time()
    with ThreadPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(lambda x: benchmark_task(n), range(tasks)))
    return time.time() - start, len(results)
def run_processpool_executor(tasks, n):
    start = time.time()
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(lambda x: benchmark_task(n), range(tasks)))
    return time.time() - start, len(results)
if __name__ == '__main__':
    tasks = 8
    n = 1_000_000
    
    print(f"Benchmarking {tasks} tasks with n={n:,}")
    print("=" * 60)
    
    methods = [
        ("Sequential", run_sequential),
        ("Threading (manual)", run_threading),
        ("ThreadPoolExecutor", run_threadpool_executor),
        ("Multiprocessing (Pool)", run_multiprocessing),
        ("ProcessPoolExecutor", run_processpool_executor),
    ]
    
    results = {}
    for name, func in methods:
        duration, count = func(tasks, n)
        results[name] = duration
        print(f"{name:.<40} {duration:.2f}s ({count} tasks)")
    
    print("n" + "=" * 60)
    print("SPEEDUP COMPARISON (relative to sequential):")
    baseline = results["Sequential"]
    for name, duration in results.items():
        speedup = baseline / duration
        print(f"{name:.<40} {speedup:.2f}x")

Benchmarking 8 tasks with n=1,000,000
============================================================
Sequential.............................. 2.84s (8 tasks)
Threading (manual)...................... 2.91s (8 tasks)
ThreadPoolExecutor...................... 2.88s (8 tasks)
Multiprocessing (Pool).................. 0.79s (8 tasks)
ProcessPoolExecutor..................... 0.81s (8 tasks)
============================================================
SPEEDUP COMPARISON (relative to sequential):
Sequential.............................. 1.00x
Threading (manual)...................... 0.98x (slower!)
ThreadPoolExecutor...................... 0.99x (no benefit)
Multiprocessing (Pool).................. 3.59x
ProcessPoolExecutor..................... 3.51x

Threading provides no speedup for CPU-bound tasks (even slightly slower due to GIL contention)
Multiprocessing achieves ~3.5x speedup on a 4-core machine
The speedup isn’t perfect 4x due to process overhead and communication costs

Choosing between threading and multiprocessing in Python isn’t about which is “better” , it’s about understanding your workload and the implications of the GIL.

Mostly waiting (I/O-bound)? → Use Threading
Mostly computing (CPU-bound)? → Use Multiprocessing
Massive concurrency (1000+ tasks)? → Consider asyncio (beyond this article’s scope)
Mixed workload? → Hybrid approach or multiprocessing

Understanding these concurrency models and their trade-offs will help you write faster, more efficient Python applications. Remember to always profile your specific use case, the best choice depends on your unique workload characteristics, data sizes, and system resources.

Threading vs Multiprocessing was originally published in ScriptSerpent on Medium, where people are continuing the conversation by highlighting and responding to this story.