The Most Underused Data Structure in Python
If you’ve been writing Python for a while, you’ve almost certainly written some version of this pattern: loop through a list, check if a key exists in a dictionary, and increment a count. It works, but it’s verbose and easy to get wrong. Python’s standard library has a better tool hiding in plain sight — collections.Counter. It’s a dictionary subclass purpose-built for counting hashable objects, and once you understand what it can do, you’ll wonder how you ever managed without it.
In this article, we’ll go well beyond the basics. You’ll learn how to create counters from different data sources, use arithmetic and set operations on them, solve real-world problems like frequency analysis and inventory tracking, and avoid common pitfalls that trip up intermediate developers.
What Is collections.Counter?
Counter is a subclass of dict that lives in the collections module. Instead of manually building frequency maps, you hand it any iterable and it returns a dictionary-like object where keys are elements and values are their counts. It also comes with a surprisingly rich set of methods for querying, combining, and manipulating counted data.
Let’s start with the manual approach most developers default to:
# The manual way
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
freq = {}
for word in words:
if word in freq:
freq[word] += 1
else:
freq[word] = 1
print(freq)
# {'apple': 3, 'banana': 2, 'cherry': 1}
Now here’s the same thing with Counter:
from collections import Counter
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
freq = Counter(words)
print(freq)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1})
Two lines of code replace eight. The result is identical, but Counter also gives you access to powerful methods that a plain dictionary doesn’t have. Let’s explore them.
Creating Counters from Different Sources
Counter is flexible about what you feed it. You can create one from a list, a string, a dictionary of pre-existing counts, or even keyword arguments. Each approach is useful in different contexts.
from collections import Counter
# From a string (counts each character)
letter_freq = Counter("mississippi")
print(letter_freq)
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
# From a dictionary of existing counts
inventory = Counter({"apples": 12, "oranges": 5, "bananas": 8})
print(inventory)
# Counter({'apples': 12, 'bananas': 8, 'oranges': 5})
# From keyword arguments
colors = Counter(red=4, blue=2, green=1)
print(colors)
# Counter({'red': 4, 'blue': 2, 'green': 1})
Notice that Counter always orders elements from most common to least common in its output. This ordering behavior is built in and makes it immediately useful for frequency analysis.
The most_common() Method
One of Counter’s most practical features is the most_common(n) method, which returns the n most frequent elements as a list of (element, count) tuples. If you omit n, you get all elements sorted by frequency.
from collections import Counter
log_levels = [
"INFO", "WARNING", "INFO", "ERROR", "INFO",
"DEBUG", "WARNING", "INFO", "ERROR", "INFO",
"INFO", "DEBUG", "WARNING", "ERROR", "INFO",
]
level_counts = Counter(log_levels)
# Get the 2 most frequent log levels
print(level_counts.most_common(2))
# [('INFO', 7), ('WARNING', 3)]
# Get all elements sorted by frequency
print(level_counts.most_common())
# [('INFO', 7), ('WARNING', 3), ('ERROR', 3), ('DEBUG', 2)]
This is incredibly handy for dashboards, log analysis, or any scenario where you need a quick “top N” summary. The method returns a list of tuples, so you can easily unpack them or iterate over them in a for loop.
Arithmetic with Counters
This is where Counter really separates itself from a plain dictionary. You can add, subtract, intersect, and union counters using standard operators. When you add two counters, their counts are summed. When you subtract, counts are reduced. The & operator gives you minimum counts (intersection), while | gives maximum counts (union).
from collections import Counter
store_a = Counter(apples=5, oranges=3, bananas=2)
store_b = Counter(apples=3, oranges=7, grapes=4)
# Addition: combine inventories
print(store_a + store_b)
# Counter({'oranges': 10, 'apples': 8, 'grapes': 4, 'bananas': 2})
# Subtraction: what does store_a have that store_b doesn't?
# Note: negative counts are automatically dropped
print(store_a - store_b)
# Counter({'bananas': 2, 'apples': 2})
# Intersection (minimum counts): items in common
print(store_a & store_b)
# Counter({'apples': 3, 'oranges': 3})
# Union (maximum counts): best of both
print(store_a | store_b)
# Counter({'oranges': 7, 'apples': 5, 'grapes': 4, 'bananas': 2})
A key detail worth noting: the + and — operators only keep positive counts. If a subtraction results in zero or negative values, those keys are silently removed from the result. This is usually the behavior you want, but if you need to preserve negative counts, use the subtract() method instead, which modifies the counter in place.
from collections import Counter
stock = Counter(apples=5, oranges=3)
sold = Counter(apples=7, oranges=1)
# subtract() keeps negative counts
stock.subtract(sold)
print(stock)
# Counter({'oranges': 2, 'apples': -2})
# The - operator would have dropped 'apples' entirely
print(Counter(apples=5, oranges=3) - Counter(apples=7, oranges=1))
# Counter({'oranges': 2})
The subtract() method is useful in inventory management where overselling is a real scenario you need to track rather than ignore.
Real-World Use Case: Word Frequency Analysis
Let’s put Counter to work on a practical task: analyzing the most frequent words in a block of text. This kind of problem comes up in search engines, content analysis, and natural language processing pipelines.
from collections import Counter
import re
text = """
Python is a versatile programming language. Python is used
in web development, data science, and automation. Many developers
choose Python because Python has a simple and readable syntax.
"""
# Normalize to lowercase and extract words using regex
words = re.findall(r'[a-z]+', text.lower())
word_counts = Counter(words)
# Top 5 most frequent words
for word, count in word_counts.most_common(5):
print(f"{word:>15}: {count}")
# Output:
# python: 4
# is: 2
# a: 2
# and: 2
# versatile: 1
The combination of re.findall() for tokenization and Counter for frequency counting makes this a concise yet powerful text analysis pipeline. In a real application, you’d likely also filter out stop words (common words like “the”, “is”, “a”) to get more meaningful results.
Useful Counter Methods You Should Know
Beyond most_common(), Counter has a few more methods that are worth knowing. The elements() method returns an iterator that repeats each element as many times as its count, which is useful for reconstructing the original data. The update() method lets you add counts from another iterable or counter. And total(), added in Python 3.10, returns the sum of all counts.
from collections import Counter
c = Counter(a=3, b=1, c=2)
# elements() - repeats each element by its count
print(sorted(c.elements()))
# ['a', 'a', 'a', 'b', 'c', 'c']
# update() - add counts from another iterable
c.update(['a', 'b', 'b', 'd'])
print(c)
# Counter({'a': 4, 'b': 3, 'c': 2, 'd': 1})
# total() - sum of all counts (Python 3.10+)
print(c.total())
# 10
# Accessing missing keys returns 0 (not KeyError!)
print(c['z'])
# 0
That last detail is easy to miss but extremely useful. With a regular dictionary, accessing a missing key raises a KeyError. With Counter, it simply returns 0. This means you never need to check if a key exists before reading its count — you can safely use it in comparisons and calculations without any guard clauses.
Common Pitfalls to Watch Out For
Counter is intuitive, but there are a few gotchas that catch intermediate developers off guard.
First, Counter only works with hashable objects. You can count strings, numbers, and tuples, but not lists or dictionaries. If you try to pass a list of lists, you’ll get a TypeError.
Second, be careful with the del statement. While Counter allows zero and negative counts when you set them directly, using del on a key removes it entirely rather than setting its count to zero.
Third, remember that Counter is a dict subclass, so passing it to functions that expect a regular dict will work — but the reverse is not always true. Methods like most_common() and elements() are only available on Counter objects.
from collections import Counter
# Pitfall 1: Unhashable types
try:
Counter([[1, 2], [3, 4], [1, 2]])
except TypeError as e:
print(f"Error: {e}")
# Error: unhashable type: 'list'
# Fix: convert to tuples first
Counter([tuple(x) for x in [[1, 2], [3, 4], [1, 2]]])
# Counter({(1, 2): 2, (3, 4): 1})
# Pitfall 2: del vs setting to zero
c = Counter(a=3, b=2)
c['a'] = 0 # key still exists with count 0
print('a' in c) # True
del c['b'] # key is completely removed
print('b' in c) # False
collections.Counter is one of those standard library tools that, once you learn it, makes you question why you ever wrote manual counting loops. It replaces boilerplate dictionary code with a clean, expressive interface and gives you powerful operations like arithmetic, set logic, and built-in sorting for free.
You can try the snippets here:
Whether you’re analyzing log files, tracking inventory, processing text, or comparing datasets, Counter should be your first instinct. The next time you catch yourself writing a for loop with an if-else block just to count things, reach for Counter instead. Your code will be shorter, more readable, and less error-prone.
collections.Counter was originally published in ScriptSerpent on Medium, where people are continuing the conversation by highlighting and responding to this story.