# Is using “not in” faster than using “in” in Python3?

68
January 01, 2020, at 12:30 PM

Let's say we're solving a simple word count problem. There's a list and we're trying to find the word count of each word occurring in the list. Which pattern is faster here?

``````book_title =  ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}
``````
pattern 1
``````for word in book_title:
if word in word_count_dict:
word_count_dict[word] += 1
else:
word_count_dict[word] = 1
``````
pattern 2
``````for word in book_title:
if word not in word_counter:
word_counter[word] = 1
else:
word_counter[word] += 1
``````

Six of one, half a dozen of the other. They should be roughly equivalent to each other - in computational terms, a `not` operation is nearly negligible (literally the cheapest possible operation), and the `in` operation in a hashtable like a dictionary runs in constant time (either the hash is there, or it's not). If we were dealing with a list, it would run in linear time, but still the between `in` and `not in`. See also computational complexity of python data structures.

So basically, use whichever makes your code more understandable.

That said, have you considered using a `collections.Counter`, a data structure specifically designed for this purpose?

``````import collections
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})
``````

You can typecast a `collections.Counter` to a `dict` if you need to, and in fact `collections.Counter` is a subclass of `dict`. It even has an `.update()` method specifically designed to work with other counters - if you add another book title, just feed it into a `Counter` and then `.update()` the original with that.

Using `dis`, you can look at the bytecode generated for each of your methods.

The only difference I could see running your code through the disassembler was right at `in` and `not in`, where the difference in bytecode was:

`COMPARE_OP 7 (not in)` or `COMPARE_OP 6 (in)`

And afterwards `POP_JUMP_IF_FALSE` (i.e. continue onto the next instruction for this condition)

All in all, both of the methods seems to have the same amount of instructions, regardless of the comparison returning true or false, and therefore most probably executes equally fast.

There might be, however, some underlying optimizations closer to CPU instructions which would cause one or the other methods to be faster, but I would consider that territory out of scope for this question. If this would be the case, then I believe a simple measure of execution time over a larger list would prove which one is faster.

Execution speed of both instructions, beneath Python bytecode, might differ between Python versions, build, OS or architecture. You'd probably be able to make a small change in the Python source code to make one or the other instruction execute faster.

They have approximately the same cost. Think of `not in` operator as being `in` operator applied first and then a logical `not` applied to that result(which is almost negligible).

For confirmation, here's a little experiment you can perform to measure execution time

``````from time import time
book_title =  ['hi']*100000+['there']*10000
word_count_dict1 = {}
word_count_dict2 = {}
start = time()
for word in book_title:
if word in word_count_dict1:
word_count_dict1[word] += 1
else:
word_count_dict1[word] = 1
print(time()-start)
start = time()
for word in book_title:
if word not in word_count_dict2:
word_count_dict2[word] = 1
else:
word_count_dict2[word] += 1
print(time()-start)
``````

Output (may vary for you)

``````0.021044015884399414
0.02713179588317871
``````

Basically, they have the same cost. From Python 3 reference

The operator `not in` is defined to have the inverse truth value of `in`.

You can check the timing of operation for both pattern and compare yourself.

``````import timeit
def pattern1():
counts = {}
for word in title:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
def pattern2():
counts = {}
for word in title:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
sample1 = [timeit.timeit(pattern1, number=10000) for _ in range(10)]
sample2 = [timeit.timeit(pattern2, number=10000) for _ in range(10)]
print(sum(sample1) / len(sample1))
# 0.01713230140048836
print(sum(sample2) / len(sample2))
# 0.017954919600015273
``````

As we can see, the difference is negligible.

POPULAR ONLINE

### How do I downgrade my version of python from 3.7.5 to 3.6.5 on ubuntu

So currently, I have ubuntu 19And it comes by default with python 3

2149

### How to read numbers from file and take actions on them in java?

this is my file information:

104

### Failing when inserting generated column of date or time type from a datetime/timstamp source

A generated column is a column containing the result of an expression or function based on another fieldI want a simple date and time field derived from a datetime field, but to insert records I must adopt a workaround

101

### What is the best way to load the bootstrap card with contents dynamically?

I am a beginner in web developmentI need to create a page with bootstrap card dynamically by fetching values from database

97