bisect — 数组二分查找算法

源代码: Lib/bisect.py


This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over linear searches or frequent resorting.

The module is called bisect because it uses a basic bisection algorithm to do its work. Unlike other bisection tools that search for a specific value, the functions in this module are designed to locate an insertion point. Accordingly, the functions never call an __eq__() method to determine whether a value has been found. Instead, the functions only call the __lt__() method and will return an insertion point between values in an array.

定义了以下函数:

bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)

a 中找到 x 合适的插入点以维持有序。参数 lohi 可以被用于确定需要考虑的子集;默认情况下整个列表都会被使用。如果 x 已经在 a 里存在,那么插入点会在已存在元素之前(也就是左边)。如果 a 是列表(list)的话,返回值是可以被放在 list.insert() 的第一个参数的。

The returned insertion point ip partitions the array a into two slices such that all(elem < x for elem in a[lo : ip]) is true for the left slice and all(elem >= x for elem in a[ip : hi]) is true for the right slice.

key specifies a key function of one argument that is used to extract a comparison key from each element in the array. To support searching complex records, the key function is not applied to the x value.

If key is None, the elements are compared directly and no key function is called.

在 3.10 版更改: 增加了 key 形参。

bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.bisect(a, x, lo=0, hi=len(a), *, key=None)

类似于 bisect_left(),但是返回的插入点是 a 中已存在元素 x 的右侧。

The returned insertion point ip partitions the array a into two slices such that all(elem <= x for elem in a[lo : ip]) is true for the left slice and all(elem > x for elem in a[ip : hi]) is true for the right slice.

在 3.10 版更改: 增加了 key 形参。

bisect.insort_left(a, x, lo=0, hi=len(a), *, key=None)

按照已排序顺序将 x 插入到 a 中。

此函数首先会运行 bisect_left() 来定位一个插入点。 然后,它会在 a 上运行 insert() 方法在正确的位置插入 x 以保持排序顺序。

To support inserting records in a table, the key function (if any) is applied to x for the search step but not for the insertion step.

请记住 O(log n) 搜索是由缓慢的 O(n) 抛入步骤主导的。

在 3.10 版更改: 增加了 key 形参。

bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.insort(a, x, lo=0, hi=len(a), *, key=None)

类似于 insort_left(),但是把 x 插入到 a 中已存在元素 x 的右侧。

此函数首先会运行 bisect_right() 来定位一个插入点。 然后,它会在 a 上运行 insert() 方法在正确的位置插入 x 以保持排序顺序。

To support inserting records in a table, the key function (if any) is applied to x for the search step but not for the insertion step.

请记住 O(log n) 搜索是由缓慢的 O(n) 抛入步骤主导的。

在 3.10 版更改: 增加了 key 形参。

性能说明

当使用 bisect()insort() 编写时间敏感的代码时,请记住以下概念。

  • 二分法对于搜索一定范围的值是很高效的。 对于定位特定的值,则字典的性能更好。

  • insort() 函数的时间复杂度为 O(n) 因为对数时间的搜索步骤被线性时间的插入步骤所主导。

  • 这些搜索函数都是无状态的并且会在它们被使用后丢弃键函数的结果。 因此,如果在一个循环中使用搜索函数,则键函数可能会在同一个数据元素上被反复调用。 如果键函数速度不快,请考虑用 functools.cache() 来包装它以避免重复计算。 另外,也可以考虑搜索一个预先计算好的键数组来定位插入点(如下面的示例节所演示的)。

参见

  • Sorted Collections is a high performance module that uses bisect to managed sorted collections of data.

  • SortedCollection recipe 使用 bisect 构建了一个功能完整的多项集类,拥有直观的搜索方法和对键函数的支持。 所有键函数都 是预先计算好的以避免在搜索期间对键函数的不必要的调用。

搜索有序列表

上面的 bisect() 函数对于找到插入点是有用的,但在一般的搜索任务中可能会有点尴尬。下面 5 个函数展示了如何将其转变成有序列表中的标准查找函数

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

def find_lt(a, x):
    'Find rightmost value less than x'
    i = bisect_left(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_le(a, x):
    'Find rightmost value less than or equal to x'
    i = bisect_right(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_gt(a, x):
    'Find leftmost value greater than x'
    i = bisect_right(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

def find_ge(a, x):
    'Find leftmost item greater than or equal to x'
    i = bisect_left(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

例子

函数 bisect() 还可以用于数字表查询。这个例子是使用 bisect() 从一个给定的考试成绩集合里,通过一个有序数字表,查出其对应的字母等级:90 分及以上是 ‘A’,80 到 89 是 ‘B’,以此类推

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

The bisect() and insort() functions also work with lists of tuples. The key argument can serve to extract the field used for ordering records in a table:

>>> from collections import namedtuple
>>> from operator import attrgetter
>>> from bisect import bisect, insort
>>> from pprint import pprint

>>> Movie = namedtuple('Movie', ('name', 'released', 'director'))

>>> movies = [
...     Movie('Jaws', 1975, 'Speilberg'),
...     Movie('Titanic', 1997, 'Cameron'),
...     Movie('The Birds', 1963, 'Hitchcock'),
...     Movie('Aliens', 1986, 'Scott')
... ]

>>> # Find the first movie released after 1960
>>> by_year = attrgetter('released')
>>> movies.sort(key=by_year)
>>> movies[bisect(movies, 1960, key=by_year)]
Movie(name='The Birds', released=1963, director='Hitchcock')

>>> # Insert a movie while maintaining sort order
>>> romance = Movie('Love Story', 1970, 'Hiller')
>>> insort(movies, romance, key=by_year)
>>> pprint(movies)
[Movie(name='The Birds', released=1963, director='Hitchcock'),
 Movie(name='Love Story', released=1970, director='Hiller'),
 Movie(name='Jaws', released=1975, director='Speilberg'),
 Movie(name='Aliens', released=1986, director='Scott'),
 Movie(name='Titanic', released=1997, director='Cameron')]

If the key function is expensive, it is possible to avoid repeated function calls by searching a list of precomputed keys to find the index of a record:

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])       # Or use operator.itemgetter(1).
>>> keys = [r[1] for r in data]         # Precompute a list of keys.
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)