IndexedHeap

Introduction

A heap organizes items into a tree structure and maintians the invariant such that the value of parent is always smaller than both the left child and right child (WLOG, talking about a min-heap).

Python has the built-in library heapq implementation.

The default heap has some constraints/ inconveniences:

You can only retrieve/ pop element at the top.
If you need to update an item, you may want to call internal functions like _siftup() or _siftdown() depending on whether the value is increased or decreased.
You can not directly remove an item in the middle of the heap.
You can not locate an item's position in a heap directly.

Towards this end, I build the IndexedHeap template. It can be used in following scenarios:

Maintain a priority queue, whose item value can be updated. This is handy in an SPFA algorithm. You can make sure there is no duplicate nodes in the queue and at the same time be able to update their value. (example behind)
One element has multiple attributes. Solving the problem requires efficiently calculating minimum of multiple attributes. With the help of indexed heap, you can maintain multiple heaps of ordered attributes and connect them via a common index. You can lookup extreme value from one heap and arbitrarily remove element from another heap via index.

Implementation of IndexedHeap

# min heap
class IndexedHeap:
    '''
    Every element is (index, item) format.
    The index is any arbitrary idenfifier you can use to look up/ pop a heap item
    '''
    def __init__(self):
        self.heap = []
        self.hash = {}

    def extreme_item(self):
        '''
        The bottom element of the heap.
        min by default.
        '''
        return self.heap[0][0]

    def extreme_index(self):
        return self.heap[0][1]

    def pop_item(self):
        item = self.extreme_item()
        self.remove_by_heap_pos(0)
        return item

    def pop_index(self):
        index = self.extreme_index()
        self.remove_by_heap_pos(0)
        return index

    def has_index(self, index):
        return (index in self.hash)

    def size(self):
        return len(self.heap)

    def add(self, index, item):
        self.heap.append((item, index))
        self.hash[index] = len(self.heap) - 1
        pos = self._siftdown(0, len(self.heap) - 1)
        return pos

    def remove(self, index):
        pos = self.hash[index]
        return self.remove_by_heap_pos(pos)

    def remove_by_heap_pos(self, pos):
        removed_item = self.heap[pos]
        del self.hash[removed_item[1]]
        self.heap[pos] = self.heap[-1]
        self.heap.pop() #list pop, remove last element
        if pos < len(self.heap):
            self._siftup(pos)
            self._siftdown(0, pos)

    def update(self, index, item):
        if self.has_index(index):
            self.remove(index)
        self.add(index, item)

    # https://github.com/python/cpython/blob/108e45524d5b2d8aa0d7feb1e593ea061fb36ba4/Lib/heapq.py#L205
    def _siftdown(self, startpos, pos):
        heap, hash = self.heap, self.hash
        newitem = heap[pos]
        # Follow the path to the root, moving parents down until finding a place
        # newitem fits.
        while pos > startpos:
            parentpos = (pos - 1) >> 1
            parent = heap[parentpos]
            if newitem < parent:
                heap[pos] = parent
                hash[parent[1]] = pos
                pos = parentpos
                continue
            break
        heap[pos] = newitem
        hash[newitem[1]] = pos
        return pos

    # https://github.com/python/cpython/blob/108e45524d5b2d8aa0d7feb1e593ea061fb36ba4/Lib/heapq.py#L205
    def _siftup(self, pos):
        heap, hash = self.heap, self.hash
        endpos = len(heap)
        startpos = pos
        newitem = heap[pos]
        # Bubble up the smaller child until hitting a leaf.
        childpos = 2*pos + 1    # leftmost child position
        while childpos < endpos:
            # Set childpos to index of smaller child.
            rightpos = childpos + 1
            if rightpos < endpos and not heap[childpos] < heap[rightpos]:
                childpos = rightpos
            # Move the smaller child up.
            heap[pos] = heap[childpos]
            hash[heap[childpos][1]] = pos
            pos = childpos
            childpos = 2*pos + 1
        # The leaf at pos is empty now.  Put newitem there, and bubble it up
        # to its final resting place (by sifting its parents down).
        heap[pos] = newitem
        hash[newitem[1]] = pos
        return self._siftdown(startpos, pos)

Example: SPFA

Example: LeetCode 2577. Minimum Time to Visit a Cell In a Grid

It is a typical shortest path problem. The key observation is that, once you reach a cell, you can always revisit the cell at even steps later. We can quickly validate if the problem has a solution by looking at cells (0,0), (0,1), (1,0) . If there is no blocking issue to start with, you can reach any cell and it is a matter of time.

Suppose we already know an interim solution (current optimal value) to reach cell (i,j), it is easy to expand to four directions (one cell being (ii, jj)) and determine if going from source to (ii, jj) via (i, j) is more efficient ("relaxing"). If so ("relaxed"), we shall update the current known optimal for (ii, jj) and then put it back to the queue so it can further relax other nodes.

The "queue" mentioned above can be two data structures:

A normal priority queue. In this way, we can always expand from the (i,j) that has minimum known value, giving a better chance of finding optimal for other nodes. This is a good heuristic. Drawback is that a relaxed node (ii, jj) might be already in the priority queue, but we don't have an efficient method to look it up, so the enqueue step results in duplicate values. In this way, the priority queue gets longer and longer and costs more time (logarithmically increase thanks to the heap structure).
A set. In this way, we can ensure no duplicate items in the active set of nodes. However, set pop is not ordered, so there is a good chance that some nodes are relaxed multiple times, because the later popped items have better route on shortest path.

Using an indexed heap, we can nicely solve the above issues. The overall codes are as follows:

class Solution:
    def minimumTime(self, grid: List[List[int]]) -> int:        
        m, n = len(grid), len(grid[0])
        if grid[0][0] > 0:
            return -1
        if grid[0][1] > 1 and grid[1][0] > 1:
            return -1
        opt = []
        for _ in range(m):
            opt.append([float('inf')] * n)
        opt[0][0] = 0

        ih = IndexedHeap()
        ih.update((0, 0), 0)

        dirs = [(1, 0), (0, 1), (-1, 0), (0, -1)]
        while len(ih.hash) > 0:
            i, j = ih.pop_index()
            for di, dj in dirs:
                ii = i + di
                jj = j + dj
                if 0 <= ii < m and 0 <= jj < n:
                    new = opt[i][j] + 1
                    if new >= grid[ii][jj]:
                        pass
                    else:
                        delta = grid[ii][jj] - new
                        if delta % 2 == 1:
                            delta += 1
                        new += delta
                    if new < opt[ii][jj]:
                        opt[ii][jj] = new
                        ih.update((ii, jj), new)
        return opt[m - 1][n - 1]

Example: Connect Two Heaps

Example: LeetCode: 218. The Skyline Problem

Given a list of (left, right, height) tuples representing the buildlings, find the skyline.

We can sort the list by left and apply a scan line algorithm across all left positions to find critical points. The key complexity of this problem lies when there are overlapping buildings. Suppose we maintain a set of current overlapping buildings, there are two sub problems we need to solve:

Given the current scanning horizontal location x, determine if we need to pop out some buildings from the overlapping set. At a given x, we can pop out all the buildings whose right side is smaller than x. In order to find "all buildings whose right < x", we can:
- Find the building with minimum right and test if it is < x:
  - If so, pop the building and continue the process.
  - If not, all the right in our set are >= x, so we don't further pop anything
Query the max height from the overlapping set, combining the with scanning horizontal location x, the point (x, max_height) is a critical point.

The first can be achieved with the min-heap of right. The second can be achived with the max-heap of height. Note that in the first step, we need to pop out items sometimes. It is easy to pop out min item from the heap (as a built in feature of normal heap). However, it becomes not so straightfoward to pop out the same building from the max-heap of height.

This is where the IndexedHeap can come handy.

# Test the indexed heap functionality
# case from https://leetcode.com/problems/the-skyline-problem/
# The skyline problem
buildings = [2,9,10],[3,7,15],[5,12,12],[15,20,10],[19,24,8](https://hash.hupili.net/291037155121215201019248/)
ih_height = IndexedHeap() # min heap; use as max heap
ih_right = IndexedHeap() # min heap; use as min heap
critical_points = []
i = 0
buildings.append((float('inf'), float('inf'), 0))
while i < len(buildings):
    b = buildings[i]
    print(i, b)
    print(ih_right.hash, ih_right.heap)
    print(ih_height.hash, ih_height.heap)
    l, r, h = b
    if ih_right.size() == 0 or ih_right.extreme_item() > l:
        print('add new')
        # l is the next scan line
        x = l
        ih_right.add(i, r)
        ih_height.add(i, (-h, h))
        y = ih_height.extreme_item()[1]
        i += 1
    else:
        print('pop existing')
        x = ih_right.extreme()
        ii = ih_right.extreme_index()
        ih_right.remove(ii)
        ih_height.remove(ii)
        if ih_height.size() == 0:
            y = 0
        else:
            y = ih_height.extreme()[1]
    critical_points.append((x, y))
print(critical_points)

Note on the template

Firstly, the update() method is not the most efficient at implementation level. It takes two \(\log N\) operations to remove and add the item. One can optimise it by directly applying the siftup or siftdown algorithm. This is left to future work.

Second note is about the min/max feature. We only implemented the min-heap here, similar to that of the standard library. If you need a max-heap, the common practice is to put a tuple (-value, value, other_data...) . If this way, it is a max-heap based on value. You can retrieve original value and auxiliary data from the remaining of the tuple.

Third note is that the index can be anything that is hashable, i.e. that can be used as key of collections. It does not have to be an integer (like normal sense of "index"). It can be a tuple.

IndexedHeap

Introduction

Implementation of IndexedHeap

Example: SPFA

Example: Connect Two Heaps

Note on the template

Comments

More from this blog

Vibeo Concept Notes

asyncio wrapper for any sync functions

Read Hackers and Painters again for GenAI

BKTree for similar string search

Quick note of Agent based on GenAI Playbook and Tool on CloudRun

Command Palette

Introduction

Implementation of IndexedHeap

Example: SPFA

Example: Connect Two Heaps

Note on the template

Comments

More from this blog