Leetcode, Data structure and Algorithm
Leetcode, Data structure and Algorithm
1. Introduction
This is the main file that contains all the problems and their solutions. Each problem is organised as follows:
- Problem number Problem name :problemtag:
- Description (Basic info)
- Examples
- Solution (Tackling process)
- Understanding the problem
- Algorithm (written by me)
- Code (written by me)
- Complexity
- Time complexity
- Space complexity
- Leetcode solution
- Time complexity
- Space complexity
- More analysis (Thoughts)
- General thoughts
- Related problems
2. Problems
Here we start to finish our problems 😈.
2.1. Imports for typing
from typing import List, Tuple, Set
2.2. DONE 1313 Decompress Run-length encoded list array easy
2.2.1. Description
We are given a list nums
of integers representing a list compressed with run-length encoding.
Consider each adjacent pair of elements [freq, val]=[nums[2*i], nums[2*i+1]], i>=0
.
For each such pair, there are freq
elements with value val
concatenated in a sublist.
Concatenate all the sublists from left to right to generate the decompressed list.
Return the decompressed list.
Constraints:
- 2 <= nums.length <= 100
- nums.length % 2 == 0
- 1 <= nums[i] <= 100
2.2.1.1. Examples:
Input: nums = [1,2,3,4] Output: [2,4,4,4] Explanation: The first pair [1,2] means we have freq = 1 and val = 2 so we generate the array [2]. The second pair [3,4] means we have freq = 3 and val = 4 so we generate [4,4,4]. At the end the concatenation [2] + [4,4,4] is [2,4,4,4]. Input: nums = [1,1,2,3] Output: [1,3,3]
2.2.2. Solution
2.2.2.1. Understanding the problem
Just an array problem.
2.2.2.2. Algorithm
We can use for i in range(len(nums), 2)
and create a sublist for all nums[i-1], nums[i]
.
Finally we concatenate these sublists.
2.2.2.3. Code
<<imports for typing>> def decompress_RLE_list(nums: List[int]) -> List[int]: res = [] for i in range(0, len(nums), 2): sub_lst = [nums[i+1] for j in range(nums[i])] res = res+sub_lst return res print(decompress_RLE_list([1,1,2,3]))
[1, 3, 3]
2.2.2.4. Complexity
2.2.2.4.1. Time complexity:
\(O(M)\text{, where } M=\sum_{i=0,2,4...}^{len(nums)-2}nums[i]\), the summation corresponds to the outer for
loop, we simply sum up the time it takes to build each sub list.
2.2.2.4.2. Space complexity:
\(O(M)\text{, where } M=\sum_{i=0,2,4...}^{len(nums)-2}nums[i]\)
2.2.2.5. Leetcode solution
Not available.
2.2.2.5.1. Time complexity:
2.2.2.5.2. Space complexity:
2.2.3. More analysis
2.2.3.1. General thoughts
To concatenate lists in Python.
lst1=[1,2,3] lst2=[3,3,3] print(lst1+lst2)
[1, 2, 3, 3, 3, 3]
2.2.3.2. Related problems
2.3. DONE 1295 Find numbers with even number of digits array easy
2.3.1. Description
Given an array nums
of integers, return how many of them contain an even number of digits.
Constraints:
- 1 <= nums.length <= 500
- 1 <= nums[i] <= 105
2.3.1.1. Examples:
Input: nums = [12,345,2,6,7896] Output: 2 Explanation: 12 contains 2 digits (even number of digits). 345 contains 3 digits (odd number of digits). 2 contains 1 digit (odd number of digits). 6 contains 1 digit (odd number of digits). 7896 contains 4 digits (even number of digits). Therefore only 12 and 7896 contain an even number of digits. Input: nums = [555,901,482,1771] Output: 1 Explanation: Only 1771 contains an even number of digits.
2.3.2. Solution
2.3.2.1. Understanding the problem
2.3.2.2. Algorithm
We simply loop the nums
and check each number.
We can use this to get the number of all digits of a number: len(str(num))
.
2.3.2.3. Code
<<imports for typing>> def find_numbers(nums: List[int]) -> int: res = 0 for ele in nums: if len(str(ele)) % 2 == 0: res += 1 return res print(find_numbers([1,2,3])) print(find_numbers([11,2,3]))
0 1
2.3.2.4. Complexity
2.3.2.4.1. Time complexity:
\(O(N)\).
2.3.2.4.2. Space complexity:
\(O(N)\).
2.3.2.5. Leetcode solution
Not available.
2.3.2.5.1. Time complexity:
2.3.2.5.2. Space complexity:
2.3.3. More analysis
2.3.3.1. General thoughts
Pretty straightforward.
2.3.3.2. Related problems
2.4. DONE 1365 How many numbers are smaller than the current number array easy
2.4.1. Description
Given the array nums
, for each nums[i]
, find out how many numbers in the array are smaller than it.
That is, for each nums[i]
, you have to count the number of valid j
's, such that j != i and nums[j] < nums[i]>
.
Return the answer in an array.
Constraints:
- 2 <= nums.length <= 500
- 0 <= nums[i] <= 100
2.4.1.1. Examples:
Input: nums = [8,1,2,2,3] Output: [4,0,1,1,3] Explanation: For nums[0]=8 there exist four smaller numbers than it (1, 2, 2 and 3). For nums[1]=1 does not exist any smaller number than it. For nums[2]=2 there exist one smaller number than it (1). For nums[3]=2 there exist one smaller number than it (1). For nums[4]=3 there exist three smaller numbers than it (1, 2 and 2). Input: nums = [6,5,4,8] Output: [2,1,0,3] Input: nums = [7,7,7,7] Output: [0,0,0,0]
2.4.2. Solution
2.4.2.1. Understanding the problem
2.4.2.2. Algorithm
We first sort nums
, and get sorted_nums
.
We loop through sorted_nums
, and get smaller_count
mapping in the form of {num: smaller_count}
.
We then loop nums
and get smaller_count
of each num
, store them in res
.
2.4.2.3. Code
<<imports for typing>> def smaller_numbers_than_current(nums: List[int]) -> List[int]: import math sorted_nums = sorted(nums) smaller_counts = {} for i, e in enumerate(sorted_nums): smaller_counts[e] = min(smaller_counts.get(e, math.inf), i) res = [] for num in nums: res.append(smaller_counts[num]) return res print(smaller_numbers_than_current([2,2,3,4,1]))
[1, 1, 3, 4, 0]
2.4.2.4. Complexity
2.4.2.4.1. Time complexity:
\(O(N\log{N})\).
2.4.2.4.2. Space complexity:
\(O(N)\).
2.4.2.5. Leetcode solution
Not available.
2.4.2.5.1. Time complexity:
2.4.2.5.2. Space complexity:
2.4.3. More analysis
2.4.3.1. General thoughts
The best time complexity we can do is \(O(N\log N)\).
2.4.3.2. Related problems
2.5. DONE 1409 Queries on a permutation with key array medium
2.5.1. Description
Given the array queries
of positive integers between 1
and m
, you have to process all queries[i]
(from i=0
to i=queries.length-1
) according to the following rules:
- in the beginning, you have the permutation
P=[1,2,3,...,m]
. - For the current
i
, find the position ofqueries[i]
in the permutationP
(indexing from 0) and then move this at the begining of the permutationP
. Notice that the position ofqueries[i]
inP
is the result forqueries[i]
.
Return an array containing the result for the given queries
.
Constraints:
- 1 <= m <= 103
- 1 <= queries.length <= m
- 1 <= queries[i] <= m
2.5.1.1. Examples:
Input: queries = [3,1,2,1], m = 5 Output: [2,1,2,1] Explanation: The queries are processed as follow: For i=0: queries[i]=3, P=[1,2,3,4,5], position of 3 in P is 2, then we move 3 to the beginning of P resulting in P=[3,1,2,4,5]. For i=1: queries[i]=1, P=[3,1,2,4,5], position of 1 in P is 1, then we move 1 to the beginning of P resulting in P=[1,3,2,4,5]. For i=2: queries[i]=2, P=[1,3,2,4,5], position of 2 in P is 2, then we move 2 to the beginning of P resulting in P=[2,1,3,4,5]. For i=3: queries[i]=1, P=[2,1,3,4,5], position of 1 in P is 1, then we move 1 to the beginning of P resulting in P=[1,2,3,4,5]. Therefore, the array containing the result is [2,1,2,1]. Input: queries = [4,1,2,2], m = 4 Output: [3,1,2,0] Input: queries = [7,5,5,8,3], m = 8 Output: [6,5,0,7,5]
2.5.2. Solution
2.5.2.1. Understanding the problem
2.5.2.2. Algorithm
We can use brute force.
We loop through queries for i, e in enumerate(queries)
, then have an inner loop for j, e_p in enumerate(p_copy)
, when e_p==e
, we do P.pop(j)
, P.insert(0, e_p)
, and res.append(j)
.
2.5.2.3. Code
<<imports for typing>> def process_queries(queries: List[int], m: int) -> List[int]: res = [] p = list(range(1, m+1)) p_copy = p.copy() for i, e in enumerate(queries): p_copy = p.copy() for j, e_p in enumerate(p_copy): if e_p == e: p.pop(j) p.insert(0, e_p) res.append(j) return res print(process_queries([3,1,2,1],5))
[2, 1, 2, 1]
2.5.2.4. Complexity
2.5.2.4.1. Time complexity:
\(O(N\times M)\text{ , where }N=len(queries), M=m\).
2.5.2.4.2. Space complexity:
\(O(M)\).
2.5.2.5. Leetcode solution
Not available.
Here is one interesting solution from the discussion forum.
<<imports for typing>> def process_queries(queries: List[int], m: int) -> List[int]: # the original code uses # def process(arr, idx), which is quite confusing # as we are trying to find the idx of an element def process(arr, elem): ans = arr.index(elem) arr.insert(0, arr.pop(ans)) return ans m = [x for x in range(1, m+1)] return [process(m, i) for i in queries] print(process_queries([3,1,2,1],5))
[2, 1, 2, 1]
2.5.2.5.1. Time complexity:
\(O(N\times M)\text{ , where }N=len(queries), M=m\).
2.5.2.5.2. Space complexity:
\(O(M)\).
2.5.3. More analysis
2.5.3.1. General thoughts
2.5.3.2. Related problems
2.6. 1329 Sort the matrix diagonally
This is based on an incorrect understanding of the problem description. See 2.7 for a correct understanding and solution.
2.6.1. Description
Given a m*n
matrix mat
of integers, sort it diagonally in ascending order from the top-left to the bottom-right then return the sorted array.
Constraints:
- m == mat.length
- n == mat[i].length
- 1 <= m, n <= 100
- 1 <= mat[i][j] <= 100
2.6.1.1. Examples:
Input: mat = [[3,3,1,1],[2,2,1,2],[1,1,1,2]] Output: [[1,1,1,1],[1,2,2,2],[1,2,3,3]]
2.6.2. Solution
2.6.2.1. Understanding the problem
2.6.2.2. Algorithm
We first concatenate all lists in the matrix, then sort it to get sorted_lst
.
Then we re-construct the resulting matrix and return it.
Then we create a new empty m*n
matrix res
filled with None
.
We then loop through sorted_lst
, and fill the first row of res
, then the first column, then second row, second column…
Recall that a 2D array can be mapped to a 1D array.
2.6.2.3. Code
<<imports for typing>> def diagonal_sort(mat: List[List[int]]) -> List[List[int]]: import itertools m = len(mat) n = len(mat[0]) res = [[None for i in range(n)] for j in range(m)] sorted_lst = sorted(itertools.chain(*mat)) cur_col = 0 cur_row = 0 idx = 0 while cur_col < n or cur_row < m: if cur_col < n: for moving_col in range(cur_col, n): res[cur_row][moving_col] = sorted_lst[idx] idx += 1 # cur_row+1 so that it does not overwrite # the first data in the column if cur_row < m: for moving_row in range(cur_row+1, m): res[moving_row][cur_col] = sorted_lst[idx] idx += 1 cur_col = min(n-1, cur_col+1) cur_row = min(m-1, cur_row+1) if cur_col == n-1 and cur_row == m-1 and res[m-1][n-1] is not None: break return res #print(diagonal_sort([[2,1],[3,2],[3,2]])) #print(diagonal_sort([[1,2,1],[3,2,4],[7,7,4]])) #print(diagonal_sort([[3,3,1,1],[2,2,1,2],[1,1,1,2]])) print(diagonal_sort( [[11,25,66,1,69,7],[23,55,17,45,15,52],[75,31,36,44,58,8],[22,27,33,25,68,4],[84,28,14,11,5,50]] ))
[[1, 4, 5, 7, 8, 11], [11, 22, 23, 25, 25, 27], [14, 28, 36, 44, 45, 50], [15, 31, 52, 58, 66, 68], [17, 33, 55, 69, 75, 84]]
2.6.2.4. Complexity
2.6.2.4.1. Time complexity:
2.6.2.4.2. Space complexity:
2.6.2.5. Leetcode solution
2.6.2.5.1. Time complexity:
2.6.2.5.2. Space complexity:
2.6.3. More analysis
2.6.3.1. General thoughts
2.6.3.2. Related problems
2.7. DONE 1329 Sort the matrix diagonally 2 array medium
2.7.1. Description
Given a m*n
matrix mat
of integers, sort it diagonally in ascending order from the top-left to the bottom-right then return the sorted array.
Constraints:
- m == mat.length
- n == mat[i].length
- 1 <= m, n <= 100
- 1 <= mat[i][j] <= 100
2.7.1.1. Examples:
Input: mat = [[3,3,1,1],[2,2,1,2],[1,1,1,2]] Output: [[1,1,1,1],[1,2,2,2],[1,2,3,3]]
2.7.2. Solution
2.7.2.1. Understanding the problem
2.7.2.2. Algorithm
With the existing mat
, we want to sort each existing diagonal so that they are ascend from top-left to bottom right. We do not want to sort the entire matrix.
We need to first get all the starting cell of each diagonal list (x, y)
.
With it, we can easily get the diagonal list by adding 1 to both x
and y
.
We then sort each diagonal list and put them back to the mat
.
2.7.2.3. Code
<<imports for typing>> def diagonal_sort(mat: List[List[int]]) -> List[List[int]]: n = len(mat) m = len(mat[0]) def sort_list(head: Tuple[int]) -> None: res = [] x, y = head while x < n and y < m: res.append(mat[x][y]) x+=1 y+=1 res.sort() res = iter(res) x, y = head while x < n and y < m: mat[x][y] = next(res) x+=1 y+=1 diagonal_heads = [(x, y) for x in range(n) for y in range(m)] for head in diagonal_heads: sort_list(head) return mat print(diagonal_sort([[4,2,3],[1,3,4]])) print(diagonal_sort([[2,1],[3,2],[3,2]])) print(diagonal_sort([[1,2,1],[3,2,4],[7,7,4]])) print(diagonal_sort([[3,3,1,1],[2,2,1,2],[1,1,1,2]])) print(diagonal_sort( [[11,25,66,1,69,7],[23,55,17,45,15,52],[75,31,36,44,58,8],[22,27,33,25,68,4],[84,28,14,11,5,50]] ))
[[3, 2, 3], [1, 4, 4]] [[2, 1], [2, 2], [3, 3]] [[1, 2, 1], [3, 2, 4], [7, 7, 4]] [[1, 1, 1, 1], [1, 2, 2, 2], [1, 2, 3, 3]] [[5, 17, 4, 1, 52, 7], [11, 11, 25, 45, 8, 69], [14, 23, 25, 44, 58, 15], [22, 27, 31, 36, 50, 66], [84, 28, 75, 33, 55, 68]]
2.7.2.4. Complexity
2.7.2.4.1. Time complexity:
This is a bit hard to analyse, but the basic idea is to add up all the time taken to sort all diagonal lists. The total number of such lists is \(m+n-1\). By observation, we can get that the number of the longest diagonal list(s) in the matrix is \(|n-m+1|\). The number of remaining lists would be \(m+n-1-|n-m+1|\), which gives us either \(2\times (m-1)\) or \(2\times (n-1)\).
Therefore, our formula to calculate the total time required by the algorithm is as follows:
\(O(2\times (\sum_{i=1}^{\min(n,m)-1} i\log i)+|n-m+1|\times \min(m,n)\times \log{\min(m,n)})\).
2.7.2.4.2. Space complexity:
\(O(\sqrt{m\times n})\).
2.7.2.5. Leetcode solution
Not available.
Interesting one here.
def diagonal_sort(mat): m, n = len(mat), len(mat[0]) def sort(i, j): ij = zip(range(i, m), range(j, n)) vals = iter(sorted(mat[i][j] for i, j in ij)) for i, j in ij: mat[i][j] = next(vals) for i in range(m): sort(i, 0) for j in range(m): sort(0, j) return mat print(diagonal_sort([[4,2,3],[1,3,4]])) print(diagonal_sort([[2,1],[3,2],[3,2]])) print(diagonal_sort([[1,2,1],[3,2,4],[7,7,4]])) print(diagonal_sort([[3,3,1,1],[2,2,1,2],[1,1,1,2]])) print(diagonal_sort( [[11,25,66,1,69,7],[23,55,17,45,15,52],[75,31,36,44,58,8],[22,27,33,25,68,4],[84,28,14,11,5,50]] ))
[[4, 2, 3], [1, 3, 4]] [[2, 1], [3, 2], [3, 2]] [[1, 2, 1], [3, 2, 4], [7, 7, 4]] [[3, 3, 1, 1], [2, 2, 1, 2], [1, 1, 1, 2]] [[11, 25, 66, 1, 69, 7], [23, 55, 17, 45, 15, 52], [75, 31, 36, 44, 58, 8], [22, 27, 33, 25, 68, 4], [84, 28, 14, 11, 5, 50]]
2.7.2.5.1. Time complexity:
2.7.2.5.2. Space complexity:
2.7.3. More analysis
2.7.3.1. General thoughts
2.7.3.2. Related problems
2.8. DONE 1395 Count number of teams array medium
2.8.1. Description
There are n
soldiers standing in a line. Each soldier is assigned a unique rating
value.
You have to form a team of 3 soldiers amongst them under the following rules:
- Choose 3 soldiers with index(i, j, k) with rating
(rating[i], rating[j], rating[k])
. - A team is valid if:
(rating[i]<rating[j]<rating[k])
orrating[i]>rating[j]>rating[k]
, where0<=i<j<k<n
.
Soldiers can be part of multiple teams.
Return the number of teams you can form given the conditions.
Constraints:
- n == rating.length
- 1 <= n <= 200
- 1 <= rating[i] <= 105
2.8.1.1. Examples:
Input: rating = [2,5,3,4,1] Output: 3 Explanation: We can form three teams given the conditions. (2,3,4), (5,4,1), (5,3,1). Input: rating = [2,1,3] Output: 0 Explanation: We can't form any team given the conditions. Input: rating = [1,2,3,4] Output: 4
2.8.2. Solution
2.8.2.1. Understanding the problem
2.8.2.2. Algorithm
We return False
for all n<3
.
This is a combination problem. We need to try all combinations of the list.
2.8.2.3. Code
<<imports for typing>> def num_teams(rating: List[int]) -> int: if len(rating) < 3: return 0 from itertools import combinations count = 0 for comb in combinations(rating, 3): if comb[0] < comb[1] < comb[2] or comb[0]>comb[1]>comb[2]: count += 1 return count print(num_teams([2,5,3,4,1])) print(num_teams([2,1,3])) print(num_teams([1,2,3,4])) print(num_teams([0,1]))
3 0 4 0
2.8.2.4. Complexity
2.8.2.4.1. Time complexity:
\(O(C_{n}^{3})\), i.e. \(O(n^3)\).
2.8.2.4.2. Space complexity:
\(O(1)\).
2.8.2.5. Leetcode solution
Not available.
This following is adapted from this Java solution.
<<imports for typing>> def num_teams(rating: List[int]) -> int: ans = 0 n = len(rating) biggerLeft, biggerRight = {}, {} for i in range(n-1): for j in range(i+1, n): if rating[i] < rating[j]: biggerRight[i] = biggerRight.get(i, 0) + 1 elif rating[i] > rating[j]: biggerLeft[j] = biggerLeft.get(j, 0) + 1 for i in range(n-1): for j in range(i+1, n): if rating[i] < rating[j]: ans += biggerRight.get(j, 0) elif rating[i] > rating[j]: ans += biggerLeft.get(i, 0) return ans print(num_teams([2,5,3,4,1])) print(num_teams([2,1,3])) print(num_teams([1,2,3,4])) print(num_teams([0,1]))
3 0 4 0
2.8.2.5.1. Time complexity:
\(O(N^2)\).
2.8.2.5.2. Space complexity:
\(O(N)\).
2.8.3. More analysis
2.8.3.1. General thoughts
Brute-forcing this problem should be prohibited as this is medium level problem. We should try to use some better algorithms to solve it.
2.8.2.5 is a neat solution to the problem and a specia case of a general solution, which is provided here.
<<imports for typing>> def calc_combinations(n: int, k: int) -> int: """ n: total number of items k: number of items to be drawn """ calculated = {} def factorial(n: int) -> int: if n == 0 or n == 1: return 1 if n not in calculated.keys(): calculated[n] = n * factorial(n-1) return calculated[n] if n < k : return 0 return int(factorial(n)/factorial(k)/factorial(n-k)) def num_teams(rating: List[int], team_len: int = 3) -> int: ans = 0 n = len(rating) biggerLeft, biggerRight = {}, {} for i in range(n-1): for j in range(i+1, n): if rating[i] < rating[j]: biggerRight[i] = biggerRight.get(i, 0) + 1 elif rating[i] > rating[j]: biggerLeft[j] = biggerLeft.get(j, 0) + 1 for i in range(n-1): for j in range(i+1, n): if rating[i] < rating[j]: # team_len-2 gives the remaining numbers that need to be drawn # from biggerRight.get(j, 0) given rating[i] and rating[j] ans += calc_combinations(biggerRight.get(j, 0), team_len-2) elif rating[i] > rating[j]: ans += calc_combinations(biggerLeft.get(i, 0), team_len-2) return ans print(num_teams([2,7,8,9,10], 4)) print(num_teams([2,1,3])) print(num_teams([1,2,3,4])) print(num_teams([0,1]))
5 0 4 0
2.8.3.2. Related problems
2.9. DONE 442 Find all duplicates in an array array medium
2.9.1. Description
Given an array of integers, \(1\le a[i] \le n, n=\text{size of array}\), some elements appear twice and others appear once.
Find all the elements that appear twice in this array.
Constraints: Do it without extra space and in \(O(n)\) runtime.
2.9.1.1. Examples:
Input: [4,3,2,7,8,2,3,1] Output: [2,3]
2.9.2. Solution
2.9.2.1. Understanding the problem
2.9.2.2. Algorithm
See Discussion.
\(\because\): Given the condition \(\max(arr)\le \text{len}(arr)\), we know that each number element in the array should be able to be mapped to an index of the array.
We also know that, a number in the array either occurs just once or twice.
\(\therefore\):
when iterating the array, we can mark the number at index i
negative when we first encounter it. If we never see it again, then we can safely ignore it (once). The second time we see it and it's negative, we add it to the final result (twice).
2.9.2.3. Code
<<imports for typing>> def find_duplicates(nums: List[int]) -> List[int]: res = [] for n in nums: if nums[abs(n)-1] < 0: res.append(abs(n)) else: nums[abs(n)-1] *= -1 return res print(find_duplicates([1,2,3,3]))
[3]
2.9.2.4. Complexity
2.9.2.4.1. Time complexity:
\(O(N)\).
2.9.2.4.2. Space complexity:
\(O(1)\), excluding the returning result res
.
2.9.2.5. Leetcode solution
Not available.
Most interesting idea has been provided in the previous section.
<<imports for typing>>
2.9.2.5.1. Time complexity:
2.9.2.5.2. Space complexity:
2.9.3. More analysis
2.9.3.1. General thoughts
2.9.3.2. Related problems
2.10. DONE 78 Subsets array medium
2.10.1. Description
Given a set of distinct integers, nums
, return all possible subsets (the power set).
Constraints: The solution set must not contain duplicate subsets.
2.10.1.1. Examples:
Input: nums = [1,2,3] Output: [ [3], [1], [2], [1,2,3], [1,3], [2,3], [1,2], [] ]
2.10.2. Solution
2.10.2.1. Understanding the problem
2.10.2.2. Algorithm
We can use Python's itertools.combinations
.
2.10.2.3. Code
<<imports for typing>> def subsets(nums: List[int]) -> List[List[int]]: from itertools import combinations, chain res = [combinations(nums, i) for i in range(1, len(nums) + 1)] return list(map(list, chain(*res))) + [[]] print(subsets([1,2,3]))
[[1], [2], [3], [1, 2], [1, 3], [2, 3], [1, 2, 3], []]
2.10.2.4. Complexity
2.10.2.4.1. Time complexity:
\(O(\frac{N!}{(N-K)!K!})\)
2.10.2.4.2. Space complexity:
\(O(\frac{N!}{(N-K)!K!})\)
2.10.2.5. Leetcode solution
Bitmask solution.
The idea is that we map each subset to a bitmask of length n, where 1 on the ith position in the bitmask meas the presence of nums[i]
in the subset, and 0 means its absence.
<<imports for typing>> def subsets(nums: List[int]) -> List[List[int]]: n = len(nums) output = [] # nth_bit = "1000" nth_bit = 1<<n for i in range(2**n): # generate bit mask, from 0..00 to 1..11 bitmask = bin(i | nth_bit)[3:] # append subset corresponding to that bit mask output.append([nums[j] for j in range(n) if bitmask[j] == '1']) return output print(subsets([1,2,3]))
0b1000 000 0b1001 001 0b1010 010 0b1011 011 0b1100 100 0b1101 101 0b1110 110 0b1111 111 [[], [3], [2], [2, 3], [1], [1, 3], [1, 2], [1, 2, 3]]
2.10.2.5.1. Time complexity:
\(O(\frac{N!}{(N-K)!K!})\)
2.10.2.5.2. Space complexity:
\(O(\frac{N!}{(N-K)!K!})\)
2.11. DONE 48 Rotate image array medium
2.11.1. Description
You are given an n*n
2D matrix representing an image.
Rotate the image 90 degrees (clockwise).
Constraints:
You have to rotate the image in-place.
2.11.1.1. Examples:
Given input matrix = [ [1,2,3], [4,5,6], [7,8,9] ], rotate the input matrix in-place such that it becomes: [ [7,4,1], [8,5,2], [9,6,3] ] Given input matrix = [ [ 5, 1, 9,11], [ 2, 4, 8,10], [13, 3, 6, 7], [15,14,12,16] ], rotate the input matrix in-place such that it becomes: [ [15,13, 2, 5], [14, 3, 4, 1], [12, 6, 8, 9], [16, 7,10,11] ]
2.11.2. Solution
2.11.2.1. Understanding the problem
2.11.2.2. Algorithm
Given the coordinate of a point (row, col)
in the matrix mat
(n*n
), to rotate a matrix clockwise by 90 degrees is to do the following:
map(lambda (row, col): (col, n-1-row), mat)
.
2.11.2.3. Code
<<imports for typing>> def rotate(matrix: List[List[int]]) -> None: import math n = len(matrix) def rotate_square(mat: List[List[int]], coor: Tuple[int]) -> None: row, col = coor target_val = mat[col][n-1-row] mat[col][n-1-row] = mat[row][col] row, col = col, n-1-row temp = mat[col][n-1-row] mat[col][n-1-row] = target_val row, col = col, n-1-row target_val = mat[col][n-1-row] mat[col][n-1-row] = temp row, col = col, n-1-row mat[col][n-1-row] = target_val for cur_row in range(math.ceil(n/2)): for cur_col in range(cur_row, n-cur_row-1): rotate_square(matrix, (cur_row, cur_col)) print(matrix) rotate([[1,2],[3,4]]) rotate([[1,2,3],[4,5,6],[7,8,9]]) rotate([[5,1,9,11],[2,4,8,10],[13,3,6,7],[15,14,12,16]])
[[3, 1], [4, 2]] [[7, 4, 1], [8, 5, 2], [9, 6, 3]] [[15, 13, 2, 5], [14, 3, 4, 1], [12, 6, 8, 9], [16, 7, 10, 11]]
2.11.2.4. Complexity
2.11.2.4.1. Time complexity:
\(O(N^2)\).
2.11.2.4.2. Space complexity:
\(O(1)\)
2.11.2.5. Leetcode solution
Not available.
<<imports for typing>>
2.11.2.5.1. Time complexity:
2.11.2.5.2. Space complexity:
2.11.3. More analysis
2.11.3.1. General thoughts
My solution is very cumbersome, especially the rotate_square()
function.
As shown in the Leetcode discussion forum, it is easier to flip the matrix instead of actually rotating it because flip the matrix only involves swapping two values. Whereas my rotate_square()
involves introducing two temporary variables to finish a rotation.
The other way to mitigate the problem in my solution is to use the following code, which does the rotation of four elements at once.
matrix[i][j], matrix[j][n - 1 - i], matrix[n - 1 - i][n - 1 - j], matrix[n - 1 - j][i] = matrix[n - 1 - j][i], matrix[i][j], matrix[j][n - 1 - i], matrix[n - 1 - i][n - 1 - j]
2.11.3.2. Related problems
2.12. DONE 72 Edit distance dp hard
2.12.1. Description
Given two words, word1
and word2
, find the minimum number of operations required to convert word1
to word2
.
You have the following 3 operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Constraints:
2.12.1.1. Examples:
Input: word1 = "intention", word2 = "execution" Output: 5 Explanation: intention -> inention (remove 't') inention -> enention (replace 'i' with 'e') enention -> exention (replace 'n' with 'x') exention -> exection (replace 'n' with 'c') exection -> execution (insert 'u')
2.12.2. Solution
2.12.2.1. Understanding the problem
2.12.2.2. Algorithm
This is a typical dynamic programming problem.
2.12.2.3. Code
<<imports for typing>> def min_distance(word1: str, word2: str) -> int: m = len(word1) n = len(word2) if m == 0: return n if n == 0: return m tbl = [[0 for j in range(n+1)] for i in range(m+1)] for i in range(m+1): for j in range(n+1): if i == 0: tbl[i][j] = j elif j==0: tbl[i][j] = i elif word1[i-1] == word2[j-1]: tbl[i][j] = tbl[i-1][j-1] else: tbl[i][j] = 1 + min( tbl[i-1][j], tbl[i][j-1], tbl[i-1][j-1] ) return tbl[m][n]
2.12.2.4. Complexity
2.12.2.4.1. Time complexity:
\(O(MN)\)
2.12.2.4.2. Space complexity:
\(O(MN)\)
2.12.2.5. Leetcode solution
Not available.
<<imports for typing>>
2.12.2.5.1. Time complexity:
2.12.2.5.2. Space complexity:
2.13. DONE 121 Best time to buy and sell stock array hard dp
2.13.1. Description
Say you have an array for which the ith
element is the price of a given stock on day i
.
If you were only permitted to complete at most one transaction (i.e., buy one and sell one share of the stock), design an algorithm to find the maximum profit.
Constraints: You cannot sell a stock before you buy one.
2.13.1.1. Examples:
Input: [7,1,5,3,6,4] Output: 5 Explanation: Buy on day 2 (price = 1) and sell on day 5 (price = 6), profit = 6-1 = 5. Not 7-1 = 6, as selling price needs to be larger than buying price. Input: [7,6,4,3,1] Output: 0 Explanation: In this case, no transaction is done, i.e. max profit = 0.
2.13.2. Solution
2.13.2.1. Understanding the problem
2.13.2.2. Algorithm
We build up a table of all possible combinations from the
prices
list.Half of the table is not necessary.
We first sort the list prices
and convert it to a ordered map that {price: day}
.
Then we start from the lowest price p_low
, and check it against the highest price p_hi
, if price[p_hi]>price[p_low]
, we already find one candidate of the result and move p_low
to the right, else, we move p_hi
to the left.
2.13.2.3. Code
<<imports for typing>> def max_profit(prices: List[int]) -> int: max_p = 0 price_map = {} for i, p in enumerate(prices): if p not in price_map.keys(): price_map[p] = i prices.sort() for p_buy in prices: d_buy = price_map[p_buy] for p_sell in reversed(prices): d_sell = price_map[p_sell] if p_buy<p_sell and d_buy < d_sell: max_p = max(max_p, p_sell-p_buy) break return max_p print(max_profit([7,1,5,3,6,4])) print(max_profit([7,6,5])) print(max_profit([1,4,1,4,3,1]))
5 0 3
2.13.2.4. Complexity
2.13.2.4.1. Time complexity:
2.13.2.4.2. Space complexity:
2.13.2.5. Leetcode solution
If we plot the numbers in the array on a graph, we will have a line graph.
The points of interest are the peaks and valleys in the given graph. We need to find the highest peak, following the lowest valley. We can maintain two variables - minprice
and maxprofit
corresponding to the lowest valley and maximum profit (max difference between selling price and minprice
) obtained so far respectively.
<<imports for typing>> def max_profit(prices: List[int]) -> int: from math import inf minprice = inf maxprofit = 0 for i in range(len(prices)): if prices[i] < minprice: minprice = prices[i] elif prices[i] - minprice > maxprofit: maxprofit = prices[i] - minprice return maxprofit print(max_profit([1,2,3,4,5]))
4
2.13.2.5.1. Time complexity:
\(O(N)\)
2.13.2.5.2. Space complexity:
\(O(1)\)
2.13.3. More analysis
2.13.3.1. General thoughts
Initially I was trying to use dynamic programming to solve this problem, as is indicated by the problem's DP tag, then it was very hard to start because the problem does not seem to have an obvious subproblems that will be solved multiple times.
The solutions provided by Leetcode did not use DP either, but then other people suggested that it can be categorised as a very simple DP problem and tabulation can be used to solve it.
The prerequisites for using DP are:
- optimal substructure
- overlaping sub-problems
In this problem, sub-problems
f[i]
is defined as "the minimum stock price from day 0 to dayi
", which is dependant onf[i-1]
andprices[i]
, whichever is lower. The overlapping sub-problem here is very obvious - there's no need to calculatef[i]
by comparing stock prices from day 0 to dayi-1
, which is the brute force solution, but just reuse the pre-calculated resultf[i-1]
.
maxprofit
variable may or may not be in the DP table, as it can be viewed as a simple global max, and we can simply get it bymaxprofit=max(maxprofit, prices[i-1]-dp[i-1][0])
.
Also people mentioned that this is similar to Kadane's Algorithm for Maximum Subarray Sum.
def max_profit(prices) -> int: from math import inf n = len(prices) dp = [[0, 0] for i in range(n+1)] # starting price dp[0][0] = inf for i in range(1, n+1): dp[i][0] = min(dp[i-1][0], prices[i-1]) dp[i][1] = max(dp[i-1][1], prices[i-1]-dp[i-1][0]) return dp[n][1] print(max_profit([1,2,3,4,5]))
4
2.13.3.2. Related problems
2.14. DONE 53 Maximum subarray dp easy
2.14.1. Description
Given an integer array nums
, find the contiguous subarray (containing at least one number) which has the largest sum and return its sum.
Constraints: Use \(O(N)\) time.
2.14.1.1. Examples:
Input: [-2,1,-3,4,-1,2,1,-5,4], Output: 6 Explanation: [4,-1,2,1] has the largest sum = 6.
2.14.2. Solution
2.14.2.1. Understanding the problem
2.14.2.2. Algorithm
Use dynamic programming. The sub-problem is "the maximum sum of previous arrays".
2.14.2.3. Code
<<imports for typing>> def max_subarray(nums: List[int]) -> int: from math import inf n = len(nums) if n == 1: return nums[0] # this is actualy not needed # see General thoughts dp = [0 for i in range(n+1)] dp[0] = -inf for i in range(1, n+1): dp[i] =max(nums[i-1], nums[i-1] + dp[i-1]) return max(dp) print(max_subarray([-2,1,-3,4,-1,2,1,-5,4])) print(max_subarray([-2,1]))
6 1
2.14.2.4. Complexity
2.14.2.4.1. Time complexity:
\(O(N)\).
2.14.2.4.2. Space complexity:
\(O(N)\) but \(O(1)\) can be achieved by removing the dp
array.
2.14.2.5. Leetcode solution
Not available.
<<imports for typing>>
2.14.2.5.1. Time complexity:
2.14.2.5.2. Space complexity:
2.14.3. More analysis
2.14.3.1. General thoughts
Wikipedia introduces the history of this problem as "a simplified model for maximum likelihood estimate of patterns in digitized images".
In this problem, \(O(1)\) space can be achieved.
<<imports for typing>> def max_subarray(nums: List[int]) -> int: from math import inf n = len(nums) if n == 1: return nums[0] cur_sum = -inf best_sum = cur_sum for i in range(1, n+1): cur_sum =max(nums[i-1], nums[i-1] + cur_sum) best_sum = max(best_sum, cur_sum) return best_sum print(max_subarray([-2,1,-3,4,-1,2,1,-5,4])) #print(max_subarray([-2,1]))
6
2.15. DONE 70 Climbming stairs dp easy
2.15.1. Description
You are climbing a stair case. It takes n
steps to reach to the top.
Each time you can either climbe 1 or 2 steps. In how many distinct ways you can climb to the top?
Constraints:
n
will be positive integer.
2.15.1.1. Examples:
Input: 2 Output: 2 Explanation: There are two ways to climb to the top. 1. 1 step + 1 step 2. 2 steps Input: 3 Output: 3 Explanation: There are three ways to climb to the top. 1. 1 step + 1 step + 1 step 2. 1 step + 2 steps 3. 2 steps + 1 step
2.15.2. Solution
2.15.2.1. Understanding the problem
2.15.2.2. Algorithm
This is basically a Fibonacci series.
We need to do a "top-down" calculation of the ways we can take.
Starting from the top (
n
), there are two ways to go back (one step or two steps), and we are at n-1
.By the same token, when we are at
n-1
, for the two different stairs we are at, there are two stairs we can take for each of them. We first calculate one of them, and store the number of ways to go down in a map. This map can be later used for calculating another possibility at n-1
.
We continue this until we reach the bottom.
We then do addition:
The code is written in "bottom-up" format. Somehow for me it is easier to think this way. Previously mentioned algorithm still works but is hard for me to implement as I am not sure how to solve the recursive part when we use a map to simplify calculation.
ans[n] = ans[n-1]+ans[n-2]
.
2.15.2.3. Code
<<imports for typing>> def climb_stairs(n: int) -> int: dp = [0 for i in range(n+1)] dp[0], dp[1] = 1, 1 for i in range(2, n+1): dp[i] = dp[i-1] + dp[i-2] return dp[n] print(climb_stairs(4))
5
2.15.2.4. Complexity
2.15.2.4.1. Time complexity:
\(O(n)\).
2.15.2.4.2. Space complexity:
\(O(n)\).
2.15.2.5. Leetcode solution
There are 6 approaches on Leetcode.
- Approach 1 is brute force.
- Approaches 2 to 4 are essentially the same as my solution.
- Approach 5 uses Binets Method, which uses matrix multiplication to obtain the \(n^{th}\) Fibonacci number. Its time complexity is \(\log(n)\). Space complexity is \(O(1)\).
- Approach 6 just uses the Fibonacci Formula to calculate the \(n^{th}\) Fibonacci number. It is asymptotically the same as approach 5. However, precision of calculation is lost when
n
is big and the formula will give an incorrect number.
Here we take the formula solution.
<<imports for typing>> def climb_stairs(n: int) -> int: sqrt5=pow(5, 1/2) fibn = pow((1+sqrt5)/2, n+1)-pow((1-sqrt5)/2,n+1) return int(fibn/sqrt5) print(climb_stairs(5))
8
2.15.2.5.1. Time complexity:
2.15.2.5.2. Space complexity:
2.15.3. More analysis
2.15.3.1. General thoughts
It is amazing how we can optimise this seemingly daunting problem such dramatically from \(O(2^n)\) to \(\log(n)\)!.
2.15.3.2. Related problems
2.16. DONE 746 Min cost climbing stairs dp easy
2.16.1. Description
On a staircase, the \(i^{th}\) step has some 0 indexed non-negative cost cost[i]
assigned.
Once you pay the cost, you can either climb one or two steps. You need to find minimum cost to reach the top of the floor, and you can either start from the step with index 0 or with index 1.
Constraints:
cost
will have a length in the range [2, 1000]- Every
cost[i]
will be an integer in the range [0, 999].
2.16.1.1. Examples:
Input: cost = [10, 15, 20] Output: 15 Explanation: Cheapest is start on cost[1], pay that cost and go to the top. Input: cost = [1, 100, 1, 1, 1, 100, 1, 1, 100, 1] Output: 6 Explanation: Cheapest is start on cost[0], and only step on 1s, skipping cost[3].
2.16.2. Solution
2.16.2.1. Understanding the problem
2.16.2.2. Algorithm
This is an optimal problem. The final problem, "minimum cost to reach the top of the floor" can be divided into two subproblems, namely, the minimum cost to reach to the previous two possible stairs. We have \(min_cost_n=min(min_cost_{n-1}, min_cost_{n-2})\), and that \(min_cost_{n-1}\) and \(min_cost_{n-2}\) have overlapping subsubproblems. Therefore, we can use dynamic programming to solve this problem.
2.16.2.3. Code
<<imports for typing>> def min_cost_climbing_stairs(cost: List[int]) -> int: from math import inf n = len(cost) min_cost = [inf for i in range(n)] min_cost[0], min_cost[1] = cost[0], cost[1] for i in range(2, n): min_cost[i] = min(min_cost[i-1], min_cost[i-2])+cost[i] return min(min_cost[n-1], min_cost[n-2]) print(min_cost_climbing_stairs([1, 100, 1,1,1,100,1,1,100,1]))
6
2.16.2.4. Complexity
2.16.2.4.1. Time complexity:
\(O(N)\).
2.16.2.4.2. Space complexity:
\(O(N)\).
2.16.2.5. Leetcode solution
The explanation from Leetcode is not very intuitive or clear. The concept is the same as mine, but the way it explains things is very twisted.
Basically, they want i in range(n, 1, -1)
, and then calculate each f[i]
which is the total cost of the current floor.
Intuition: There is a clear recursion available: the final cost f[i] to climb the staircase from some step i is f[i] = cost[i] + min(f[i+1], f[i+2]). This motivates dynamic programming.
Algorithm: Let's evaluate f backwards in order. That way, when we are deciding what f[i] will be, we've already figured out f[i+1] and f[i+2].
We can do even better than that. At the i-th step, let f1, f2 be the old value of f[i+1], f[i+2], and update them to be the new values f[i], f[i+1]. We keep these updated as we iterate through i backwards. At the end, we want min(f1, f2).
<<imports for typing>> def min_cost_climbing_stairs(cost: List[int]): f1 = f2 = 0 for x in reversed(cost): f1, f2=x+min(f1, f2), f1 return min(f1, f2) print(min_cost_climbing_stairs([10,14,10]))
14
Based on the Leetcode solution, we can improve our code to get \(O(1)\) space complexity.
<<imports for typing>> def min_cost_climbing_stairs(cost: List[int]) -> int: from math import inf n = len(cost) mc1, mc2 = cost[0], cost[1] for i in range(2, n): mc1, mc2 = mc2, min(mc1, mc2)+cost[i] return min(mc1, mc2) print(min_cost_climbing_stairs([1, 100, 1,1,1,100,1,1,100,1]))
6
2.16.2.5.1. Time complexity:
\(O(N)\)
2.16.2.5.2. Space complexity:
\(O(1)\)
2.16.3. More analysis
2.16.3.1. General thoughts
Again, to determine if a question fits dynamic programming, we want to determine if:
- It asks for an optimal solution
- It can be divided into finding the optimal solution to subproblems
- The subproblems have overlapping subsubproblems
- DP's goal is to only solve these overlapping problems once to save time
- A recursive definition of the optimal solution can be found
2.16.3.2. Related problems
2.17. DONE 703 Kth largest element in a stream heap
2.17.1. Description
Design a class to find the K th largest element in a stream. Note that it is the kth largest element in the sorted order, not the kth distinct element.
Your KthLargest
class will have a constructor which accepts an integer k
and an integer array
nums
, which contains initial elements from the stream. For each call to the method KthLargest.add
,
return the element representing the kth largest element in the modified stream.
Constraints:
You may assume that nums.length>=k-1
and k>=1
.
2.17.1.1. Examples:
int k = 3; int[] arr = [4,5,8,2]; KthLargest kthLargest = new KthLargest(3, arr); kthLargest.add(3); // returns 4 kthLargest.add(5); // returns 5 kthLargest.add(10); // returns 5 kthLargest.add(9); // returns 8 kthLargest.add(4); // returns 8
2.17.2. Solution
2.17.2.1. Understanding the problem
2.17.2.2. Algorithm
See 6.4.
Brute-force algorithm:
- constructing the object
- simply sort the
nums
array and store it inself.nums
- simply sort the
- running the
KthLargest.add(val)
method- insert the
val
in to the sortedself.nums
, then getself.nums[2]
- insert the
2.17.2.3. Code
<<imports for typing>> class KthLargest: def __init__(self, k: int, nums: List[int]): self.k = k self.nums = sorted(nums, reverse=True) def add(self, val: int) -> int: for i in range(len(self.nums)): if val > self.nums[i]: self.nums.insert(i, val) break else: self.nums.append(val) return self.nums[self.k-1] k=3 arr=[4,5,8,2] obj = KthLargest(k, arr) print(obj.add(3)) print(obj.add(5)) print(obj.add(10)) print(obj.add(9)) print(obj.add(4))
4 5 5 8 8
2.17.2.4. Leetcode solution
Not available.
6.7.4 was used by many to solve the problem.
The question was probably asking us to implement the (max) priority queue.
The following solution uses Python's heapq
data structure.
<<imports for typing>> import heapq class KthLargest: def __init__(self, k: int, nums: List[int]): self.arr = [] self.n = k for i in nums: self._push(i) def _push(self, val: int): if len(self.arr) == self.n: top = heapq.heappop(self.arr) if top > val: heapq.heappush(self.arr, top) else: heapq.heappush(self.arr, val) else: heapq.heappush(self.arr, val) def add(self, val: int) -> int: self._push(val) return self.arr[0] if len(self.arr) else None nums = [10,9,8,7] k=3 obj = KthLargest(k, nums) val=6 param_1 = obj.add(val) print(param_1)
8
2.17.2.4.1. Time complexity:
Initial sorting: \(O(n\lg n)\).
The add()
method: \(O(\lg n)\).
2.17.2.4.2. Space complexity:
\(O(n)\).
2.17.3. More analysis
2.17.3.1. General thoughts
The heapq
provided by Python follows min-heap properties.
arr[0]
is the smallest elementarr[k]<arr[2k+1], arr[k]<arr[2k+2]
We can modify the Leetcode solution to make it use less memory in theory since we are dealing with a stream and do not care about storing it.
Basically, we chop off the list after each add(val)
operation, so the list has only k
elements.
<<imports for typing>> class KthLargest: import heapq def __init__(self, k: int, nums: List[int]): self.k = k self.arr = nums heapq.heapify(self.arr) self._chop_off_queue() def add(self, val: int) -> int: heap_top = 0 # always keep heap size = k # Top element = kth largest element if len(self.arr) < self.k: heapq.heappush(self.arr, val) else: heapq.heappushpop(self.arr, val) return self.arr[heap_top] def _chop_off_queue(self): while len(self.arr) > self.k: # heappop() pops the smallest element heapq.heappop(self.arr)
2.18. DONE 215 Kth largest element in an array heap
2.18.1. Description
Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.
Constraints: You may assume k is always valid, 1 ≤ k ≤ array's length.
2.18.1.1. Examples:
Input: [3,2,1,5,6,4] and k = 2 Output: 5 Input: [3,2,3,1,2,4,5,5,6] and k = 4 Output: 4
2.18.2. Solution
2.18.2.1. Understanding the problem
2.18.2.2. Algorithm
We can use quicksort.
See 6.4 for theory discussion and linear time algorithm (randomized quickselect).
2.18.2.3. Code
<<imports for typing>> def find_kth_largest(nums: List[int], k: int) -> int: return sorted(nums)[k-1] print(find_kth_largest([3,2,1,5,6,4], 2))
2
Quickselect implementation (a replica of 6.4.2).
<<imports for typing>> import random def kth_largest(arr: List[int], k: int) -> int: def partition(arr, lo, hi): pivot = arr[hi] i = lo for j in range(lo, hi): if arr[j] > pivot: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i def quickselect(arr: List[int],lo, hi, k) -> int: rand_pivot_idx = random.randrange(lo, hi+1) arr[hi], arr[rand_pivot_idx] = arr[rand_pivot_idx], arr[hi] par = partition(arr, lo, hi) if par == k - 1: return arr[par] elif par > k-1: return quickselect(arr, lo, par-1, k) return quickselect(arr, par+1, hi, k) return quickselect(arr, 0, len(arr)-1, k) print(kth_largest([3,2,1,5,6,4], 3))
4
2.18.2.4. Leetcode solution
Not available.
<<imports for typing>>
2.18.2.4.1. Time complexity:
2.18.2.4.2. Space complexity:
2.19. DONE 973 K closest points to origin heap medium
2.19.1. Description
We have a list of points
on the plane. Find the K
closest points to the origin (0, 0)
.
Constraints:
- The distance between two points is the Euclidean distance
- The answer is unique, but its elements can be in any order.
- \(1\le K \le points.length \le 10000\)
- \(-10000
- \(-10000
- \(-10000
2.19.1.1. Examples:
Input: points = [[1,3], [-2,2]], k=1 Output: [[-2,2]]
2.19.2. Solution
2.19.2.1. Understanding the problem
2.19.2.2. Algorithm
We can treat this input of points
as a stream and use a priority queue to store each point
as we iterate through points
.
Once we process all points, we just return the last k
elements.
2.19.2.3. Code
<<imports for typing>> import heapq def k_closest_points(points: List[List[int]], K: int) -> List[List[int]]: def get_distance(x, y) -> float: return (x**2+y**2) if len(points) == K: return points pq = [] # this takes O(n) # because it is essentially heapify() for point in points: distance = get_distance(*point) heapq.heappush(pq, (distance, point)) return [heapq.heappop(pq)[1] for i in range(K)] print(k_closest_points([[1,3],[2,-2]], 2)) print(k_closest_points([[1,3],[2,-2], [2,-3]], 2)) print(k_closest_points([[3,3],[5,-1], [-2,4], [3,4]], 3))
[[1, 3], [2, -2]] [[2, -2], [1, 3]] [[3, 3], [-2, 4], [3, 4]]
2.19.2.4. Leetcode solution
Uses Quickselect.
<<imports for typing>> class Solution(object): def kClosest(self, points, K): dist = lambda i: points[i][0]**2 + points[i][1]**2 def sort(i, j, K): # Partially sorts A[i:j+1] so the first K elements are # the smallest K elements. if i >= j: return # Put random element as A[i] - this is the pivot k = random.randint(i, j) points[i], points[k] = points[k], points[i] mid = partition(i, j) if K < mid - i + 1: sort(i, mid - 1, K) elif K > mid - i + 1: sort(mid + 1, j, K - (mid - i + 1)) def partition(i, j): # Partition by pivot A[i], returning an index mid # such that A[i] <= A[mid] <= A[j] for i < mid < j. oi = i pivot = dist(i) i += 1 while True: while i < j and dist(i) < pivot: i += 1 while i <= j and dist(j) >= pivot: j -= 1 if i >= j: break points[i], points[j] = points[j], points[i] points[oi], points[j] = points[j], points[oi] return j sort(0, len(points) - 1, K) return points[:K]
2.19.2.4.1. Time complexity:
\(O(n^2)\) worst case, average case \(O(n)\).
2.19.2.4.2. Space complexity:
\(O(n)\).
2.19.3. More analysis
2.19.3.1. General thoughts
2.19.3.2. Related problems
Any problem that asks for "Closest", "Smallest", "Largest", K elements.
2.20. DONE 347 Top K frequent elements heap medium
2.20.1. Description
Given a non-empty array of integers, return the \(k\) most frequent elements.
Constraints:
- You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
- Your algorithm's time complexity must be better than O(n log n), where n is the array's size.
- It's guaranteed that the answer is unique, in other words the set of the top k frequent elements is unique.
- You can return the answer in any order.
2.20.1.1. Examples:
Input: nums = [1,1,1,2,2,3], k = 2 Output: [1,2] Input: nums = [1], k = 1 Output: [1]
2.20.2. Solution
2.20.2.1. Understanding the problem
2.20.2.2. Algorithm
It's obvious that we have to use quickselect to achieve \(O(n)\) time complexity on average.
We first scan the array and build a cound of elements (\(O(n)\)). We then quick-select the counts and get the \(k\) elements (\(O(n)\)).
2.20.2.3. Code
<<imports for typing>> def top_k_frequent(nums: List[int], k: int) -> List[int]: def partition(arr, low, high): pivot = arr[high] i = low for j in range(low, high): if arr[j][1] >= pivot[1]: arr[j], arr[i] = arr[i], arr[j] i += 1 arr[i], arr[high] = arr[high], arr[i] return i def quickselect(arr, low, high, k: int) -> List[int]: par = partition(arr, low, high) if par == k-1: return arr[:k] elif par < k-1: return quickselect(arr, par+1, high, k) else: return quickselect(arr, low, par-1, k) temp = {} for i in nums: temp[i] = temp.get(i, 0) + 1 arr = [(k, v) for k, v in temp.items()] return [i[0] for i in quickselect(arr, 0, len(arr)-1, k)] print(top_k_frequent([1,1,1,2,2,3], 2)) print(top_k_frequent([1,1,1,2,2,3,3,3], 2))
[1, 2] [1, 3]
2.20.2.4. Leetcode solution
<<imports for typing>> def top_k_frequent(nums, k): from collections import Counter import heapq count = Counter(nums) return heapq.nlargest(k, count.keys(), key=count.get) print(top_k_frequent([1,1,1,2,3,2], 2))
[1, 2]
2.20.2.4.1. Time complexity:
\(O(n \lg k)\)
2.20.2.4.2. Space complexity:
\(O(n)\)
2.20.3. More analysis
2.20.3.1. General thoughts
My solution uses Quickselect without randomization, which means that some particular array setup could hit it hard and run at \(O(n^2)\), but in general, it works at \(O(n)\).
Another quick hack to this is to use Counter
object.
When \(ksorted
internally, which also gives us \(O(n\lg k)\) time complexity.
from collections import Counter def top_k_frequent(nums, k): count = Counter(nums) return [i[0] for i in count.most_common(k)] print(top_k_frequent([1,2,1,1,24,2], 2))
[1, 2]
Seems we can also use bucket sort.
2.20.3.2. Related problems
2.21. DONE 355 Design twitter medium heap
2.21.1. Description
Design a simplified version of Twitter where users can post tweets, follow/unfollow another user and is able to see the 10 most recent tweets in the user's news feed. Your design should support the following methods:
postTweet(userId, tweetId)
: compose a new tweet.geNewsFeed(userId)
: Retrieve the 10 most recent tweet ids in the user's news feed. Each item in the news feed must be posted by users who the user followed or by the user herself. Tweets must be ordered from most recent to least recent.follow(followerId, followeeId)
: follower follows a followee.- unfollow(followerId, followeeId): Follower unfollows a followee.
Constraints: None.
2.21.1.1. Examples:
Twitter twitter = new Twitter(); // User 1 posts a new tweet (id = 5). twitter.postTweet(1, 5); // User 1's news feed should return a list with 1 tweet id -> [5]. twitter.getNewsFeed(1); // User 1 follows user 2. twitter.follow(1, 2); // User 2 posts a new tweet (id = 6). twitter.postTweet(2, 6); // User 1's news feed should return a list with 2 tweet ids -> [6, 5]. // Tweet id 6 should precede tweet id 5 because it is posted after tweet id 5. twitter.getNewsFeed(1); // User 1 unfollows user 2. twitter.unfollow(1, 2); // User 1's news feed should return a list with 1 tweet id -> [5], // since user 1 is no longer following user 2. twitter.getNewsFeed(1);
2.21.2. Solution
2.21.2.1. Understanding the problem
2.21.2.2. Algorithm
We would have a set of users
that stores all the users.
They have tweets
ordrerd by time.
We also have a newsfeed
for every user. This newsfeed
is a priority queue built on timestamp and selects feeds from followees of current user.
We have followers
and followees
for any user.
We then map the userId
to tade newsfeed
.
We update users
in postTweet
for user A.
We update newsfeed
in follow
of user A.
We update newsfeed
in unfollow
of user A.
We update newsfeed
in postTweet
of user A's followees.
newsfeed
has to maintain a length of 10 unless total number of tweets from followees
is less than 10.
2.21.2.3. Code
<<imports for typing>> import collections from datetime import datetime import heapq class Twitter: def __init__(self): """ Initialize your data structure here. """ self.users_tweets = collections.defaultdict(list) self.users_followers = collections.defaultdict(set) self.users_followees = collections.defaultdict(set) self.users_newsfeeds = collections.defaultdict(list) def postTweet(self, userId: int, tweetId: int) -> None: """ Compose a new tweet. """ # if this userId is new # ta must follow taself self.users_followees[userId].add(userId) self.users_followers[userId].add(userId) # add tweet to the userId tweet = datetime.now(tz=None), tweetId self.users_tweets[userId].append(tweet) # update newsfeeds of the followers for user in self.users_followers[userId]: heapq.heappush(self.users_newsfeeds[user], tweet) def getNewsFeed(self, userId: int) -> List[int]: """ Retrieve the 10 most recent tweet ids in the user's news feed. Each item in the news feed must be posted by users who the user followed or by the user herself. Tweets must be ordered from most recent to least recent. """ num_total_tweets = len(self.users_newsfeeds[userId]) num_tweets = 10 if num_total_tweets > 10 else num_total_tweets res = heapq.nlargest(num_tweets, self.users_newsfeeds[userId]) return [i[1] for i in res] def follow(self, followerId: int, followeeId: int) -> None: """ Follower follows a followee. If the operation is invalid, it should be a no-op. """ # Only follow if followee is not followed by us if followeeId not in self.users_followees[followerId]: self.users_followees[followerId].add(followeeId) self.users_followers[followeeId].add(followerId) # get the tweets from new followee for tweet in self.users_tweets[followeeId]: heapq.heappush(self.users_newsfeeds[followerId], tweet) def unfollow(self, followerId: int, followeeId: int) -> None: """ Follower unfollows a followee. If the operation is invalid, it should be a no-op. """ # Do not unfollow oneself if followeeId == followerId: return # Only unfollow if followee is followed by us if followeeId in self.users_followees[followerId]: # remove followee (unfollow) self.users_followees[followerId].remove(followeeId) # remove follower (unfollow) self.users_followers[followeeId].remove(followerId) # remove tweets from folowee for tweet in self.users_tweets[followeeId]: self.users_newsfeeds[followerId].remove(tweet) heapq.heapify(self.users_newsfeeds[followerId]) # Your Twitter object will be instantiated and called as such: obj = Twitter() obj.postTweet(1, 5) param_2 = obj.getNewsFeed(1) print(param_2) obj.follow(1, 2) param_2 = obj.getNewsFeed(1) print(param_2) obj.postTweet(2, 6) param_2 = obj.getNewsFeed(1) print(param_2) obj.unfollow(1, 2) param_2 = obj.getNewsFeed(1) print(param_2)
[5] [5] [6, 5] [5]
2.21.2.3.1. Complexity
- Time complexity:
postTweet
takes \(O(k)\), where \(k\) is the number of followers \(userId\) has.getNewsFeed
takes \(O(n \lg n)\) withheapq.nlargest
, where \(n\) is the number of tweets in the user's news feed.follow
takes \(O(n \lg k)\), where \(n\) is the number of tweets thefollowee
has and \(k\) is the number of tweets in user's news feed. (this part can be improved significantly).unfollow
takes \(O(n+m)\), where \(n\) is the number of tweets thefollowee
has, \(m\) is the number of tweets in user's news feed.
- Space complexity:
users_tweets
: \(O(m\times n)\), where \(m\) is the number of users, and \(n\) is the number of tweets of each user.users_followers
: \(O(m\times n)\), where \(m\) is the number of users, and \(n\) is the number of users who choose to follow a user.users_followees
: \(O(m\times n)\), where \(m\) is the number of users, and \(n\) is the number of users who choose to follow a user. (this is not necessary)users_newsfeeds
: \(O(m\times n)\), where \(m\) is the number of users and \(n\) is the number of tweets in their news feed. (We can also just keep 10 news feeds).
2.21.2.4. Leetcode solution
Not available.
<<imports for typing>>
2.21.2.4.1. Time complexity:
2.21.2.4.2. Space complexity:
2.21.3. More analysis
2.21.3.1. General thoughts
Design, design, design!
From this question, we need to again emphasise the importance of thinking ahead and design our program before we write any code. Jumping into writing code after reading the description can be attracting because that gives us a false feeling of us doing something and make our hands dirty, but when we are doing Leetcode or serious programming, it is paramount that we do more paperwork, more design.
The most important thing here is to actually define the question, as medium/hard questions, or easy questions, can contain a lot of information, and we need to process that information and make sure that we understand it by clearly defining it in our own words. The same not only applies to real world programming as well, but it takes an even more significance in the real world since problems are more vague and complex out there.
If we do not think ahead and design well, it is likely that we will write some code, and it will pass some tests and fail some. Then we go back to the code trying to find out what happened, then run it again, then it maybe fail another test, then we fix it again. Essentially we will become a bug ourselves that tries to hit the lightbulb blindly over and over again. We reduce ourselves to a patch that can only fix one case and that may cause another part of the program to fail. Best case, our program becomes a heavy if else
monster that does not have clear logic but still runs fine. Worst case, we give up and do not even have the courage to check the solution and try to fix our program, because this is human nature - we do not want to humiliate ourselves, even if the only one that knows we failed is ourselves.
2.21.3.2. Related problems
2.22. DONE 778 Swim in rising water heap hard
2.22.1. Description
On an \(N\times N\) grid, each square grid[i][j]
represents the elevation at that point (i, j)
.
Now rain starts to fall. At time t
, the depth of the water everywhere is t
.
You can swim from a square to another 4-directionally adjacent square if and only if the elevation of both squares individually are at most t
.
You can swim infinite distance in zero time.
Of course, you must stay within the boundaries of the grid during your swim.
You start at the top left square (0, 0)
. What is the least time until you can reach the bottom right square (N-1, N-1)
?
Constraints:
- 2 <= N <= 50.
- grid[i][j] is a permutation of [0, …, N*N - 1].
2.22.1.1. Examples:
Input: [[0,2],[1,3]] Output: 3 Explanation: At time 0, you are in grid location (0, 0). You cannot go anywhere else because 4-directionally adjacent neighbors have a higher elevation than t = 0. You cannot reach point (1, 1) until time 3. When the depth of water is 3, we can swim anywhere inside the grid. Input: [[0,1,2,3,4],[24,23,22,21,5],[12,13,14,15,16],[11,17,18,19,20],[10,9,8,7,6]] Output: 16 Explanation: 0 1 2 3 4 24 23 22 21 5 12 13 14 15 16 11 17 18 19 20 10 9 8 7 6 We need to wait until time 16 so that (0, 0) and (4, 4) are connected.
2.22.2. Solution
2.22.2.1. Understanding the problem
2.22.2.2. Algorithm
We have a loop that controls the time t
, which is also the depth of rain.
Then in each iteration, we swim from the last_pos
we are at, swim(last_pos)
.
last_pos=(0,0)
when we are at time 0.
In the swim()
function, we choose the direction that is equal or less than t
, but that we did not swim before in visited_pos
.
Each swim()
gives a new last_pos
, we check if it is (N-1, N-1)
, if so, we just return t
.
2.22.2.3. Code
<<imports for typing>> def swim_in_water(grid: List[List[int]]) -> int: n = len(grid) visited_pos : Set[Tuple[int]] = set() # row, col last_pos = (0, 0) visited_pos.add(last_pos) def avail_poses(last_pos: Tuple[int]) -> List[Tuple[int]]: res = [] for row_offset in [-1, 0, 1]: for col_offset in [-1, 0, 1]: row = last_pos[0] + row_offset col = last_pos[1] + col_offset if 0 <= row < n and 0 <= col < n and (row, col) not in visited_pos and grid[row][col] <= t: res.append((row, col)) return res def swim(last_pos: Tuple[int]) -> Tuple[int]: ele = n+1 return_pos = last_pos for pos in avail_poses(last_pos): if grid[pos[0]][pos[1]] <= ele: ele = grid[pos[0]][pos[1]] return_pos = pos return return_pos for t in range(0, n**2): print(t) last_pos = swim(last_pos) visited_pos.add(last_pos) print(last_pos) if last_pos == (n-1, n-1): return t #print(swim_in_water([[0,2],[1,3]])) print(swim_in_water([[0,1,2,3,4],[24,23,22,21,5],[12,13,14,15,16],[11,17,18,19,20],[10,9,8,7,6]]))
0
(0, 0)
1
(0, 1)
(0, 1)
2
(0, 2)
(0, 2)
3
(0, 3)
(0, 3)
4
(0, 4)
(0, 4)
5
(1, 4)
(1, 4)
6
(1, 4)
7
(1, 4)
8
(1, 4)
9
(1, 4)
10
(1, 4)
11
(1, 4)
12
(1, 4)
13
(1, 4)
14
(1, 4)
15
(2, 3)
(1, 4)
16
(2, 3)
(2, 4)
(1, 4)
17
(2, 3)
(2, 4)
(1, 4)
18
(2, 3)
(2, 4)
(1, 4)
19
(2, 3)
(2, 4)
(1, 4)
20
(2, 3)
(2, 4)
(1, 4)
21
(1, 3)
(2, 3)
(2, 4)
(1, 4)
22
(1, 3)
(2, 3)
(2, 4)
(1, 4)
23
(1, 3)
(2, 3)
(2, 4)
(1, 4)
24
(1, 3)
(2, 3)
(2, 4)
(1, 4)
None
<<imports for typing>> def swim_in_water(grid: List[List[int]]) -> int: n = len(grid) # row, col last_pos = (0, 0) visited_poses = set() def avail_poses(last_pos: Tuple[int], cur_t: int) -> List[Tuple[int]]: offsets = ( (-1, 0), (0, 1), (1, 0), (0, -1) ) res = [] for row_offset, col_offset in offsets: row = last_pos[0] + row_offset col = last_pos[1] + col_offset if 0 <= row < n and 0 <= col < n and (row, col) not in visited_poses and grid[row][col] <= cur_t: res.append((row, col)) return res def swim(cur_pos: Tuple[int], cur_t: int) -> int: if cur_pos == (n-1, n-1): return cur_t while len(avail_poses(cur_pos, cur_t)) == 0: cur_t += 1 print(cur_pos) print(cur_t) for pos in avail_poses(cur_pos, cur_t): visited_poses.add(pos) return swim(pos, cur_t) return swim(last_pos, 0) print(swim_in_water([[0,2],[1,3]])) #print(swim_in_water([[0,1,2,3,4],[24,23,22,21,5],[12,13,14,15,16],[11,17,18,19,20],[10,9,8,7,6]]))
(0, 0) 1 (1, 0) 1 (0, 0) 2 (0, 1) 3 3
Dynamic programming.
<<imports for typing>> from math import inf def swim_in_water(grid: List[List[int]]) -> int: def adjacent_cells(pos: Tuple[int]) -> List[Tuple[int]]: offsets = ( (-1, 0), (0, 1), (1, 0), (0, -1) ) res = [] for row_offset, col_offset in offsets: row = pos[0] + row_offset col = pos[1] + col_offset if 0 <= row < n and 0 <= col < n: res.append((row, col)) return res n = len(grid) min_times = [[inf for i in range(n)] for j in range(n)] min_times[0][0] = 0 for row in range(n): for col in range(n): if (row, col) != (0, 0): min_times[row][col] = min( [abs(grid[row][col]-grid[adj_row][adj_col]) + min_times[adj_row][adj_col] for adj_row, adj_col in adjacent_cells((row, col))] ) return min_times[n-1][n-1] #print(swim_in_water([[0,2],[1,3]])) print(swim_in_water([[0,1,2,3,4],[24,23,22,21,5],[12,13,14,15,16],[11,17,18,19,20],[10,9,8,7,6]]))
[[0, 1, 2, 3, 4], [24, 23, 22, 21, 5], [36, 33, 30, 27, 16], [37, 37, 34, 31, 20], [38, 39, 40, 41, 34]]
Dijkstra's algorithm.
<<imports for typing>> from math import inf def swim_in_water(grid: List[List[int]]) -> int: def adjacent_cells(pos: Tuple[int]) -> List[Tuple[int]]: offsets = ( (-1, 0), (0, 1), (1, 0), (0, -1) ) res = [] for row_offset, col_offset in offsets: row = pos[0] + row_offset col = pos[1] + col_offset if 0 <= row < n and 0 <= col < n: res.append((row, col)) return res def min_distance(pos, dist): min_dist = inf min_index = None for row, col in adjacent_cells(pos): dist = abs(grid[row][col] - grid[pos[0]][pos[1]]) if (row, col) not in dist and dist < min_dist: min_dist = dist min_index = (row, col) return min_index n = len(grid) dist = dict() # The set of visited vertices visited = set() # The set of unvisited vertices unvisited = set() # set all other distances to infinity for row in range(n): for col in range(n): dist[(row, col)] = inf unvisited.add((row, col)) dist = {(0, 0): 0} while len(unvisited) != 0: return dist[(n-1, n-1)] #print(swim_in_water([[0,2],[1,3]])) print(swim_in_water([[0,1,2,3,4],[24,23,22,21,5],[12,13,14,15,16],[11,17,18,19,20],[10,9,8,7,6]]))
2.22.2.4. Leetcode solution
We use 4.3.
Given a grid like below:
1 | 2 | 5 |
3 | 7 | 6 |
8 | 4 | 0 |
Starting from grid[0][0]
, we have two adjacent cells, we add them to the heap
because we can naively reach them starting from grid[0][0]
.
By definition, we want to choose the cell with the minimum elevation in a list of possible choices,
so we pop from the heap, and get grid[0][1]
, 2
.
Then we get its adjacent cells, grid[0][2]
, 5
and grid[1][1]
, 7
.
We add these two to the heap.
Now, it is obvious to the naked eye that 3<5<7
, so at this moment we have to choose the grid[1][0]
, 3
, path.
Again, we simply pop from the heap to do this. After this operation, our heap now has two elements left, grid[0][2]
, 5
, and grid[1][1]
, 7
.
We then again check the adjacent cells of grid[1][0]
, 3
, and add them to the heap.
Now in the heap we have four elements, grid[0][2]
, 5
, grid[1][1]
, 7
, grid[1][1]
, 7
, grid[2][0]
, 8
We continue this loop until we reach grid[n-1][n-1]
.
Intuitively speaking, all elements in the heap have been discovered by us before, meaning that there is a valid path from the start to it.
For example, when we are at grid[0][1]
, 2
, and we find that 5
and 7
are larger than another option 3
, then there must be a way from grid[0][0]
for us to swim to grid[1][0]
, 3
, in no time.
When we are at grid[1][0]
, 3
, and find that 7
and 8
are larger than another option 5
, we can swim to grid[0][2]
, 5
, in no time.
The process of finding and getting (swimming to) the smallest option/heading/cell/elevation is done by the priority queue automatically.
<<imports for typing>> import heapq def swim_in_water(grid: List[List[int]]) -> int: visited, heap = set(), [(grid[0][0], 0, 0)] time, n = 0, len(grid) # The idea to use heap is that: # 1. It keeps a list of possible cells to take, and # 2. The cell with the minimum cost/elevation is stored at the top # 3. At the start of every iteration, every cell in the heap has been seen before, # meaning that there is at least one path from (0, 0) to that cell # Intuitively, we always want to choose the cell with the minimum elevation. # while heap: cur, x, y = heapq.heappop(heap) visited.add((x, y)) time = max(time, cur) if x == y == n-1: return time for i, j in [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]: if 0<=i<=n-1 and 0<=j<=n-1 and (i, j) not in visited: heapq.heappush(heap, (grid[i][j], i, j)) print(cur) print(heap) return -1 print(swim_in_water([[8,2, 7], [3,4, 6], [1, 5, 0]]))
8 [(2, 0, 1), (3, 1, 0)] 2 [(3, 1, 0), (4, 1, 1), (7, 0, 2)] 3 [(1, 2, 0), (4, 1, 1), (4, 1, 1), (7, 0, 2)] 1 [(4, 1, 1), (4, 1, 1), (7, 0, 2), (5, 2, 1)] 4 [(4, 1, 1), (5, 2, 1), (7, 0, 2), (5, 2, 1), (6, 1, 2)] 4 [(5, 2, 1), (5, 2, 1), (6, 1, 2), (6, 1, 2), (5, 2, 1), (7, 0, 2)] 5 [(0, 2, 2), (5, 2, 1), (5, 2, 1), (6, 1, 2), (7, 0, 2), (6, 1, 2)] 8
2.22.2.4.1. Time complexity:
\(O(n^2\lg n)\), \(n\) is the number of rows of the grid. The while
loop takes \(O(n^2)\) time. Inside it, the heapq.heappush
takes
\(O(\lg n)\) time.
2.22.2.4.2. Space complexity:
\(O(n)\), visited
and heap
.
2.23. DONE 1022 Sum of root to leaf binary numbers tree easy
2.23.1. Description
Given a binary tree, each node has value 0
or 1
.
Each root-to-leaf path represents a binary number starting with the most significant bit.
For example, if the path is 0->1->1->0->1
, then this could represent 01101
in binary, which is 13
.
For all leaves in the tree, consider the numbers represented by the path from the root to that leaf.
Return the sum of these numbers.
Constraints:
- The number of nodes in the tree is between 1 and 1000.
- node.val is 0 or 1.
- The answer will not exceed 231 - 1.
2.23.1.1. Examples:
Input: [1,0,1,0,1,0,1] Output: 22 Explanation: (100) + (101) + (110) + (111) = 4 + 5 + 6 + 7 = 22
2.23.2. Solution
2.23.2.1. Understanding the problem
2.23.2.2. Algorithm
We try to use generators to solve the calculation part of the problem. For traversing the tree, we use pre-order traversal.
2.23.2.3. Code
<<imports for typing>> class TreeNode: def __init__(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right def calculator(): total = 0 bit = None num = "" # every path uses the same num in the calculator generator, so this is messed up while True: bit = yield if bit is None: # We must terminate the coroutine normally # to return a value. # If we do try...except GeneratorExit, we will not be able to return a value later break elif bit == "leaf": total += int(num, 2) num = num[:-1] else: num = num + str(bit) return total def dele_gen(): global results while True: results = yield from calculator() results = 0 calc = dele_gen() next(calc) def traverse(node, calc): if node: calc.send(node.val) if node.left is None or node.right is None: calc.send("leaf") else: traverse(node.left, calc) traverse(node.right, calc) else: calc.send("") l_1 = TreeNode(0) r_0 = TreeNode(1) l = TreeNode(0, l_1, r_0) r = TreeNode(1) root = TreeNode(1, l, r) traverse(root, calc) calc.send(None) print(results)
11
2.23.2.4. Leetcode solution
Not available.
<<imports for typing>> class TreeNode: def __init__(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right def sum_root_to_leaf(root) -> int: if not root: return 0 res = [] def helper(node, path): if node.left is None and node.right is None: res.append(path + str(node.val)) return if node.left: helper(node.left, path + str(node.val)) if node.right: helper(node.right, path+str(node.val)) helper(root, "0b") return sum([int(p, 2) for p in res]) l_1 = TreeNode(0) r_0 = TreeNode(1) l = TreeNode(0, l_1, r_0) r = TreeNode(1) root = TreeNode(1, l, r) print(sum_root_to_leaf(root))
12
2.23.2.4.1. Time complexity:
\(O(n)\), where n
is the number of nodes, as this algorithm visits all nodes in the tree.
2.23.2.4.2. Space complexity:
Assuming full binary tree, \(O(n\log(n))\), we need to have \(O(n)\) paths
(number of leaves of the tree), each having the length of \(\log(n)\) (height of the tree).
2.23.3. More analysis
2.23.3.1. General thoughts
A key problem in my solution is that I have no way of creating path copies easily.
Initially I thought that it was a good idea to use a generator responsible for calculating the total as this is somewhat similar to calculating a running average problem.
Then the problem came when the variable num
could only hold one path and the generator had no easy way to keep track of the paths that corresponds to nodes. Therefore, when it added the num
for the left most leaf to the total, it could not determine which path to take next.
The only way to do it is to backtrack num
by removing the bits that correspond to the leaf that has been visited and its parent, grandparent, great-grandparent… that has no other children. This causes a lot of trouble as every node references this single variable num
, and there are many states to check.
The code from Leetcode discussion solves this problem by copying the current path at each node, though this is done implicitly, as strings are copied implicitly.
Therefore, every node has their own copy the the current path, and they can safely modify the path (path+str(node.val)
) without affecting other nodes.
2.23.3.2. Related problems
2.24. DONE 1207 Unique number of occurrences hash_table easy
2.24.1. Description
Given an array of integers arr
, write a function that returns true
iff the number of occurrences of each value in the array is unique.
Constraints:
- 1 <= arr.length <= 1000
- -1000 <= arr[i] <= 1000
2.24.1.1. Examples:
Input: arr = [1,2,2,1,1,3] Output: true Explanation: The value 1 has 3 occurrences, 2 has 2 and 3 has 1. No two values have the same number of occurrences. Input: arr = [1,2] Output: false Input: arr = [-3,0,1,-3,1,1,1,-3,10,0] Output: true
2.24.2. Solution
2.24.2.1. Understanding the problem
The problem:
- gives an array of integers that
- could be negative, 0, or positive.
- wants us to count the number of occurrences (
occur
) of each value, and - return true if
occur
of each integer is different.
2.24.2.2. Algorithm
We scan the integer array, and add one to the counter (occur
) map of each value we met.
We then scan the counter map to see if there is more than one unique numbers.
2.24.2.3. Code
<<imports for typing>> def unique_occurrences(arr: List[int]) -> bool: occurs = {} for i in arr: occurs[i] = occurs.get(i, 0) + 1 uniques = set(occurs.values()) return len(uniques) == len(occurs.values()) arr = [1,2,2,1,1,3] arr = [2,2,1,1,3] print(unique_occurrences(arr))
False
2.24.2.4. Leetcode solution
Not available.
<<imports for typing>>
2.24.2.4.1. Time complexity:
2.24.2.4.2. Space complexity:
2.24.3. More analysis
2.24.3.1. General thoughts
This is an easy question. The hard part of this question is actually understanding the description clearly and quickly. This is why I suddenly decided to add a Understanding the problem heading.
It's actually amazing that after so many questions and problems that I have solved, I put this heading in so late. Yes the importance of understanding the question, or figuring out the definition of the terms used in the question, has been known to me for quite some time, but somehow I just did not give credits where it's due. It's almost like I'm unconsciously supressing the known idea that understanding the problem clearly and quickly is crucial to solving the problem.
I want to take this opportunity to write it down explicitly that the first thing we need to do in any coding (or anything really) is to come to terms with the author (How to read a book). We must first understand what the words mean before we understand what the author of the problem is saying by giving us those words. Again, this is an extremly simple and important but often ignored rule.
2.24.3.2. Related problems
2.25. TODO 706 Design HashMap hash_table easy
2.25.1. Description
Design a HashMap without using any built-in hash table libraries.
To be specific, your design should include these functions:
put(key, value)
: insert a(key, value)
pair into the HashMap. If the value already exists in the HashMap, update the value.get(key)
: returns the value to which the specified key is mapped, or-1
if this map contains no mapping for the key.remove(key)
: remove the mapping for the value key if this map contains the mapping for the key.
Constraints:
- All keys and values will be in the range of [0, 1000000].
- The number of operations will be in the range of [1, 10000].
- Please do not use the built-in HashMap library.
2.25.1.1. Examples:
MyHashMap hashMap = new MyHashMap(); hashMap.put(1, 1); hashMap.put(2, 2); hashMap.get(1); // returns 1 hashMap.get(3); // returns -1 (not found) hashMap.put(2, 1); // update the existing value hashMap.get(2); // returns 1 hashMap.remove(2); // remove the mapping for 2 hashMap.get(2); // returns -1 (not found)
2.25.2. Solution
2.25.2.1. Understanding the problem
The remove()
method, though not mentioned, should be idempotent.
The code template also gives more information on requirements.
The key
is an integer and value
will always be non-negative.
Because the number of operations will be in the range less than 10000
, this can be used as
the length of our array backing up the map, assuming that all the operations were put()
.
2.25.2.2. Algorithm
We need to find a proper hash function.
For the hash function, we simply divide the key
by 100
. We can use doubly linked list for handling collision.
2.25.2.3. Code
<<imports for typing>> class MyHashMap: class Node: def __init__(self, val: int, key: int, parent: 'Node'=None, child: 'Node'=None): self._val = val self._key = key self._parent = parent self._child = child @property def val(self): return self._val @val.setter def val(self, val): self._val = val @property def key(self): return self._key @key.setter def key(self, key): self._key = key @property def child(self): return self._child @child.setter def child(self, child: 'Node'): self._child = child @property def parent(self): return self._parent @parent.setter def parent(self, parent: 'Node'): self._parent = parent @staticmethod def gen_nodes(root: 'Node'): node = root while node: yield node node = node.child def __init__(self): """ Initialize your data structure here. """ self.arr = [self.Node(-1, -1) for i in range(10000)] def _get_nodes(self, key: int): hash_value = self._hash_func(key) return self.gen_nodes(self.arr[hash_value]) def put(self, key: int, value: int) -> None: """ value will always be non-negative. """ nodes = self._get_nodes(key) for node in nodes: if node.key == key: node.val = value break else: node.child = self.Node(value, key, node) def get(self, key: int) -> int: """ Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key """ nodes = self._get_nodes(key) for node in nodes: if node.key == key: return node.val return -1 def remove(self, key: int) -> None: """ Removes the mapping of the specified value key if this map contains a mapping for the key """ nodes = self._get_nodes(key) for node in nodes: if node.key == key: if node.child: node.child.parent = node.parent node.parent.child = node.child @staticmethod def _hash_func(key: int) -> int: """ Hashes key """ return key // 100 obj = MyHashMap() obj.put(100, 123) print(obj.get(100)) obj.put(101, 124) obj.put(101, 1024) print(obj.get(101)) obj.put(102, 125) print(obj.get(102)) obj.remove(102) print(obj.get(102)) print(obj.get(101)) print(obj.get(103))
123 1024 125 -1 1024 -1
2.25.2.4. Leetcode solution
Not available.
<<imports for typing>>
2.25.2.4.1. Time complexity:
2.25.2.4.2. Space complexity:
2.25.3. More analysis
2.25.3.1. General thoughts
Coming up with the solution algorithm conceptually is not hard as this is well-documented in CLRS.
The hard part comes at the actual coding. Basically, I encountered the following issues:
- Properly use reference to inner class
- Properly reference the class itself when setting types for arguments
- Properly use
@property
decorator andproperty.setter
as I did not use them often.
Besides those issues, I was also sceptical if the following code would work.
That is to say, if I still have reference to the last node
when I break out of the loop.
The answer is yes, since node
is shadowed in the for
loop. Therefore, if we break out of the loop or the loop finishes, node still points to the last object it got in the loop.
node = None for node in nodes: if node.key == key: node.val = value break # this is a slight modification of my solution node.child = self.Node(value, key, node)
2.25.3.2. Related problems
2.26. TODO 208 Implementing Trie
2.26.1. Description
Implement a Trie with insert
, search
and startwith
.
Constraints:
- You may assume that all inputs are consist of lowercase letters a-z.
- All inputs are guaranteed to be non-empty strings.
2.26.1.1. Examples:
Trie trie = new Trie(); trie.insert("apple"); trie.search("apple"); // returns true trie.search("app"); // returns false trie.startsWith("app"); // returns true trie.insert("app"); trie.search("app"); // returns true
2.26.2. Solution
2.26.2.1. Understanding the problem
A Trie is based on a tree structure.
insert(word)
inserts a word
into the trie.
search(word)
tries to find a word
in the trie, and returns True
if it can find the word
, else return False
.
startswith(prefix)
tries to find prefix
in the trie, and returns True
if it can find the prefix
, else return False
.
The difference between search()
and startswith()
is that the last character of the word
for search()
must be the last character in the trie as well, i.e. its is_end
property must be True
.
2.26.2.2. Algorithm
We use standard Trie structure with a TrieNode
and a Trie
.
2.26.2.3. Code
<<imports for typing>> class Trie: class TrieNode: def __init__(self, is_end=False, num_children=26): self.is_end = is_end self.num_children = num_children self.links = [None for i in range(self.num_children)] @property def is_end(self): return self._is_end @is_end.setter def is_end(self, val): self._is_end = val def contains_key(self, ch: str) -> bool: return self.links[ord(ch)-ord('a')] is not None def get(self, ch: str) -> 'TrieNode': return self.links[ord(ch) - ord('a')] def put(self, ch: str, node: 'TrieNode') -> None: self.links[ord(ch) - ord('a')] = node def __init__(self): """ Initialize your data structure here. """ self.root = self.TrieNode() def insert(self, word: str) -> None: """ Inserts a word into the trie. """ node = self.root for ch in word: if not node.contains_key(ch): node.put(ch, self.TrieNode()) node = node.get(ch) # set the last node.is_end to True node.is_end = True def search(self, word: str) -> bool: """ Returns if the word is in the trie. """ node = self.search_prefix(word) return (node is not None) and node.is_end def startsWith(self, prefix: str) -> bool: """ Returns if there is any word in the trie that starts with the given prefix. """ node = self.search_prefix(prefix) return node is not None def search_prefix(self, prefix: str) -> TrieNode: node = self.root for ch in prefix: if node.contains_key(ch): node = node.get(ch) else: return None return node # Your Trie object will be instantiated and called as such: obj = Trie() word="apple" obj.insert(word) param_2 = obj.search(word) print(param_2) word="app" obj.insert(word) param_2 = obj.search(word) print(param_2) prefix="apt" param_3 = obj.startsWith(prefix) print(param_3) param_2 = obj.search("aa") print(param_2)
True True False False
2.26.2.4. Leetcode solution
<<imports for typing>>
2.26.2.4.1. Time complexity:
2.26.2.4.2. Space complexity:
2.26.3. More analysis
2.26.3.1. General thoughts
A Trie is built with two parts: a TrieNode
and a Trie
.
The most important thing to check for when building a TrieNode
is the number of children it should have when children
is an array of fixed length. In cases when we are not certain of the length, we can use HashMap.
The TrieNode
should have at least three methods and one property to be functional: put
, get
, contains_key
, and is_end
as a property.
The three methods are meant to operate on TrieNode.children
, which are basically modifications of array's get
, append
, in
methods. They also have the same time complexity of \(O(1)\).
The is_end
property marks whether the current node is the end of a word.
When we have a TrieNode
, we then build a Trie
upon it.
The most important thing with Trie
is the subtle difference between a prefix
and a word
.
A word
is also a prefix, but a prefix
does not necessarily equal to a word
.
A prefix
is also a word
iff prefix.last_node.is_end==True
.
This tells us that, we can use the method search_prefix(prefix)
to find the prefix
, and also use it to find if a word
is in the Trie, since we only need to check if the last node is also an end.
The Trie
also have at least three methods to be functional: insert
, search_prefix
and search_word
. Typically a startswith
method will be added, and this is as trivial as search_word
.
2.26.3.2. Related problems
2.27. TODO 1038 Binary Search Tree to Greater Sum Tree
2.27.1. Description
Given the root of a binary search tree with distinct values,
modify it so that every node
has a new value equal to the sum of the values of
the original tree that are greater or equal to node.val
.
As a reminder, a binary search tree is a tree that satisfies these constraints:
- The left subtree of a node contains only nodes with keys less than the node's key.
- The right subtree of a node contains only nodes with keys greater than the node's key.
- Both the left and right subtrees must also be binary search trees.
Constraints:
- The number of nodes in the tree is between 1 and 100.
- Each node will have value between 0 and 100.
- The given tree is a binary search tree.
2.27.1.1. Examples:
2.27.2. Solution
2.27.2.1. Understanding the problem
The question is bit hard to understand at first but it is asking us to do the following.
- We have a BST.
- Start from the right-most leaf, we replace every node with,
the sum of, all values that are greater than the node, i.e, all values that are to the right of the node.
2.27.2.2. Algorithm
Like what is said in Understanding the problem, we start from the right-most leaf. In this case, we use DFS and In-order traversal. Every node's value is made of its original value plus the value of the node processed before it.
2.27.2.3. Code
<<imports for typing>> # Definition for a binary tree node. class TreeNode: def __init__(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right class Solution: def bstToGst(self, root: TreeNode) -> TreeNode: def helper(node) -> None: nonlocal running_sum if node.right: helper(node.right) node.val += running_sum running_sum = node.val if node.left: helper(node.left) running_sum = 0 helper(root) return root
2.27.2.4. Leetcode solution
<<imports for typing>>
2.27.2.4.1. Time complexity:
2.27.2.4.2. Space complexity:
2.27.3. More analysis
2.27.3.1. General thoughts
2.27.3.2. Related problems
2.28. TODO 21 Merge Two Sorted Lists
2.28.1. Description
Merge two sorted linked lists and return it as a new sorted list. The new list should be made by splicing together the nodes of the first two lists.
Constraints:
2.28.1.1. Examples:
Input: 1->2->4, 1->3->4 Output: 1->1->2->3->4->4
2.28.2. Solution
2.28.2.1. Understanding the problem
The problem gives two sorted linked lists a
and b
, and asks us to merge these two
together to form a new sorted linked list c
that is:
in the same order asa
andb
, though this is not mentioned.- either descending or ascending.
- ignorant of the order of elements from
a
andb
when they are equal.
The lists could be empty or only have one node. The lists could have different order.
2.28.2.2. Algorithm
We assume that a
and b
are both descending or ascending.
We need to know whether
Actually, we can't know that ahead of time as the whole list could have a list of equal values.
a
and b
are descending or ascending.
Once we know that, we pop each of the node of the lists and always use the smallest/largest node.
We simply pop nodes from each list and compare the new nodes with the one we have to form the next node of the new list.
Or maybe we just quick sort them, considering a
and b
can have different orders???
2.28.2.3. Code
<<imports for typing>> # Definition for singly-linked list. class ListNode: def __init__(self, val=0, next=None): self.val = val self.next = next def mergeTwoLists(l1: ListNode, l2: ListNode) -> ListNode: def gen_node(lst: ListNode): yield lst node = lst while node.next: node = node.next yield node if l1 == None: return l2 if l2 == None: return l1 gen_l1 = gen_node(l1) gen_l2 = gen_node(l2) res = [] for node in gen_l1: res.append(node) for node in gen_l2: res.append(node) res = sorted(res, key=lambda node: node.val) for i, node in enumerate(res): if i == len(res) - 1: break node.next = res[i+1] res[-1].next = None return res[0] l1_1 = ListNode(4) l1_2 = ListNode(2, l1_1) l1_3 = ListNode(1, l1_2) l2_1 = ListNode(4) l2_2 = ListNode(3, l2_1) l2_3 = ListNode(1, l2_2) print(mergeTwoLists(l1_3, l2_3).next.next.next.val)
3
2.28.2.4. Leetcode solution
<<imports for typing>>
2.28.2.4.1. Time complexity:
2.28.2.4.2. Space complexity:
2.28.3. More analysis
2.28.3.1. General thoughts
2.28.3.2. Related problems
2.29. TODO 20 Valid Parentheses
2.29.1. Description
Constraints:
2.29.1.1. Examples:
2.29.2. Solution
2.29.2.1. Understanding the problem
2.29.2.2. Algorithm
2.29.2.4. Leetcode solution
<<imports for typing>>
2.29.2.4.1. Time complexity:
2.29.2.4.2. Space complexity:
2.29.3. More analysis
2.29.3.1. General thoughts
2.29.3.2. Related problems
3. Data structures
This section contains widely used data structures and related problems.
4. Algorithms
This section contains widely used algorithms and related problems.
4.1. Sorting
4.1.1. Insertion sort
In-place sort. Easy to implement.
def insertion_sort(arr): i = 0 while i < len(arr): j = i while j > 0 and arr[j-1] > arr[j]: temp = arr[j-1] arr[j-1] = arr[j] arr[j] = temp j -= 1 i+=1 return arr print(insertion_sort([2,1,3,4,2,5]))
[1, 2, 2, 3, 4, 5]
4.2. Dynamic programming
The prerequisites for using DP are:
- optimal substructure
- overlaping sub-problems
4.2.1. Edit distance
def edit(s: str, t: str) -> int: from math import inf def get_element(tbl: List[List[int]], i: int, j: int) -> int: if i>=0 and j>=0: return tbl[i][j] else: return inf m = len(s) n = len(t) tbl = [[0 for j in range(n+1)] for i in range(m+1)] for i in range(0, m+1): for j in range(0, n+1): if i==0 and j==0: continue tbl[i][j] = min(1+get_element(tbl,i, j-1), 1+get_element(tbl,i-1, j), get_element(tbl, i-1, j-1)+(0 if s[i-1]==t[j-1] else 1)) return tbl[m][n] print(edit("cbc", "ac"))
2
def edit(s: str, t: str) -> int: m = len(s) n = len(t) # +1 accounts for the empty string in the table tbl = [[0 for j in range(n+1)] for i in range(m+1)] for i in range(m+1): for j in range(n+1): if i == 0: tbl[i][j] = j elif j==0: tbl[i][j] = i # we want to compare the chars s[i-1] and t[j-1] # not the chars s[i] and t[j] # because at this point, i is in [1..m], j is in [1..n] # we need to map them back to [0..m-1] and [0..n-1] elif s[i-1]==t[j-1]: tbl[i][j] = tbl[i-1][j-1] else: tbl[i][j] = 1+min(tbl[i][j-1], # insert tbl[i-1][j], # remove tbl[i-1][j-1] # replace ) return tbl[m][n] print(edit("abc", "ac"))
1
4.2.2. Discussions from Introduction to Algorithms
See 6.5.
4.3. Dijkstra's Algorithm
4.4. Binary search
Simple recursive solution.
def binary_search(arr, left, right, target): """Return position of the target, or -1 if not found.""" # check base case if left <= right: mid = left + (right - left) // 2 if arr[mid] == target: return mid elif arr[mid] > target: return binary_search(arr, left, mid - 1, target) else: return binary_search(arr, mid + 1, right, target) return -1
5. Techniques
This section contains techniques that do not make a full algorithm and related problems.
5.1. General
5.1.1. Bitmask
See 2.10.2.5.
We have set of objects that we want to draw from. How do we represent the set's subsets such that we get its power set?
We can easily think of a map to associate with each object a boolean value, indicating whether the object is picked. The problem with this is that it can take a lot of memory and can be slow due to the overhead of creating the map.
So we can use bitmask.
arr | 1 | 2 | 3 |
mask | 1 | 0 | 1 |
def get_power_set(arr: List[int]) -> List[List[int]]: n = len(arr) # helps create left padding zeros # if n == 4, we have 1000 nth_bit = 1<<n res = [[]] # time complexity: 2**n for i in range(2**n): # if n==4, we might have 1000 | 010 # 1000 | 110, etc # [3:] takes out the leading "0b1" in "0b1010" mask = bin(nth_bit | i)[3:] sub = [] # time complexity: n for j in range(n): if mask[j] == "1": sub.append(arr[j]) res.append(sub) return res print(get_power_set([1,2,3]))
[[], [], [3], [2], [2, 3], [1], [1, 3], [1, 2], [1, 2, 3]]
5.2. Python
k in dict.keys()
is \(O(1)\) because keys views are set-like since their are unique and hashable.
6. CLRS Notes & Yufei Tao lecture notes
Read chapters 3-4, 6-7, 10-17 and 22 of CLRS, and practice on LeetCode. Also check Yufei Tao's lecture notes for concise introductions.
6.1. Growth of functions
We want to compare which algorithm performs better with respect to input size. We study the asymptotic efficiency of algorithms when we make the input size large enough that only the order of growth of the running time is relevant.
6.1.1. Asymptotic notation
Notations:
- Domain for all functions of interest: \(\mathbb{N}\).
- Worst-case running-time function \(T(n)\).
6.1.1.1. List of notations from YFT lecture notes:
6.1.1.1.1. Big-\(O\) (upper bound)
Let \(f(n)\) and \(g(n)\) be two functions of \(n\).
We say that \(f(n)\) grows asymptotically no faster than \(g(n)\) if there is a constant \(c_1>0\) and a constant \(c_2\) such that \(f(n)\le c_1 g(n)\) holds for all \(n \ge c_2\). We denote this by \(f(n)=O(g(n))\).
The time complexity of an algorithm is described in the asymptotical form (big-O).
6.1.1.1.2. Big-\(\Omega\) (lower bound)
Let \(f(n)\) and \(g(n)\) be two functions of \(n\). If \(g(n)=O(f(n))\), then we define: \(f(n)=\Omega(g(n))\), to indicate that \(f(n)\) grows asymptotically no slower than \(g(n)\).
6.1.1.1.3. Big-\(\Theta\)
Let \(f(n)\) and \(g(n)\) be two functions of \(n\). If \(f(n)=O(g(n)) \text{ and } f(n)=\Omega(g(n))\), in other words, \(f(n)=O(g(n)) \text{ and } g(n)=O(f(n))\), then we define: \(f(n)=\Theta(g(n))\). This indicates that \(f(n)\) grows asymptotically as fast as \(g(n)\), no faster (big-\(O\)), no faster (big-\(\Omega\)).
6.1.2. Standard notations and common functions
6.2. Divide and conquer
We solve the problem recursively, applying the following 3 steps at each level of the recursion:
- Divide the problem into a number of subproblems that are smaller instances of the same problem (recursive case).
- Conquer the subproblems by solving them recursively. If the subproblem sizes are small enough, however, just solve the subproblems in a straightforward manner (base case).
- Combine the solutions to the subproblems into the solution for the original problem.
6.2.1. Recurrences
A recurrence is an equation of inequality that describes a function in terms of its value on smaller inputs.
The books provides 3 methods for obtaining the asymptotic \(\Theta\) or \(O\) bounds on the recurrence solution:
- Substitution method - we guess a bound and then use mathematical induction to prove our guess correct.
- Recursion-tree method - convert the recurrence into a tree whose nodes represent the costs incurred at various levels of the recursion. We use techniques for bounding summations to solve the recurrence.
- Master method - provides bounds for recurrences of the form \(T(n)=\underbrace{aT(n/b)}_{a \text{ subproblems}}+f(n),a\ge 1, b>1\) and \(f(n)\) is a given function.
To use the 3, 3 cases need to be memorized.
6.3. Sorting
The sorting problem:
- Input: A sequence of \(n\) numbers \(\langle a_1, a_2,\ldots,a_n\rangle\).
- Output: A permutation \(\langle a_1',a_2',\ldots,a_n'\rangle\) of the input sequence such that \(a_1'\le a_2'\le \ldots \le a_n'\).
Two kinds of sorting based on time complexity:
- Comparison sort: sorted order is based only on comparison between the input elements
- Non-comparison sort: use operations other than comparisons to determine the sorted order
Algorithm | Worst-case runing time | Average-case/expected running time | Comparison? |
---|---|---|---|
Insertion sort | \(\Theta(n^2)\) | \(\Theta(n^2)\) | yes |
Merge | \(\Theta(n\lg n)\) | \(\Theta(n\lg n)\) | yes |
Heap | \(O(n\lg n)\) | - | yes |
Quick | \(\Theta(n^2)\) | \(\Theta(n\lg n)\) (expected) | yes |
Counting | \(\Theta(k+n)\) | \(\Theta(k+n)\) | no |
Radix | \(\Theta(d(k+n))\) | \(\Theta(d(k+n))\) | no |
Bucket | \(\Theta(n^2)\) | \(\Theta(n)\) (average-case) | no |
6.3.1. Heapsort comparison_sort
- Running time: \(O(n\lg n)\)
- In-place: yes
The (binary) heap data structure is an array object that can be viewed as a nearly complete binary tree (some leaves might be empty). Each node of the tree corresponds to an element of the array.
A heap \(A\) is an array that has two attributes:
- \(A.length\)
- \(A.heap-size\), only \(A[1..A.heap-size]\) are valid elements to the heap
\(A.heap-size\le A.length\).
6.3.1.1. Heap properties
For any node \(i\), we have:
- parent: \(\lfloor i/2\rfloor\).
- left child: \(2i\)
- right child: \(2i+1\)
There are two kinds of binary heaps: max-heaps and min-heaps. For every node \(i\), they satisfy a heap property,
- max-heap: \(A[parent(i)]\ge A[i]\).
- min-heap: \(A[parent(i)]\le A[i]\).
Max-heaps are used in the heapsort lagorithm; Min-heaps implement 6.7.4.
6.3.1.2. Max-Heapify (maintaining \(A[parent(i)]\ge A[i]\))
Here is the Python code for the Max-Heapify pseudo-code. Time complexity: \(O(\lg n), n=A.length\).
The max_heapify
lets the value at A[i]
"float down" in the max-heap
so that the subtree rooted at index i
obeys the max-heap property.
/-------\ | arr(i)| | | \-------/ / \ / \ / \ /-------\ /-------\ | max | | max | | heap | | heap | \-------/ \-------/
def max_heapify(A, i, heap_size) -> None: # A is 0-indexed l = 2*i+1 r = 2*i+2 largest = i if l < heap_size and A[l]>A[i]: largest = l if r < heap_size and A[r]>A[largest]: largest = r if largest != i: A[i], A[largest] = A[largest], A[i] # after exchanging A[i] and A[largest] # the subtree now rooted at A[largest] might violate the max-heap property # thus we need to heapify it # at the end, the value of the original A[i] will float down # as many heights as possible max_heapify(A, largest, heap_size) # The children of the root node must be already max-heaps A = [4,14,13,10,1,12] max_heapify(A, 0, len(A)) print(A)
[14, 10, 13, 4, 1, 12]
6.3.1.3. Building a heap
Here is the Python code for building the Build-Max-Heap pseudo-code from bottom to top (this is faster than "top-bottom" but the details are not important here).
This also makes sense because we start from the direct parent of leave nodes, which matches
the prerequisite of max-heapify()
of having the child nodes of a 3-node construct to be
max-heaps. In this case, those child nodes of a 3-node construct are leaves of the binary tree,
which are max-heaps with only one element.
As we go up the tree, all nodes that we have passed by are now roots of max-heaps,
which again satisfies the condition of max-heapify()
when used as child nodes of an upper node.
This is \(O(n)\). Although the obvious time complexity is \(O(n\lg n)\), we can prove that a tighter bound exists and it is \(O(n)\).
This makes sure that all elements \(A[i]\) in the heap satisfies: \(A[parent(i)]\ge A[i]\).
Once again, compared to max-heapify()
, this algorithm does not need its argument array to have children of the root as max-heaps already,
since this algorithm is going to achieve exactly that.
<<max-heapify>> def build_max_heap(A, heap_size) -> None: # we simply skip all leaves # because they are max-heaps with one element for i in range((len(A)//2-1), -1, -1): max_heapify(A, i, heap_size) #A = [4,14,7] #build_max_heap(A, len(A)) #print(A)
[14, 4, 7]
6.3.1.4. Heapsort algorithm
<<build-max-heap>> def heapsort(A): # As the name suggests, # build a max-heap out of array A. # We have the largest number at root # after this build_max_heap(A, len(A)-1) # Keep changing root and the last element # max_heapify() guarantees that # the largest element pops to the root for i in range(len(A)-1, 0, -1): # heap_size is i A[0], A[i] = A[i], A[0] max_heapify(A, 0, i) A=[3,13,2,123,4,1] heapsort(A) print(A)
[14, 10, 7, 4, 1, 12] [1, 2, 3, 4, 13, 123]
6.3.1.5. Issues with heapsort in real world applications
Notes from Princeton COS 226.
Performance on real computers is heavily impacted by really messy factors like cache performance.
Cache works in a way that when we fetch one memory address, its nearby addresses are fetched as well, since memory access patterns of most programs/algorithms are highly localized.
Therefore, in a \(n\times n\) matrix, [arr[i][j] for j in range(n) for i in range(n)]
is faster than [arr[j][i] for j in range(n) for i in range(n)]
, because when we fetched arr[i][0]
, the computer already cached arr[i][1]
, etc, for us.
Sorting algo | characteristic | cache-friendly☺? |
---|---|---|
mergesort | sort subarrays first | cache-friendly ☺ |
quicksort | partition into subarrays | cache-friendly ☺ |
heapsort | all over the place | cache-unfriendly 🤕 |
6.3.2. Quicksort comparison_sort
Worst-case: \(\Theta(n^2)\). Expected-case: \(O(n\lg n)\). In-place: true.
Three step divide-and-conquer process for a subarray \(A[p..r]\):
- Divide: Partition \(A[p..r]\) into two (possibly empty) subarrays \(A[p..q-1]\) and \(A[q+1..r]\) such that \(a\le A[q]\le b, a\in A[p..q-1], b\in A[q+1, r]\). Compute the index \(q\) as part of this partition prcedure.
- Conquer: Sort the two subarrays \(A[p..q-1]\) and \(A[q+1..r]\) by recursive calls to quicksort.
- Combine: Because the subarrays are already sorted, no work is needed to combine them.
6.3.2.1. Implementation
Implementation of the above steps:
<<imports for typing>> <<partitioning the array>> def quicksort(arr, low, high): if low < high: par_idx = partition(arr, low, high) quicksort(arr, low, par_idx-1) quicksort(arr, par_idx+1, high) A = [2,1,4,5] quicksort(A, 0, len(A)-1) print(A)
[1, 2, 4, 5]
Implementation of partitioning the array:
def partition(arr, low, high): """ This function takes last element as pivot, places the pivot element at its correct position in sorted array, and places all smaller elements to the left of the pivot, and all greater elements to the right of the pivot. This partitions the arr in-place. Thus the for loop seems hard to understand. """ pivot = arr[high] # arr[low:i+1] is the smaller elements segment # that are <= pivot # i is the idx of the last smaller element i = low - 1 # After each the iteration: # arr[low:i+1] are smaller elements. # arr[i+1:j] are larger elements. # arr[j:high] are unchecked elements. # arr[high] is the pivot for j in range(low, high): # if current element is smaller than # or equal to pivot # This means we can add another element # to the smaller elements segment if arr[j] <= pivot: # increment index of smaller element # because we found one smaller element at arr[j] # and is going to exchange it with arr[i] i += 1 # if j>i, that means arr[i] > pivot # we need to exchange that with arr[j]. # otherwise, there is no need to exchange # arr[j] and arr[i] # however, this condition can be ignored, # as it does not slow down the operation if j>i: arr[i], arr[j] = arr[j], arr[i] # exchange the smallest element in the larger segment # with the pivot arr[i+1], arr[high] = arr[high], arr[i+1] return i+1
6.3.2.2. Notes on choosing the pivot and running time
We can randomly pick any \(i\) in \(A\) to be our pivot. And this random picking process can be done in \(O(1)\) time.
Because this is a randomized algorithm, its running time is a random variable depending on the random choices made. Thus we can only get its running time as follows:
Case | # of elements in segments after each partitioning | Running time |
---|---|---|
Worst | n-1 elements and 0 elements | \(O(n^2)\) |
Best | \(\lfloor n/2 \rfloor\), \(\lceil n/2 \rceil -1\) | \(O(n\lg n)\) |
Average | as long as segment not empty (not the worst) | \(O(n\lg n)\) |
6.3.3. Sorting in linear time
We assume all input elements are distinct, without loss of generality.
6.3.3.1. TODO Decision-tree
Comparison sorts can be viewed abstractly in terms of decision trees. A decision tree is a full binary tree that represents the comparisons between elements that are performed by a particular sorting algorithm operating on an input of a given size.
6.4. Medians and order statistics
6.4.1. Definition
The \(i\) th order statistic of a set of \(n\) elements is the \(i\) th smallest element. For example, the minimum of a set of elements is the first order statistic (\(i=1\)), and the maximum is the \(n\) th order statistic (\(i=n\)). A median, is the middle of the set.
The selection problem:
- Input: A set \(A\) of \(n\) (distinct) numbers and an integer \(i\), with \(1\le i \le n\).
- Output: The element \(x \in A,a_1< a_2 < \ldots < a_{i-1},x < a_{i+1}\ldots a_n\).
6.4.2. DONE Selection in expected linear time (Quickselect)
We use randomized_select
modeled after the 6.3.2 algorithm but only works on one side of a partition.
Implementation of Quickselect fromLeetcode.
import random def partition(arr, l, r): x = arr[r] i = l for j in range(l, r): # use > to sort the elements in descending order if arr[j] > x: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[r] = arr[r], arr[i] return i def quick_select(nums, start, n, k): # get random pivot index rand_pivot_idx = random.randrange(start, n+1) nums[n], nums[rand_pivot_idx] = nums[rand_pivot_idx], nums[n] pos = partition(nums, start, n) if pos == k-1: return nums[pos] elif pos >= k: return quick_select(nums, start, pos-1, k) return quick_select(nums, pos+1, n, k) def find_kth_largest(nums, k): return quick_select(nums, 0, len(nums)-1, k) print(find_kth_largest([3,2,1,5,6,4], 1))
6
6.4.3. TODO Selection in worst-case linear time (median-of-medians)
Steps to select:
- Divide the \(n\) elements of the input array into \(\lfloor n/5 \rfloor\) groups and at most one group made up of the remaining \(n \mod 5\) elements.
- Find the median of each of the \(\lceil n/5 \rceil\) groups by first insertion-sorting the elements of each group.
- Use
select()
recursively to find the median \(x\) of the \(\lceil n/5 \rceil\) medians found in step 2. - Partition the input array around the median-of-medians \(x\) using the modified version of
partition
. Let \(k\) be one more than the number of elements on the low side of the partition, so that \(x\) is the kth smallest element and there are \(n-k\) elements on the high side of the partition. - If \(i=k\), then return \(x\), otherwise, use
select
recursively to find the ith smallest element on the low side if \(ik\).
Python implementation of the algorithm.
def partition(arr, lo, hi, pivot) -> int: for i in range(lo, hi): if arr[i] == pivot: arr[hi], arr[i] = arr[i], arr[hi] break i = lo for j in range(lo, hi): if arr[j] >= pivot: arr[j], arr[i] = arr[i], arr[j] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i def find_median(arr, lo, n): lst = [] for i in range(lo, lo+n): lst.append(arr[i]) lst.sort() return lst[n//2] def select(arr, lo, hi, k) -> int: # number of elements in arr[lo, hi] n = hi - lo + 1 medians = [] i = 0 while (i<n//5): medians.append(find_median(arr, lo+i*5, 5)) i += 1 # last group if i*5 < n: medians.append(find_median(arr, lo+i*5, n%5)) i+=1 # Find median of all medians using recursive call. # If medians has only one element, then no need # of recursive call if i == 1: med_of_meds = medians[0] else: med_of_meds = select(medians, 0, i-1, i//2) # Partition the array around a med_of_meds # element and get position of pivot # element in sorted array pos = partition(arr, lo, hi, med_of_meds) # If position is the same as k if (pos - lo == k - 1): return arr[pos] # If position is more, # recur for left subarray if (pos - lo > k-1): return select(arr, lo, pos-1, k) # Else recur for right subarray return select(arr, pos+1, hi, k-pos+lo-1) print(select([1,2,3,4,5],0, 4, 2))
4
6.5. Dynamic programming
Dynamic programming solves problems by combining the solutions to subproblems. "Programming" in this context refers to a tabular mothod, not to writing computer code.
DP is usually applied to optimization problems that can be divided into subproblems, and these subproblems have overlapping subsubprolems, i.e. they share some common subsubproblems.
Usually, there exist multiple solutions to a problem, and we want a solution that gives an optimal (minimum or maximum) value, since there might also exist multiple solutions that give us the same optimal value.
6.5.1. Four steps to develop a dynamic programming algorithm
- Characterize the structure of an optimal solution
- Recursively define the value of an optimal solution
- Compute the value of an optimal solution, typically in a bottom-up fashion
- Construct an optimal solution from computed information
- This step is only needed when we want not only the value of the solution, but also the solution itself.
- This step is where DP needs \(O(N)\) space (if input is one dimentional array) as it needs to store the computed information.
6.5.2. Rod cutting example
6.5.2.1. The problem
Given a rod of length n
inches and a table of prices[i], i=1,2,...,n
, determine the maximum revenue rn
obtainable
by cutting up the rod and selling the pieces. Note that if the price prices[n]
for a rod of length
n
is large enough, an optimal solution may require no cutting at all.
6.5.2.2. The solution
To cut a rod of length n
into rods of 1 inch, we need to cut n-1
times.
By thinking of this as a problem of bitmasking, where we have a binary number of n-1
bits,
and 1 represents cut, 0 represents no cut, it is easy to deduce that the number of possible ways
to cut the rod is \(2^{n-1}\).
Given the prices
table, any revenue \(r_n\) of a length n
rod can be represented as follows:
\(r_n=\max(p_n, r_1+r_{n-1}, r_2+r_{n-2},...,r_{n-1}+r_1)\). (Recursively define the value of an optimal solution)
\(p_n\) corresponds to making no cuts at all. The remaining n-1
arguments correspond to the maximum
revenue obtained by making an initial cut of the rod into two pieces of size i
and n-i
,
for each \(i=1,2,...,n-1\), and then optimally cutting up those pieces further,
obtaining revenues \(r_i\) and and \(r_{n-i}\) from those two pieces.
We say the rod-cutting problem exihibits optimal substructure, i.e. optimal solutions to a problem incorporate optimal solutions to related subproblems, which we may solve independently. It also has overlapping subsubproblems. For example, one needs to \(r_1\) twice for \(r_i\) and \(r_{n-i}\).
The equation \(r_n=\max(p_n, r_1+r_{n-1}, r_2+r_{n-2},...,r_{n-1}+r_1)\) has duplicate values since when doing cutting (bitmasking), "100" is the same as "001". Therefore, we can simplify this process of cutting by the following:
- We cut a first piece of length
i
off the left-hand end - We only further divided the right-hand end remainder of length
n-i
. - The revenue is then \(r_n=\max_{1\le i\le n}(p_i+r_{n-i})\), which removes one recursion (\(r_i\)) for us.
6.5.2.2.1. Recursive top-down implementation
Here is an implementation of the above formula \(r_n=\max_{1\le i\le n}(p_i+r_{n-i})\).
<<imports for typing>> def cut_rod(p: List[int], n: int) -> int: from math import inf if n == 0: return 0 q = -inf for i in range(1, n+1): # implementation detail # the array in the book starts with 1 # here the array starts with 0 q = max(q, p[i-1]+cut_rod(p, n-i)) return q print(cut_rod([1,2,3,9], 3))
3
This implementation is slow because it's time complexity is \(O(n^2),\text{ where }n=\text{len}(p)\).
Many same operations were done multiple times.
For example, when solving cut_rod(p, n)
and cut_rod(p, n-1)
, cut_rod(p, n-2)
was calculated twice.
6.5.2.2.2. Dynamic programming solution
We only need to solve one problem in 6.5.2.2.1 to make it significantly faster, namely, we want to solve each subproblem exactly once and save it (extra memory) for future use.
Dynamic programming runs in polynomial time when the number of distinct subproblems involved is polynomial in the input size and we can solve each sub subproblem in polynomial time. There are usually two ways to implement a dynamic programming approach - top-down with memoization and bottom-up.
Top-down with memoization:
- Recursive
- Save the result of each subproblem in an array or hash table
- For every subproblem, first check if we have already solved it
- If so, just return the value
- If not, compute the value for the subproblem
Bottom-up:
- Typically depends on some natural notion of the "size" of a subproblem, s.t. solving any particular subproblem depends only on solving "smaller" subproblems.
- We sort the subproblems by size and solve them in size order, smallest first.
- When solving a particular subproblem, we have already solved all of the smaller subproblems its solution depends upon beforehand, and we have saved their solutions.
These two methods are asymptotically the same.
- top-bottom approach code
Here is the Python code for dynamic programming using top-bottom memoization method.
def memoized_cut_rod(p, n): from math import inf def memoized_cut_rod_aux(p, n, r): if r[n] >= 0: return r[n] if n == 0: q = 0 else: q = -inf for i in range(1, n+1): q = max(q, p[i-1]+memoized_cut_rod_aux(p, n-i, r)) r[n] = q return q r = [-inf for i in range(n+1)] return memoized_cut_rod_aux(p, n, r) print(memoized_cut_rod([1,2,3,9], 3))
3
- bottom-up approach code
Here is the bottom-up approach, which is simpler and easier to understand for me. Recall the revenue is \(r_n=\max_{1\le i\le n}(p_i+r_{n-i})\). We need to prove that \(r_n=\max(p_n, r_1+r_{n-1}, r_2+r_{n-2},...,r_{n-1}+r_1)\) is in effect equivalent to \(r_n=\max_{1\le i\le n}(p_i+r_{n-i})\), so that we can justify the use of an inner loop in the code below.
def bottom_up_cut_rod(p, n): from math import inf r = [0 for i in range(n+1)] for i in range(1, n+1): q = -inf # This inner loop is introduced because # we cannot know, ahead of time, the best cutting strategy # that will give us the highest return. # Therefore we need to check all possible strategies, # which were already calculated. for j in range(1, i+1): # p[j-1] is the price of a length j rod # r[i-j] is the best return of length i-j rod, which should be already calculated #q = max(q, p[j-1] + r[i-j]) # q is the possible return from previous operations in this inner loop q = max(q, r[j] + r[i-j], p[j-1]) # this is easier to understand r[i] = q return r[n] print(bottom_up_cut_rod([1,4,4,4,9], 4))
8
- Reconstructing a solution
6.6. Single-Source Shortest Paths
Relaxation in Dijkstra's Algorithm: http://web.cs.unlv.edu/larmore/Courses/CSC269/pathing.
The notion of "relaxation" comes from an analogy between the estimate of the shortest path and the length of a helical tension spring, which is not designed for compression. Initially, the cost of the shortest path is an overestimate, likened to a stretched out spring. As shorter paths are found, the estimated cost is lowered, and the spring is relaxed. Eventually, the shortest path, if one exists, is found and the spring has been relaxed to its resting length.
6.6.1. Definition
In a shortest-paths problem, we are given a weighted, directed graph \(G=(V, E)\), with weight function \(w: E \rightarrow \mathbb{R}\) mapping edges to real-valued weights. The weight \(w(p)\) of a path \(p=\langle v_0, v_1,\ldots,v_k\rangle\) is the sum of the weights of its constituent edges:
\begin{equation} w(p) = \sum_{i=1}^k w(v_{i-1}, v_i) \end{equation}We define the shortest-path weight \(\delta(u,v)\) from \(u\) to \(v\) by
\begin{equation} \delta(u,v)= \begin{cases} \min{\{ w(p): u \overset{p}{\leadsto} v\}} &\text{if there is a path from \(u\) to \(v\)}\\ \infty &\text{else} \end{cases} \end{equation}To put it simply, the shortest-path weight from \(u\) to \(v\) either
- does not exist (\(\infty\)), or
- is the minimum weight of all possible weights of different paths (or simply one path) from \(u\) to \(v\).
A shortest path from vertex \(u\) to vertex \(v\) is then defined as any path \(p\) with weight \(w(p)=\delta (u,v)\).
Concepts:
- Vertices, that are connected to create
- directed Edges, that are given real-valued
- Weights
- Directed Edges with Weights, that form
- Paths with Weights
Single-source shortest-paths problem (SSSP): Given a directed graph \(G=(V, E)\), we want to find a shortest path from a given source vertex \(s\in V\) to each vertex \(v\in V\).
This seems also like a dynamic programming problem because:
- We are looking fo a minimum value
- This value can be obtained by obtaining the minimum value of subproblems of the same kind
6.6.1.1. Variants of the SSSP problem
Single-destination shortest-paths problem: Find a shortest path to a given destination vertex \(t\) from each vertex \(v\). By reversing the direction of each edge in the graph, we can reduce this problem to SSSP.
Single-pair shortest-path problem: Find a shortest path from \(u\) to \(v\) fro given vertices \(u\) and \(v\). If we solve the SSSP problem with source vertex \(u\), then we solve this problem as well. Moreover, all known algorithms for this problem have the same worst-case asymptotic running time as the best SSSP problem algorithms (❓).
All-piars shortest-paths problem: Find a shortest path from \(u\) to \(v\) for every pair of vertices \(u\) and \(v\). Although we can solve this problem by running a single-source algorithm once from each vertex, we usually can solve it faster (as there are many repeated calculations in this algorithm).
6.6.1.2. Summary of the four problems
Using databse cardinality:
- \(1:1\): One vertex to one vertex.
- \(1:many\): One vertex to multiple vertices.
- $src:many: base/example problem the algorithm works with.
- \(many:1\): Multiple vertices to one vertex.
- $many:dest: can be solved by inverting directions of edges and starting from \(dest\) to \(many\).
- \(many:many\): Every vertex to every vertex.
6.6.2. Optimal substructure of a shortest path
Shortest paths algorithms typically rely on the property that a shortest path between two vertices contains other shortest paths within it. This property calls for 4.2 and Greedy method.
Dijkstra's algorithm is a greedy algorithm, and the Floyd-Warshall algorithm, which finds shortest paths in \(many:many\) vertices, is a dynamic-programming algorithm.
6.6.3. Negative-weight edges
A graph \(G=(V, E)\) can contain negative-weighted edges \(E_{nw}\) but and be well defined, given that: for any vertex pair \((u,v)\), if \(w(u, v)<0, w(v, u) > 0\), then \(w(u,v)+w(v, u)\ge 0\). This prevents a negative-weight cycle.
Non-negative (+6-3) +----+ 5 +----+ +6 +----+ |src |---->|5 |---->|11 | | | | |<----| | +----+ +----+ -3 +----+ Negative (+6-7) +----+ 5 +----+ +6 +----+ |src |---->|-inf|---->|-inf| | | | |<----| | +----+ +----+ -7 +----+
6.7. Data structure
Dynamic sets: sets that can grow, shrink, or otherwise change over time.
Dictionary: a dynamic set that supports insert
, delete
, and test
(membership)
Category | Operation |
Query | search() |
6.7.1. Stacks and queues
data structure | policy | example |
---|---|---|
Queue | first in, first out | real life queue |
Stack | last in, first out | recursive call stack |
Priority queue | whatever in, min/max key out | Triage |
Stack operations (all \(O(1)\)):
push(element)
pop()
stack_empty()
Queue operations (all \(O(1)\)):
enqueue(element)
dequeue()
6.7.2. Linked lists
Three types of linked lists:
- singly linked list
- doubly linked list
- circular list
The following content concerns unsorted Doubly Linked List.
6.7.3. Hash tables
Addressing: Putting one item to a place according to some rule.
Direct-address tables: associate keys with indexes in an array of size U
, when the number of possible keys \(U\) is small. \(O(1)\) for searching in worst-case scenario.
When \(U\) is large, creating an array/table \(T\) of size U
may be immpractical.
In addition, the number of keys actually stored may be so small relative to \(U\) that most of the space allocated to the table \(T\) will be wasted.
Thus we come up with another idea (Hash Table) to save storage and maintain \(O(1)\) search time in average-case scenario.
With hash tables, we now cannot use an element's key to store it because its key might be larger than the length of the hash table. To solve this problem, we use a hash function \(h\) to compute the slot/position in the table \(T\) from key \(k\). \(h\) maps the universe \(U\) of keys into the slots of a hash table \(T[0\ldots m-1]\): \(h:U\rightarrow \{0,1,\ldots,m-1\}\) and \(m\) is usually much less than \(|U|\).
We also say that an element with key \(k\) hashes to slot \(h(k)\), and that \(h(k)\) is the hash value of key \(k\). If two keys somehow hash to the same slot, this is called a collision.
To minimise or even avoid collision, we make \(h\) appear to be "random", making its output as dispersed as possible.
There are two ways to deal with collision: chaining and open addressing. Chaining: make every slot in the table \(T\) a linked list (doubly linked list makes deletion faster).
In a nutshell:
- We want to store some objects in such a way that it's fast for us to get to them later via some key.
- We then just use an array to store the objects, each is placed in the array using their key as the index, and one can use a key to find the corresponding object in the array. This is called open addressing.
- As the number of potential objects grows, it requires more storage. However, most of the time, we only want a subset of the objects to be readily available because those are what we use anyway.
- Therefore, there is really no need to store all possible objects.
- Then we decide to use a smaller array to store the objects.
- Because the key of an object is no longer also an index in the smaller array (the key might be larger than the length of the array), it is logical to develop a function to map the key to an index of the array.
- The function is properly named hash function.
- The array is then called a hash table.
- The story does not end here. Because there are more objects than the number of places available to store them in the array (hash table), it is possible that two elements will be put in the same position (two different keys end up with the same index through the hash function). Pigeon hold theorem.
- To solve this, we have chaining and open addressing
- Chaining links all elements that ended up in the same index so the 1-d array now becomes 2-d.
- Because the linking of the elements at one index can be infinite (as long as the memory is sufficient), the hash table can store infinite objects, but the time spent on retrieving one object also increases until it becomes asymptotically the same as directly searching through all objects.
- Open addressing tries to find the next available index if the current index is already taken.
- This means that the table can fill up and no more insertion can be made.
6.7.4. Priority queue
We use Heap to implement it.
A priority queue is a data structure for maintaining a set \(S\) of elements, each with an associated value called a \(key\) (priority). A max-priority queue supports the following operations:
- \(insert(S, x)\) inserts the element \(x\) into the set \(S\), i.e. \(S=S\cup \{x\}\), \(O(\lg n)\)
- \(maximum(S)\) returns the element of \(S\) with the largest \(key\), \(O(1)\)
- \(extract\_max(S)\) removes and returns the element of \(S\) with the largest \(key\), \(O(\lg n)\)
- \(increase\_key(S, x, k)\) increases the value of element \(x\text{'s}\) \(key\) to the new value \(k\), which is assumed to be at least as large as the current \(key\), \(O(\lg n)\)
Similarly, a min-priority queue supports:
- \(insert(S, x)\) inserts the element \(x\) into the set \(S\), i.e. \(S=S\cup \{x\}\)
- \(minimum(S)\) returns the element of \(S\) with the smallest \(key\)
- \(extract\_min(S)\) removes and returns the element of \(S\) with the smallest \(key\)
- \(decrease\_key(S, x, k)\) decreases the value of element \(x\text{'s}\) \(key\) to the new value \(k\), which is assumed to be at least as small as the current \(key\)
Max-priority queue can be used to schedule jobs on a shared computer. Min-priority queue can be used in an event-driven simulator.
6.7.4.1. Implementation of max-priority queue
Based on the pseudo-code on CLRS.
class max_priority_queue: def __init__(self, nums: List[int]): self.q = max_heapify(nums, 0, len(nums)-1) def heap_max(self): return self.q[0] def heap_extract_max(self): if len(self.q) < 1: raise Exception m = self.q[0] self.q[0] = self.q[len(self.q)-1] self.max_heapify(self.q, 1, self.q[0]-1) @staticmethod def max_heapify(arr, i, heap_size): pass
7. Clock table
Headline | Time |
---|---|
Total time | 1d 15:49 |
Problems | 23:22 |
Algorithms | 0:54 |
Techniques | 0:20 |
CLRS Notes & Yufei Tao lecture notes | 11:47 |
Exporting | 0:38 |
Fix up | 2:48 |
8. Appendix
List of Listings
- Listing 1: imports for typing
- Listing 2: 1313 Decompress Run-length encoded list my solution
- Listing 3: 1313 Decompress Run-length encoded list leetcode solution
- Listing 4: concatenate lists
- Listing 5: 1295 Find numbers with even number of digits my solution
- Listing 6: 1295 Find numbers with even number of digits leetcode solution
- Listing 7: 1365 How many numbers are smaller than the current number my solution
- Listing 8: 1365 How many numbers are smaller than the current number leetcode solution
- Listing 9: 1409 Queries on a permutation with key my solution
- Listing 10: 1409 Queries on a permutation with key discussion solution
- Listing 11: 1329 Sort the matrix diagonally my solution
- Listing 12: 1329 Sort the matrix diagonally leetcode solution
- Listing 13: 1329 Sort the matrix diagonally 2 my solution
- Listing 14: 1329 Sort the matrix diagonally 2 leetcode solution
- Listing 15: 1395 Count number of teams my solution
- Listing 16: 1395 Count number of teams efficient solution
- Listing 17: 1395 Count number of teams general solution
- Listing 18: 442 Find all duplicates in an array my solution
- Listing 19: 442 Find all duplicates in an array leetcode solution
- Listing 20: 78 Subsets my solution
- Listing 21: 78 Subsets leetcode solution
- Listing 22: 48 Rotate image my solution
- Listing 23: 48 Rotate image leetcode solution
- Listing 24: 48 rotate image sample
- Listing 25: 72 Edit distance my solution
- Listing 26: 72 Edit distance leetcode solution
- Listing 27: 121 Best time to buy and sell stock my solution
- Listing 28: 121 Best time to buy and sell stock leetcode solution
- Listing 29: 121 Best time to buy and sell stock dp solution
- Listing 30: 53 Maximum subarray my solution
- Listing 31: 53 Maximum subarray leetcode solution
- Listing 32: 53 Maximum subarray my solution O(1) space
- Listing 33: 70 Climbming stairs my solution
- Listing 34: 70 Climbming stairs leetcode solution
- Listing 35: 746 Min cost climbing stairs my solution
- Listing 36: 746 Min cost climbing stairs leetcode solution
- Listing 37: 746 Min cost climbing stairs my solution improved
- Listing 38: 703 Kth largest element in a stream my solution
- Listing 39: 703 Kth largest element in a stream my solution-results
- Listing 40: 703 Kth largest element in a stream leetcode solution
- Listing 41: 703 Kth largest element in a stream leetcode solution-results
- Listing 42: 703 Kth largest element in a stream leetcode solution improved
- Listing 43: 215 Kth largest element in an array my solution
- Listing 44: 215 Kth largest element in an array my solution-results
- Listing 45: quickselect implementation
- Listing 46: quickselect implementation-results
- Listing 47: 215 Kth largest element in an array leetcode solution
- Listing 48: 215 Kth largest element in an array leetcode solution-results
- Listing 49: 973 K closest points to origin my solution
- Listing 50: 973 K closest points to origin my solution-results
- Listing 51: 973 K closest points to origin leetcode solution
- Listing 52: 973 K closest points to origin leetcode solution-results
- Listing 53: 347 Top K frequent elements my solution
- Listing 54: 347 Top K frequent elements my solution-results
- Listing 55: 347 Top K frequent elements leetcode solution
- Listing 56: 347 Top K frequent elements leetcode solution-results
- Listing 57: mostcommon
- Listing 58: 355 Desin twitter my solution
- Listing 59: 355 Desin twitter my solution-results
- Listing 60: 355 Desin twitter leetcode solution
- Listing 61: 355 Desin twitter leetcode solution-results
- Listing 62: 778 Swim in rising water my solution
- Listing 63: 778 Swim in rising water my solution-results
- Listing 64: 778 Swim in rising water my solution - 2
- Listing 65: 778 Swim in rising water my solution - 3
- Listing 66: 778 Swim in rising water my solution - 4
- Listing 67: 778 Swim in rising water leetcode solution
- Listing 68: 778 Swim in rising water leetcode solution-results
- Listing 69: 1022 Sum of root to leaf binary numbers my solution
- Listing 70: 1022 Sum of root to leaf binary numbers my solution-results
- Listing 71: 1022 Sum of root to leaf binary numbers leetcode solution
- Listing 72: 1022 Sum of root to leaf binary numbers leetcode solution-results
- Listing 73: 1207 Unique number of occurrences my solution
- Listing 74: 1207 Unique number of occurrences my solution-results
- Listing 75: 1207 Unique number of occurrences leetcode solution
- Listing 76: 1207 Unique number of occurrences leetcode solution-results
- Listing 77: 706 Design HashMap my solution
- Listing 78: 706 Design HashMap my solution-results
- Listing 79: 706 Design HashMap leetcode solution
- Listing 80: 706 Design HashMap leetcode solution-results
- Listing 81: 208 Implementing Trie my solution
- Listing 82: 208 Implementing Trie my solution-results
- Listing 83: 208 Implementing Trie leetcode solution
- Listing 84: 208 Implementing Trie leetcode solution-results
- Listing 85: 1038 Binary Search Tree to Greater Sum Tree my solution
- Listing 86: 1038 Binary Search Tree to Greater Sum Tree my solution-results
- Listing 87: 1038 Binary Search Tree to Greater Sum Tree leetcode solution
- Listing 88: 1038 Binary Search Tree to Greater Sum Tree leetcode solution-results
- Listing 89: 21 Merge Two Sorted Lists my solution
- Listing 90: 21 Merge Two Sorted Lists my solution-results
- Listing 91: 21 Merge Two Sorted Lists leetcode solution
- Listing 92: 21 Merge Two Sorted Lists leetcode solution-results
- Listing 93: 20 Valid Parentheses my solution
- Listing 94: 20 Valid Parentheses my solution-results
- Listing 95: 20 Valid Parentheses leetcode solution
- Listing 96: 20 Valid Parentheses leetcode solution-results
- Listing 97: Insertion sort
- Listing 98: dynamic programming - edit distance
- Listing 99: dynamic programming - edit distance improved
- Listing 100: bitmask
- Listing 101: A 3-node construct that
max-heapify
works on - Listing 102: max-heapify
- Listing 103: max-heapify-results
- Listing 104: build-max-heap
- Listing 105: heapsort
- Listing 106: quicksort
- Listing 107: quicksort-results
- Listing 108: partitioning the array
- Listing 109: Quickselect python implementation
- Listing 110: selection worst case linear
- Listing 111: cutting rod recursive top-down implementation
- Listing 112: top-bottom dynamic programming meoization
- Listing 113: bottom-up dynamic programming approach
- Listing 114: max-priority queue
Created: 2022-01-19 Wed 22:22