# Algorithm(9) Search

Find particular properties based on the abstracted decision tree or abstracted graph.

The input data is not the explicit tree with Tree node for questions listed in this blog. For the string structure, it may use some ideas like sliding window with loer bound and upper bound as the mechanism to search things, we do not list them here, we mainly list the question with the decision tree or graph in an logical way but the input is not the not the explicit tree structure.

Input data DFS
(up to bottom)
DFS
(bottom to up)
Naive BFS BFS with visited flag BFS+DFS Binary search Accelaration by Map
single number (combination)
22
array of numbers (combination)
39,40,77
78,90,216
(permuataion)
46,47,996
string (permutation)784
(expression)856,726
301
(expression) 394
2d Array 943,37,51 17,

### Key techniques and considerations

• Extracting decision tree from the actual problem. One difficulty with the search problem is that we need to extract the decision tree logically from different problem contexts. Typical scenarios are 1> the binary tree type, which decide add or not add current character. Each node represent choose or not choose here 2> Each node represents an actual character as the tree node, such as the permulation problems or combination of different cases. 3> Each node represents the derived results about the givin input, a little bit similar to the reduction opetaion. such as 241, the resutls can be a set contains all possible solutions.

• Using DFS or BFS, we can decide to use BFS or DFS from several aspects:

1. For the tree node expression, the BFS case might need to use a customized tree node (which is rarely used in leetcode questions), usually use the pair or tuple if we need multiple variables in the tree node. For the DFS code, there are less restrictions since we basically hide the tree node inexplicitly in the function call. We can transfer the necessary variables used in each call by function parameters. We can also set some varaibles as the class variables to avoid transfering them by tree node and simplify the code a little bit.

2. Time complexity and if there is explicit layered structure. Although both the time compexity is O(V+E), the DFS is suitable for the narrow long tree, ans the BFS is suitable for the wide and short tree. In practical case, we may need to evaluate the data scale of the leetcode question, this may decide which algorithm is used for the giving data sets. This is tricky and is easy to be neglected when we start to code in in a rush without considering time compexity in advance carefully.

If the question shows an obvious layered structure, the BFS can be more approporiate. For example, we find the minimal path with the particular constrians, we use the BFS, once find the path, we return it direactly. These type of questions are typical BFS style, and the BFS can ususlly save more efficient for solving these type of questions.

3. For the DFS solution, pay attention that if we need return sth from the child tree. Namely make sure which is more convenient, either top to bottom operation (the previous informaion in the path can be used to determin the subsequent nodes in the path) or bottom to up operation. The core idea is to make sure what are computation processed in each node (is it merging two results from subtree or divide the current results to two subtree). It is worth noting that, for the bottom to up case, we can only use the DFS since it keep the status of the current node, so we can update the status when getting back results from the subtree. For the BFS, it just pop the parent out during the traverse process and we can not further update the previous status of the nodes.

For the DFS case, there are three typical design: (a) Check all solutions and return void, put the valid path into a separate ans (b) Return true or false, we only need to find one path in this case, if we find the valid path, just return true (such as 37) (c) Return a specific value for each dfs call, maybe a reduced integer or more complex results such as a set.

1. Set back the status. It is important to always check if we set the status back for the dfs operation. For example, a> when we find specific path, remember to pop the current node back from the ans if we insert it for the current layer (make sure the pop_back match with the insert, otherwise, there might be the memory issue). b> The status can also be the flag variable that label if a specific character is visited or not. c> In the for loop scenarios, we need to make sure the status if set back to the parent case before calling the dfs operations each time.

If there is no difference between the aspects discussed above, we can either use BFS or DFS. And using DFS might be more easier from coding’s perspective if we write it into a recursive way instead of using a stack.

• Commonly used debug strategy. This part of question is easy prone and some small type of condition error can lead to totally different results. Start from the small scale case or print out the traverse path explicitly are commonly used strategies.

• Commonly used optimization strategy or trimming strategy. Sort the input array. Using flag to label the status of associated elements. Using map to memorize existing path (if the key and values are clear). Avoid the unnecessary string copy.

• Ensentially the bfs or dfs questions discussed this part can process the unweighted graph. For the question which can be abstracted as the weighted graph, we need to use more complex mechanism to do that such as the dp method to acclerate the searching process (such as dp approach of the 943)

• Be careful about using the vector<bool> when you need to use a flag array. There are lots of discussion online, you may just use the vector<int> for simplicity or use the bit mask if the flag length is fixed. Using the vector<bool> can slow down the your code heavily (a least 10x times slower compared with usinig the vector<int>) 996 is a good example (three important conditions) to discuss about how to set the flag to make sure how the same node can be visited once in the same level.

• A tip for checking the associated elements in 2d case. We might check the element at the different direaction such as up down left and right even for slash direaction sometimes, instead of go through each dimention separately, the good way is to initilize a 2d vector, such as (-1,1),(1,-1)... then we use one loop to range this vector and adjust the x and y coordinates accordingly. This method is more simple to check the different direction of the 2d grid (Related question: 51).

### Naive DFS-UpToBottom-Combination

The nature of the combination is the cartision product, we may need to have an idea about the set. Then the results of the combination might be to do the cartision products between different sets. It is important to figure out what are avalible set for some questions with complex background.

39. Combination Sum

This is the recursion version, which is really fast. For this recursion version based on dfs, the pop out operation at the end of dfs call is really important. Sorting the input array is an important techniques, sometimes, it can trim the searching tree (when the target value is negative one we trim the tree). The key idea to make sure that the elements can be used multiple times is that for each step of creating the children node, we go through all possible options no matter it is visited or not.

Another key point is that, if we do not use the sort, we may have repeated resutls such as [2,3,3] and [3,2,3]. From the implementaion perspective, using the remain value, and we decrease it gradually (instead of sum value) can be more convenient. Otherwise, we need to check if the current path exists in the ans set (we need to sort the array and check back and forth, which is redundant)

We can also use the BFS for this simple context, but for this question, there is no obvious benifits for using the BFS, since we need to search all possible solutions. However, it needs some difficuties to process the intermidiate node. Using the targetNew value as the distinction for different strategies is important. (how to define the control block is important)

40. Combination Sum II

This one is similar with previous one, but the only difference is that the element is not unique in the input string, we add some constrains to avoid the repeated ans. The sorting operation is still necessary, we just skip the current element if it has been visited.

77. Combinations

The idea is similar with previous one. Be careful the meaning of the n and k. This can be viewed as the basic one for this type of question. Using the dfs can make things much more easier. For the first layer, there are n possibilites, and then the next layer, there is n-1 combinations.

78. Subsets

This is also comparatively easy and we can use the similar ideas to solve this questions. The difference is that is stores the results into the ans for every possibilities. For previous quesitons, it stores the value only when it satisfies specific constraints. This one is more generous without specific constraints.

90. Subsets II

There are some difference, it contains the repeated elements in queue, and results should not be repeative. We can solve this using the similar techniques used in 40

216. Combination Sum III

There are not much differences compred with previous type, we put the curr into the ans when the constraints are satisfied.

22. Generate Parentheses

Evey node can be the left or right parentheses, and then we trim some results. From the top to bottom, the final path from the root to the leaf node is the actual ans we need to record. We can only place the right parathesis if the count of the left parathesis is larger than 0.

37. Sudoku Solver

the idea of checking the conflicts is just check it from the 1-9 and keep calling the next level, try to set the cell to a dot if there is no possible results. return the true if all cells are filled in.

The logic is not complex but it sill take me some time to figure it out. Originally we did not add return true or false, so it basically search all possibilities even if we find a valid results. The idea here is that we return it immediately if we find a valid ans. For the case when there is number at a specific position, we also need to add the return true or false.

51. N-Queens

This question is similar to previous one. The dfs need to go thrrough all possible combinations and store the valid results during the traverse. For each level, we can assume the possible options (namely the elements in the set) is the position for each one in different column. Then we use a function to decide if each position is conflict with the current grid results to decide if the current tree can move one step further. The recursion depth is the size of the grid.

Fur the current code, the meaning of each node means if the tree is at the specific row, column position. Since each row can only put one element, so we can simply use a for loop to detect each case and range it. (Every option is exclusive)

Origianlly, we consider a more general case and consider each cell separately. Each cell can put a queue or not, for that code, even though it can work, but we create too many recursions and the code become slow. The time complexity is (2^(n*n)) for that case.

### Naive DFS-UpToBottom-Permutation

46. Permutations

Typical permutation question, we can reuse the similar framework and put things into the ans only when all elements are checked. Still go through the path as previous one, put the path into the ans only when it get to the end.

Maybe there are more efficient way to consider which path is unnecessary, in this example, we just check if the particular elements is in the curr, if it is already in the curr, we do not go through along that path. The start position in this question start from the 0 to end instead of customized start position to end (since the results can be any order).

We can use another flag vector to label if the element is used or not, if then when we pop the element out, we set the specific label into false. This can further improve the code a little bit.

This is the code with the visited flag. It looks that the permutation related problem use the visited flag a lot (each ans is a complete path) and the combination problems use the sort a lot (the answer can be half of the path).

47. Permutations II

Similar to the previous question, the input of this problem contains the repeated elements. In previous one, we solve this issue by sorting the input firstly and then compares nums[i] and nums[i-1], when they appears at the second time, we do not go through along that path. It use both the sorting strategy and the visited flag to trim the path

943. Find the Shortest Superstring

The original idea of this question is that, do all the permulations, and then remove the potential overlapping part.

The original idea to remove the overlapping part is to find if the start position of second string is in the first string. Assuming in the position i of the first string, then compare the i to the end, to see if it appears in the second string, it is possible there are multiple positions. This operation can be applied every time we add a new string.

this operation can be also done before searching (this is a good point), and we store the results explicitly. then we do not need to do the similar operation during the search (we store the information beforehand) This info is important to construct the final path, dfs call is n! for every node, we call n nodes.

This is the version that computes the prior information about the overlap length between each two pairs.

This code can not pass all tests because and exceeds the limited time. The ideal way is still to view the problem as a weighted graph and find the shortest path. Refer to the graph listed in huahua’s blog and the official solution here. Using the mask to remember the existing visited nodes is a little bit tricky. The 1 or 0 in each bit position of the mask represents that if the associated value is visited or not. The mask ^ (1<<j) represents that we set jth position as 1 and put it into the mask. (The ^ is the xor opertaion, if the original bit is 0 and current one 1<<j is 1, the results here for specific bit is 1). Using s & (1 << j) operation can check if the specific bit is set. If it is set, the results is a non zero number, otherwise, the result is zero. We do not discuss the dp solution here. The dp solution can be written into the recursive way or not. The core ideas is that we keeping the intermidiate status during the search. How to memorise that information is key point to improve the speed.

996. Number of Squareful Arrays

This is a permutation operation pluse the constraint. The idea is a little bit similar with previous one. Same to the previous one, the naive searching method is still time consuming and the dp method can improve the code a lot, but it is not easy to figure out that dp solution. There are several small points, for example, 0 is also a square number. We can also sort the array and compute the prefix sum to see if the results is squere number. We can avoid the repeated solutions by this way.

This is the version to utilize the prior knoledge, maybe the good strategy is not to compute them all at once, we can cache them during the searching process.

The efficient way is still to use the dp method, it is ensentially a hamitonion path in the graphy theory, refer to this explanation (https://www.youtube.com/watch?v=ISnktonx7rY). When there two numbers are prefect squre, there is a path between them.

### Naive DFS-UpToBottom-Expression

This type of question provide a expression, then maybe we need to compute some properties according to the expression. There usually exist the inserted structure which can be abstracted into a tree. The question in this type is a little bit like decoding the string into a tree structure with layered information, even if we do not use an actually tree structure, we still need to keep a layer variable to trace current part is at which layer. The 856 is a good example.

856 Score of Parentheses

the code looks super easy we do not even write it into a recursion way, but it is not straightforward to figure out why it works. The whole idea for solving this problem is to emulate a tree structure, different with the previous one, every node is a () pair.

For example, if the input is (()), the imaginary tree is as follows:

If there is (()()), the imaginary tree (or the decision tree) is as follows:

The key is that we do not consider the implementaion, we consider the abstraction first. Then according to the rule of the score, for each leaf node, its score is 2^depth, then we accumulate all the scores of leaf node together. Although the description of the question is to compute the score from the bottom to top, it is much easier to compute scores from the top to the bottom.

Regarding to the implementation, when there is (, the depth increase, when there is ) the depth decrease, and if the previous one is (, we assume that we met the leaf node. The whole sequence is just like the LRN traversal of the tree

Another way is to find the balanced outlayer strcutre, and then use the recursion to call the inner layer, which is like the tree taversal from the top to the bottom, in the string case, it can use the left and right index to label the each layer of the structure, when you find each layer, the tree depth increase one.

Enssentially, it like the process of decoding the string into the tree and then compute some properties of the tree. For the 394. Decode String, this idea is more obvious, and we basically use the bottom to up strategy of the tree based approach and we decode each subtree recursively. For each level, we just need a function to let it return the reuslts of the subtree. For the tree structure in logically, the leaf node is the character and the parent node is the number. The issue is that in the string structure, we basically flatten everything and not knwo when it end. So that’s why we also return a right index. That right index labels when the current level moves to the end. And then we start from the next segment (which is the node at the same level with the current node)

726. Number of Atoms

This is the a hard one, the main reason is that the rules are a liitle bit complex. The idea is same with previous one, the different place is that we read the string from right side to the left side, it might be easier to consider things in this way. By this way, we check the number first (the non leaf node) then the atom (the leaf node). The code can be further optimized. Although current code can work, it is still a little bit unclear for when to put the information into the map.

Another example is 736, which is a little bit complex. (we will solve it in future)

301. Remove Invalid Parentheses

For this type of question, we need to generate new expression based on given expression. The naive way is to use the binaray tree, if the results increase the specific character or not. Without the trim, the complexity is 2^n, every time we map the status array into a new case.

This is the version based on the bit mask and the cache, it does not improve the performance a lot, it however slows down the code, not sure the reason. (maybe the cache is not necessary)

394. Decode String

We write this code into a recursion way, once we got the bracket, we do a recursion call. It actually use the bottom to up way to do the dfs

### Naive BFS

17 Output all possible combinations.

We basically need to output all the paths, each number represents a new level for each all possible nodes. Essentailly, there is a tree structure and it is good to use the bfs method to search all the tree and only put the results in vector at the last layer. We do not need to put the value in ans at the first several layers and we just need to put the value into the ans at the final layer. The code structure is same with the tree based version, the difference is that the node is not the actual tree node, it is just the imaginary tree node.

784. Letter Case Permutation

This is the sample code, the framework is still same with previous one (dfs split the current string into the current part and the remaning part). We just classifiy the type of each elements can consider each operations gradually. the idea is still the recursion based on dfs manner.

The goal of algorithm is to find all the combinations, for each character which is a letter, it can be either a small case or large case, so there are two options for current layer