Question: Given m sorted arrays each of size n, what is the fastest algorithm to find the kth smallest element in the union of the m arrays?
A friend sent me this optimization question on Quora. He said he’d solved it in Python, on a plane ride, in O(log(k)) time (Note: the algorithm uses O(log(k)) time, [presumably] not my friend). I thought sure, challenge accepted.
I didn’t have the luxury of a full plane ride to figure it out, but I came up with a solution in R, using linear search. I know, I know, linear search is O(n) time. No excuse, I lost the challenge. But again, no plane ride! C’mon!
I wonder, though. Considering that sortation is a precondition for binary search, and sorting itself can take O(n) time, O(n+k) time, O(nk) time or more means that binary searching over a sorted list could take O(n) + O(log(n)) time in total, as opposed to linear/brute force search which tends to O(n) time. Not sure… should come back to Big O notation later.
As promised, please find the below solution on my recently not-so-anemic github page.
#searchArrays takes a list of sorted vectors as an argument #and finds the kth smallest common element #note that the input sorted vectors must be sorted descending searchArrays <- function(array.list,k=1){ element.vector<-vector() for(e in 1:length(array.list)){ array.list.search<-array.list[!1:length(array.list) %in% e] for(p in 1:length(array.list[[e]])){ x.search<-array.list[[e]][p] element.vector<-x.search for(l in 1:length(array.list.search)){ for(n in 1:length(array.list.search[l])){ element.vector<-append(element.vector,array.list.search[[l]][n]) if(length(element.vector)==length(array.list) & range(element.vector)[1]==range(element.vector)[2]){ return(element.vector) } } } } } if(length(element.vector)==length(array.list) & range(element.vector)[1]==range(element.vector)[2]){ print(element.vector) } else { print("Error: no common element across arrays") } }
The input to the function searchArrays() is a list of sorted vectors. The goal is to find the smallest common element in all of the arrays.
Using loops, I define the “search” array e (the first array), then loop through each element in e, which I label x.search (maybe I should have called this e.i). I then separate e from the search arrays, labeling those array.list.search.
At the heart of the nested loops is the “element.vector”, which builds one element at a time, checking for uniformity with each step. To illustrate, if you print it in real time it looks something like:
a a b a a a a a a a a a c a a a a a a a a a a
To test the above:
test.gen<<-function(n){sort(sample(letters,n,replace=T))}; test<-list() for(i in 1:5){a<-list(test.gen(100));test<-append(test,a)} searchArrays(test)
Final thought: solving the above required checking whether element.vector was uniform at each step. It was an unexpected challenge which I enjoyed solving. The ahah moment came when I realized that if the vector is uniform , the min and max should be the same. Thus you can compare the range:
range(element.vector)[1]==range(element.vector)[2])
This blog post brought to you by Charles Mingus radio, via Spotify.