Question: Given m sorted arrays each of size n, what is the fastest algorithm to find the kth smallest element in the union of the m arrays?
A friend sent me this optimization question on Quora. He said he’d solved it in Python, on a plane ride, in O(log(k)) time (Note: the algorithm uses O(log(k)) time, [presumably] not my friend). I thought sure, challenge accepted.
I didn’t have the luxury of a full plane ride to figure it out, but I came up with a solution in R, using linear search. I know, I know, linear search is O(n) time. No excuse, I lost the challenge. But again, no plane ride! C’mon!
I wonder, though. Considering that sortation is a precondition for binary search, and sorting itself can take O(n) time, O(n+k) time, O(nk) time or more means that binary searching over a sorted list could take O(n) + O(log(n)) time in total, as opposed to linear/brute force search which tends to O(n) time. Not sure… should come back to Big O notation later.
As promised, please find the below solution on my recently not-so-anemic github page.
#searchArrays takes a list of sorted vectors as an argument
#and finds the kth smallest common element
#note that the input sorted vectors must be sorted descending
searchArrays <- function(array.list,k=1){
element.vector<-vector()
for(e in 1:length(array.list)){
array.list.search<-array.list[!1:length(array.list) %in% e]
for(p in 1:length(array.list[[e]])){
x.search<-array.list[[e]][p]
element.vector<-x.search
for(l in 1:length(array.list.search)){
for(n in 1:length(array.list.search[l])){
element.vector<-append(element.vector,array.list.search[[l]][n])
if(length(element.vector)==length(array.list) &
range(element.vector)[1]==range(element.vector)[2]){
return(element.vector)
}
}
}
}
}
if(length(element.vector)==length(array.list) &
range(element.vector)[1]==range(element.vector)[2]){
print(element.vector)
} else { print("Error: no common element across arrays") }
}
The input to the function searchArrays() is a list of sorted vectors. The goal is to find the smallest common element in all of the arrays.
Using loops, I define the “search” array e (the first array), then loop through each element in e, which I label x.search (maybe I should have called this e.i). I then separate e from the search arrays, labeling those array.list.search.
At the heart of the nested loops is the “element.vector”, which builds one element at a time, checking for uniformity with each step. To illustrate, if you print it in real time it looks something like:
a
a b
a
a a
a a a
a a a c
a
a a
a a a
a a a a
To test the above:
test.gen<<-function(n){sort(sample(letters,n,replace=T))};
test<-list()
for(i in 1:5){a<-list(test.gen(100));test<-append(test,a)}
searchArrays(test)
Final thought: solving the above required checking whether element.vector was uniform at each step. It was an unexpected challenge which I enjoyed solving. The ahah moment came when I realized that if the vector is uniform , the min and max should be the same. Thus you can compare the range:
range(element.vector)[1]==range(element.vector)[2])
This blog post brought to you by Charles Mingus radio, via Spotify.