An Accidental K-NN

Problem: For a large fitness chain, identify fitness clubs which were over staffing. The best way to do this would be a simple supply-demand comparison of staffed hours vs. member usage on a day-by-day or even hourly basis, but I didn’t readily have this information.

With very limited data access and pressed for time, I decided to look for clubs with relatively higher staffing levels. Obviously, certain larger clubs will staff more hours, so I needed to normalize by things like club revenue, total member usage in terms of hours, square footage of the club and other parameters.

By the end, I had accidentally, incrementally hand-built a K-NN model from scratch.

First cut (v.1):


##using a new file, saving old file in workspace for reference
#file_init<-file
#df1_input1<-df1

file<-"~./Data/Labor KNN/LaborKNNInput2.csv"

df1<-read.csv(file, stringsAsFactors=F)
#removing value rows with 0's
df2<-df1[-which(df1$"Max.of.2013.Total.Usage"==0),]
df3<-df2[-which(df2$"Max.of.Sq.Ft"==0),]
df4<-df3[-which(df3$"Max.of.2013.Club.Revenue"==0),]
row.diff<-nrow(df1)-nrow(df4)
cat(paste("total of",row.diff,"observations removed of",nrow(df1)))

#stratifying data by role_type for clustering
df4$ROLE_TYPE<-as.factor(df4$ROLE_TYPE)
df.split<-split(df4,df4$ROLE_TYPE)

#===K-NN===#
#system.time({
k<-50 #must be >= 10

df.fin<-data.frame()
for(i in 1:length(df.split)){ #for each partition by role-type
t1<-as.data.frame(df.split[i])
t1<-cbind(t1,"group"=i)
if(nrow(t1)<=11){
d<-nrow(t1)-1
} else(d<-k)
df.int1<-data.frame()
for(j in 1:nrow(t1)){ #for each row in each partition
t2<-t1[j,]
t3<-t1[-j,]
vals<-abs(t3[7]-as.numeric(t2[7]))
hours<-t1[order(vals),6]
av<-mean(hours[1:d])
std<-sd(hours[1:d])
zsc<-(as.numeric(t2[6])-av)/std
df.int<-cbind(t1[j,"group"],t1[j,1],zsc,std,d,nrow(t1))
df.int1<-rbind(df.int1,df.int)
}
df.fin<-rbind(df.fin,df.int1)
}
names(df.fin)<-c("group#",names(df1)[1],"cluster z-score","cluster std","cluster size","group size")

})#for system.time

mer<-merge(x=df2,y=df.fin,by.x="ROLE_TYPE_LOCATION",by.y="ROLE_TYPE_LOCATION")
mer2<-mer[-which(mer$Standard_Role=="#N/A"),]
mer3<-cbind("AnlysID"=1:nrow(mer2),mer2)

#head(mer3[order(mer3[10],decreasing=T),],n=20)

For future research:

  • Package ‘kknn’ by Klaus Schliep & Klaus Hechenbichler could be used to the same effect: 

    Click to access kknn.pdf

Leave a comment