Problem: For a large fitness chain, identify fitness clubs which were over staffing. The best way to do this would be a simple supply-demand comparison of staffed hours vs. member usage on a day-by-day or even hourly basis, but I didn’t readily have this information.
With very limited data access and pressed for time, I decided to look for clubs with relatively higher staffing levels. Obviously, certain larger clubs will staff more hours, so I needed to normalize by things like club revenue, total member usage in terms of hours, square footage of the club and other parameters.
By the end, I had accidentally, incrementally hand-built a K-NN model from scratch.
First cut (v.1):
##using a new file, saving old file in workspace for reference #file_init<-file #df1_input1<-df1 file<-"~./Data/Labor KNN/LaborKNNInput2.csv" df1<-read.csv(file, stringsAsFactors=F) #removing value rows with 0's df2<-df1[-which(df1$"Max.of.2013.Total.Usage"==0),] df3<-df2[-which(df2$"Max.of.Sq.Ft"==0),] df4<-df3[-which(df3$"Max.of.2013.Club.Revenue"==0),] row.diff<-nrow(df1)-nrow(df4) cat(paste("total of",row.diff,"observations removed of",nrow(df1))) #stratifying data by role_type for clustering df4$ROLE_TYPE<-as.factor(df4$ROLE_TYPE) df.split<-split(df4,df4$ROLE_TYPE) #===K-NN===# #system.time({ k<-50 #must be >= 10 df.fin<-data.frame() for(i in 1:length(df.split)){ #for each partition by role-type t1<-as.data.frame(df.split[i]) t1<-cbind(t1,"group"=i) if(nrow(t1)<=11){ d<-nrow(t1)-1 } else(d<-k) df.int1<-data.frame() for(j in 1:nrow(t1)){ #for each row in each partition t2<-t1[j,] t3<-t1[-j,] vals<-abs(t3[7]-as.numeric(t2[7])) hours<-t1[order(vals),6] av<-mean(hours[1:d]) std<-sd(hours[1:d]) zsc<-(as.numeric(t2[6])-av)/std df.int<-cbind(t1[j,"group"],t1[j,1],zsc,std,d,nrow(t1)) df.int1<-rbind(df.int1,df.int) } df.fin<-rbind(df.fin,df.int1) } names(df.fin)<-c("group#",names(df1)[1],"cluster z-score","cluster std","cluster size","group size") })#for system.time mer<-merge(x=df2,y=df.fin,by.x="ROLE_TYPE_LOCATION",by.y="ROLE_TYPE_LOCATION") mer2<-mer[-which(mer$Standard_Role=="#N/A"),] mer3<-cbind("AnlysID"=1:nrow(mer2),mer2) #head(mer3[order(mer3[10],decreasing=T),],n=20)
For future research:
- Package ‘kknn’ by Klaus Schliep & Klaus Hechenbichler could be used to the same effect: