Tag Archives: RecordLinkage

Installing RecordLinkage from the archive

Working to achieve database linkage using approximate or “fuzzy” matching, I needed to link customer names in one database to possible matches in another. Fuzzy matching one name to a small list of other possible names is well-documented and actually quite simple in R with agrep() and adist(). The challenge compounds on itself, though, as the list of potential matches grew. I need to match 12K names against a potential list of ~115,000 names–over a billion possibilities. Computation was an issue, especially under tight time constraints.

The package RecordLinkage, by Murat Sariyar and Andreas Borg, attempts to solve this problem in R by implementing the matching using the ff data classes (among many other useful utilities). For some reason I don’t know, RecordLinkage as a project was abandoned and archived. The package still works (and the work is fascinating: http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Sariyar+Borg.pdf).

To install RecordLinkage from CRAN archive, follow instructions here:
http://stackoverflow.com/questions/24194409/how-do-i-install-a-package-that-has-been-archived-from-cran

On windows, it requires first installing RTools, then running this code:

url <- "http://cran.r-project.org/src/contrib/Archive/RecordLinkage/RecordLinkage_0.4-1.tar.gz"
pkgFile <- "RecordLinkage_0.4-1.tar.gz"
download.file(url = url, destfile = pkgFile)

# Install dependencies

install.packages(c("ada", "ipred", "evd"))

# Install package
install.packages(pkgs=pkgFile, type="source", repos=NULL)

# Delete package tarball
unlink(pkgFile)