Thursday, March 27, 2014

Assign n Email Addresses to x Cells, Intrinsically (Part II)

Part I showed the concept and general technique of a method of assigning n email addresses to x cells pseudo-randomly, without the need for maintaining a log of each assignment.

The earlier post considered the basic case of each cell being assigned approximately the same quantity of email addresses. In practice, cell sizes often vary. Below is a technique that works well when the total number of email addresses needed is less than the product of the cell sizes' greatest common divisor and the average email address length. For example, cell sizes are 500, 500, & 1,000; so 2,000 < 500*25ish.

Assign n Email Addresses to x Cells, Intrinsically; Part 2 (variable Cell Sizes)

Assign n Email Addresses to x Cells, Intrinsically; Part 2 (Variable Cell Sizes)

Sample Use Case:
Marketing requests that an email address list be divided randomly into a given number of cells so that each cell would receive a different version of copy.
Below is a technique that takes n email addresses and pseudo-randomly assigns each to one of x cells. The advantage of this method is that the user does not need to maintain a log of each email address's assigned cell since the cell assignment can be reproduced at any time.
This technique is extended from Part 1 to accommodate cells of varying sizes.
First, load in a randomly generated list of email addresses.
set.seed(4444)
library(numbers)

fict.email <- function(n = 5) {
    fict.emails <- data.frame(email = NA)
    for (i in 1:n) {
        fict.emails[i, "email"] <- paste0(paste(sample(letters, sample(3:25, 
            1, TRUE), TRUE), collapse = ""), "@", paste(sample(letters, sample(3:15, 
            1, TRUE), TRUE), collapse = ""), ".", paste(sample(letters, sample(2:3, 
            1, TRUE), TRUE), collapse = ""))
    }
    fict.emails
}
emails <- sample(fict.email(10000))
Next, assign the cell sizes.
cell.sizes <- c(500, 500, 1500, 2000)
Get the number of characters of each email address; this is important because this will remain constant for each entry. Next, find the greatest common divisor of the cell sizes. Use the modulo function to calculate the remainders.
cells <- length(cell.sizes)
cell.gcd <- mGCD(cell.sizes)
em.len <- sapply(emails, nchar)
em.mod <- em.len%%(sum(cell.sizes)/cell.gcd)
Combine mod values into cell numbers.
ranges <- data.frame(start = 0, end = 0)
for (j in 1:cells) {
    ranges[j, "start"] <- (sum(cell.sizes[1:j]) - cell.sizes[j])/cell.gcd + 
        1
    ranges[j, "end"] <- sum(cell.sizes[1:j])/cell.gcd
}

for (k in 1:cells) {
    emails$cell[em.mod >= ranges$start[k] & em.mod <= ranges$end[k]] <- k
}
Split the data frame into the required cell sizes. These lists are the final output.
email.lists <- split(emails, emails$cell)
for (l in 1:cells) {
    email.lists[[l]] <- email.lists[[l]][[1]][1:cell.sizes[l]]
}
Now each email address has been assigned to a specific cell.
Each email address will always belong to the current cell because the number of characters it has will not change.

Wednesday, March 5, 2014

Assign n Email Addresses to x Cells, Intrinsically

Assign n Email Addresses to x Cells, Intrinsically

Assign n Email Addresses to x Cells, Intrinsically

Sample Use Case:
Marketing requests that an email address list be divided randomly into a given number of cells so that each cell would receive a different version of copy.
Below is a technique that takes n email addresses and pseudo-randomly assigns each to one of x cells. The advantage of this method is that the user does not need to maintain a log of each email address's assigned cell since the cell assignment can be reproduced at any time.
First, read in a list of email addresses to be assigned.
emails <- c("vladputin@gmail.ru", "j.wilshere@gmail.com", "princess27@hotmail.com", 
    "dnnyby@yahoo.com", "doctoroctagon@met.com", "tommy2@aol.com", "mikef@preds.com", 
    "vandyfan@vanderbilt.org", "omaha@peyton.com", "cash.johnny@bmi.com", "tbright@caterpillar.com", 
    "soccermom@aol.com", "1736384647.6365227@compuserve.net", "ninfan@aol.com")
length(emails)
## [1] 14
Next, assign the number of cells.
cells <- 3
Create a vector of the number of characters in each email address.
em.len <- nchar(emails)
Use the modulo function (%%) to create a vector of remainders. 1 is added to the number of cells as a holdout.
em.mod <- em.len%%(cells + 1)
The table function summarizes how many email addresses have been assigned to each cell (including the holdout).
table(em.mod)
## em.mod
## 0 1 2 3 
## 3 3 4 4
Separate the original list of email addresses into the assigned cells.
em.1 <- emails[em.mod == 1]  #  cell 1
em.2 <- emails[em.mod == 2]  #  cell 2
em.3 <- emails[em.mod == 3]  #  cell 3
em.0 <- emails[em.mod == 0]  #  control
Display the email addresses assigned to each cell.
em.1
## [1] "doctoroctagon@met.com"             "soccermom@aol.com"                
## [3] "1736384647.6365227@compuserve.net"
em.2
## [1] "vladputin@gmail.ru"     "princess27@hotmail.com"
## [3] "tommy2@aol.com"         "ninfan@aol.com"
em.3
## [1] "mikef@preds.com"         "vandyfan@vanderbilt.org"
## [3] "cash.johnny@bmi.com"     "tbright@caterpillar.com"
em.0
## [1] "j.wilshere@gmail.com" "dnnyby@yahoo.com"     "omaha@peyton.com"
Now each email address has been assigned to a specific number of given cells.
Each email address will always belong to the current cell because the number of characters it has will not change.