Tag Archives: statistics

A Sequence Clustering Model in R

I’ve just released my first R package!
Over the past 1.5 years or so, I’ve been studying an obscure statistical model over ranking (full, or partial) data called Mallows’ model. ┬áIt hypothesizes that a set of sequence data has a “modal” sequence about which the data cluster, and that the data fall away from that modal sequence with probability \exp{-\lambda}, where \lambda is derived from the data.

There is a closed form solution to determining the \lambda parameter and the modal sequence, as outlined in my third reference in the package manual . However, we often have data that are heterogeneous, and it makes more sense to hypothesize multiple modal sequences. That is where this package comes in.

The RMallow package can take any set of sequence data of reasonable size (currently I have gotten it to process data sets with 2000 + rows and 20 + attributes in under a minute), and fit a Multi-Modal Mallows’ Model to the data using an EM algorithm as described in my first reference in the package summary.

Certain conditions have to be met for the model to fit very well. For example, if there happen to be multiple modal sequences, but they are very close together (as determined by a distance metric called Kendall’s metric, which is equivalent to Bubble Sort in a way), Mallows’ model may simply group them all into the same “cluster.” Here is an example where I fit the model to some crude synthetic data, and discover two constructed modal sequences. The sequences are no more than distance 20 apart (the maximum possible distance in this space is 20*(20-1)/2 = 190), and it still groups the data very well!

The idea of the construction of the data in the following is that we create two modal sequences for the code to find. I then construct a good number of identical sequences to the modal ones, and randomly transpose between 1 and 10 of the elements of each (uniformly, which I stress is NOT how Mallows’ model works…). I then add a good deal of noise in partial rankings (rankings where there are no full orderings between certain groups). The final call is the code to fit the model.

# Some synthetic data to represent the capability of RMallow
library(RMallow)
modal <- 1:20
second.modal <- c(1, 2, 3, 4, 5, 17, 7, 9, 8, 11, 10, 
                 12, 13, 14, 15, 16, 6, 18, 19, 20)
partial.noise <- list()
modal.noise <- list()
second.noise <- list()
for (i in 1:300) {
  # Partial ranking noise
  partial.noise[[i]] <- sample(20, replace = TRUE)
  # Modal sequence 1
  modal.noise[[i]] <- modal
  # Modal sequence 2
  second.noise[[i]] <- second.modal
  # Choose to switch between 1 and 10 random indicies
  swit <- sample(20, sample(10, 1))
  swit.2 <- sample(20, sample(10, 1))
  modal.noise[[i]][swit] <- modal.noise[[i]][rev(swit)]
  second.noise[[i]][swit.2] <- second.modal[rev(swit.2)]
}
partial.noise <- do.call("rbind", partial.noise)
partial.noise <- SimplifySequences(partial.noise)
modal.noise <- do.call("rbind", modal.noise)
second.noise <- do.call("rbind", second.noise)
datas <- do.call("rbind", list(modal.noise, partial.noise, second.noise))
datas <- datas[sample(nrow(datas)), ]
test <- Mallows(datas, iter = 100, G = 2)

Now for the moment of truth, to determine whether the code found the sequences. The output is a list, where the first element is the modal sequences found, second is the proportion of data assigned to each cluster, third is the cluster membership probabilities for each row, and fourth a data frame with the data, probabilities of cluster memberships, cluster assignments, and distance from each modal sequence.

> test[[1]]
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

[[2]]
 [1]  1  2  3  4  5 17  7  9  8 11 10 12 13 14 15 16  6 18 19 20

GREAT SUCCESS! I purposely did not set a seed here because when I ran the code above, it actually did not produce what I was hoping for. I DID get that output when I ran a function I will put into the next release, which runs the algorithm several times and selects the best fit, based on the log-likelihood of the data in the model. EM Algorithms are prone to local maxima, so this is a necessary fix. You may very well get the output above when you run that code on your machine, depending on the state of your system. If not, try running it 5 times or so and it will most likely work out!

So there we have it. I have demonstrated that this package has some value in mining patterns from sequence data with noise. I will be working on implementing some solutions to increase the probability of convergence, and trying to extend the model to larger sequence spaces soon. Also, I will be providing a Goodness Of Fit Assessment of the model, when our data satisfies certain conditions (sample size, …). Finally, I hope to soon be able to demonstrate the value of this model in real-world data in my next post.

Advertisements

A spell-checker in R

I came across Dr. Peter Norvig’s blog about writing a basic spell-checker (http://norvig.com/spell-correct.html), and just had to try to implement it in R. Please excuse the ugly-ish code (I have not optimized it or commented it adequately at this point, but you can get the idea of what it does by reading Dr. Norvig’s blog). If anyone knows of any pre-built spell-checker packages in R, please let me know in a comment!

I do not think R is a particularly good language for this sort of activity, but I got it to work out fine. The first few lines here create a list of common words, and their frequencies in the English language. The following lines may take a few minutes to run on an average machine, but I will try to upload them soon so that you can just download the table instead of creating it yourself…

words <- scan("http://norvig.com/big.txt", what = character())
words <- strip.text(words)
counts <- table(words)

Next, here are the functions we need to do the spell-check operations…

# This is a text processing function, which I
# borrowed from a CMU Data mining course professor.
strip.text <- function(txt) {
  # remove apostrophes (so "don't" -> "dont", "Jane's" -> "Janes", etc.)
  txt <- gsub("'","",txt)
  # convert to lowercase
  txt <- tolower(txt)
  # change other non-alphanumeric characters to spaces
  txt <- gsub("[^a-z0-9]"," ",txt)
  # change digits to #
  txt <- gsub("[0-9]+"," ",txt)
  # split and make one vector
  txt <- unlist(strsplit(txt," "))
  # remove empty words
  txt <- txt[txt != ""]
  return(txt)
}

# Words within 1 transposition.
Transpositions <- function(word = FALSE) {
  N <- nchar(word)
  if (N > 2) {
    out <- rep(word, N - 1)
    word <- unlist(strsplit(word, NULL))
    # Permutations of the letters
    perms <- matrix(c(1:(N - 1), 2:N), ncol = 2)
    reversed <- perms[, 2:1]
    trans.words <- matrix(rep(word, N - 1), byrow = TRUE, nrow = N - 1)
    for(i in 1:(N - 1)) {
      trans.words[i, perms[i, ]] <- trans.words[i, reversed[i, ]]
      out[i] <- paste(trans.words[i, ], collapse = "")
    }
  }
  else if (N == 2) {
    out <- paste(word[2:1], collapse = "")
  }
  else {
    out <- paste(word, collapse = "")
  }
  return(out)
}

# Single letter deletions.
# Thanks to luiscarlosmr for partial correction in comments
Deletes <- function(word = FALSE) {
  N <- nchar(word)
  out<-mat.or.vec(1,N)
  word <- unlist(strsplit(word, NULL))
  for(i in 1:N) {
    out[i] <- paste(word[-i], collapse = "")
  }
  return(out)
}

# Single-letter insertions.
Insertions <- function(word = FALSE) {
  N <- nchar(word) 
  out <- list()
  for (letter in letters) {
    out[[letter]] <- rep(word, N + 1)
    for (i in 1:(N + 1)) {
      out[[letter]][i] <- paste(substr(word, i - N, i - 1), letter, 
                                substr(word, i, N), sep = "")
    }
  }
  out <- unlist(out)
  return(out)
}

# Single-letter replacements.
Replaces <- function(word = FALSE) {
  N <- nchar(word) 
  out <- list()
  for (letter in letters) {
    out[[letter]] <- rep(word, N)
    for (i in 1:N) {
      out[[letter]][i] <- paste(substr(word, i - N, i - 1), letter, 
                                substr(word, i + 1, N + 1), sep = "")
    }
  }
  out <- unlist(out)
  return(out)
}
# All Neighbors with distance "1"
Neighbors <- function(word) {
  neighbors <- c(word, Replaces(word), Deletes(word),
                 Insertions(word), Transpositions(word))
  return(neighbors)
}

# Probability as determined by our corpus.
Probability <- function(word, dtm) {
  # Number of words, total
  N <- length(dtm)
  word.number <- which(names(dtm) == word)
  count <- dtm[word.number]
  pval <- count/N
  return(pval)
}

# Correct a single word.
Correct <- function(word, dtm) {
  neighbors <- Neighbors(word)
  # If it is a word, just return it.
  if (word %in% names(dtm)) {
    out <- word
  }
  # Otherwise, check for neighbors.
  else {
    # Which of the neighbors are known words?
    known <- which(neighbors %in% names(dtm))
    N.known <- length(known)
    # If there are no known neighbors, including the word,
    # look farther away.
    if (N.known == 0) {
      print(paste("Having a hard time matching '", word, "'...", sep = ""))
      neighbors <- unlist(lapply(neighbors, Neighbors))
    }
    # Then out non-words.
    neighbors <- neighbors[which(neighbors %in% names(dtm))]
    N <- length(neighbors)
    # If we found some neighbors, find the one with the highest
    # p-value.
    if (N >= 1) {
      P <- 0*(1:N)
      for (i in 1:N) {
        P[i] <- Probability(neighbors[i], dtm)
      }
      out <- neighbors[which.max(P)]
    }
    # If no neighbors still, return the word.
    else {
      out <- word
    }
  }
  return(out)
}

# Correct an entire document.
CorrectDocument <- function(document, dtm) {
  by.word <- unlist(strsplit(document, " "))
  N <- length(by.word)
  for (i in 1:N) {
    by.word[i] <- Correct(by.word[i], dtm = dtm)
  }
  corrected <- paste(by.word, collapse = " ")
  return(corrected)
}

The above functions generate “neighbors” of words, determine probabilities of the neighbors, and return the best ones. Function “CorrectDocument” will correct an entire document (with special characters and punctuation removed), and “Correct” will simply correct a word. Here are some sample runs.

> Correct("speling", dtm = counts)
        l4 
"spelling" 
> Correct("korrecter", dtm = counts)
[1] "Having a hard time matching 'korrecter'..."
      c1.d9 
"corrected" 
> CorrectDocument("the quick bruwn fowx jumpt ovre tha lasy dog", dtm = counts)
[1] "the quick brown fox jump over the last dog"

As you can see, this function is obviously not perfect. It will do some basic corrections automatically though, but there are some improvements to be made. More to come!

Decoding a Substitution Cipher using Simulated Annealing

My last post discussed a method to decode a substitution cipher using a Metropolis-Hastings algorithm. It was brought to my attention that this code could be improved by using Simulated Annealing methods to jump around the sample space and avoid some of the local maxima. Here is a basic description of the difference:

In a basic Metropolis-Hastings algorithm, it is relatively unlikely that we will jump to states of the system which decrease our cost function (note: we usually try to decrease the cost function, but in the code below our goal is to maximize it instead). The algorithm greedily tries to attain a maximum value of the cost function, and thus is very prone to hitting local maxima instead of the global one.

In contrast, when we use simulated Annealing methods we are much more likely to jump to “worse” states during early iterations. This is because in the early stages, we want to jump around the sample space and try to find where the global maximum will occur. We can also build a characteristic into the algorithm which will “cool-off” as the algorithm iterates. What I mean by this is that our likelihood to jump to “worse” states will slowly decrease as the algorithm iterates.

In a standard Metropolis-Hastings algorithm, we jump to state s^* with probability \frac{P(s^*)}{P(s)}, when P(s^*) < P(s). In my Annealing Method to decode a cipher, I first fix a temperature T, and move to state s^* with probability \frac{P(s^*)}{P(s)} \frac{TN}{Ti - 2i + N} when P(s^*) < P(s), where N is the number of iterations and i is the current iteration number. The reasoning behind this acceptance probability is that at first, we are basically multiplying the acceptance probability by T, and by the end we are basically not changing the acceptance probability at all from \frac{P(s^*)}{P(s)}. Try plugging in 0, and N for i to verify this. The result is the following transition probability function, which can perform both the Annealing and Metropolis-Hasting probability calculations.

# Function to calculate the transition probability.
Prob = function(pl.fstar, pl.f, temp, i, iter, method = "Anneal") {
  if(method == "Anneal") {
    flip = (pl.fstar/pl.f)*temp/((temp - 1)*(i/iter) + (iter-i)/iter)
  }
  else{
    flip = pl.fstar/pl.f
  }
  return(flip)
}

An additional variable that I added was a method of changing neighbors. In very large sample spaces, it can take a lot of transpositions to move from one state to another. In fact, the furthest state from any given state is \frac{n*(n-1)}{2}, where $n$ is the length of the permutation. In our case, that is 1326 moves. The chance that the previous algorithm would get stuck in a local maximum en route is very high. We combat this by allowing the neighbors of any particular permutation to be farther away. Consider the following neighbor-generating function.

# Function to generate a new candidate function.
# Inputs:
#  f: the function we want to find a neighbor of.
#  N: the domain size of f
#  top: the maximum number of elements to transpose.
Neighbor = function(f, N = length(f), top = 2) {
  # Decode how many elements to switch (between 2 and top)
  if(top == 2) {
    num = 2
  }
  else {
    num = sample(2:top, 1)
  }
  # Which elements to switch?
  swit = sample(1:N, num)
  
  # Decide how to transpose our elements
  if(num == 2) {
    ord = 2:1
  }
  else {
    ord = sample(1:num)
  }
  # Prevent the case where we do not
  # switch any elements.
  while(all(ord == 1:num)) {
    ord = sample(1:num)
  }
  # fstar will be a neighbor of f
  fstar = f
  fstar[swit] = f[swit[ord]]
  return(fstar)
}

We supply the function with the maximum number of elements, top, it is allowed to transpose, and it decides to transpose between 2 and top elements, at random (uniformly). It then decides on a random ordering of that transposition (there are a lot of ways to move around 10 different elements, for example). The function has a check to ensure that it does not simply output the same function as was input.

Another idea I came up with to allow the algorithm to zero-in on a good solution was to make the algorithm start doing single-transpositions eventually. This allows us to fine-tune our “good” solution in the later iterations of the algorithm. In the code, we use the “attack” parameter to determine how quickly we begin our fine-tuning of our solution. Small values of “attack” correspond with a lot of jumping around throughout the iterations, while high values correspond with us fine-tuning our algorithm earlier on in the iterations. The result is our final algorithm-decoding function.

# Generate the plausability of fstar
new.pl <- function(fstar, converted, len, trans) {
  test.mat = fstar[converted]
  compare = matrix(c(test.mat[1:(len -1)], test.mat[2:len]), ncol = 2)
  pl.fstar = prod(trans[compare])
  return(pl.fstar)
}
# Inputs:
#  trans: a matrix of transition probabilities
#  dictionary: list of possible characters
#  message: the secret message
#  iter: number of iterations
#  temp: the initial temperature of the system
#  attack: parameter to determine how quickly we
#          approach single-transposition versus
#          multiple-transpositions.  High values 
#          of "attack" will cause us to rapidly approach
#          single-transpositions.
DecodeMessage = function(trans, dictionary, message, iter = 1000,
                       temp = 10, method = "Anneal",
                       attack = 1) {
  # Size of the dictionary
  N = length(dictionary)
  # Number of characters in the message.
  len = length(message) 
  # Convert the matrix to the cooresponding numeric value in 
  # the dictionary.
  converted = 0*(1:len)
  for(i in 1:len) {
    converted[i] = which(dictionary == message[i])
  }
  # Take a random guess at the permutation:
  # f will be 'permanent', fstar will be a guess.
  f = sample(N, N)
  fstar = f
  best.guess = f
  # Keep track of the plausability of the best guess,
  # and of the current state of the system.
  pl.f = 0*(1:iter)
  pl.f.max = 0
  # There is no reason to not gerate all of the 
  # random numbers at once, since they are independent.
  test.val = runif(iter, 0, 1)
  for(i in 1:iter) {
    pl.fstar = new.pl(fstar, converted, len, trans)
    # flip is the probability that we switch to 
    # fstar.  Note that this function will output
    # values greater than 1, in which case we will always
    # switch.
    flip = Prob(pl.fstar, pl.f[i], temp, i, iter, method) 
    if(flip >= test.val[i]) {
      f = fstar
      pl.f[i] = pl.fstar
    }
    # Keep track of the best over-all guess.
    if(pl.f[i] > pl.f.max) {
      best.guess = f
      pl.f.max = pl.f[i]
    }
    # Periodically output the current state
    # of the decoding candidates.
    if(i %% 10000 == 0) {
      cat(dictionary[f[converted]][1:44])
      cat("\n")
    }
    # Generate the "top" input for the Neighbor function.
    top = round(2 + (N - 2)*((iter - i)/iter)^attack)
    # Generate a new candidate function,
    # and initiate next pl.f value.
    fstar = Neighbor(f, N, top)
    pl.f[i + 1] = pl.f[i]
  }
  return(list(current = dictionary[f[converted]], 
              pl.f = pl.f, 
              best.guess = dictionary[best.guess[converted]], 
              best.pl = pl.f.max))
}

Here we read in the data, and format the transitions matrix (to prevent infinity and 0 values of the plausibility function).

# Read in the matrix of transitions
trans = as.matrix(read.csv("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/wp_transition_counts.csv",
                   header = FALSE))
# Vector of all possible characters
dictionary = scan("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/wp_char_vector_sort.txt",
             what = character(), sep = "\n")
# The secret message
message = scan("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/message1.txt", 
             what = character())
# Separate the message into individual characters
message = unlist(strsplit(do.call(paste, as.list(message)), NULL))
# We change the probabilities of transition so that there are 
# no zero-values using a sort of Laplacian smoothing, and 
# find something to multiply the values by so that we dont get 0 
# probability (restrictions on Rs computational precision)
fixed.trans1 = log(trans + 1)
fixed.trans1 = 50*(fixed.trans1 + 1)/(rowSums(fixed.trans1) + 51)

# Set a seed to make this reproducible...
set.seed(8675309)
# Set a different, random seed, as to avoid some bias
# (your random number will still be the same as mine here)
set.seed(runif(1, 0, 999999))

Here is a couple of sample calls of the function. One uses the standard Metropolis-Hastings algorithm with the varying neighbor-generating function; the other uses the Annealing method and the varying neighbor-generating function. I semi-arbitrarily chose “attack” to be 10 here, and temp to be 100000

DecodeMessage(fixed.trans1, dictionary, message, 
            iter = 100000, temp = 10000,
            method = "Anneal", attack = 10) -> annel
u o t e s c o s p t c i t - o h e s t i s t c l o t n e d   c t i n t n o - d ' a d y t
z a t e s c a s p t c o t - a k e s t o s t c h a t u e l   c t o u t u a - l y i l b t
w a t e n c a n p t c o t g a d e n t o n t c h a t f e r   c t o f t f a g r u i r y t
r e   i n t e n d   t o   k e v i n   o n   t h e   w i s p t   o w   w e k s u a s f  
w e   i n c e n d   c o   r e g i n   o n   c h e   m i s t c   o m   m e r s u a s y  
w e   i n c e n d   c o   b e g i n   o n   c h e   f i s t c   o f   f e b s u a s y  
w e   i n t e n d   t o   b e g i n   o n   t h e   f i l s t   o f   f e b l u a l y  
w e   i n t e n d   t o   b e g i n   o n   t h e   f i r s t   o f   f e b r u a r y  
w e   i n t e n d   t o   r e g i n   o n   t h e   v i s f t   o v   v e r s u a s y  
w e   i n t e n d   t o   p e g i n   o n   t h e   f i l s t   o f   f e p l u a l z  
DecodeMessage(fixed.trans1, dictionary, message, 
              iter = 100000, method = "Metrop",
              attack = 10) -> metrop
p i   a n s i n v   s o   c i g a n   o n   s h i   f a d m s   o f   f i c d u e d w  
k e   i n c e n f   c o   y e g i n   o n   c h e   w i s t c   o w   w e y s u a s "  
k e   i n c e n d   c o   w e g i n   o n   c h e   f i s t c   o f   f e w s u a s y  
w e   i n t e n d   t o   b e g i n   o n   t h e   f i r s t   o f   f e b r u a r y  
w e   i n t e n d   t o   b e g i n   o n   t h e   f i l s t   o f   f e b l u a l y  
w e   i n t e n d   t o   y e g i n   o n   t h e   f i r s t   o f   f e y r u a r v  
w e   i n t e n g   t o   p e d i n   o n   t h e   v i r f t   o v   v e p r u a r k  
w e   i n t e n v   t o   b e g i n   o n   t h e   f i r s t   o f   f e b r u a r y  
w e   i n t e n d   t o   z e k i n   o n   t h e   f i l s t   o f   f e z l u a l y  
w e   i n t e n d   t o   b e g i n   o n   t h e   f i r s t   o f   f e b r u a r y  

We get the following graph of the Plausibility, for each method. The black is the Annealing method, the red is the Metropolis method.

Note that the Annealing method drops down more often, but the two eventually converge to similar values. Here are the final decodings of the message by the different methods:

> cat(annel$best.guess, fill = 60)
w e   i n t e n d   t o   b e g i n   o n   t h e   f i r s 
t   o f   f e b r u a r y   u n r e s t r i c t e d   s u b 
m a r i n e   w a r f a r e !   w e   s h a l l   e n d e a 
k o r   i n   s p i t e   o f   t h i s   t o   v e e p   t 
h e   u n i t e d   s t a t e s   o f   a m e r i c a   n e 
u t r a l !   i n   t h e   e k e n t   o f   t h i s   n o 
t   s u c c e e d i n g ,   w e   m a v e   m e x i c o   a 
  p r o p o s a l   o f   a l l i a n c e   o n   t h e   f 
o l l o w i n g   b a s i s ?   m a v e   w a r   t o g e t 
h e r ,   m a v e   p e a c e   t o g e t h e r ,   g e n e 
r o u s   f i n a n c i a l   s u p p o r t   a n d   a n   
u n d e r s t a n d i n g   o n   o u r   p a r t   t h a t 
  m e x i c o   i s   t o   r e c o n q u e r   t h e   l o 
s t   t e r r i t o r y   i n   t e x a s ,   n e w   m e x 
i c o ,   a n d   a r i z o n a !   t h e   s e t t l e m e 
n t   i n   d e t a i l   i s   l e f t   t o   y o u !   y 
o u   w i l l   i n f o r m   t h e   p r e s i d e n t   o 
f   t h e   a b o k e   m o s t   s e c r e t l y   a s   s 
o n   a s   t h e   o u t b r e a v   o f   w a r   w i t h 
  t h e   u n i t e d   s t a t e s   o f   a m e r i c a   
i s   c e r t a i n   a n d   a d d   t h e   s u g g e s t 
i o n   t h a t   h e   s h o u l d ,   o n   h i s   o w n 
  i n i t i a t i k e ,   i n k i t e   j a p a n   t o   i 
m m e d i a t e   a d h e r e n c e   a n d   a t   t h e   
s a m e   t i m e   m e d i a t e   b e t w e e n   j a p a 
n   a n d   o u r s e l k e s !   p l e a s e   c a l l   t 
h e   p r e s i d e n t . s   a t t e n t i o n   t o   t h 
e   f a c t   t h a t   t h e   r u t h l e s s   e m p l o 
y m e n t   o f   o u r   s u b m a r i n e s   n o w   o f 
f e r s   t h e   p r o s p e c t   o f   c o m p e l l i n 
g   e n g l a n d   i n   a   f e w   m o n t h s   t o   m 
a v e   p e a c e !
> cat(metrop$best.guess, fill = 60)
w e   i n t e n d   t o   b e g i n   o n   t h e   f i r s 
t   o f   f e b r u a r y   u n r e s t r i c t e d   s u b 
m a r i n e   w a r f a r e .   w e   s h a l l   e n d e a 
v o r   i n   s p i t e   o f   t h i s   t o   k e e p   t 
h e   u n i t e d   s t a t e s   o f   a m e r i c a   n e 
u t r a l .   i n   t h e   e v e n t   o f   t h i s   n o 
t   s u c c e e d i n g ,   w e   m a k e   m e x i c o   a 
  p r o p o s a l   o f   a l l i a n c e   o n   t h e   f 
o l l o w i n g   b a s i s :   m a k e   w a r   t o g e t 
h e r ,   m a k e   p e a c e   t o g e t h e r ,   g e n e 
r o u s   f i n a n c i a l   s u p p o r t   a n d   a n   
u n d e r s t a n d i n g   o n   o u r   p a r t   t h a t 
  m e x i c o   i s   t o   r e c o n j u e r   t h e   l o 
s t   t e r r i t o r y   i n   t e x a s ,   n e w   m e x 
i c o ,   a n d   a r i z o n a .   t h e   s e t t l e m e 
n t   i n   d e t a i l   i s   l e f t   t o   y o u .   y 
o u   w i l l   i n f o r m   t h e   p r e s i d e n t   o 
f   t h e   a b o v e   m o s t   s e c r e t l y   a s   s 
o n   a s   t h e   o u t b r e a k   o f   w a r   w i t h 
  t h e   u n i t e d   s t a t e s   o f   a m e r i c a   
i s   c e r t a i n   a n d   a d d   t h e   s u g g e s t 
i o n   t h a t   h e   s h o u l d ,   o n   h i s   o w n 
  i n i t i a t i v e ,   i n v i t e   " a p a n   t o   i 
m m e d i a t e   a d h e r e n c e   a n d   a t   t h e   
s a m e   t i m e   m e d i a t e   b e t w e e n   " a p a 
n   a n d   o u r s e l v e s .   p l e a s e   c a l l   t 
h e   p r e s i d e n t ' s   a t t e n t i o n   t o   t h 
e   f a c t   t h a t   t h e   r u t h l e s s   e m p l o 
y m e n t   o f   o u r   s u b m a r i n e s   n o w   o f 
f e r s   t h e   p r o s p e c t   o f   c o m p e l l i n 
g   e n g l a n d   i n   a   f e w   m o n t h s   t o   m 
a k e   p e a c e .

In this case, the Metropolis method seems to have found the better decoding. But that is not necessarily the point. The goal of the Annealing method is to avoid local maxima, and it was successful in that pursuit (as evidenced by the readability of the message).

I have had varying results with both methods, depending on parameter inputs. It is not clear which one is better of these two, and I do not have the computing power to benchmark them. In any event, there is probably some fine-tuning of the parameters that could be done in order to improve the efficacy of the decoding methods. They do seem to both come close to the correct message more often than the code in my last post.

This post is linked to on http://www.R-bloggers.com