My friend introduced me to a web site for a data mining course at Columbia, and I have been pretty hooked on some of the projects. One of the most recent ones I’ve worked on was decoding a simple cipher, using a Metropolis-Hastings algorithm. It is a pretty amazing little technique, which jumps around the sample space of 51! possible substitutions (something like 10^66 possible choices) and yet still quickly converges to a hopefully decoded version of the message. I finally got it to work, and the code runs in around 15 seconds. Check it out.
# Function to decode a message of symbols or characters, given
# A matrix of transitions, a dictionary, and a secret message
Decode = function(trans, dictionary, message, iter = 10,
thresh = 3) {
# Size of the dictionary
N = length(dictionary)
# Number of characters in the message.
len = length(message)
# Convert the matrix to the cooresponding numeric value in
# the dictionary.
converted = 0*(1:len)
for(i in 1:len) {
converted[i] = which(dictionary == message[i])
}
# Take a random guess at the permutation:
# f will be 'permanent', fstar will be a guess.
f = sample(N, N)
fstar = f
pl.f = 0
# Keep track of the plausability of the best guess,
# and of each individual guess.
k = 1
rem = 0
rem.plfstar = 0*(1:iter)
top.plaus = 0
stopped = 0
for(i in 1:iter) {
# The current test of the decoded message.
test.mat = fstar[converted]
# This is a len(message)x 2 matrix of indecies, to index the
# matrix containing the transision probabilities/frequencies.
compare = matrix(c(test.mat[1:(len -1)], test.mat[2:len]), ncol = 2)
# The plausability of fstar is the product of the probabilities
# in the transition matrix.
pl.fstar = prod(trans[compare])
# If the plausibility of fstar is better than f, change f
# to fstar.
if(pl.fstar > pl.f) {
f = fstar
pl.f = pl.fstar
# Record the new plausibility.
rem[k] = pl.f
k = k + 1
}
# If the plausibility did NOT increase,
# change to the new guess with p value
# pl.fstar/pl.f
else {
flip = pl.fstar/(pl.f + 1E-99)
if (runif(1, 0, 1) < flip) {
f = fstar
rem[k] = pl.fstar
k = k + 1
}
}
# Periodically output the best effort
# at decoding the message.
if(i %% 10000 == 0) {
pl.change = pl.f - top.plaus
if(pl.change == 0) {
stopped = stopped + 1
if (stopped == thresh) {
cat("Chain Converged \n")
break
}
}
else {
stopped = 0
cat(dictionary[f[converted]][1:44])
cat("\n")
cat(pl.f)
cat("\n")
}
top.plaus = pl.f
}
# After each iterations, randomly transpose
# two entries of our best guess and assign them
# to fstar.
swit = sample(1:N, 2)
fstar = f
fstar[swit] = f[rev(swit)]
}
return(list(dictionary[f[converted]],
rem))
}
It makes sense to eventually add some capability to test whether the algorithm is jumping around properly (is the probability of f changing?) and to start the chain at different points, since this algorithm is not guaranteed to converge. It may reach a local maximum, and never get to jump to the final solution. The following code scans in the data, and formats the transition probabilities using the log p-value and a little bit of Laplacian smoothing.
# Read in the matrix of transitions
trans = as.matrix(read.csv("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/wp_transition_counts.csv",
header = FALSE))
# Vector of all possible characters
dictionary = scan("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/wp_char_vector_sort.txt",
what = character(), sep = "\n")
# The secret message
message = scan("http://www2.research.att.com/~volinsky/DataMining/Columbia2011/HW/KennyMCMC/message1.txt",
what = character())
# Separate the message into individual characters
message = unlist(strsplit(do.call(paste, as.list(message)), NULL))
fixed.trans1 = log(trans + 1)
fixed.trans1 = 50*(fixed.trans1 + 1)/(rowSums(fixed.trans1) + 51)
And here is a sample call of the function, with output. It prints out the best decoding every 10,000 iterations, and will stop trying if the message doesn’t change at all for 30,000 iterations.
> cat(message)
, ' r k . t ' . s r t r n ' ( k . r . r t * ' r ; k ! d t r ; r ; ' n ! 6 c ! 2 r 6 . ! ' d t ! k o t ' s r d 6 n 1 c ! k . ' r , c ! ; c ! ' 9 r , ' r d * c 3 3 r ' . s ' c h ! r k . r d e k t ' r ; r t * k d r t r z ' ' e r t * ' r 6 . k t ' s r d t c t ' d r ; r c 1 ' ! k o c r . ' 6 t ! c 3 9 r k . r t * ' r ' h ' . t r ; r t * k d r . t r d 6 o o ' ' s k . ( 8 r , ' r 1 c z ' r 1 ' / k o r c r e ! e d c 3 r ; r c 3 3 k c . o ' r . r t * ' r ; 3 3 , k . ( r n c d k d l r 1 c z ' r , c ! r t ( ' t * ' ! 8 r 1 c z ' r e ' c o ' r t ( ' t * ' ! 8 r ( ' . ' ! 6 d r ; k . c . o k c 3 r d 6 e e ! t r c . s r c . r 6 . s ' ! d t c . s k . ( r . r 6 ! r e c ! t r t * c t r 1 ' / k o r k d r t r ! ' o . m 6 ' ! r t * ' r 3 d t r t ' ! ! k t ! 2 r k . r t ' / c d 8 r . ' , r 1 ' / k o 8 r c . s r c ! k w . c 9 r t * ' r d ' t t 3 ' 1 ' . t r k . r s ' t c k 3 r k d r 3 ' ; t r t r 2 6 9 r 2 6 r , k 3 3 r k . ; ! 1 r t * ' r e ! ' d k s ' . t r ; r t * ' r c n h ' r 1 d t r d ' o ! ' t 3 2 r c d r d . r c d r t * ' r 6 t n ! ' c z r ; r , c ! r , k t * r t * ' r 6 . k t ' s r d t c t ' d r ; r c 1 ' ! k o c r k d r o ' ! t c k . r c . s r c s s r t * ' r d 6 ( ( ' d t k . r t * c t r * ' r d * 6 3 s 8 r . r * k d r , . r k . k t k c t k h ' 8 r k . h k t ' r ) c e c . r t r k 1 1 ' s k c t ' r c s * ' ! ' . o ' r c . s r c t r t * ' r d c 1 ' r t k 1 ' r 1 ' s k c t ' r n ' t , ' ' . r ) c e c . r c . s r 6 ! d ' 3 h ' d 9 r e 3 ' c d ' r o c 3 3 r t * ' r e ! ' d k s ' . t b d r c t t ' . t k . r t r t * ' r ; c o t r t * c t r t * ' r ! 6 t * 3 ' d d r ' 1 e 3 2 1 ' . t r ; r 6 ! r d 6 n 1 c ! k . ' d r . , r ; ; ' ! d r t * ' r e ! d e ' o t r ; r o 1 e ' 3 3 k . ( r ' . ( 3 c . s r k . r c r ; ' , r 1 . t * d r t r 1 c z ' r e ' c o ' 9
> Decode(fixed.trans1, dictionary, message, iter = 100000) -> test2
w e i n t e n g t o m e k i n o n t h e f i r s t o f f e m r u a r y
3.082826e+263
w e i n t e n d t o b e g i n o n t h e f i r s t o f f e b r u a r y
2.657652e+267
w e i n t e n d t o b e g i n o n t h e f i r s t o f f e b r u a r y
7.711922e+267
w e i n t e n d t o b e g i n o n t h e f i r s t o f f e b r u a r y
8.556139e+268
Chain Converged
It doesn’t always work like you would hope, but here is the original message, followed by our decoded version. Note that in the final version, the algorithm decided that “d” and “g” should be switched.
cat(message, fill = 88)
, ' r k . t ' . s r t r n ' ( k . r . r t * ' r ; k ! d t r ; r ; ' n ! 6 c ! 2 r
6 . ! ' d t ! k o t ' s r d 6 n 1 c ! k . ' r , c ! ; c ! ' 9 r , ' r d * c 3 3 r ' . s
' c h ! r k . r d e k t ' r ; r t * k d r t r z ' ' e r t * ' r 6 . k t ' s r d t
c t ' d r ; r c 1 ' ! k o c r . ' 6 t ! c 3 9 r k . r t * ' r ' h ' . t r ; r t * k
d r . t r d 6 o o ' ' s k . ( 8 r , ' r 1 c z ' r 1 ' / k o r c r e ! e d c 3 r
; r c 3 3 k c . o ' r . r t * ' r ; 3 3 , k . ( r n c d k d l r 1 c z ' r , c !
r t ( ' t * ' ! 8 r 1 c z ' r e ' c o ' r t ( ' t * ' ! 8 r ( ' . ' ! 6 d r ; k .
c . o k c 3 r d 6 e e ! t r c . s r c . r 6 . s ' ! d t c . s k . ( r . r 6 ! r e
c ! t r t * c t r 1 ' / k o r k d r t r ! ' o . m 6 ' ! r t * ' r 3 d t r t ' !
! k t ! 2 r k . r t ' / c d 8 r . ' , r 1 ' / k o 8 r c . s r c ! k w . c 9 r t *
' r d ' t t 3 ' 1 ' . t r k . r s ' t c k 3 r k d r 3 ' ; t r t r 2 6 9 r 2 6 r ,
k 3 3 r k . ; ! 1 r t * ' r e ! ' d k s ' . t r ; r t * ' r c n h ' r 1 d t r d
' o ! ' t 3 2 r c d r d . r c d r t * ' r 6 t n ! ' c z r ; r , c ! r , k t * r t
* ' r 6 . k t ' s r d t c t ' d r ; r c 1 ' ! k o c r k d r o ' ! t c k . r c . s r c
s s r t * ' r d 6 ( ( ' d t k . r t * c t r * ' r d * 6 3 s 8 r . r * k d r , .
r k . k t k c t k h ' 8 r k . h k t ' r ) c e c . r t r k 1 1 ' s k c t ' r c s * ' !
' . o ' r c . s r c t r t * ' r d c 1 ' r t k 1 ' r 1 ' s k c t ' r n ' t , ' ' . r ) c
e c . r c . s r 6 ! d ' 3 h ' d 9 r e 3 ' c d ' r o c 3 3 r t * ' r e ! ' d k s ' . t
b d r c t t ' . t k . r t r t * ' r ; c o t r t * c t r t * ' r ! 6 t * 3 ' d d r '
1 e 3 2 1 ' . t r ; r 6 ! r d 6 n 1 c ! k . ' d r . , r ; ; ' ! d r t * ' r e
! d e ' o t r ; r o 1 e ' 3 3 k . ( r ' . ( 3 c . s r k . r c r ; ' , r 1 . t *
d r t r 1 c z ' r e ' c o ' 9
> cat(test2[[1]], fill = 88)
w e i n t e n g t o b e d i n o n t h e f i r s t o f f e b r u a r y
u n r e s t r i c t e g s u b m a r i n e w a r f a r e , w e s h a l l e n g
e a v o r i n s p i t e o f t h i s t o k e e p t h e u n i t e g s t
a t e s o f a m e r i c a n e u t r a l , i n t h e e v e n t o f t h i
s n o t s u c c e e g i n d . w e m a k e m e x i c o a p r o p o s a l
o f a l l i a n c e o n t h e f o l l o w i n d b a s i s q m a k e w a r
t o d e t h e r . m a k e p e a c e t o d e t h e r . d e n e r o u s f i n
a n c i a l s u p p o r t a n g a n u n g e r s t a n g i n d o n o u r p
a r t t h a t m e x i c o i s t o r e c o n j u e r t h e l o s t t e r
r i t o r y i n t e x a s . n e w m e x i c o . a n g a r i ! o n a , t h
e s e t t l e m e n t i n g e t a i l i s l e f t t o y o u , y o u w
i l l i n f o r m t h e p r e s i g e n t o f t h e a b o v e m o s t s
e c r e t l y a s s o n a s t h e o u t b r e a k o f w a r w i t h t
h e u n i t e g s t a t e s o f a m e r i c a i s c e r t a i n a n g a
g g t h e s u d d e s t i o n t h a t h e s h o u l g . o n h i s o w n
i n i t i a t i v e . i n v i t e ( a p a n t o i m m e g i a t e a g h e r
e n c e a n g a t t h e s a m e t i m e m e g i a t e b e t w e e n ( a
p a n a n g o u r s e l v e s , p l e a s e c a l l t h e p r e s i g e n t
" s a t t e n t i o n t o t h e f a c t t h a t t h e r u t h l e s s e
m p l o y m e n t o f o u r s u b m a r i n e s n o w o f f e r s t h e p
r o s p e c t o f c o m p e l l i n d e n d l a n g i n a f e w m o n t h
s t o m a k e p e a c e ,
A little Googling will tell us that this is “The Zimmerman Telegram.”
Although this is a simple type of cipher that could be done quite easily with simple frequency analysis of the characters, the method is an interesting one.
References:
Diaconis, Persi; “The Markov Chain Monte Carlo Revolution”