Frequency analysis (substitution)

A frequency analysis procedure, at least in python, seems to be a long and involved. It is composed of several elements: determining the frequencies, making the first frequency comparisons, ascertaining 2-3 key letters, and then using various idiosyncrasies of English (in this case) to ascertain more letters, then – if fully automating – using a dictionary to compare words with the emerging ciphertext.

The process is made marginally simpler if there are spaces between words.

The list of words that I used here can be found at:

https://svnweb.freebsd.org/csrg/share/dict/words?view=log

here goes!

import modules

import collections from random import shuffle import string from collections import Counter from collections import OrderedDict import pandas as pd import numpy as np import re

key for random substitution. I used the same one as in the book (Simon Singh)

random_substitution = {'a': 'x', 'b': 'z', 'c': 'a', 'd': 'v', 'e': 'o', 'f': 'i', 'g': 'd', 'h': 'b', 'i': 'y', 'j': 'g', 'k': 'e', 'l': 'r', 'm': 's', 'n': 'p', 'o': 'c', 'p': 'f', 'q': 'h', 'r': 'j', 's': 'k', 't': 'l', 'u': 'm', 'v': 'n', 'w': 'q', 'x': 't', 'y': 'u', 'z': 'w', ' ':' '}

plaintext, also from the book.

plaintext = "now during this time shahrazad had borne king shahriyar three sons on the thousand and first night when she had ended", \ "the tale of maaruf she rose and kissed the ground before him saying great king for a thousand and one nights i have been recounting to", \ "you the fables of past ages and the legends of ancient kings may i make so bold as to crave a favour of your majesty epilogue tales from", \ "the thousand and one nights"

make the ciphertext:

plaintext = str(plaintext) plaintext = plaintext.replace("(", "") plaintext = plaintext.replace(")", "") plaintext = plaintext.replace("'", "") plaintext = plaintext.replace(",", "") ciphertext = '' for letter in plaintext: substitution = random_substitution[letter] ciphertext = ciphertext+substitution ciphertext = ciphertext.upper() print "ciphertext is: ",ciphertext

gives: ciphertext is: PCQ VMJYPD LBYK LYSO KBXBJXWXV BXV ZCJPO EYPD KBXBJYUXJ LBJOO KCPK CP LBO LBCMKXPV XPV IYJKL PYDBL QBOP KBO BXV OPVOV LBO LXRO CI SXXJMI KBO JCKO XPV EYKKOV LBO DJCMPV ZOICJO BYS KXUYPD DJOXL EYPD ICJ X LBCMKXPV XPV CPO PYDBLK Y BXNO ZOOP JOACMPLYPD LC UCM LBO IXZROK CI FXKL XDOK XPV LBO RODOPVK CI XPAYOPL EYPDK SXU Y SXEO KC ZCRV XK LC AJXNO X IXNCMJ CI UCMJ SXGOKLU OFYRCDMO LXROK IJCS LBO LBCMKXPV XPV CPO PYDBLK

produce frequencies of letters in ciphertext
ciphertext = ciphertext.lower() cipher_freq = OrderedDict(Counter(ciphertext)) d = {} for key, value in cipher_freq.items(): d[key] = value letters_list = [] frequency_list = [] total_letters = len(plaintext) for key, value in d.iteritems(): letters_list.append(key) floated = float(value) percented = (floated/total_letters)*100 rounded = round(percented,1) frequency_list.append(rounded)

prepare dataframe of ciphertext frequencies
c1 = pd.Series(letters_list,index=None) c2 = pd.Series(frequency_list,index=None) freq = pd.concat([c1, c2], axis=1) freq.columns = ['cipherletter', 'frequency'] freq_sorted = freq.sort_values(ascending=False,by='frequency')

sorted_cipher = freq_sorted['cipherletter'].values.tolist() if ' ' in sorted_cipher: sorted_cipher.remove(' ') print sorted_cipher

letters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'] frequencies = [8.2,1.5,2.8,4.3,12.7,2.2,2.0,6.1,7.0,0.2,0.8,4.0,2.4,6.7,7.5,1.9,0.1,6.0,6.3,9.1,2.8,1.0,2.4,0.2,2.0,0.1]

produce dataframe for plaintext frequencies (normal)
p1 = pd.Series(letters,index=None) p2 = pd.Series(frequencies,index=None) alpha_freq = pd.concat([p1, p2], axis=1) alpha_freq.columns = ['plaintextletter', 'frequency'] alpha_freq_sorted = alpha_freq.sort_values(ascending=False,by='frequency')

reconcile the two alphabets (cipher/plain)
sorted_plain = alpha_freq_sorted['plaintextletter'].values.tolist() letters_missing_exact = set(sorted_plain)- set(sorted_cipher) letters_missing_number = len(sorted_plain)-len(sorted_cipher) print "number of letters missing: ",letters_missing_number," these are: ",letters_missing_exact

gives:
number of letters missing: 2 these are: set([‘h’, ‘t’])

Take top three letters to inspect. patterns = sorted_cipher[:3] print patterns

gives:
[‘o’, ‘x’, ‘p’]

Now need to work out which of the letters is which – consonants, vowels etc. Have to check whether each letter precedes or follows each other letter in the ciphertext (patternA and patternB). All combinations checked are stored in ‘checked_combinations’ all the rest are stored in ‘combinations’. The total number of checks must be divisible by 3, as we are searching against top three letters.

def multi_re_find(patterns,phrase): combinations = [] checked_combinations = [] for pattern in patterns: for letter in letters: checked = pattern+letter checked_combinations.append(checked) letter_copyA = letter letter_copyB = letter patternA = pattern+letter_copyA list_front = [] list_front.append(re.findall(patternA,phrase)) list_front = list_front[0] list_back = [] patternB = letter_copyB+pattern list_back.append(re.findall(patternB,phrase)) list_back = list_back[0] if (len(list_front)==0) and (len(list_back)==0): continue elif (len(list_front)>0) and (len(list_back)==0): combinations.append(list_front) elif (len(list_front)==0) and (len(list_back)>0): combinations.append(list_back) else: merged_list = list_front+list_back combinations.append(merged_list) checked_combinations.append(patternA) patternA = '' patternB = '' pattern = pattern[0] list_front = [] list_back = [] merged_list = [] return combinations, checked_combinations

call function multi_re_find, and then make an empty dictionary of the potential results.

combinations, checked_combinations = multi_re_find(patterns,ciphertext) checked_combinations = {letter:0 for letter in checked_combinations}

Populate the dictionary
i=0 while i < len(combinations): length_sublist = len(combinations[i]) if length_sublist > 0: sublist_combination = combinations[i][0] if sublist_combination in checked_combinations: checked_combinations[sublist_combination] = length_sublist else: new_combo = sublist_combination[::-1] checked_combinations[new_combo] = length_sublist i=i+1 print "number of checked combinations: ",checked_combinations

gives:
number of checked combinations: {‘xj’: 4, ‘xk’: 6, ‘xh’: 0, ‘xi’: 2, ‘xn’: 3, ‘xo’: 0, ‘xl’: 3, ‘xm’: 0, ‘xb’: 7, ‘xc’: 0, ‘xa’: 0, ‘xf’: 1, ‘xg’: 1, ‘xd’: 1, ‘xe’: 1, ‘xz’: 1, ‘xx’: 2, ‘xy’: 0, ‘xr’: 2, ‘xs’: 4, ‘xp’: 9, ‘xq’: 0, ‘xv’: 3, ‘xw’: 2, ‘xt’: 0, ‘xu’: 3, ‘pr’: 0, ‘ps’: 0, ‘pp’: 0, ‘pq’: 0, ‘pv’: 11, ‘pw’: 0, ‘pt’: 0, ‘pu’: 0, ‘pz’: 0, ‘px’: 0, ‘py’: 9, ‘pb’: 0, ‘pc’: 5, ‘pa’: 1, ‘pf’: 0, ‘pg’: 0, ‘pd’: 6, ‘pe’: 0, ‘pj’: 1, ‘pk’: 1, ‘ph’: 0, ‘pi’: 0, ‘pn’: 0, ‘po’: 8, ‘pl’: 2, ‘pm’: 2, ‘oo’: 4, ‘on’: 2, ‘om’: 1, ‘ol’: 0, ‘ok’: 6, ‘oj’: 4, ‘oi’: 1, ‘oh’: 0, ‘og’: 1, ‘of’: 1, ‘oe’: 1, ‘od’: 3, ‘oc’: 0, ‘ob’: 9, ‘oa’: 1, ‘oz’: 2, ‘oy’: 1, ‘ox’: 1, ‘ow’: 0, ‘ov’: 3, ‘ou’: 0, ‘ot’: 0, ‘os’: 1, ‘or’: 4, ‘oq’: 0, ‘op’: 8}

Create the first analysis dataframe; parse results; and top three keys, take from unordered dictionary

k1 = patterns[0] k2 = patterns[1] k3 = patterns[2] key1_dict = {} key2_dict = {} key3_dict = {} key1_dict = {k:v for k,v in checked_combinations.iteritems() if k.startswith(patterns[0])} key2_dict = {k:v for k,v in checked_combinations.iteritems() if k.startswith(patterns[1])} key3_dict = {k:v for k,v in checked_combinations.iteritems() if k.startswith(patterns[2])}

Create dataframe. I’ve done this in a fairly verbose way. This is because using an unordered dictionary did not provide the pandas columns in the right order. This was important for me when checking my results against the solution in the text…

df2a = pd.DataFrame(key1_dict,index=[0]) transpose_2a = df2a.T transpose_2a['letters'] = letters transpose_2a.set_index('letters',inplace=True) df2b = pd.DataFrame(key2_dict,index=[0]) transpose_2b = df2b.T transpose_2b['letters'] = letters transpose_2b.set_index('letters',inplace=True) df2c = pd.DataFrame(key3_dict,index=[0]) transpose_2c = df2c.T transpose_2c['letters'] = letters transpose_2c.set_index('letters',inplace=True) df_comb = pd.concat([transpose_2a,transpose_2b,transpose_2c],axis=1) df_comb.columns = [k1,k2,k3] df_comb = df_comb.T

df_comb['sum'] = df_comb.iloc[:,:].sum(axis=1) df_comb['zeros_sum'] = (df_comb==0).sum(axis=1) print df_comb

letters a b c d e f g h i j … s t u v w x y z \
o 1 9 0 3 1 1 1 0 1 4 … 1 0 0 3 0 1 1 2
x 0 7 0 1 1 1 1 0 2 4 … 4 0 3 3 2 2 0 1
p 1 0 5 6 0 0 0 0 0 1 … 0 0 0 11 0 0 9 0

letters sum zeros_sum
o 54 7
x 55 8
p 46 16

[3 rows x 28 columns]

identify top ranking consonant
key1 = {} mask_cons1 = df_comb["zeros_sum"]==df_comb["zeros_sum"].max() letter_cons1 = df_comb[mask_cons1] letter_cons1 = str(letter_cons1.index)[9:10] print "this ciphertext letter is likely to be a consonant: ",letter_cons1, " we will hold for now"

confirm candidates for a and e
mask_eo = df_comb["sum"]==df_comb["sum"].max() letter_eo = df_comb[mask_eo] letter_eo = str(letter_eo.index)[9:10] print "based on frequency alone this ciphertext letter is a candidate for an 'e': ",letter_eo patterns_copy = patterns if letter_cons1 in patterns_copy: patterns_copy.remove(letter_cons1) #while letter_cons1 in patterns: patterns.remove(letter_cons1); if there had been more than one item to remove. double_1 = patterns_copy[0] double_2 = patterns_copy[1] double_1_count = df_comb.ix[double_1,double_1] double_2_count = df_comb.ix[double_2,double_2] if double_1_count > double_2_count: print "from a 'double-letter' pattern, 'e' could be: ",double_1 if letter_eo != double_1: print "based on language usage, we will go with the 'double' selection for 'e' over letter selected on frequency only, and make that letter 'a'" key1[double_1] = 'e' key1[double_2] = 'a' elif double_1_count == double_2_count: print "number of double-letter patterns are the same so no inference can be made, we will go with the frequency-based selection" key1[letter_eo] = 'e' if letter_eo in patterns_copy: patterns_copy.remove(letter_eo) final_letter = patterns_copy[0] #check if this is key1[final_letter] = 'a' else: print "from a 'double-letter' pattern, 'e' could be: ",double_2 if letter_eo != double_2: print "based on language usage, we will go with this selection for 'e' over letter selected on frequency only, and make that letter 'a'" key1[double_2] = 'e' key1[double_1] = 'a' print "first partial cipher key to try (vowels first): ",key1

gives:

this ciphertext letter is likely to be a consonant: p we will hold for now
based on frequency alone this ciphertext letter is a candidate for an ‘e’: x
from a ‘double-letter’ pattern, ‘e’ could be: o
based on language usage, we will go with the ‘double’ selection for ‘e’ over letter selected on frequency only, and make that letter ‘a’
first partial cipher key to try (vowels first): {‘x’: ‘a’, ‘o’: ‘e’}

first round of substitution
ciphertext = ciphertext.upper() def call_cipher_analysis(key,ciphertext): plaintext_back = '' for letter in ciphertext: if letter.lower() in key1: substitution_back = key1[letter.lower()] plaintext_back = plaintext_back+substitution_back.lower() else: plaintext_back = plaintext_back+letter return plaintext_back plaintext_back = call_cipher_analysis(key1,ciphertext) print plaintext_back

gives:
PCQ VMJYPD LBYK LYSe KBaBJaWaV BaV ZCJPe EYPD KBaBJYUaJ LBJee KCPK CP LBe LBCMKaPV aPV IYJKL PYDBL QBeP KBe BaV ePVeV LBe LaRe CI SaaJMI KBe JCKe aPV EYKKeV LBe DJCMPV ZeICJe BYS KaUYPD DJeaL EYPD ICJ a LBCMKaPV aPV CPe PYDBLK Y BaNe ZeeP JeACMPLYPD LC UCM LBe IaZReK CI FaKL aDeK aPV LBe ReDePVK CI aPAYePL EYPDK SaU Y SaEe KC ZCRV aK LC AJaNe a IaNCMJ CI UCMJ SaGeKLU eFYRCDMe LaReK IJCS LBe LBCMKaPV aPV CPe PYDBLK

(ONLY RELEVANT IF THE WORDS ARE SEPARATED)
identify letters that appear on their own in the cipher text. If ‘a’ has already been assigned any other non-assigned letter must be ‘i’

single_letter_search = plaintext_back.split() for word in single_letter_search: if len(word)==1: if word == word.lower(): continue else: print "this letter is an 'i': ",word word = word.lower() key1[word] = 'i' break print key1 call_cipher_analysis(key1,ciphertext)

gives:
this letter is an ‘i’: Y
{‘y’: ‘i’, ‘x’: ‘a’, ‘o’: ‘e’}

‘PCQ VMJiPD LBiK LiSe KBaBJaWaV BaV ZCJPe EiPD KBaBJiUaJ LBJee KCPK CP LBe LBCMKaPV aPV IiJKL PiDBL QBeP KBe BaV ePVeV LBe LaRe CI SaaJMI KBe JCKe aPV EiKKeV LBe DJCMPV ZeICJe BiS KaUiPD DJeaL EiPD ICJ a LBCMKaPV aPV CPe PiDBLK i BaNe ZeeP JeACMPLiPD LC UCM LBe IaZReK CI FaKL aDeK aPV LBe ReDePVK CI aPAiePL EiPDK SaU i SaEe KC ZCRV aK LC AJaNe a IaNCMJ CI UCMJ SaGeKLU eFiRCDMe LaReK IJCS LBe LBCMKaPV aPV CPe PiDBLK’

print key1 call_cipher_analysis(key1,ciphertext)

gives:
{‘y’: ‘i’, ‘x’: ‘a’, ‘o’: ‘e’}

Establish h from e/h combination, i.e. H goes before E but rarely after

for k,v in key1.iteritems(): if v == 'e': e_key = k combinations_forward = [] combinations_reverse = [] for letter in letters: letter = letter.upper() letter_copyA = e_key.upper() letter_copyB = e_key.upper() patternA = letter+letter_copyA combinations_forward.append(re.findall(patternA,ciphertext)) patternB = letter_copyB+letter combinations_reverse.append(re.findall(patternB,ciphertext)) patternA = '' patternB = '' h_e = [len(x) for x in combinations_forward] e_h = [len(x) for x in combinations_reverse] for combo in combinations_forward: if len(combo)==max(h_e): pattern_h = combo[0] for combo2 in combinations_reverse: if len(combo2) > 0: if combo2[0]==pattern_h[::-1]: ratio = len(combo2)/max(h_e) if ratio >= 0.25: print "cannot confirm 'h' in this case, ratio of he to eh is to high" elif ratio <0.25: pattern_h = pattern_h[0].lower() key1[pattern_h] = 'h' else: continue else: continue pattern_h = pattern_h[0].lower() key1[pattern_h] = 'h' print key1 call_cipher_analysis(key1,ciphertext)

gives:

{‘y’: ‘i’, ‘x’: ‘a’, ‘b’: ‘h’, ‘o’: ‘e’}

‘PCQ VMJiPD LhiK LiSe KhahJaWaV haV ZCJPe EiPD KhahJiUaJ LhJee KCPK CP Lhe LhCMKaPV aPV IiJKL PiDhL QheP Khe haV ePVeV Lhe LaRe CI SaaJMI Khe JCKe aPV EiKKeV Lhe DJCMPV ZeICJe hiS KaUiPD DJeaL EiPD ICJ a LhCMKaPV aPV CPe PiDhLK i haNe ZeeP JeACMPLiPD LC UCM Lhe IaZReK CI FaKL aDeK aPV Lhe ReDePVK CI aPAiePL EiPDK SaU i SaEe KC ZCRV aK LC AJaNe a IaNCMJ CI UCMJ SaGeKLU eFiRCDMe LaReK IJCS Lhe LhCMKaPV aPV CPe PiDhLK’

Most common three-letter words in English: THE and AND… deterimine N and T

the_pattern = '[A-Z]' for k,v in key1.iteritems(): if v == 'h': the_pattern = the_pattern+k for k,v in key1.iteritems(): if v == 'e': the_pattern = the_pattern+k the_pattern = the_pattern.upper() the_pattern = re.findall(the_pattern,ciphertext) the_pattern_dict = {word:0 for word in the_pattern} for word in the_pattern: if word in the_pattern: the_pattern_dict[word]+=1 frequencies_the = the_pattern_dict.values() maxi = max(frequencies_the) if maxi>2: for k,v in the_pattern_dict.iteritems(): if v == maxi: print "likely match for 'the' is, ",k k = k[:-2].lower() key1[k]= 't' and_pattern = '' for k,v in key1.iteritems(): if v == 'a': and_pattern = and_pattern+k and_pattern = and_pattern+'[A-Z][A-Z]' and_pattern = and_pattern.upper() and_pattern = re.findall(and_pattern,ciphertext) and_pattern_dict = {word:0 for word in and_pattern} for word in and_pattern: if word in and_pattern: and_pattern_dict[word]+=1 frequencies_and = and_pattern_dict.values() maxi = max(frequencies_and) if maxi>2: for k,v in and_pattern_dict.iteritems(): if v == maxi: print "likely match for 'and' is, ",k k1 = k k = k[1:2].lower() key1[k]= 'n' k1 = k1[2:3].lower() key1[k1]= 'd' print key1 call_cipher_analysis(key1,ciphertext)

gives:

likely match for ‘the’ is, LBO
likely match for ‘and’ is, XPV
{‘b’: ‘h’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’}

‘nCQ dMJinD thiK tiSe KhahJaWad had ZCJne EinD KhahJiUaJ thJee KCnK Cn the thCMKand and IiJKt niDht Qhen Khe had ended the taRe CI SaaJMI Khe JCKe and EiKKed the DJCMnd ZeICJe hiS KaUinD DJeat EinD ICJ a thCMKand and Cne niDhtK i haNe Zeen JeACMntinD tC UCM the IaZReK CI FaKt aDeK and the ReDendK CI anAient EinDK SaU i SaEe KC ZCRd aK tC AJaNe a IaNCMJ CI UCMJ SaGeKtU eFiRCDMe taReK IJCS the thCMKand and Cne niDhtK’

‘w’ and ‘g’ combinations from ‘w’h, and ‘g’ht, respectively. start with ‘g’ first
for k,v in key1.iteritems(): if v == 'h': h_key = k for k,v in key1.iteritems(): if v == 't': t_key = k ght_pattern = '[A-Z]'+h_key+t_key ght_pattern = ght_pattern.upper() ght_pattern = re.findall(ght_pattern,ciphertext) ght_pattern_dict = {word:0 for word in ght_pattern} for word in ght_pattern: if word in ght_pattern: ght_pattern_dict[word]+=1 frequencies_ght = ght_pattern_dict.values() maxi = max(frequencies_ght) if maxi>1: for k,v in ght_pattern_dict.iteritems(): if v == maxi: print "likely match for 'g' is, ",k[0] k1 = k k = k[0:1].lower() key1[k]= 'g' print key1 plaintext_back = call_cipher_analysis(key1,ciphertext) plaintext_back

gives:

likely match for ‘g’ is, D
{‘b’: ‘h’, ‘d’: ‘g’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’}

‘nCQ dMJing thiK tiSe KhahJaWad had ZCJne Eing KhahJiUaJ thJee KCnK Cn the thCMKand and IiJKt night Qhen Khe had ended the taRe CI SaaJMI Khe JCKe and EiKKed the gJCMnd ZeICJe hiS KaUing gJeat Eing ICJ a thCMKand and Cne nightK i haNe Zeen JeACMnting tC UCM the IaZReK CI FaKt ageK and the RegendK CI anAient EingK SaU i SaEe KC ZCRd aK tC AJaNe a IaNCMJ CI UCMJ SaGeKtU eFiRCgMe taReK IJCS the thCMKand and Cne nightK’

Phase 2, at this point, automated frequency analysis is less possible. use English language dictionary to ascertain missing letters…

Prepare dictionary to match longest string in ciphertext:

def init_word_length_calc(plaintext_back): cipher_text_list = plaintext_back.split() word_lengths = [] for word in cipher_text_list: word_lengths.append(len(word)) max_length = max(word_lengths) i = 0 for num in word_lengths: if num == max_length: index = i i += 1 long_cipher_pattern = cipher_text_list[index] return long_cipher_pattern, max_length, word_lengths

def repeat_word_length_calc(plaintext_back,word_lengths,ix): cipher_text_list = plaintext_back.split() i=0 for num in word_lengths: if num == ix: index = i i += 1 long_cipher_pattern = cipher_text_list[index] return long_cipher_pattern

create search pattern
def search_pattern_create(long_cipher_pattern): letter_split = [] for letter in long_cipher_pattern: letter_split.append(letter) letter_split letter_convert = [] for letter in letter_split: if letter.islower()==True: letter_convert.append(letter) else: letter_convert.append('[a-z]') pattern = ''.join(letter_convert) return pattern

def create_lookup_list(max_length): #open dictionary and create the set of words to match pattern against. with open("C:\\Users\\HP\\Desktop\\words.txt") as f: lines = f.read().splitlines() word_length_match = [] for word in lines: if len(word)==max_length: word_length_match.append(word) dictionary_options_string = ' '.join(word_length_match) return dictionary_options_string

match the cipher_letters to the plaintext letters from dictionary lookup
def return_matched_keys(outcome,long_cipher_pattern): outcome = str(outcome[0]) outcome_list = [] long_cipher_pattern_list = [] for letter in outcome: outcome_list.append(letter) long_cipher_pattern = long_cipher_pattern.lower() for letter in long_cipher_pattern: long_cipher_pattern_list.append(letter) match_up_tuples = zip(outcome_list,long_cipher_pattern_list) dictionary_match_up = {} for a,b in match_up_tuples: if a != b: dictionary_match_up[b] = a return dictionary_match_up

FROM THIS POINT ONWARDS, IT IS CASE OF RUNNING THE PROCEDURE STEP-BY-STEP.
#first pass

long_cipher_pattern, max_length, word_lengths = init_word_length_calc(plaintext_back) pattern = search_pattern_create(long_cipher_pattern) dictionary_options_string = create_lookup_list(max_length) print long_cipher_pattern print pattern print max_length #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:
JeACMnting
[a-z]e[a-z][a-z][a-z]nting
10
[]

if outcome is null, run again for next word-length down
ix = max_length-1 long_cipher_pattern = repeat_word_length_calc(plaintext_back,word_lengths,ix) print long_cipher_pattern pattern = search_pattern_create(long_cipher_pattern) print pattern dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:
KhahJiUaJ
[a-z]hah[a-z]i[a-z]a[a-z]
[]

if outcome is null, run again for next word-length down
ix = max_length-2 long_cipher_pattern = repeat_word_length_calc(plaintext_back,word_lengths,ix) print long_cipher_pattern pattern = search_pattern_create(long_cipher_pattern) print pattern dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

thCMKand
th[a-z][a-z][a-z]and
[‘thousand’]

we have a match

Get the new keys from the match
dictionary_match_up = return_matched_keys(outcome,long_cipher_pattern) print "keys to add: ",dictionary_match_up #Add them to the main keys1 dictionary, decipher, and get new plaintext for k,v in dictionary_match_up.iteritems(): key1[k] = v print key1 plaintext_back = call_cipher_analysis(key1,ciphertext) plaintext_back

gives:

keys to add: {‘c’: ‘o’, ‘m’: ‘u’, ‘k’: ‘s’}
{‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’}

‘noQ duJing this tiSe shahJaWad had ZoJne Eing shahJiUaJ thJee sons on the thousand and IiJst night Qhen she had ended the taRe oI SaaJuI she Jose and Eissed the gJound ZeIoJe hiS saUing gJeat Eing IoJ a thousand and one nights i haNe Zeen JeAounting to Uou the IaZRes oI Fast ages and the Regends oI anAient Eings SaU i SaEe so ZoRd as to AJaNe a IaNouJ oI UouJ SaGestU eFiRogue taRes IJoS the thousand and one nights’

Run again for next word-length lookup.
ix = max_length-3 long_cipher_pattern = repeat_word_length_calc(plaintext_back,word_lengths,ix) print long_cipher_pattern pattern = search_pattern_create(long_cipher_pattern) print pattern dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:

SaGestU
[a-z]a[a-z]est[a-z]
[‘majesty’]

gives:

keys to add: {‘s’: ‘m’, ‘u’: ‘y’, ‘g’: ‘j’}
{‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘g’: ‘j’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘s’: ‘m’, ‘u’: ‘y’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’}

‘noQ duJing this time shahJaWad had ZoJne Eing shahJiyaJ thJee sons on the thousand and IiJst night Qhen she had ended the taRe oI maaJuI she Jose and Eissed the gJound ZeIoJe him saying gJeat Eing IoJ a thousand and one nights i haNe Zeen JeAounting to you the IaZRes oI Fast ages and the Regends oI anAient Eings may i maEe so ZoRd as to AJaNe a IaNouJ oI youJ majesty eFiRogue taRes IJom the thousand and one nights’

Now select individuals cipher-patterns to try:
cipher_pattern = 'IiJst' pattern = search_pattern_create(cipher_pattern) print pattern ix = len(cipher_pattern) print ix dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:

[a-z]i[a-z]st
5
[‘first’, ‘midst’]

Select ‘first’ as better option and get new keys from the match
dictionary_match_up = return_matched_keys(outcome[0],cipher_pattern) print "keys to add: ",dictionary_match_up #Add them to the main keys1 dictionary, decipher, and get new plaintext for k,v in dictionary_match_up.iteritems(): key1[k] = v print key1 plaintext_back = call_cipher_analysis(key1,ciphertext) plaintext_back

keys to add: {‘i’: ‘f’}
{‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘g’: ‘j’, ‘i’: ‘f’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘s’: ‘m’, ‘u’: ‘y’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’}

‘noQ duJing this time shahJaWad had ZoJne Eing shahJiyaJ thJee sons on the thousand and fiJst night Qhen she had ended the taRe of maaJuf she Jose and Eissed the gJound ZefoJe him saying gJeat Eing foJ a thousand and one nights i haNe Zeen JeAounting to you the faZRes of Fast ages and the Regends of anAient Eings may i maEe so ZoRd as to AJaNe a faNouJ of youJ majesty eFiRogue taRes fJom the thousand and one nights’

Now select further cipher-patterns to try:

cipher_pattern = 'Zefore' pattern = search_pattern_create(cipher_pattern) print pattern ix = len(cipher_pattern) print ix dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:

[a-z]efore
6
[‘before’]

Get new keys from the match
dictionary_match_up = return_matched_keys(outcome,cipher_pattern) print "keys to add: ",dictionary_match_up #Add them to the main keys1 dictionary, decipher, and get new plaintext for k,v in dictionary_match_up.iteritems(): key1[k] = v print key1 plaintext_back = call_cipher_analysis(key1,ciphertext) plaintext_back

keys to add: {‘z’: ‘b’}
{‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘g’: ‘j’, ‘i’: ‘f’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘s’: ‘m’, ‘u’: ‘y’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’, ‘z’: ‘b’}

‘noQ duJing this time shahJaWad had boJne Eing shahJiyaJ thJee sons on the thousand and fiJst night Qhen she had ended the taRe of maaJuf she Jose and Eissed the gJound befoJe him saying gJeat Eing foJ a thousand and one nights i haNe been JeAounting to you the fabRes of Fast ages and the Regends of anAient Eings may i maEe so boRd as to AJaNe a faNouJ of youJ majesty eFiRogue taRes fJom the thousand and one nights’

Approaching the end, select another cipher-pattern to try…
cipher_pattern = 'eFiRogue' pattern = search_pattern_create(cipher_pattern) print pattern ix = len(cipher_pattern) print ix dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

gives:

e[a-z]i[a-z]ogue
8
[‘epilogue’]

Get new keys from the match
dictionary_match_up = return_matched_keys(outcome,cipher_pattern) print "keys to add: ",dictionary_match_up

#Add them to the main keys1 dictionary, decipher, and get new plaintext
for k,v in dictionary_match_up.iteritems():
key1[k] = v
print key1
plaintext_back = call_cipher_analysis(key1,ciphertext)
plaintext_back

gives:

keys to add: {‘r’: ‘l’, ‘f’: ‘p’}
{‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘g’: ‘j’, ‘f’: ‘p’, ‘i’: ‘f’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘s’: ‘m’, ‘r’: ‘l’, ‘u’: ‘y’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’, ‘z’: ‘b’}

‘noQ duJing this time shahJaWad had boJne Eing shahJiyaJ thJee sons on the thousand and fiJst night Qhen she had ended the tale of maaJuf she Jose and Eissed the gJound befoJe him saying gJeat Eing foJ a thousand and one nights i haNe been JeAounting to you the fables of past ages and the legends of anAient Eings may i maEe so bold as to AJaNe a faNouJ of youJ majesty epilogue tales fJom the thousand and one nights’

cipher_pattern = 'anAient' pattern = search_pattern_create(cipher_pattern) print pattern ix = len(cipher_pattern) print ix dictionary_options_string = create_lookup_list(ix) #run the comparison, and review result outcome = re.findall(pattern, dictionary_options_string) print outcome

an[a-z]ient
7
[‘ancient’]

dictionary_match_up = return_matched_keys(outcome,cipher_pattern) print "keys to add: ",dictionary_match_up #Add them to the main keys1 dictionary, decipher, and get new plaintext for k,v in dictionary_match_up.iteritems(): key1[k] = v print key1 plaintext_back = call_cipher_analysis(key1,ciphertext) plaintext_back

gives:

keys to add: {‘a’: ‘c’}
{‘a’: ‘c’, ‘c’: ‘o’, ‘b’: ‘h’, ‘d’: ‘g’, ‘g’: ‘j’, ‘f’: ‘p’, ‘i’: ‘f’, ‘k’: ‘s’, ‘m’: ‘u’, ‘l’: ‘t’, ‘o’: ‘e’, ‘p’: ‘n’, ‘s’: ‘m’, ‘r’: ‘l’, ‘u’: ‘y’, ‘v’: ‘d’, ‘y’: ‘i’, ‘x’: ‘a’, ‘z’: ‘b’}

‘noQ duJing this time shahJaWad had boJne Eing shahJiyaJ thJee sons on the thousand and fiJst night Qhen she had ended the tale of maaJuf she Jose and Eissed the gJound befoJe him saying gJeat Eing foJ a thousand and one nights i haNe been Jecounting to you the fables of past ages and the legends of ancient Eings may i maEe so bold as to cJaNe a faNouJ of youJ majesty epilogue tales fJom the thousand and one nights’

This process can be repeated a further four or five times, to yield:
‘now duJing this time shahJaWad had boJne king shahJiyaJ thJee sons on the thousand and fiJst night when she had ended the tale of maaJuf she Jose and kissed the gJound befoJe him saying gJeat king foJ a thousand and one nights i haNe been Jecounting to you the fables of past ages and the legends of ancient kings may i make so bold as to cJaNe a faNouJ of youJ majesty epilogue tales fJom the thousand and one nights’

A third and final phase, which I did not implement, could be to infer the missing letters from a dictionary.
I decided not to do that because the keyphrase is based on a title of a book and I am unsure how a look-up of this kind could be (simply) incoprorated into an stored procedure.

Frequency analysis (substitution)

Share this: