--Stay hungry, stay foolish.
--Forever young, forever weep.
Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.
You have the following 3 operations permitted on a word:
Example 1:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')
Example 2:
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')
class Solution: def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ from itertools import chain # heuristic function from difflib import SequenceMatcher # replace charactor string generator def replaceGenerator(word1, word2): for i in range(min(len(word1), len(word2))): if word1[i] == word2[i]: continue yield word1[:i]+word2[i]+word1[i+1:] # delete charactor string generator def deleteGenerator(word1): for i in range(len(word1)): yield word1[:i]+word1[i+1:] # insert charactor string generator def insertGenerator(word1, word2): for i in range(len(word1)+1): if i < len(word2): yield word1[:i]+word2[i]+word1[i:] min_d = 0 while word1!=word2: cur_ratio = SequenceMatcher(None, word1, word2).ratio() max_improve = 0 best_next_w1 = word1 for tp_word1 in chain(replaceGenerator(word1, word2), deleteGenerator(word1), insertGenerator(word1, word2)): tp_ratio = SequenceMatcher(None, tp_word1, word2).ratio() if tp_ratio-cur_ratio > max_improve: max_improve = tp_ratio-cur_ratio best_next_w1 = tp_word1 word1 = best_next_w1 # print(word1) min_d += 1 return min_d
优点:速度快
缺点:可能找到的不是最优解
例如:
>>> s = Solution() >>> s.minDistance("dinitrophenylhydrazine", "acetylphenylhydrazine") 7
实际应该为6,可是得到的结果为7,而且因为每次都选择最优的,所以每次的结果都一样。一种解决思路是使用首优搜索,并使用多次重启爬山法。但是这个题目有点特殊,因为每次换一个字符,一定比之前的字符串匹配度高,所以首优并没有意义。
再来看看其时间效率:
>>> %time s.minDistance( "pneumonoultramicroscopicsilicovolcanoconiosis", "stereomicroscopically") CPU times: user 130 ms, sys: 2.71 ms, total: 133 ms Wall time: 132 ms 30
其时间效率还好。
class Solution: def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ from itertools import chain from collections import namedtuple # heuristic function from difflib import SequenceMatcher # replace charactor string generator def replaceGenerator(word1, word2): for i in range(min(len(word1), len(word2))): if word1[i] == word2[i]: continue yield word1[:i]+word2[i]+word1[i+1:] # delete charactor string generator def deleteGenerator(word1, word2): for i in range(len(word1)): if i<len(word2) and word1[i]==word2[i]: continue yield word1[:i]+word1[i+1:] # insert charactor string generator def insertGenerator(word1, word2): for i in range(len(word1)+1): if i < len(word2): yield word1[:i]+word2[i]+word1[i:] min_d = 0 Beam = namedtuple('Beam', 'ratio word1') beams = [Beam(0, word1)] max_beams_cnt = 3 visited = {beams[0].word1} exploded = set() while beams[0].word1!=word2: tp_beams = beams[:] for beam in tp_beams: word1 = beam.word1 if word1 in exploded: continue else: exploded.add(word1) cur_ratio = SequenceMatcher(None, word1, word2).ratio() for tp_word1 in chain(replaceGenerator(word1, word2), deleteGenerator(word1, word2), insertGenerator(word1, word2)): if tp_word1 in visited: continue else: visited.add(tp_word1) tp_ratio = SequenceMatcher(None, tp_word1, word2).ratio() if tp_ratio-cur_ratio > 0: tp_beam = Beam(tp_ratio, tp_word1) for i in range(max_beams_cnt): if i >= len(beams): beams.append(tp_beam) break if beams[i].ratio < tp_beam.ratio: beams.insert(i, tp_beam) break min_d += 1 return min_d
正确性验证,还是用上面那个例子:
>>> s.minDistance("dinitrophenylhydrazine", "acetylphenylhydrazine") 6
正确性的问题改善了,不过并没有完全解决!
看看时间复杂度:
>>> %time s.minDistance( "pneumonoultramicroscopicsilicovolcanoconiosis", "stereomicroscopically") CPU times: user 359 ms, sys: 3.94 ms, total: 363 ms Wall time: 378 ms 30
时间复杂度不是很高,不过 LeetCode上就是通不过 Time Limit,可能是用了太多高级的类和函数吧。
最后上一个大神写的代码:
class Solution: def minDistance(self, word1, word2): """ :type word1: str :type word2: str :rtype: int """ a, b = len(word1)-1, len(word2)-1 mem = {} def dist(a, b): if a == -1: return b+1 if b == -1: return a+1 if (a,b) in mem: return mem[(a,b)] if word1[a] == word2[b]: d = dist(a-1, b-1) else: # dist(a-1,b-1)表示replace # dist(a-1,b)表示delete # dist(a,b-1)表示insert d = min(dist(a-1, b-1), dist(a-1, b), dist(a, b-1)) + 1 mem[(a,b)] = d return d return dist(a,b)
评价:都知道动态规划的问题基本上要用递归,不过关键是要知道如何分治。
测试结果:
>>> %time s.minDistance( "pneumonoultramicroscopicsilicovolcanoconiosis", "stereomicroscopically") CPU times: user 353 ms, sys: 2.6 ms, total: 356 ms Wall time: 358 ms 30
速度和上面的局部束搜索差不多,比爬山法还慢一点。不过就是通过了 Time Limit。
基本上来说,爬山法都是最快的。