知行编程网知行编程网  2022-12-24 03:00 知行编程网 隐藏边栏  5 
文章评分 0 次,平均分 0.0
导语: 本文主要介绍了关于python实现查询纠错的相关知识,希望可以帮到处于编程学习途中的小伙伴

Python实现查询纠错

python实现查询纠错的方法:

方法一:

1. 输入一个拼错的单词,调用aspell -a 得到一些候选正确的单词,然后使用距离编辑进一步选择更准确的单词。例如,运行 aspell -a 并输入 'hella' 得到以下结果:

hell, Helli, hello, heal, Heall, he’ll, hells, Heller, Ella, Hall, Hill, Hull, hall, heel, hill, hula, hull, Helga, Helsa, Bella, Della, Mella, Sella, fella, Halli, Hally, Hilly, Holli, Holly, hallo, hilly, holly, hullo, Hell’s, hell’s

2.什么是Edit-Distance(也叫Levenshtein算法)?也就是说,给定一个词,枚举单个字符多次插入、删除、交换、替换后所有可能的正确拼写,比如输入'hella',经过单个字符多次插入、删除、交换、替换后,操作变成:

‘helkla’, ‘hjlla’, ‘hylla’, ‘hellma’, ‘khella’, ‘iella’, ‘helhla’, ‘hellag’, ‘hela’, ‘vhella’, ‘hhella’, ‘hell’, ‘heglla’, ‘hvlla’, ‘hellaa’, ‘ghella’, ‘hellar’, ‘heslla’, ‘lhella’, ‘helpa’, ‘hello’, …

3.结合以上两组的结果,并兼顾一些理论知识,可以提高拼写检查的准确率。例如,一般来说,错误的单词是无意的或输入错误的。单词完全错的可能性很小,单词的第一个字母一般不会拼错。所以可以删除上面集合中首字母不匹配的词,比如:'Sella', 'Mella', khella', 'iella'等。这里VPSee不是删词,而是把这些词拿出来的队列,并把它们放在队列的末尾(优先级降低),所以匹配以h开头的词真的不可能匹配那些以其他字母开头的词。

4、程序中使用了外部工具aspell。如何在Python中捕获外部程序的输入和输出,以便在Python程序中处理这些输入和输出? Python 2.4之后引入了subprocess模块​​,可以用subprocess.Popen进行处理。

实现代码:

#!/usr/bin/python
# A simple spell checker
import os, sys, subprocess, signal
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def found(word, args, cwd = None, shell = True):
child = subprocess.Popen(args,
shell = shell,
stdin = subprocess.PIPE,
stdout = subprocess.PIPE,
cwd = cwd,
universal_newlines = True)
child.stdout.readline()
(stdout, stderr) = child.communicate(word)
if ": " in stdout:
# remove \n\n
stdout = stdout.rstrip("\n")
# remove left part until :
left, candidates = stdout.split(": ", 1)
candidates = candidates.split(", ")
# making an error on the first letter of a word is less
# probable, so we remove those candidates and append them
# to the tail of queue, make them less priority
for item in candidates:
if item[0] != word[0]:
candidates.remove(item)
candidates.append(item)
return candidates
else:
return None
# copy from http://norvig.com/spell-correct.html
def edits1(word):
n = len(word)
return set([word[0:i]+word[i+1:] for i in range(n)] +
[word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] +
[word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] +
[word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])
def correct(word):
candidates1 = found(word, 'aspell -a')
if not candidates1:
print "no suggestion"
return
candidates2 = edits1(word)
candidates = [] for word in candidates1:
if word in candidates2:
candidates.append(word)
if not candidates:
print "suggestion: %s" % candidates1[0] else:
print "suggestion: %s" % max(candidates)
def signal_handler(signal, frame):
sys.exit(0)
if __name__ == '__main__':
signal.signal(signal.SIGINT, signal_handler)
while True:
input = raw_input()
correct(input)

方法二:

当然,直接在程序中调用相关模块是最简单的。有一个名为 PyEnchant 的库支持拼写检查。安装好PyEnchant和Enchant后,可以直接在Python程序中导入:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"] 
>>>

本文为原创文章,版权归所有,欢迎分享本文,转载请保留出处!

知行编程网
知行编程网 关注:1    粉丝:1
这个人很懒,什么都没写
扫一扫二维码分享