rasa对话系统如何实现联想输入
rasa 文集
rasa文章导引(用于收藏)
联想输入
问题描述:输入关键词,从候选的问题中选择出相似度最大的前n个词。
具体示例:
候选问题如下:
- 笔记本死机了怎么办?
- 计算机死机了怎么办?
- 电脑卡死了?
- 电脑用着突然卡死了
- 手机死机了
- 电脑不能上网了
- 电脑死机了
提的问题如下:我的电脑死机了怎么办?
联想词搜索原理:https://blog.csdn.net/DusonBlog/article/details/52661237。
下面Demo使用倒排索引进行实现,参见:https://blog.csdn.net/u011239443/article/details/60604017
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
import jieba # 使用结巴分词器
import numpy
corpus = [
"笔记本死机了怎么办",
"计算机死机了怎么办",
"电脑卡死了",
"电脑用着突然卡死了",
"手机死机了",
"电脑不能上网了",
"电脑死机了"
]
def search_related_questions(question):
# 网上很容易就可以搜索到停用词表
stopwords = ['我', '了', '着', '的', '怎么办']
vocabulary = set()
corpus_tokens = []
for sentence in corpus:
# 分词
tokens = jieba.cut(sentence)
# 去除停用词
tokens_without_stops = [token for token in tokens if token not in stopwo rds]
corpus_tokens.append(tokens_without_stops)
for word in tokens_without_stops:
vocabulary.add(word)
vocabulary = sorted(list(vocabulary))
search_dict = {}
for word in vocabulary:
search_dict[word] = []
for i, tokens in enumerate(corpus_tokens):
count = tokens.count(word)
if count != 0:
search_dict[word].append([i, count])
# 进行检索
question_tokens = jieba.cut(question)
question_tokens_without_stops = [token for token in question_tokens if token not in stopwords]
# 相关的问题有
related_question = []
for token in question_tokens_without_stops:
if token in search_dict.keys():
related_question += search_dict[token]
# 相关问题匹配到的次数
related_question_dict = {}
for id_count_pair in related_question:
if id_count_pair[0] in related_question_dict:
related_question_dict[id_count_pair[0]] += id_count_pair[1]
else:
related_question_dict[id_count_pair[0]] = id_count_pair[1]
# 进行排序
sorted_question = sorted(related_question_dict.items(), key=lambda item:item [1], reverse=True)
# 输出相似问题排名
related_question_str = []
for question in sorted_question:
related_question_str.append(corpus[question[0]])
return related_question
if __name__ == '__main__':
question = "我的电脑死机了怎么办"
search_related_questions(question)
|
将rasa和联想输入结合
联想输出,通常情况下是针对question-answer系统。下面针对如何将联想输入结合到rasa 给出一定思路。可以在RestInput
添加相关实现,类似如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# file: rasa/core/channels/channel.py
class RestInput(InputChannel):
def blueprint(
self, on_new_message: Callable[[UserMessage], Awaitable[None]]
) -> Blueprint:
...
@custom_webhook.route("/suggest", methods=["POST"])
async def suggest(request: Requkest) -> HTTPResponse:
sender_id = await self._extract_sender(request)
# 用户输入的消息
text = self._extract_message(request)
related_questions = search_related_questions(text)
return response.json({"suggest": related_questions})
....
|