利用 Rasa NLU / Rasa Core 來建立中文的 chatbot

Kiwi lee
15 min readApr 16, 2018

--

Rasa NLU: https://github.com/RasaHQ/rasa_nlu
Rasa Core: https://github.com/RasaHQ/rasa_core

0. Prepare Chinese NER

Reference website:

用Rasa NLU构建自己的中文NLU系统

利用 MITIE 做訓練

  1. Build wordrep 這個工具即可
git clone https://github.com/mit-nlp/MITIE.git
cd MITIE/tools/wordrep
mkdir build && cd build
cmake ..
cmake --build . --config Release
  1. 訓練模型
    這個模型訓練可能會花費很多時間
./wordrep -e /path/to/your/folder_of_cutted_text_files
  1. 主要的模型
total_word_feature_extractor_zh.dat

1. Create Intent corpus

1–1 Example

intent text
greet 你好
greet 早安
goodbye 再見
goodbye 掰掰
ask_weather 今天天氣如何
ask_weather 天氣如何
ask_weather 今天天氣可好嗎

1–2 Data format

{
"rasa_nlu_data": {
"common_examples": [
{
"text": "你好",
"intent": "greet",
"entities": []
},
{
"text": "再見",
"intent": "greet",
"entities": []
}
]
}
}

1–3 Online view

因為直接輸入 json 太麻煩,所以 rasa_nlu有提供一個方便的tool,讓你可以簡單的新增 utterance, intent 及 entities

rasa-nlu-trainer <your-intent-corpus> 

2. Setting ur config rasa NLU Configuration

2–1 NLU_model save format

{project_path}/{project_name}/{model_name}
  • project_path 底下會有很多不同應用的 project_name ,而每個 project 底下可能會有多個不同情況的 model
  1. project_path 是你設定的 path 路徑(預設 projects),
  2. project_name 是你指定的 project 名稱(預設 default),
  3. model_name 是你設定要儲存的 model 名稱(預設是 model 配上時間資訊)

2–2 Example

pipeline is the procedure of ur data being analyzed [More information]

这里也可以看到Rasa NLU的工作流程。”nlp_mitie”初始化MITIE,”tokenizer_jieba”用jieba来做分词,”ner_mitie”和”ner_synonyms”做实体识别,”intent_featurizer_mitie”为意图识别做特征提取,”intent_classifier_sklearn”使用sklearn做意图识别的分类。

{
"project": "your-project-name",
"fixed_model_name": "<the model name u want to save>",
"pipeline": [
"nlp_mitie",
"tokenizer_jieba",
"ner_mitie",
"ner_synonyms",
"intent_featurizer_mitie",
"intent_classifier_sklearn"
],
"mitie_file": "<the mitie-model we pretrained>",
"path": "<where ur model save>",
"data": "<where ur intent corpus>",
"response_log:": "<where ur log file>"
}

3. Train NLU and Test

3–1 Train NLU

將 config 裡的資訊都設定好,intent-corpus 準備好,然後就可以開始訓練了

  • 方法一 (推薦): 使用rasa NLU內建的訓練工具,需注意所有的資訊都要設定好在 config 檔
python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.json
  • 方法二: 使用 python 來建立,相較於方法一可能會有些沒考慮到
from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer
from rasa_nlu.model import Metadata, Interpreter
def train_nlu(data, config, model_folder):
'''Train NLU with config.json and data, and then dump into folder
'''
# Load training data in sepcific traing
training_data = load_data(data)
# Use config
trainer = Trainer(RasaNLUConfig(config))
# Train
trainer.train(training_data)
# Dump model by giving model path and name
model_directory = trainer.persist(model_folder, fixed_model_name='weather')
if __name__ == '__main__':
train_nlu('./data/data.json', 'config_spacy.json', './models/nlu')

3–2 Test

  1. 用 terminal 測試模型 More information on rasa NLU
def run_nlu():
interpreter = Interpreter.load('./models/nlu/default/weather', RasaNLUConfig('config_spacy.json'))
print(interpreter.parse(u"I am planning my holiday to Barcelona. I wonder what is the weather out there."))
if __name__ == '__main__':
run_nlu()
  1. rasa NLU 內建的工具來測試 More information on rasa NLU
  • 啟動 server
python -m rasa_nlu.server -c config.json
  • 查詢 server 的狀態
curl localhost:5000/status
  • 測試句子擷取結果
curl -XPOST localhost:5000/parse -d '{"q": "hi"}'

4. Start of Rasa Core more information on Rasa Core:

4–1 Basic voc for domain knowledge:

  • Slots: the entities which have to be tracked for chatbot to do QA
  • intents: the intents what user(NLU) will get
  • entities: what variables will be found
  • templates: some templates for chatbot to reply immediatly without running action.
  • actions: including templates and some QA research on backend

4–2 Slots and entities:

  • entity: some useful noum in utterance
  • slot: some useful entities need to be tracked on this dialogue

4–3 Intents and templates:

  • intent: what nlu get
  • template: default response for chatbot

4–4 Templates and actions:

  • Templates: some default response for chatbot ask user to get more slot information.
  • Actions: including templates action. Do the backend search.

4–5 Example:

slots:
deviceID:
type: text
location:
type: text
warranty_date:
type: unfeaturized
entities:
- location
- deviceID
intents:
- greet
- goodbye
- affirm
- warranty_search
- location_search
- search
templates:
utter_greet:
- '安安'
- '你賀'
utter_goodbye:
- '再見'
utter_affirm:
- '收到'
utter_doing_search:
- '好,馬上幫你查詢'
utter_ask_for_help:
- '請問有什麼需要協助的嗎?'
utter_ask_location:
- '請問你想找哪個地方?'
utter_ask_deviceID:
- '請問你的產品序號是?'

actions:
- utter_greet
- utter_goodbye
- utter_affirm
- utter_doing_search
- utter_ask_for_help
- utter_ask_location
- utter_ask_deviceID
- warranty_bot.Device_search_action
- warranty_bot.Location_search_action

4–6 Build ur own domain.yml

5. Build story More information on Rasa Core

5–1. why

藉由 user 與 chatbot 的對話,來讓chatbot模擬可能的對話

5–2. Data format

範例

## story_07715946                     
* greet
- action_ask_howcanhelp
* inform{"location": "rome", "price": "cheap"}
- action_on_it
- action_ask_cuisine
* inform{"cuisine": "spanish"}
- action_ask_numpeople
* inform{"people": "six"}
- action_ack_dosearch

5–3. Visualization

python -m rasa_core.visualize -d concert_domain.yml -s data/stories.md -o graph.png

6. Build action of chatbot

6–1. Why?

讓 chatbot 不是只有 default 的 utterance reply,而能做更多的事情

6–2. Example

  • 以下面範例來說明,class Location_search_action 需前面的 domain.yml 做引入才會被 chatbot 知道有這個是 action API。
  • Action API 的接口會是在此class的def run,chatbot 會傳入多組參數如 dispatcher, tracker, domain
  • 在 story 裡 action 的名稱則是使用此 class 的def name的回傳值。
  • 通常我會在 Action API 在做另一個 API ,專門做 chatbot 需要針對 backend 做的動作,以方便未來可以獨立操作及測試
class LocationSearchAPI(object):
def search(self, info):
return '我找不到據點也QQ'
class Location_search_action(Action):
def name(self):
return 'location_search_action'
def run(self, dispatcher, tracker, domain):
res_api = LocationSearchAPI()
ress = res_api.search(tracker.get_slot('location'))
dispatcher.utter_message(ress)
return []

6–3. 一些 function 在 def run

  • 使用 template 的回話
dispatcher.utter_template(utter_reply, my_variable='some')
  • 回傳 message
dispatcher.utter_message(msg)
  • 取得 slot 資訊
tracker.get_slot(<slot key>)

6–4. rasa_core.events 的功用

7. Train Core

7–1 Training

  • 下面是範例的函數,domain_file 是 domain.yml ,model_path 是你要儲存訓練完後模型的資料夾,story_file 則是 story.md 放對話紀錄的地方。
  • policies 是模型訓練的方法,官方說用這兩個就很不錯了
def train_dialogue(domain_file, model_path, story_file):
agent = Agent(domain_file, policies=[MemoizationPolicy(), KerasPolicy()])
agent.train(story_file,
max_history=3,
epochs=400,
batch_size=10,
validation_split=0.2
)
agent.persist(model_path)
return agent

7–2 Training online

  • 有時訓練資料不夠時,chatbot 不會很聰明,但 Rasa Core 提供了 online training 的方法來協助我們增加資料
  • interpreter 是我們先前訓練的 rasa NLU 模型,用來分析對話的 intent 與 entity。
# agent main function and policy
from rasa_core.agent import Agent
from rasa_core.policies.memoization import MemoizationPolicy
from rasa_core.policies.keras_policy import KerasPolicy
# Online
from rasa_core.channels.console import ConsoleInputChannel
from rasa_core.interpreter import RasaNLUInterpreter
# function
def train_dialogue_online(input_channel, interpreter,
domain_file, training_data_file):
agent = Agent(domain_file,
policies=[MemoizationPolicy(), KerasPolicy()],
interpreter=interpreter)
agent.train_online(training_data_file, input_channel=input_channel,
max_history=3,
batch_size=50,
epochs=200,
max_training_samples=300)
return agent
# interpreter
nlu_interpreter = RasaNLUInterpreter(nlu_model_path)
# function usage
agent = train_dialogue_online(ConsoleInputChannel(), nlu_interpreter, domain_file, training_data_file)

8. Summary

目前還沒有應用,只有先掃過一遍,用自身的 data 來亂生成模型。覺得這裡面的邏輯及回覆都需要專人協助安排,若直接由 RD 設計,會很慘,chatbot 可能反而帶來反效果。

--

--