利用 Rasa NLU / Rasa Core 來建立中文的 chatbot

Kiwi lee

15 min readApr 16, 2018

Rasa NLU: https://github.com/RasaHQ/rasa_nlu
Rasa Core: https://github.com/RasaHQ/rasa_core

0. Prepare Chinese NER

Reference website:

用Rasa NLU构建自己的中文NLU系统

利用 MITIE 做訓練

Build wordrep 這個工具即可

git clone https://github.com/mit-nlp/MITIE.git
cd MITIE/tools/wordrep
mkdir build && cd build 
cmake ..
cmake --build . --config Release

訓練模型
這個模型訓練可能會花費很多時間

./wordrep -e /path/to/your/folder_of_cutted_text_files

主要的模型

total_word_feature_extractor_zh.dat

1. Create Intent corpus

1–1 Example

intent text
greet 你好
greet 早安
goodbye 再見
goodbye 掰掰
ask_weather 今天天氣如何
ask_weather 天氣如何
ask_weather 今天天氣可好嗎

1–2 Data format

{
  "rasa_nlu_data": {
    "common_examples": [
      {
        "text": "你好",
        "intent": "greet",
        "entities": []
      },
      {
        "text": "再見",
        "intent": "greet",
        "entities": []
      }
    ]
  }
}

1–3 Online view

因為直接輸入 json 太麻煩，所以 rasa_nlu有提供一個方便的tool，讓你可以簡單的新增 utterance, intent 及 entities

rasa-nlu-trainer <your-intent-corpus>

2. Setting ur config rasa NLU Configuration

2–1 NLU_model save format

{project_path}/{project_name}/{model_name}

project_path 底下會有很多不同應用的 project_name ，而每個 project 底下可能會有多個不同情況的 model

project_path 是你設定的 path 路徑(預設 projects)，
project_name 是你指定的 project 名稱(預設 default），
model_name 是你設定要儲存的 model 名稱(預設是 model 配上時間資訊)

2–2 Example

pipeline is the procedure of ur data being analyzed [More information]

这里也可以看到Rasa NLU的工作流程。”nlp_mitie”初始化MITIE，”tokenizer_jieba”用jieba来做分词，”ner_mitie”和”ner_synonyms”做实体识别，”intent_featurizer_mitie”为意图识别做特征提取，”intent_classifier_sklearn”使用sklearn做意图识别的分类。

{
    "project": "your-project-name",
    "fixed_model_name": "<the model name u want to save>",
    "pipeline": [
        "nlp_mitie", 
        "tokenizer_jieba",
        "ner_mitie",
        "ner_synonyms",
        "intent_featurizer_mitie",
        "intent_classifier_sklearn"
    ],
    "mitie_file": "<the mitie-model we pretrained>",
    "path": "<where ur model save>",
    "data": "<where ur intent corpus>",
    "response_log:": "<where ur log file>"
}

3. Train NLU and Test

3–1 Train NLU

將 config 裡的資訊都設定好，intent-corpus 準備好，然後就可以開始訓練了

方法一 (推薦): 使用rasa NLU內建的訓練工具，需注意所有的資訊都要設定好在 config 檔

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.json

方法二: 使用 python 來建立，相較於方法一可能會有些沒考慮到

from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer
from rasa_nlu.model import Metadata, Interpreterdef train_nlu(data, config, model_folder):
    '''Train NLU with config.json and data, and then dump into folder 
    '''
    # Load training data in sepcific traing 
    training_data = load_data(data)    # Use config
    trainer = Trainer(RasaNLUConfig(config))    # Train
    trainer.train(training_data)    # Dump model by giving model path and name
    model_directory = trainer.persist(model_folder, fixed_model_name='weather')if __name__ == '__main__':
    train_nlu('./data/data.json', 'config_spacy.json', './models/nlu')

3–2 Test

用 terminal 測試模型 More information on rasa NLU

def run_nlu():
    interpreter = Interpreter.load('./models/nlu/default/weather', RasaNLUConfig('config_spacy.json'))
    print(interpreter.parse(u"I am planning my holiday to Barcelona. I wonder what is the weather out there."))if __name__ == '__main__':
    run_nlu()

用 rasa NLU 內建的工具來測試 More information on rasa NLU

啟動 server

python -m rasa_nlu.server -c config.json

查詢 server 的狀態

curl localhost:5000/status

測試句子擷取結果

curl -XPOST localhost:5000/parse -d '{"q": "hi"}'

4. Start of Rasa Core more information on Rasa Core:

4–1 Basic voc for domain knowledge:

Slots: the entities which have to be tracked for chatbot to do QA
intents: the intents what user(NLU) will get
entities: what variables will be found
templates: some templates for chatbot to reply immediatly without running action.
actions: including templates and some QA research on backend

4–2 Slots and entities:

entity: some useful noum in utterance
slot: some useful entities need to be tracked on this dialogue

4–3 Intents and templates:

intent: what nlu get
template: default response for chatbot

4–4 Templates and actions:

Templates: some default response for chatbot ask user to get more slot information.
Actions: including templates action. Do the backend search.

4–5 Example:

slots:
  deviceID:
    type: text
  location:
    type: text
  warranty_date:
    type: unfeaturizedentities:
  - location
  - deviceIDintents:
  - greet
  - goodbye
  - affirm
  - warranty_search
  - location_search
  - searchtemplates:
  utter_greet:
    - '安安'
    - '你賀'
  utter_goodbye:
    - '再見'
  utter_affirm:
    - '收到'
  utter_doing_search:
    - '好，馬上幫你查詢'
  utter_ask_for_help:
    - '請問有什麼需要協助的嗎?'
  utter_ask_location:
    - '請問你想找哪個地方?'
  utter_ask_deviceID:
    - '請問你的產品序號是?'
  
actions:
  - utter_greet
  - utter_goodbye
  - utter_affirm
  - utter_doing_search
  - utter_ask_for_help
  - utter_ask_location
  - utter_ask_deviceID
  - warranty_bot.Device_search_action
  - warranty_bot.Location_search_action

4–6 Build ur own `domain.yml`

5. Build story More information on Rasa Core

5–1. why

藉由 user 與 chatbot 的對話，來讓chatbot模擬可能的對話

5–2. Data format

範例

## story_07715946                     
* greet
    - action_ask_howcanhelp
* inform{"location": "rome", "price": "cheap"}
    - action_on_it                     
    - action_ask_cuisine
* inform{"cuisine": "spanish"}
    - action_ask_numpeople             
* inform{"people": "six"}
    - action_ack_dosearch

5–3. Visualization

python -m rasa_core.visualize -d concert_domain.yml -s data/stories.md -o graph.png

6. Build action of chatbot

6–1. Why?

讓 chatbot 不是只有 default 的 utterance reply，而能做更多的事情

6–2. Example

以下面範例來說明，class Location_search_action 需前面的 domain.yml 做引入才會被 chatbot 知道有這個是 action API。
Action API 的接口會是在此class的def run，chatbot 會傳入多組參數如 dispatcher, tracker, domain。
在 story 裡 action 的名稱則是使用此 class 的def name的回傳值。
通常我會在 Action API 在做另一個 API ，專門做 chatbot 需要針對 backend 做的動作，以方便未來可以獨立操作及測試

class LocationSearchAPI(object):
    def search(self, info):
        return '我找不到據點也QQ'class Location_search_action(Action):
    def name(self):
        return 'location_search_action'    def run(self, dispatcher, tracker, domain):
        res_api = LocationSearchAPI()
        ress = res_api.search(tracker.get_slot('location'))
        dispatcher.utter_message(ress)
        return []

6–3. 一些 function 在 `def run`

使用 template 的回話

dispatcher.utter_template(utter_reply, my_variable='some')

回傳 message

dispatcher.utter_message(msg)

取得 slot 資訊

tracker.get_slot(<slot key>)

6–4. `rasa_core.events` 的功用

作用於 def run 的回傳值
可透過回傳來調整 chatbot 的狀態，按照順序一個一個執行
舉些例子：
SlotSet(key, value=None, timestamp=None): 可以設定 slot 資訊
Reminder: 可以設定多久之後通知使用者，詳細請見 https://core.rasa.com/scheduling.html
更多例子，請見 https://core.rasa.com/api/events.html

7. Train Core

7–1 Training

下面是範例的函數，domain_file 是 domain.yml ，model_path 是你要儲存訓練完後模型的資料夾，story_file 則是 story.md 放對話紀錄的地方。
policies 是模型訓練的方法，官方說用這兩個就很不錯了

def train_dialogue(domain_file, model_path, story_file):
    agent = Agent(domain_file, policies=[MemoizationPolicy(), KerasPolicy()])
    agent.train(story_file, 
                max_history=3, 
                epochs=400,
                batch_size=10,
                validation_split=0.2
               )
    agent.persist(model_path)
    return agent

7–2 Training online

有時訓練資料不夠時，chatbot 不會很聰明，但 Rasa Core 提供了 online training 的方法來協助我們增加資料
interpreter 是我們先前訓練的 rasa NLU 模型，用來分析對話的 intent 與 entity。

# agent main function and policy
from rasa_core.agent import Agent
from rasa_core.policies.memoization import MemoizationPolicy
from rasa_core.policies.keras_policy import KerasPolicy# Online 
from rasa_core.channels.console import ConsoleInputChannel
from rasa_core.interpreter import RasaNLUInterpreter# function
def train_dialogue_online(input_channel, interpreter, 
                          domain_file, training_data_file):
    agent = Agent(domain_file, 
                  policies=[MemoizationPolicy(), KerasPolicy()], 
                  interpreter=interpreter)
    agent.train_online(training_data_file, input_channel=input_channel,
                       max_history=3, 
                       batch_size=50,
                       epochs=200,
                       max_training_samples=300)
    return agent# interpreter 
nlu_interpreter = RasaNLUInterpreter(nlu_model_path)# function usage
agent = train_dialogue_online(ConsoleInputChannel(), nlu_interpreter, domain_file, training_data_file)

8. Summary

目前還沒有應用，只有先掃過一遍，用自身的 data 來亂生成模型。覺得這裡面的邏輯及回覆都需要專人協助安排，若直接由 RD 設計，會很慘，chatbot 可能反而帶來反效果。