使用 AutoGen 打造多 AI 工作流 — Two-Agent Chat 與 Group Chat
Posted on Sep 23, 2024 in AutoGen by Amo Chen ‐ 11 min read
大家應該或多或少聽說過 ChatDev, Devika 等 AI 專案, 這些 AI 專案的特點在於多個 AI 角色協同完成工作,能夠完成多樣化的任務,例如制定軟體規格、撰寫程式碼以及測試軟體等不同層面的任務。
如果單純只有使用 ChatGPT 的經驗,應該很難想像 AI 是如何處理如此複雜的功能,但如果將這些任務像人類分工一樣,交由專門的 AI 或模型負責的話,這一切就變得合理與易於理解。
本文將介紹如何使用 AutoGen 打造多 AI 協同合作的工作流,讓大家能夠更加發揮 AI 的力量,打造更複雜的 AI 工作流程,解放 AI 在輔助日常工作的更多可能性。
本文環境
$ pip install pyautogen
Agentic Workflow 簡介
Agentic workflow 是由人工智慧專家吳恩達(Andrew Ng)在 Sequoia Capital AI Ascent 2024 的演講中所提到的一種 LLM 應用技術。
由於大家常用的其實是 1 種稱為 zero-shot prompting 的方法,所以我們先介紹一下何謂 zero-shot prompting。
Zero-shot prompting 是一種讓語言模型在沒有任何範例的情況下,完成某項任務的技術。例如,當我們直接告訴生成式 AI「請產生一篇關於義式料理的論文」,這個 prompt 並沒有提供生成式 AI 任何關於義式料理的範例或者額外的上下文情境(context),所以生成式 AI 就會按照它的理解直接產生答案。
Zero-shot prompting 其實跟人類行為的最大差異在於 Zero-shot prompting 沒有迭代、反思、改善等行為,以「產生一篇關於義式料理的論文」任務為例,它會自顧自地產生一篇關於義式料理的論文就結束了,而不像人類會對論文的章節、段落進行審視、迭代與修正,甚至添加各種文獻、佐證等等,藉此使得論文變得更加完整。
而 Agentic workflow 則是藉由將 1 個任務拆解成多個步驟,藉此讓每個步驟具有迭代、反思、改善的可能,同樣以「產生一篇關於義式料理的論文」任務為例,我們可以將任務拆解成:
- 確定論文題目
- 制定論文章節
- 搜尋相關文獻
- 研讀、摘要相關文獻
- 撰寫論文
- 審視論文
- 修正論文
- 定稿
上述步驟可以視需求加入迭代或反思的步驟,使得 GenAI 的行為更接近人類行為之外,也解決以往 zeor-shot prompting 無法迭代、反思、改善的缺點, 例如:
而且不同步驟還可以採用不同的 prompt 與語言模型,藉此改善最終生成的內容品質,或者控制整體的生成成本,例如我們可以僅某個步驟使用較昂貴的語言模型,剩餘的步驟則使用較便宜的語言模型。
吳恩達也提出關於 agentic workflow 的 4 種設計模式(design pattern):
- Reflection,流程加入反省的步驟,使得 GenAI 可以反省並修改生成的內容。
- Tool use,流程加入可以使用工具(或自行產生並執行程式碼、API )的步驟,例如讓 GenAI 可以搜尋 wikipedia。
- Planning,讓 GenAI 規劃任務細項、流程等等,並且按照規劃執行。
- Multi-agent collaboration,讓多個 agents 協同合作完成任務,例如著名的 ChatDev 就是此一模式,該專案模擬 1 個虛擬軟體公司,公司裡有多個 agents 扮演程式設計師、測試工程師、產品經理等角色,讓這些 agents 彼此合作完成任務。
p.s. 此 4 種設計模式可以混用
總而言之,Agentic workflow 頗有分治法(divide and conquer)的味道。不過值得注意的是,目前 agentic workflow 仍存在一定程度的不穩定性,這也是生成式 AI 普遍面臨的問題之一。
AutoGen 簡介
AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks.
微軟的 AutoGen 是 1 個開源框架(framework),專為開發生成式 AI 應用而設計,它的特點是能夠簡化 multi-agent collaboration 的實作,讓開發者能夠更快、更有效率地打造 AI 應用。如果你想打造 agentic workflow 的話,使用 AutoGen 會相當合適。
p.s. 與 AutoGen 相似的解決方案為 LangGraph
下圖表明了 AutoGent 的用途,使用 AutoGen 不僅可以做到 multi-agent 對話(conversation)以及各種常見的對話模式(conversation patterns),甚至可以客製對話模式:
AutoGen 必備觀念
使用 AutoGen 之前,最好認識一下 1 個重要的概念 — Agents 。
Agents & ConversableAgent
AutoGen 將 agent 定義為能夠傳送與接收訊息的實體(entity)。
Agent 可以是:
- 語言模型,例如 GPT-4。
- Code executors,例如 IPython kernel,接收程式碼、傳送執行結果。
- 人類,由人參與任務的一部分。
- 其他元件。
這些概念被封裝在 ConversableAgent 類別之中,所以使用 AutoGen 時,多半是與 ConversableAgent 打交道。
AutoGen 與 Ollama
接下來,我們使用 Ollama 結合 AutoGen 學習如何實作 multi-agent 完成工作。使用 Ollama 可以使用 Llama 等開源語言模型,不需要額外付費。
安裝完 Ollama 之後,請開啟終端機(terminal)輸入以下指令安裝 Llama3.1 8B 語言模型:
$ ollama run llama3.1:8b
如此一來,當我們在使用 ConversableAgent 類別時,如果需要使用語言模型,只要將參數 llm_config
改為以下設定即可:
llm_config = {
'config_list': [
{
'model': 'ollama3.1',
'base_url': 'http://localhost:11434/v1',
'api_key': 'ollama', # 任意值皆可
}
],
}
下列為範例程式碼:
example.py
from autogen import ConversableAgent
llm_config = {
'config_list': [
{
'model': 'llama3.1:8b',
'base_url': 'http://localhost:11434/v1',
'api_key': 'ollama', # 任意值皆可,
'price': [0, 0], # AutoGen 統計呼叫 API 成本的功能,更改設定為 0
}
],
}
agent = ConversableAgent('Jack', llm_config=llm_config)
reply = agent.initiate_chat(agent, message='Tell me a joke.', max_turns=1)
print(reply)
上述程式碼執行結果如下,可以看到我們成功使用 llama3.1 8B 讓 agent 說了個笑話:
$ python example.py
Jack (to Jack):
Tell me a joke.
-----------------
>>>>>>>> USING AUTO REPLY...
Jack (to Jack):
Here's one:
What do you call a fake noodle?
(Wait for it...)
An impasta!
Hope that made you smile! Do you want to hear another one?
-----------------
上述結果中的 Jack (to Jack)
是因為 agent
對自己發出訊息(自問自答),也就是 agent.initiate_chat(agent, message='Tell me a joke.', max_turns=1)
的第 1 個參數 agent
,如果換成其他不同名字的 agent 就等於發起不同 agent 的對話。
1 個簡單的 Two-Agent Chat 範例 — 猜數字
前述範例屬於 agent 自問自答的範例,接著我們做一個 two-agent 猜數字的範例。
猜數字的範例中,我們設計 2 個角色與它們的職責:
- Jack,負責告訴 Cathy 是否猜中數字。如果沒猜中,則給予 Cathy 提示;猜中的話,就停止執行。
- Cathy,負責猜數字
以下是 two-agent 猜數字的程式碼,從程式碼也可以看到 2 個 agents 各自的 prompt:
guess_number_example.py
from autogen import ConversableAgent
llm_config = {
'config_list': [
{
'model': 'llama3.1:8b',
'base_url': 'http://localhost:11434/v1',
'api_key': 'ollama', # 任意值皆可
'price': [0, 0],
}
],
}
jack = ConversableAgent(
'Jack',
llm_config=llm_config,
system_message='''
You are playing a game called "Guess My Number."
6 is the number, and I will try to guess it.
If I guess a number that is higher than 6, respond with "Too high."
If I guess a number that is lower than 6, respond with "Too low."
No additional description is needed.
''',
is_termination_msg=lambda msg: '6' in msg['content'],
)
cathy = ConversableAgent(
'Cathy',
llm_config=llm_config,
system_message='''
I have a number in mind, and you will try to guess it.
If I respond with "Too high", you should guess a lower number.
If I respond with "Too low", you should guess a higher number.
Please only answer the number which you guess.
No additional description is needed.
''',
)
result = jack.initiate_chat(cathy, message='I have a number between 1 and 10. Guess it!')
上述程式碼中,最重要的部分為:
is_termination_msg=lambda msg: '6' in msg['content'],
參數 is_termination_msg
是 1 個 callable,該 callable 會在收到訊息時執行,作為判斷是否要結束對話的依據,傳入值為 1 個含有 content
, role
, name
等 key 的 dictionary,分別代表收到的文字訊息、傳送文字訊息的角色、傳送文字訊息的 agent 名字。
之所以需要設定參數 is_termination_msg
是因為 2 個 agents 需要知道在何時、何種情況下結束對話,否則 2 個 agents 就會進行無止盡的對話,以猜數字來說,只要 Jack 收到數字 6 的訊息,就可以結束對話。
以下是其執行結果,可以看到 Jack 向 Cathy 發出對話訊息,接著 2 個 agents 進入猜數字的對話,直到 Cathy 猜中數字 6 為止:
$ python guess_number_example.py
Jack (to Cathy):
I have a number between 1 and 10. Guess it!
------
>>>>>>>> USING AUTO REPLY...
Cathy (to Jack):
7
------
>>>>>>>> USING AUTO REPLY...
Jack (to Cathy):
Too high.
------
>>>>>>>> USING AUTO REPLY...
Cathy (to Jack):
5
------
>>>>>>>> USING AUTO REPLY...
Jack (to Cathy):
Too low.
------
>>>>>>>> USING AUTO REPLY...
Cathy (to Jack):
8
------
>>>>>>>> USING AUTO REPLY...
Jack (to Cathy):
Too high.
------
>>>>>>>> USING AUTO REPLY...
Cathy (to Jack):
6
------
這種 2 個 agents 之間的對話模式,也是 AutoGen 官方文件所提到的 Two-agent chat,也是最簡單最容易理解的對話模式,適合 question answering 的情境,例如一方扮演教師,另一方則扮演學生,而我們可以將教師 agent 與學生 agent 的互動內容進行摘要後產出內容:
下列程式碼是模擬學生 agent 詢問教師 agent 三角不等式的範例,有興趣的人可以玩看看:
from autogen import ConversableAgent
llm_config = {
'config_list': [
{
'model': 'llama3.1:8b',
'base_url': 'http://localhost:11434/v1',
'api_key': 'ollama', # 任意值皆可
'price': [0, 0],
}
],
}
student_agent = ConversableAgent(
name="Student_Agent",
system_message="You are a student willing to learn.",
llm_config=llm_config,
)
teacher_agent = ConversableAgent(
name="Teacher_Agent",
system_message="You are a math teacher.",
llm_config=llm_config,
)
chat_result = student_agent.initiate_chat(
teacher_agent,
message="What is triangle inequality?",
summary_method="reflection_with_llm",
max_turns=2,
)
print(chat_result.summary)
上述程式碼值得注意的是學生發起對話的 summary_method
參數與 max_turns
參數。
summary_method
參數,設定摘要整段對話的方法,預設為 last_msg
直接用最後的對話訊息當作摘要。如果設定為 reflection_with_llm
則是交由語言模型進行摘要。摘要的內容可以透過對話結果的 summary
屬性取得,如上述範例的 chat_result.summary
。
max_turns
參數,設定幾回合(一問一答算一回合)對話就結束。由於 question answering 的應用情境不像猜數字那樣有明確的結束條件,因此 AutoGen 提供另一種對話結束的設定條件,也就是 max_turns
。
猜數字遊戲後記
如果把數字範圍改成 1 到 100,將會發現 Llama 3.1 8B 對於數字的處理能力不佳,很容易判斷錯誤,例如 75 低於 55、62 大於 72 等情況,屢見不鮮。
綜觀語言模型對於數字的處理能力,目前還是 OpenAI 的語言模型最為出色。(這可能也是為什麼 AutoGen 官方範例是使用 OpenAI 的語言模型)
1 個簡單的 Group Chat 範例 — 英文信校正
談完 two-agent 對話模式,接著談談 group chat 對話模式。
Group chat 對話模式可以超過 2 個 agents 協同合作完成任務,會有 1 個 agent 扮演 manager 的角色,或稱 GroupChatManager
,由 manager 負責指派底下某個 agent 工作,該 agent 會回報結果給 manager,再由 manager 向底下其他 agents 發送訊息以更新任務進度,經過此階段所有 agents 都會知道目前工作的情況,接著再由 manager 指派下一位 agent 接續工作,如此周而復始,直到完成對話,如下圖所示:
目前 manager 指派工作的策略有 5 種:
auto
,預設策略,交由語言模型選擇。random
,隨機選擇。manual
,需要人類手動輸入指派。round_robin
,所有 agents 輪流。- 客製化的 callable,必須接受 2 個參數,第 1 個參數代表上一個發言的 agent,第 2 個參數代表 agent 所在的 chat group。
接著,我們實際做 1 個英文信校正的 group chat 吧!
在這個範例中,除了 manager 之外,有 3 個 agents:
- Grammar checker,負責找出文法錯誤的 agent
- Writer,負責重新編輯英文信的 agent
- Reviewer,負責指出哪些英文用詞不夠淺白的 agent
Group chat 的流程理想上會由 grammar checker
先檢查一遍英文信,由 writer
重新編輯後,再由 reviewer
審核一遍,最後由 writer
重新編輯最後一次。之所以使用「理想上」這個詞,是因為預設 group chat manager 指派工作的策略為 auto
,該模式相當仰賴語言模型的能力,本文所使用的 Llama 3.1 8B 有不穩定的現象,無法全然按照我們預想的次序進行,這也是使用語言模型實作 multi-agent collaboration 很容易遇到的問題。
而為了解決語言模型在指派工作時不穩定的問題,以下 group chat 範例程式使用客製化 callable 確保指派 agent 執行工作的流程一致:
group_chat_example.py
from autogen import ConversableAgent, GroupChat, GroupChatManager
llm_config = {
'config_list': [
{
'model': 'llama3.1:8b',
'base_url': 'http://localhost:11434/v1',
'api_key': 'ollama', # 任意值皆可
'price': [0, 0],
}
],
}
grammar_checker = ConversableAgent(
'Grammar Checker',
llm_config=llm_config,
system_message='''
You are a Grammar Checker.
Your task is to carefully review the provided English text and identify any grammatical errors.
For each error, explain what the mistake is and suggest the correct form.
''',
)
writer = ConversableAgent(
'Writer',
llm_config=llm_config,
system_message='''
You are a Writer.
Your task is to take the provided English text and re-edit it to improve its clarity, flow, and readability while maintaining the original meaning.
Ensure that the tone remains appropriate for the context.
''',
)
reviewer = ConversableAgent(
'Reviewer',
llm_config=llm_config,
system_message='''
You are a Reviewer.
Your task is to analyze the given English text and point out any words or phrases that are not simple or clear enough.
Suggest simpler alternatives that would be easier for a broader audience to understand.
''',
)
def select_speaker(last_speaker, group_chat):
if last_speaker is grammar_checker:
return writer
if last_speaker is reviewer:
return writer
return reviewer
group_chat = GroupChat(
agents=[grammar_checker, writer, reviewer],
speaker_selection_method=select_speaker,
messages=[],
max_round=4,
)
group_chat_manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config,
)
group_chat_manager.initiate_chat(
grammar_checker,
message='''
Please help me rewrite the email enclosed within the triple quotes below:
"""
Hi,
This is Amo. Nice to e-meet you. Hope you are doing well.
I am writing this email to notify you that I might have a great AI project idea you would like.
If you are intresting, please let me know.
Best,
Amo
"""
'''
)
重點說明上述程式碼。
首先我們定義了 3 個 agents,分別是 grammer_checker
, writer
, reviewer
。
接著,我們定義一個指派 agents 的函式 select_speaker
,該函式能讓 group chat manager 指派下一個 agent,只要上一個 agent 是 grammar_checker
或 reviewer
,則下一個接手工作的 agent 必須是 writer
:
def select_speaker(last_speaker, group_chat):
if last_speaker is grammar_checker:
return writer
if last_speaker is reviewer:
return writer
return reviewer
再來,我們需要新增 1 個 group chat,並加入我們所定義的 3 個 agents,以及指派 agent 的策略,並且設定 max_round=4
也就是按照順序 grammar_checker
> writer
> reivewer
> writer
會各自與 group chat manager 對答一次,總共 4 次:
group_chat = GroupChat(
agents=[grammar_checker, writer, reviewer],
speaker_selection_method=select_speaker,
messages=[],
max_round=4,
)
最後由 group chat manager 對 grammar_checker
發起對話,開始校正英文信的任務,以下是其執行結果,從結果中可以看到 grammar_checker
, writer
, reviewer
之間的互動,包含 grammar_checker
指出文法錯誤的部分,reviewer
給出更簡潔的寫法的建議,以及 writer
最後修改出 1 封簡潔易懂的 email:
chat_manager (to Grammar Checker):
Please help me rewrite the email enclosed within the triple quotes below:
"""
Hi,
This is Amo. Nice to e-meet you. Hope you are doing well.
I am writing this email to notify you that I might have a great AI project idea you would like.
If you are intresting, please let me know.
Best,
Amo
"""
---------------
>>>>>>>> USING AUTO REPLY...
Grammar Checker (to chat_manager):
Here's a rewritten version of the email with grammatical errors corrected:
"""
Hi,
This is Amo. Nice to meet you. Hope you're doing well.
I am writing this email to inform you that I might have a great AI project idea that you'd like.
If you are interesting, please let me know.
Best,
Amo
"""
Here are the corrections made:
1. "e-meet" -> "meet": The correct spelling is just "meet", as it's an action of making someone else's acquaintance.
2. Missing article ("the") before "great AI project idea": We should add "a" or "an" depending on context, in this case, we'll stick with "a" as the first word is "I might".
3. Comma splice: Changed "If you are intresting" to "If you are interesting,". Spelling correction from "intresting" (should be "interesting") and also added a comma at the end of this sentence.
4. Punctuation mistake: Added a period after "well".
---------------
Next speaker: Writer
>>>>>>>> USING AUTO REPLY...
Writer (to chat_manager):
You've already made some great corrections, but I'll tidy up the revised email to ensure optimal clarity and flow:
"""
Hi,
This is Amo. Nice to meet you. Hope you're doing well.
I am writing this email to inform you that I might have a great AI project idea that you'd like.
If you are interested, please let me know.
Best,
Amo
"""
Here's where we made the changes:
* Swapped "e-meet" for just "meet", as it's an action of making someone else's acquaintance.
* Added the missing article ("the") before "great AI project idea" with "a", since it comes after an introductory phrase ("I might").
* Changed the original "if you are intresting," to "If you are interested," for correct spelling and punctuation.
* Added a period after "well" to complete the sentence.
Thanks for sharing your revised email!
---------------
Next speaker: Reviewer
>>>>>>>> USING AUTO REPLY...
Reviewer (to chat_manager):
Great effort in rewriting the email with corrections!
However, I have some suggestions to further improve clarity and simplicity:
1. The phrase "I might have a great AI project idea that you'd like" could be rephrased as "I think I've come up with an interesting AI project idea." This change makes it clearer what Amo is saying.
2. Instead of using the formal-sounding term "acquaintance," you could simply say "nice to meet you!" to make the greeting more conversational.
Here's the revised text:
"""
Hi,
This is Amo. Nice to meet you! Hope you're doing well.
I think I've come up with an interesting AI project idea. If you are interested, please let me know.
Best,
Amo
"""
Additional suggestions:
* Consider using a more active and descriptive opening sentence instead of "Hope you're doing well." This could be something like "How's your day going?"
* The final paragraph is mostly about describing the corrections made to the original text. You might want to omit this part and focus on explaining why certain changes were made (e.g., "I've rewritten the email for clarity").
Let me know if you'd like me to review any other aspects of the text!
---------------
Next speaker: Writer
>>>>>>>> USING AUTO REPLY...
Writer (to chat_manager):
Your suggestions are golden!
You're right, rephrasing the second sentence as "I think I've come up with an interesting AI project idea" makes it clearer and more concise. And instead of using a formal tone, keeping it simple with just "nice to meet you!" adds a touch of friendliness.
The revised text flows nicely:
"""
Hi,
This is Amo. Nice to meet you! How's your day going?
I think I've come up with an interesting AI project idea. If you are interested, please let me know.
Best,
Amo
"""
Your additional suggestions are spot on:
* Using a more active opening sentence like "How's your day going?" breaks the ice and makes the email feel more approachable.
* Omitting the paragraph about describing corrections makes the rewritten text focus on clarity and readability, which is what matters most!
Thank you so much for your input!
---------------
以上就是關於使用 AutoGen group chat 對話模式的範例,group chat 相當適合需要多個 agents 協同合作的情境。除了本文所提供的範例,AutoGen 也提供多個範例讓開發者參考學習, 例如 Group Chat 範例使用內建的 UserProxyAgent 類別執行語言模型產生的 Python 程式碼,以透過 API 取得資料,並進行摘要。
總結
礙於篇幅長度,本文僅就 AutoGent 框架與 two-agent chat 與 group chat 兩種對話模式進行介紹,以讓讀者了解 agentic workflow 的運作。
我們將在下一篇文章介紹更多關於 AutoGen 的使用方法。
以上!
Enjoy!
References
What’s next for AI agentic workflows ft. Andrew Ng of AI Fund
OpenAI compatibility · Ollama Blog