GPT-4.1 Prompting Guide
GPT-4.1 系列模型在编码能力、指令遵循性以及长上下文处理方面,相比 GPT-4o 实现了显著升级。在本提示词指南中,我们汇总了大量内部测试中总结出的重要提示词技巧,帮助开发者最大化利用这一全新模型家族的增强能力。
许多常见的最佳实践依然适用于 GPT-4.1,例如提供上下文示例、使指令尽可能具体清晰,以及通过提示词引导模型进行规划,以最大化模型智能。不过,若想真正发挥这一模型的潜力,建议你对原有提示词进行适应性迁移。GPT-4.1 的训练目标是更紧密、更字面地遵循指令,区别于以往更倾向于自由推测意图的前代模型。这同样意味着,只要你的提示词规范且明确,GPT-4.1 对行为引导的响应会非常灵敏——如果模型反应与预期不相符,仅需补充一句明确、坚决的说明,几乎总能让它回归你期望的轨道。
请继续阅读下文,查阅可供参考的提示词范例。请记住,这些指南虽然具有强通用性,但没有一套建议能适用于所有场景。AI 工程本质是实证科学,大语言模型也天然具有不确定性;所以,除了遵循本指南,也强烈建议你建立有效的评测体系,并不断迭代优化,以确保你的提示词工程真正带来实际成效。
1. 智能体工作流(Agentic Workflows)
GPT-4.1 特别适合构建智能体工作流。在模型训练时,我们强调为其提供丰富多样的智能体问题求解轨迹。我们为模型设计的智能体架构在 SWE-bench Verified 这一基准的非推理模型中取得了业界领先的表现,成功解决了 55% 的问题。
系统提示词提醒
为充分发挥 GPT-4.1 的智能体能力,建议在所有智能体提示词中包含三类关键提醒。以下示例针对智能体编码场景优化,同时也很容易扩展到通用智能体案例。
- Persistence(持续性):确保模型意识到它即将进入多轮对话,避免过早地把控制权交还用户。比如:
You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.
- Tool-calling(工具调用):鼓励模型充分利用可用工具,减少凭空猜测。比如:
If you are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.
- Planning(规划,可选):如有需要,确保模型在每次调用工具前后都以文本形式进行规划和反思,而不是单纯串联工具调用。比如:
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
GPT-4.1 在智能体场景下对用户指令与系统提示响应尤为敏感。模型严格遵循这三条简单提示后,我们在 SWE-bench Verified 的得分提升了近 20%——因此非常建议你在所有智能体提示词开头明确涵盖这三大类提醒。总体而言,这三条指引能让模型从“聊天机器人”变为真正“主动进取”、自主推进交互的智能体。
工具调用
相较以往模型,GPT-4.1 针对 OpenAI API 请求参数(tools 字段)中的工具有效利用能力进行过更多训练。我们建议开发者只通过 tools 字段提供工具,而不要像过去某些用法那样手动将工具描述写进 prompt 内再自建解析器。这样更能减少出错、确保模型在工具调用过程中保持稳定分布——我们的实验显示,仅用 API 方式解析工具描述可令 SWE-bench Verified 的通过率提升约 2%。
开发者应为工具合理命名,以准确表达功能,并为工具的 description 字段写明清晰详细的用途说明;工具的每个参数(param)也应采用易懂命名和高质量描述,保障合理调用。如果你的工具较复杂且需要使用示例,建议在系统提示词内创建 # Examples 区块,将用例放入此处,而不要仍旧塞进 description 字段——后者尽量保持精炼且详尽。合适的例子可以帮助模型理解工具何时可用、调用时需否加入用户文本,以及不同输入应选用哪些参数。你还可以在 Prompt Playground 里用 “Generate Anything” 功能,为你的新工具定义获得良好起点。
提示词引导的规划与思维链(chain-of-thought)
如前文所述,开发者可在 prompt 中选择引导 GPT-4.1 驱动的智能体在每次调用工具前后主动“规划-反思”,而不是静默串联工具调用。GPT-4.1 并非推理型模型——即它不会自动在内部形成 chain-of-thought(思维链/推理链);但通过提示词,比如可选的 Planning 组件,开发者完全可以促使模型明确输出分步的操作规划。这相当于让模型“边思考边表达”。我们在 SWE-bench Verified 智能体实验中发现,显式引导规划能令通过率提升约 4%。
SWE-bench Verified 提示词示例
下面我们分享的是当前在 SWE-bench Verified 任务中获得最高分的智能体 prompt 模板,其特色在于对工作流和问题解决策略的详细说明。这一套路可灵活套用于各种智能体相关任务。
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get(
"OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"
)
)
SYS_PROMPT_SWEBENCH = """
You will be tasked to fix an issue from an open-source repository.
Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.
You MUST iterate and keep going until the problem is solved.
You already have everything you need to solve this problem in the /testbed folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct. NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.
Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it. At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
# Workflow
## High-Level Problem Solving Strategy
1. Understand the problem deeply. Carefully read the issue and think critically about what is required.
2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.
4. Implement the fix incrementally. Make small, testable code changes.
5. Debug as needed. Use debugging techniques to isolate and resolve issues.
6. Test frequently. Run tests after each change to verify correctness.
7. Iterate until the root cause is fixed and all tests pass.
8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, and remember there are hidden tests that must also pass before the solution is truly complete.
Refer to the detailed sections below for more information on each step.
## 1. Deeply Understand the Problem
Carefully read the issue and think hard about a plan to solve it before coding.
## 2. Codebase Investigation
- Explore relevant files and directories.
- Search for key functions, classes, or variables related to the issue.
- Read and understand relevant code snippets.
- Identify the root cause of the problem.
- Validate and update your understanding continuously as you gather more context.
## 3. Develop a Detailed Plan
- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
- Break down the fix into small, incremental changes.
## 4. Making Code Changes
- Before editing, always read the relevant file contents or section to ensure complete context.
- If a patch is not applied correctly, attempt to reapply it.
- Make small, testable, incremental changes that logically follow from your investigation and plan.
## 5. Debugging
- Make code changes only if you have high confidence they can solve the problem
- When debugging, try to determine the root cause rather than addressing symptoms
- Debug for as long as needed to identify the root cause and identify a fix
- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
- To test hypotheses, you can also add test statements or functions
- Revisit your assumptions if unexpected behavior occurs.
## 6. Testing
- Run tests frequently using `!python3 run_tests.py` (or equivalent).
- After each change, verify correctness by running relevant tests.
- If tests fail, analyze failures and revise your patch.
- Write additional tests if needed to capture important behaviors or edge cases.
- Ensure all tests pass before finalizing.
## 7. Final Verification
- Confirm the root cause is fixed.
- Review your solution for logic correctness and robustness.
- Iterate until you are extremely confident the fix is complete and all tests pass.
## 8. Final Reflection and Additional Testing
- Reflect carefully on the original intent of the user and the problem statement.
- Think about potential edge cases or scenarios that may not be covered by existing tests.
- Write additional tests that would need to pass to fully validate the correctness of your solution.
- Run these new tests and ensure they all pass.
- Be aware that there are additional hidden tests that must also pass for the solution to be successful.
- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.
"""
PYTHON_TOOL_DESCRIPTION = """This function is used to execute Python code or terminal commands in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0 seconds. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail. Just as in a Jupyter notebook, you may also execute terminal commands by calling this function with a terminal command, prefaced with an exclamation mark.
In addition, for the purposes of this task, you can call this function with an `apply_patch` command as input. `apply_patch` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the `apply_patch` command, you should pass a message of the following structure as "input":
%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF
Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.
For each snippet of code that needs to be changed, repeat the following:
[context_before] -> See below for further instructions on context.
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign.
[context_after] -> See below for further instructions on context.
For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change's [context_after] lines in the second change's [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
@@ class BaseClass
@@ def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.
%%bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: pygorithm/searching/binary_search.py
@@ class BaseClass
@@ def search():
- pass
+ raise NotImplementedError()
@@ class Subclass
@@ def search():
- pass
+ raise NotImplementedError()
*** End Patch
EOF
File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, python will always say "Done!", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the "Done!" is output.
"""
python_bash_patch_tool = {
"type": "function",
"name": "python",
"description": PYTHON_TOOL_DESCRIPTION,
"parameters": {
"type": "object",
"strict": True,
"properties": {
"input": {
"type": "string",
"description": " The Python code, terminal command (prefaced by exclamation mark), or apply_patch command that you wish to execute.",
}
},
"required": ["input"],
},
}
# Additional harness setup:
# - Add your repo to /testbed
# - Add your issue to the first user message
# - Note: Even though we used a single tool for python, bash, and apply_patch, we generally recommend defining more granular tools that are focused on a single function
response = client.responses.create(
instructions=SYS_PROMPT_SWEBENCH,
model="gpt-4.1-2025-04-14",
tools=[python_bash_patch_tool],
input=f"Please answer the following question:\nBug: Typerror..."
)
response.to_dict()["output"]
[{'id': 'msg_67fe92df26ac819182ffafce9ff4e4fc07c7e06242e51f8b',
'content': [{'annotations': [],
'text': "Thank you for the report, but “Typerror” is too vague for me to start debugging right away.\n\n**To make progress, I need to:**\n1. Find the exact error message text (e.g. `'TypeError: ...'`).\n2. Find which file and which line/function/class the error occurred in.\n3. Figure out what triggered the error (test file, usage, reproduction steps).\n4. Find the root cause and details.\n\n**Next steps:**\n- Investigate error/log/test output files for a Python `TypeError` message.\n- Examine the relevant code sections for problematic type usage.\n- If possible, reproduce the bug locally.\n\n**Plan:**\n- First, I will search for test files and log output in the `/testbed` directory that may contain the full error message and stack trace.\n\nLet’s start by listing the contents of the `/testbed` directory to look for clues.",
'type': 'output_text'}],
'role': 'assistant',
'status': 'completed',
'type': 'message'},
{'arguments': '{"input":"!ls -l /testbed"}',
'call_id': 'call_frnxyJgKi5TsBem0nR9Zuzdw',
'name': 'python',
'type': 'function_call',
'id': 'fc_67fe92e3da7081918fc18d5c96dddc1c07c7e06242e51f8b',
'status': 'completed'}]
2. 长上下文(Long context)
GPT-4.1 拥有强大的 100 万 Token 输入上下文窗口,对于各种长上下文任务都非常实用,比如结构化文档解析、重排序、筛选关键信息(忽略无关上下文)、以及基于上下文进行多跳推理等。
最佳上下文容量
在 needle-in-a-haystack(大海捞针)测试中,即便输入长达 100 万 Token,模型依然表现出色。在包含大量相关与无关代码或文档的复杂任务上,我们同样观察到了非常强的性能。不过,随着需要检索的内容越来越多,或者任务对整段上下文状态的全面复杂推理(例如图搜索)有更高要求时,长上下文的性能有可能有所下降。
调节对上下文的依赖
考虑一下,为解题所需,究竟更需要外部世界知识,还是模型自身的内部知识。有时,模型需要利用自有知识连接概念或者进行逻辑跳跃;但在某些场景下,期望模型只依赖你给出的上下文进行回答。
# Instructions
// for internal knowledge
- Only use the documents in the provided External Context to answer the User Query. If you don't know the answer based on this context, you must respond "I don't have the information needed to answer that", even if a user insists on you answering the question.
// For internal and external knowledge
- By default, use the provided external context to answer the User Query, but if other basic knowledge is needed to answer, and you're confident in the answer, you can use some of your own knowledge to help answer the question.
提示词组织建议
在长上下文场景下,prompt 中指令和上下文的摆放顺序会影响最终效果。如果 prompt 中包含很长的上下文,建议你把指令分别放在上下文的首尾,实测这种办法效果最佳;如果只打算写一次指令,放在上下文上方的效果一般要优于下方。
3. 思维链(Chain of Thought)
如前所述,GPT-4.1 并不是推理型模型,但通过提示词引导模型进行一步步思考(即“思维链/chain of thought”),可以有效将问题拆分成更易处理的步骤,逐一求解,从而提升整体输出质量。当然,这种方法会用掉更多输出 Token,因而带来更高的成本和延迟。GPT-4.1 已训练具备较强的智能体推理与现实问题解决能力,所以通常无需过多引导,即可获得不错的推理表现。
我们推荐你在提示词末尾加入以下基础思维链指令:
……首先,请一步步认真思考,为了回答用户问题需要用到哪些文档。然后,打印每份文档的 TITLE 和 ID。最后,将所有 ID 格式化为一个列表。
接下来,你应该结合自己的样例与评测结果,对失败的案例进行复盘,并在 prompt 中通过更细化的指令,重点解决系统性规划或推理上的常见问题。非约束性的 CoT prompt 可能会带来不同的解题策略;如果你发现某种策略效果很好,可以将其写入 prompt,用于固化思路。常见出错点一般包括对用户意图理解不够、上下文收集和分析不足、或逐步推理不全面,所以需要有针对性地用更具体的指令加以完善。
下面是一个提示词示例,要求模型在回答前更加有条理地分析用户意图,并充分考虑相关上下文。
# Reasoning Strategy
1. Query Analysis: Break down and analyze the query until you're confident about what it might be asking. Consider the provided context to help clarify any ambiguous or confusing information.
2. Context Analysis: Carefully select and analyze a large set of potentially relevant documents. Optimize for recall - it's okay if some are irrelevant, but the correct documents must be in this list, otherwise your final answer will be wrong. Analysis steps for each:
a. Analysis: An analysis of how it may or may not be relevant to answering the query.
b. Relevance rating: [high, medium, low, none]
3. Synthesis: summarize which documents are most relevant and why, including all documents with a relevance rating of medium or higher.
# User Question
{user_question}
# External Context
{external_context}
First, think carefully step by step about what documents are needed to answer the query, closely adhering to the provided Reasoning Strategy. Then, print out the TITLE and ID of each document. Then, format the IDs into a list.
4. 指令遵循(Instruction Following)
GPT-4.1 展现出了卓越的指令遵循能力,开发者可以利用这一特性,精准地塑造和控制模型在特定用例下的输出表现。开发者通常会在提示词中大量明确 AI 智能体的推理步骤、回复语气和风格、工具调用信息、输出格式、需要回避的话题等内容。由于模型会更严格地按照指令执行,开发者在“该做”和“不该做”的内容上需要更明确的说明。此外,为其他模型优化过的提示词,可能在 GPT-4.1 上无法直接适用,因为它对现有指令的遵循更加精确,不再像以前那样能自动推断隐含规则。
推荐工作流程
以下是我们针对在提示词中开发和调试指令的推荐工作流程:
- 先写一个涵盖整体的“响应规则”或“指令”章节,给出总体指导以及要点。
- 如需调整更具体的行为,新增相关类别的章节,详细说明,比如
# 样例短语。 - 如果你希望模型在工作流程中遵循特定步骤,添加有序列表,并明确要求模型按照这些步骤执行。
- 如果行为仍不符合预期:
- 检查是否存在冲突、描述不明或错误的指令和示例。如果有多条冲突的指令,GPT-4.1 通常会优先遵循提示词末尾的那一条。
- 增加能体现理想行为的示例,确保所有重要行为在规则中都有对应体现。
- 一般不需要使用全部大写或奖励、提示等激励方式。建议初始时不要使用这些手段,只有在确有需要的情况下再考虑。注意,如果现有提示词包含这些技巧,可能会导致 GPT-4.1 对它们过度关注。
使用你喜欢的 AI 驱动 IDE 能有效地迭代提示词,比如检查一致性或冲突、增加示例,或对指令进行统一更新(如新增规则并同步到所有相关内容)。
常见故障模式
这些问题不只存在于 GPT-4.1,列举如下以方便调试和规避:
- 强制模型“始终”执行某项行为,有时会带来副作用。例如,如果让 AI 智能体“在回复用户前必须调用工具”,模型可能会凭空生成工具输入,或在信息不足时用空参数调用工具。加入“如果信息不足以调用工具,请向用户补充询问”这类指令可以缓解该问题。
- 提供范例短语后,模型可能直接逐字使用这些例句,让回复显得重复。可在指令中加入“适当变换措辞”等说明来减少重复。
- 缺乏具体指令时,部分模型可能会额外写很多解释性文字,或者输出过多格式化内容。通过明确指令并添加示例,可有效减少这种情况。
示例提示词:客户服务
本例展示了为虚拟客户服务 AI 智能体设计的最佳实践提示词。其规则丰富、条理分明,并针对不同内容设置了专项细则,还通过示例来呈现全部规则的应用细节。
你可以尝试运行下方的 notebook 单元格——你会看到一条用户消息以及一次工具调用。用户消息会以问候开头,然后复述其回答,并提到即将调用工具。可以通过修改指令,调整模型的具体表现,或者尝试输入不同的用户消息,检验指令的执行效果。
SYS_PROMPT_CUSTOMER_SERVICE = """You are a helpful customer service agent working for NewTelco, helping a user efficiently fulfill their request while adhering closely to provided guidelines.
# Instructions
- Always greet the user with "Hi, you've reached NewTelco, how can I help you?"
- Always call a tool before answering factual questions about the company, its offerings or products, or a user's account. Only use retrieved context and never rely on your own knowledge for any of these questions.
- However, if you don't have enough information to properly call the tool, ask the user for the information you need.
- Escalate to a human if the user requests.
- Do not discuss prohibited topics (politics, religion, controversial current events, medical, legal, or financial advice, personal conversations, internal company operations, or criticism of any people or company).
- Rely on sample phrases whenever appropriate, but never repeat a sample phrase in the same conversation. Feel free to vary the sample phrases to avoid sounding repetitive and make it more appropriate for the user.
- Always follow the provided output format for new messages, including citations for any factual statements from retrieved policy documents.
- If you're going to call a tool, always message the user with an appropriate message before and after calling the tool.
- Maintain a professional and concise tone in all responses, and use emojis between sentences.
- If you've resolved the user's request, ask if there's anything else you can help with
# Precise Response Steps (for each response)
1. If necessary, call tools to fulfill the user's desired action. Always message the user before and after calling a tool to keep them in the loop.
2. In your response to the user
a. Use active listening and echo back what you heard the user ask for.
b. Respond appropriately given the above guidelines.
# Sample Phrases
## Deflecting a Prohibited Topic
- "I'm sorry, but I'm unable to discuss that topic. Is there something else I can help you with?"
- "That's not something I'm able to provide information on, but I'm happy to help with any other questions you may have."
## Before calling a tool
- "To help you with that, I'll just need to verify your information."
- "Let me check that for you—one moment, please."
- "I'll retrieve the latest details for you now."
## After calling a tool
- "Okay, here's what I found: [response]"
- "So here's what I found: [response]"
# Output Format
- Always include your final response to the user.
- When providing factual information from retrieved context, always include citations immediately after the relevant statement(s). Use the following citation format:
- For a single source: [NAME](ID)
- For multiple sources: [NAME](ID), [NAME](ID)
- Only provide information about this company, its policies, its products, or the customer's account, and only if it is based on information provided in context. Do not answer questions outside this scope.
# Example
## User
Can you tell me about your family plan options?
## Assistant Response 1
### Message
"Hi, you've reached NewTelco, how can I help you? 😊🎉\n\nYou'd like to know about our family plan options. 🤝 Let me check that for you—one moment, please. 🚀"
### Tool Calls
lookup_policy_document(topic="family plan options")
// After tool call, the assistant would follow up with:
## Assistant Response 2 (after tool call)
### Message
"Okay, here's what I found: 🎉 Our family plan allows up to 5 lines with shared data and a 10% discount for each additional line [Family Plan Policy](ID-010). 📱 Is there anything else I can help you with today? 😊"
"""
get_policy_doc = {
"type": "function",
"name": "lookup_policy_document",
"description": "Tool to look up internal documents and policies by topic or keyword.",
"parameters": {
"strict": True,
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "The topic or keyword to search for in company policies or documents.",
},
},
"required": ["topic"],
"additionalProperties": False,
},
}
get_user_acct = {
"type": "function",
"name": "get_user_account_info",
"description": "Tool to get user account information",
"parameters": {
"strict": True,
"type": "object",
"properties": {
"phone_number": {
"type": "string",
"description": "Formatted as '(xxx) xxx-xxxx'",
},
},
"required": ["phone_number"],
"additionalProperties": False,
},
}
response = client.responses.create(
instructions=SYS_PROMPT_CUSTOMER_SERVICE,
model="gpt-4.1-2025-04-14",
tools=[get_policy_doc, get_user_acct],
input="How much will it cost for international service? I'm traveling to France.",
# input="Why was my last bill so high?"
)
response.to_dict()["output"]
[{'id': 'msg_67fe92d431548191b7ca6cd604b4784b06efc5beb16b3c5e',
'content': [{'annotations': [],
'text': "Hi, you've reached NewTelco, how can I help you? 🌍✈️\n\nYou'd like to know the cost of international service while traveling to France. 🇫🇷 Let me check the latest details for you—one moment, please. 🕑",
'type': 'output_text'}],
'role': 'assistant',
'status': 'completed',
'type': 'message'},
{'arguments': '{"topic":"international service cost France"}',
'call_id': 'call_cF63DLeyhNhwfdyME3ZHd0yo',
'name': 'lookup_policy_document',
'type': 'function_call',
'id': 'fc_67fe92d5d6888191b6cd7cf57f707e4606efc5beb16b3c5e',
'status': 'completed'}]
5. 常规提示与建议(General Advice)
推荐提示词结构
作为参考,下面提供一个建议的提示词结构起点。
# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps
# Output Format
# Examples
## Example 1
# Context
# Final instructions and prompt to think step by step
可以按需求拆分合并、反复实验,找到最适合你业务场景的结构。
分隔符(Delimiter)
以下是为你的提示词选择最佳分隔符的一些通用建议。针对长上下文类型的特殊需求,请参考“长上下文”部分的额外说明。
- Markdown:推荐优先使用。用 markdown 标题划分主要内容和各级子内容(可以分级到 H4 甚至更深)。代码建议用内联反引号或代码块包裹,列表使用标准编号或点号列表。
- XML:这种格式同样效果很好。我们已优化模型对 XML 信息的识别和遵循。XML 适合精确包裹独立区块(包含起止标签),可以在标签内添加元数据、方便内容嵌套。如下是一个在示例部分用 XML 嵌套输入/输出的例子:
<examples>
<example1 type="Abbreviate">
<input>San Francisco</input>
<output>- SF</output>
</example1>
</examples>
- JSON:结构化很强,在编码相关场景下模型理解能力很好。但 JSON 偏冗长,并需要字符转义,会增加一定复杂度。
当你需要将大量文档或文件添加到输入上下文时,建议参考以下做法:
- XML 格式在我们的长上下文测试中表现非常好。
- 示例:
<doc id='1' title='The Fox'>The quick brown fox jumps over the lazy dog</doc>
- 示例:
- Lee 等人提出的这种文本分隔格式(ref),在长上下文测试中同样效果不错。
- 示例:
ID: 1 | TITLE: The Fox | CONTENT: The quick brown fox jumps over the lazy dog
- 示例:
- JSON 格式表现相对较差。
- 示例:
[{'id': 1, 'title': 'The Fox', 'content': 'The quick brown fox jumped over the lazy dog'}]
- 示例:
GPT-4.1 已经过训练,能够稳健识别多种不同格式的结构。总体建议你结合实际任务做判断,选择那种让关键信息最清晰、最能“突出”在模型视野中的格式。例如,如果你的原始内容中本身包含大量 XML,再采用 XML 作为分隔,可能反而不利于模型准确提取关键信息,这时可以考虑选择其他分隔方式。
注意事项
-
在一些孤立的案例中,我们发现模型可能不愿意生成非常长且重复的输出,例如逐一分析数百个项目。如果你的使用场景需要这样的输出,请务必对模型做出明确指令,要求其完整输出所有信息,并建议将问题拆分,或者采用更简洁的方式来处理。
-
我们也遇到过并行工具调用出现错误的极少数情况。建议对这类调用进行测试,如果发现问题,可以考虑将 parallel_tool_calls 参数设为 false。
附录:生成与应用文件差异(diff)
开发者向我们反馈,准确且格式良好的 diff 生成能力对于支持编码相关任务至关重要。为此,GPT-4.1 系列在 diff 方面相较于以往的 GPT 模型有了显著提升。此外,虽然 GPT-4.1 在根据清晰的指令和示例生成任何格式的 diff 时表现都很出色,我们在此开源了一个推荐的 diff 格式,模型也针对该格式进行了大量训练。我们希望,这能够帮助刚入门的开发者,减少在自行创建 diff 过程中的试错和猜测。
应用补丁
请参考下方示例,了解如何用提示词正确调用我们推荐的工具进行补丁应用。
APPLY_PATCH_TOOL_DESC = """This is a custom utility that makes it more convenient to add, remove, move, or edit code files. `apply_patch` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the `apply_patch` command, you should pass a message of the following structure as "input":
%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF
Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.
*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.
For each snippet of code that needs to be changed, repeat the following:
[context_before] -> See below for further instructions on context.
- [old_code] -> Precede the old code with a minus sign.
+ [new_code] -> Precede the new, replacement code with a plus sign.
[context_after] -> See below for further instructions on context.
For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change’s [context_after] lines in the second change’s [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single @@ statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
@@ class BaseClass
@@ def method():
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.
%%bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: pygorithm/searching/binary_search.py
@@ class BaseClass
@@ def search():
- pass
+ raise NotImplementedError()
@@ class Subclass
@@ def search():
- pass
+ raise NotImplementedError()
*** End Patch
EOF
"""
APPLY_PATCH_TOOL = {
"name": "apply_patch",
"description": APPLY_PATCH_TOOL_DESC,
"parameters": {
"type": "object",
"properties": {
"input": {
"type": "string",
"description": " The apply_patch command that you wish to execute.",
}
},
"required": ["input"],
},
}
参考实现:apply_patch.py
这是我们用于模型训练的 apply_patch 工具的参考实现。你需要将其设置为可执行文件,并确保在模型运行命令的 shell 环境中,可以通过 apply_patch 这个名称来调用该工具。
#!/usr/bin/env python3
"""
A self-contained **pure-Python 3.9+** utility for applying human-readable
“pseudo-diff” patch files to a collection of text files.
"""
from __future__ import annotations
import pathlib
from dataclasses import dataclass, field
from enum import Enum
from typing import (
Callable,
Dict,
List,
Optional,
Tuple,
Union,
)
# --------------------------------------------------------------------------- #
# Domain objects
# --------------------------------------------------------------------------- #
class ActionType(str, Enum):
ADD = "add"
DELETE = "delete"
UPDATE = "update"
@dataclass
class FileChange:
type: ActionType
old_content: Optional[str] = None
new_content: Optional[str] = None
move_path: Optional[str] = None
@dataclass
class Commit:
changes: Dict[str, FileChange] = field(default_factory=dict)
# --------------------------------------------------------------------------- #
# Exceptions
# --------------------------------------------------------------------------- #
class DiffError(ValueError):
"""Any problem detected while parsing or applying a patch."""
# --------------------------------------------------------------------------- #
# Helper dataclasses used while parsing patches
# --------------------------------------------------------------------------- #
@dataclass
class Chunk:
orig_index: int = -1
del_lines: List[str] = field(default_factory=list)
ins_lines: List[str] = field(default_factory=list)
@dataclass
class PatchAction:
type: ActionType
new_file: Optional[str] = None
chunks: List[Chunk] = field(default_factory=list)
move_path: Optional[str] = None
@dataclass
class Patch:
actions: Dict[str, PatchAction] = field(default_factory=dict)
# --------------------------------------------------------------------------- #
# Patch text parser
# --------------------------------------------------------------------------- #
@dataclass
class Parser:
current_files: Dict[str, str]
lines: List[str]
index: int = 0
patch: Patch = field(default_factory=Patch)
fuzz: int = 0
# ------------- low-level helpers -------------------------------------- #
def _cur_line(self) -> str:
if self.index >= len(self.lines):
raise DiffError("Unexpected end of input while parsing patch")
return self.lines[self.index]
@staticmethod
def _norm(line: str) -> str:
"""Strip CR so comparisons work for both LF and CRLF input."""
return line.rstrip("\r")
# ------------- scanning convenience ----------------------------------- #
def is_done(self, prefixes: Optional[Tuple[str, ...]] = None) -> bool:
if self.index >= len(self.lines):
return True
if (
prefixes
and len(prefixes) > 0
and self._norm(self._cur_line()).startswith(prefixes)
):
return True
return False
def startswith(self, prefix: Union[str, Tuple[str, ...]]) -> bool:
return self._norm(self._cur_line()).startswith(prefix)
def read_str(self, prefix: str) -> str:
"""
Consume the current line if it starts with *prefix* and return the text
**after** the prefix. Raises if prefix is empty.
"""
if prefix == "":
raise ValueError("read_str() requires a non-empty prefix")
if self._norm(self._cur_line()).startswith(prefix):
text = self._cur_line()[len(prefix) :]
self.index += 1
return text
return ""
def read_line(self) -> str:
"""Return the current raw line and advance."""
line = self._cur_line()
self.index += 1
return line
# ------------- public entry point -------------------------------------- #
def parse(self) -> None:
while not self.is_done(("*** End Patch",)):
# ---------- UPDATE ---------- #
path = self.read_str("*** Update File: ")
if path:
if path in self.patch.actions:
raise DiffError(f"Duplicate update for file: {path}")
move_to = self.read_str("*** Move to: ")
if path not in self.current_files:
raise DiffError(f"Update File Error - missing file: {path}")
text = self.current_files[path]
action = self._parse_update_file(text)
action.move_path = move_to or None
self.patch.actions[path] = action
continue
# ---------- DELETE ---------- #
path = self.read_str("*** Delete File: ")
if path:
if path in self.patch.actions:
raise DiffError(f"Duplicate delete for file: {path}")
if path not in self.current_files:
raise DiffError(f"Delete File Error - missing file: {path}")
self.patch.actions[path] = PatchAction(type=ActionType.DELETE)
continue
# ---------- ADD ---------- #
path = self.read_str("*** Add File: ")
if path:
if path in self.patch.actions:
raise DiffError(f"Duplicate add for file: {path}")
if path in self.current_files:
raise DiffError(f"Add File Error - file already exists: {path}")
self.patch.actions[path] = self._parse_add_file()
continue
raise DiffError(f"Unknown line while parsing: {self._cur_line()}")
if not self.startswith("*** End Patch"):
raise DiffError("Missing *** End Patch sentinel")
self.index += 1 # consume sentinel
# ------------- section parsers ---------------------------------------- #
def _parse_update_file(self, text: str) -> PatchAction:
action = PatchAction(type=ActionType.UPDATE)
lines = text.split("\n")
index = 0
while not self.is_done(
(
"*** End Patch",
"*** Update File:",
"*** Delete File:",
"*** Add File:",
"*** End of File",
)
):
def_str = self.read_str("@@ ")
section_str = ""
if not def_str and self._norm(self._cur_line()) == "@@":
section_str = self.read_line()
if not (def_str or section_str or index == 0):
raise DiffError(f"Invalid line in update section:\n{self._cur_line()}")
if def_str.strip():
found = False
if def_str not in lines[:index]:
for i, s in enumerate(lines[index:], index):
if s == def_str:
index = i + 1
found = True
break
if not found and def_str.strip() not in [
s.strip() for s in lines[:index]
]:
for i, s in enumerate(lines[index:], index):
if s.strip() == def_str.strip():
index = i + 1
self.fuzz += 1
found = True
break
next_ctx, chunks, end_idx, eof = peek_next_section(self.lines, self.index)
new_index, fuzz = find_context(lines, next_ctx, index, eof)
if new_index == -1:
ctx_txt = "\n".join(next_ctx)
raise DiffError(
f"Invalid {'EOF ' if eof else ''}context at {index}:\n{ctx_txt}"
)
self.fuzz += fuzz
for ch in chunks:
ch.orig_index += new_index
action.chunks.append(ch)
index = new_index + len(next_ctx)
self.index = end_idx
return action
def _parse_add_file(self) -> PatchAction:
lines: List[str] = []
while not self.is_done(
("*** End Patch", "*** Update File:", "*** Delete File:", "*** Add File:")
):
s = self.read_line()
if not s.startswith("+"):
raise DiffError(f"Invalid Add File line (missing '+'): {s}")
lines.append(s[1:]) # strip leading '+'
return PatchAction(type=ActionType.ADD, new_file="\n".join(lines))
# --------------------------------------------------------------------------- #
# Helper functions
# --------------------------------------------------------------------------- #
def find_context_core(
lines: List[str], context: List[str], start: int
) -> Tuple[int, int]:
if not context:
return start, 0
for i in range(start, len(lines)):
if lines[i : i + len(context)] == context:
return i, 0
for i in range(start, len(lines)):
if [s.rstrip() for s in lines[i : i + len(context)]] == [
s.rstrip() for s in context
]:
return i, 1
for i in range(start, len(lines)):
if [s.strip() for s in lines[i : i + len(context)]] == [
s.strip() for s in context
]:
return i, 100
return -1, 0
def find_context(
lines: List[str], context: List[str], start: int, eof: bool
) -> Tuple[int, int]:
if eof:
new_index, fuzz = find_context_core(lines, context, len(lines) - len(context))
if new_index != -1:
return new_index, fuzz
new_index, fuzz = find_context_core(lines, context, start)
return new_index, fuzz + 10_000
return find_context_core(lines, context, start)
def peek_next_section(
lines: List[str], index: int
) -> Tuple[List[str], List[Chunk], int, bool]:
old: List[str] = []
del_lines: List[str] = []
ins_lines: List[str] = []
chunks: List[Chunk] = []
mode = "keep"
orig_index = index
while index < len(lines):
s = lines[index]
if s.startswith(
(
"@@",
"*** End Patch",
"*** Update File:",
"*** Delete File:",
"*** Add File:",
"*** End of File",
)
):
break
if s == "***":
break
if s.startswith("***"):
raise DiffError(f"Invalid Line: {s}")
index += 1
last_mode = mode
if s == "":
s = " "
if s[0] == "+":
mode = "add"
elif s[0] == "-":
mode = "delete"
elif s[0] == " ":
mode = "keep"
else:
raise DiffError(f"Invalid Line: {s}")
s = s[1:]
if mode == "keep" and last_mode != mode:
if ins_lines or del_lines:
chunks.append(
Chunk(
orig_index=len(old) - len(del_lines),
del_lines=del_lines,
ins_lines=ins_lines,
)
)
del_lines, ins_lines = [], []
if mode == "delete":
del_lines.append(s)
old.append(s)
elif mode == "add":
ins_lines.append(s)
elif mode == "keep":
old.append(s)
if ins_lines or del_lines:
chunks.append(
Chunk(
orig_index=len(old) - len(del_lines),
del_lines=del_lines,
ins_lines=ins_lines,
)
)
if index < len(lines) and lines[index] == "*** End of File":
index += 1
return old, chunks, index, True
if index == orig_index:
raise DiffError("Nothing in this section")
return old, chunks, index, False
# --------------------------------------------------------------------------- #
# Patch → Commit and Commit application
# --------------------------------------------------------------------------- #
def _get_updated_file(text: str, action: PatchAction, path: str) -> str:
if action.type is not ActionType.UPDATE:
raise DiffError("_get_updated_file called with non-update action")
orig_lines = text.split("\n")
dest_lines: List[str] = []
orig_index = 0
for chunk in action.chunks:
if chunk.orig_index > len(orig_lines):
raise DiffError(
f"{path}: chunk.orig_index {chunk.orig_index} exceeds file length"
)
if orig_index > chunk.orig_index:
raise DiffError(
f"{path}: overlapping chunks at {orig_index} > {chunk.orig_index}"
)
dest_lines.extend(orig_lines[orig_index : chunk.orig_index])
orig_index = chunk.orig_index
dest_lines.extend(chunk.ins_lines)
orig_index += len(chunk.del_lines)
dest_lines.extend(orig_lines[orig_index:])
return "\n".join(dest_lines)
def patch_to_commit(patch: Patch, orig: Dict[str, str]) -> Commit:
commit = Commit()
for path, action in patch.actions.items():
if action.type is ActionType.DELETE:
commit.changes[path] = FileChange(
type=ActionType.DELETE, old_content=orig[path]
)
elif action.type is ActionType.ADD:
if action.new_file is None:
raise DiffError("ADD action without file content")
commit.changes[path] = FileChange(
type=ActionType.ADD, new_content=action.new_file
)
elif action.type is ActionType.UPDATE:
new_content = _get_updated_file(orig[path], action, path)
commit.changes[path] = FileChange(
type=ActionType.UPDATE,
old_content=orig[path],
new_content=new_content,
move_path=action.move_path,
)
return commit
# --------------------------------------------------------------------------- #
# User-facing helpers
# --------------------------------------------------------------------------- #
def text_to_patch(text: str, orig: Dict[str, str]) -> Tuple[Patch, int]:
lines = text.splitlines() # preserves blank lines, no strip()
if (
len(lines) < 2
or not Parser._norm(lines[0]).startswith("*** Begin Patch")
or Parser._norm(lines[-1]) != "*** End Patch"
):
raise DiffError("Invalid patch text - missing sentinels")
parser = Parser(current_files=orig, lines=lines, index=1)
parser.parse()
return parser.patch, parser.fuzz
def identify_files_needed(text: str) -> List[str]:
lines = text.splitlines()
return [
line[len("*** Update File: ") :]
for line in lines
if line.startswith("*** Update File: ")
] + [
line[len("*** Delete File: ") :]
for line in lines
if line.startswith("*** Delete File: ")
]
def identify_files_added(text: str) -> List[str]:
lines = text.splitlines()
return [
line[len("*** Add File: ") :]
for line in lines
if line.startswith("*** Add File: ")
]
# --------------------------------------------------------------------------- #
# File-system helpers
# --------------------------------------------------------------------------- #
def load_files(paths: List[str], open_fn: Callable[[str], str]) -> Dict[str, str]:
return {path: open_fn(path) for path in paths}
def apply_commit(
commit: Commit,
write_fn: Callable[[str, str], None],
remove_fn: Callable[[str], None],
) -> None:
for path, change in commit.changes.items():
if change.type is ActionType.DELETE:
remove_fn(path)
elif change.type is ActionType.ADD:
if change.new_content is None:
raise DiffError(f"ADD change for {path} has no content")
write_fn(path, change.new_content)
elif change.type is ActionType.UPDATE:
if change.new_content is None:
raise DiffError(f"UPDATE change for {path} has no new content")
target = change.move_path or path
write_fn(target, change.new_content)
if change.move_path:
remove_fn(path)
def process_patch(
text: str,
open_fn: Callable[[str], str],
write_fn: Callable[[str, str], None],
remove_fn: Callable[[str], None],
) -> str:
if not text.startswith("*** Begin Patch"):
raise DiffError("Patch text must start with *** Begin Patch")
paths = identify_files_needed(text)
orig = load_files(paths, open_fn)
patch, _fuzz = text_to_patch(text, orig)
commit = patch_to_commit(patch, orig)
apply_commit(commit, write_fn, remove_fn)
return "Done!"
# --------------------------------------------------------------------------- #
# Default FS helpers
# --------------------------------------------------------------------------- #
def open_file(path: str) -> str:
with open(path, "rt", encoding="utf-8") as fh:
return fh.read()
def write_file(path: str, content: str) -> None:
target = pathlib.Path(path)
target.parent.mkdir(parents=True, exist_ok=True)
with target.open("wt", encoding="utf-8") as fh:
fh.write(content)
def remove_file(path: str) -> None:
pathlib.Path(path).unlink(missing_ok=True)
# --------------------------------------------------------------------------- #
# CLI entry-point
# --------------------------------------------------------------------------- #
def main() -> None:
import sys
patch_text = sys.stdin.read()
if not patch_text:
print("Please pass patch text through stdin", file=sys.stderr)
return
try:
result = process_patch(patch_text, open_file, write_file, remove_file)
except DiffError as exc:
print(exc, file=sys.stderr)
return
print(result)
if __name__ == "__main__":
main()
其他有效的 diff 格式
如果你想尝试使用不同的 diff 格式,我们在测试中发现,Aider 的 polyglot benchmark 所采用的 SEARCH/REPLACE diff 格式,还有一种不带内部转义的伪 XML 格式,在测试中都表现出了很高的成功率。
这些 diff 格式有两个共同点:(1)它们不使用行号;(2)它们都会提供需要被替换的精确代码内容和替换后的代码内容,并在两者之间用清晰的分隔符进行区分。
SEARCH_REPLACE_DIFF_EXAMPLE = """
path/to/file.py
```
>>>>>>> SEARCH
def search():
pass
=======
def search():
raise NotImplementedError()
<<<<<<< REPLACE
"""
PSEUDO_XML_DIFF_EXAMPLE = """
<edit>
<file>
path/to/file.py
</file>
<old_code>
def search():
pass
</old_code>
<new_code>
def search():
raise NotImplementedError()
</new_code>
</edit>
"""