Jupyter 中的消息传递

这份文档解释了 Jupyter 前端和内核如何通信的基本通信设计和消息传递规范。ZeroMQ 库提供了低级别的传输层,这些消息是通过它来发送的。

重要

本文档包含了 IPython 消息传输协议的权威描述。我们强烈鼓励所有的开发者随着实现的发展而不断地更新它,以便我们对所有的协议细节有一个统一的参考。

版本管理

Jupyter 消息规范是独立于使用它的软件包的版本。目前该规范的版本是 5.3。

备注

本文档中的 New inChanged in 消息指的是 Jupyter 消息规范 的版本,而不是 jupyter_client 的版本。

介绍

基本设计在下图中得到了解释:

IPython kernel/frontend messaging architecture.

一个内核可以同时连接到一个或多个前端。内核有专门的套接字,用于以下功能:

  1. Shell:这个单一的 ROUTER 套接字允许来自前端的多个传入连接,这是任何前端向内核提出的代码执行、对象消息、提示等请求的套接字。这个套接字上的通信是每个前端和内核的一连串请求/回复动作。

  2. IOPub:这个套接字是 ‘broadcast channel’,内核在这里发布所有的副作用(stdout, stderr, 调试事件等),以及来自任何客户端在 shell 套接字上的请求和它自己在 stdin 套接字的请求。在 Python 中,有许多动作会产生副作用:print() 会写到 sys.stdout,错误会产生追踪,等等。此外,在多客户端的情况下,我们希望所有前端都能知道彼此向内核发送了什么(例如,这在协作的情况下很有用)。这个套接字允许副作用和通过 shell 通道与一个客户进行通信的消息以统一的方式提供给所有客户。

  3. stdin:这个 ROUTER 套接字与所有前端相连,当 raw_input() 被调用时,它允许内核从活动的前端请求输入。执行代码的前端有一个 DEALER 套接字,当这种通信发生时,它充当内核的 ‘虚拟键盘’ (图中中央键盘周围的黑色轮廓表示)。在实践中,前端可以使用一个特殊的输入部件来显示这样的内核请求,或者以其他方式表明用户要为内核键入输入,而不是在前端的正常命令。

    所有的消息都被标记了足够的消息(详见下文),以便客户知道哪些消息来自他们自己与内核的互动,哪些来自其他客户,这样他们就可以适当地显示每种类型的消息。

  4. Control:这个通道与 Shell 相同,但在一个单独的套接字上操作,以避免在执行请求后面排队。控制通道用于关机和重启消息,以及调试消息。

    为了获得更顺畅的用户体验,我们建议在与 shell 通道分开的线程中运行控制通道,这样,例如关机或调试消息可以立即得到处理,而不必等待一个长期运行的 shell 消息处理完毕(例如一个昂贵的执行请求)。

  5. Heartbeat:这个套接字允许在前端和内核之间发送简单的字节字符串消息,以确保它们仍然是连接的。

每个通道上允许的消息的实际格式在下面规定。消息是具有字符串键和值的 dicts 的 dicts,可以合理地用 JSON 表示。

通用消息格式

一条消息是由五个字典组成的。

消息头

消息 header 包含关于消息的信息,如发起会话和实际消息 ID 的唯一标识符,消息的类型,Jupyter 协议的版本,以及消息的创建日期。此外,还有一个用户名字段,例如,如果适用的话,用于生成消息的进程。这在多个用户可能同时与同一个内核交互的协作环境中是很有用的,这样前端就可以用有意义的方式标记各种消息。

{
    'msg_id' : str, # typically UUID, must be unique per message
    'session' : str, # typically UUID, should be unique per session
    'username' : str,
    # ISO 8601 timestamp for when the message is created
    'date': str,
    # All recognized message type strings are listed below.
    'msg_type' : str,
    # the message protocol version
    'version' : '5.0',
}

备注

消息头中的 session ID 标识了一个具有状态的独特实体,如内核进程或客户端进程。

来自客户端的消息头中的客户端会话 ID,在连接到一个内核的所有客户端中应该是唯一的。当一个客户端重新连接到一个内核时,它应该在其消息头中使用相同的客户端会话 ID。当客户端重新启动时,它应该产生一个新的客户端会话 ID。

内核会话 ID,在来自内核的消息头中,应该标识一个特定的内核进程。如果一个内核被重新启动,内核会话 ID 应该被重新生成。

消息头中的会话 ID 可以用来标识发送实体。例如,如果客户端断开连接并重新连接到一个内核,而来自该内核的消息有一个与断开连接前不同的内核会话 ID,客户端应该认为该内核被重新启动。

在 5.0 版更改: version 关键字被添加到消息头

在 5.1 版更改: 消息头中的 date 在 5.1 之前的规范中被意外地省略了,但它一直在规范的实现中,所以强烈鼓励实现者包含它。在 5.1 中,它将是强制性的。

Parent 消息头

当一个消息是另一个消息的 “结果”,例如一个副作用(输出或状态)或直接应答,parent_header “引起” 当前消息的 header 的副本。。_reply 信息必须有一个 parent_header,而副作用 通常 有一个父信息。如果没有父类,应该使用一个空的 dict。这个父节点被客户端用来将消息处理路由到正确的地方,例如输出到一个单元格。

{
    # parent_header is a copy of the request's header
    'msg_id': '...',
    ...
}

元数据

metadata dict 包含关于消息的信息,但不是内容的一部分。这并不经常使用,但可以作为一个额外的位置来存储关于请求和回复的信息,例如扩展添加关于请求或执行环境的信息。

内容

content dict 是信息的主体。其结构由 header 中的 msg_type 字段决定,下面将对每条信息进行详细描述。

缓冲

最后,一个额外的二进制缓冲区的列表可以与一个消息相关联。虽然这是协议的一部分,但没有正式的消息使用这些缓冲区。它们被扩展消息所使用,例如 IPython Parallel 的 apply 和一些 ipywidgets 的 comm 消息。

完整消息

将所有这些结合起来,一个完整的信息可以表示为以下的字典(和一个列表):

{
    "header" : {
        "msg_id": "...",
        "msg_type": "...",
        ...
    },
    "parent_header": {},
    "metadata": {},
    "content": {},
    "buffers": [],
}

备注

这个字典结构不是 Jupyter 协议的部分,它必须由内核和前端来实现;那将是 Wire 协议,它决定了这些信息如何通过 wire 进行序列化。反序列化是由内核或前端实现的,但在大多数情况下,像这样的 dict 是一个合理的选择。

兼容性

内核必须实现 执行内核 info 消息,以及相关的 busy 和 idle Kernel status 消息。所有其他消息类型都是可选的,但我们建议尽可能实现 completion。内核不需要为它们不处理的消息发送任何回复,如果没有回复,前端应该提供合理的行为(除了必要的执行和内核 info 消息)。

stdin 消息 是独特的,因为请求来自内核,而回复来自前端。前端不需要支持这个,但是如果它不支持,它必须在 执行请求 设置中 'allow_stdin' : False。在这种情况下,内核可能不会发送 stdin 请求。如果该字段为真,内核可以发送 stdin 请求并阻塞等待回复,所以前端必须应答。

双方都应该允许意外的消息类型,以及已知消息类型中的额外字段,以便对协议的添加不会破坏现有的代码。

Wire 协议

上述消息格式只是 Jupyter 消息内容的逻辑表示,但并没有描述 zeromq 中 wire 层面的实际 实现。本节描述了 Jupyter 内核和客户端通过 zeromq 相互交谈时必须实现的协议。

消息规范的参考实现是 Session 类。

备注

这部分应该只与该协议的非 Python consumer 有关。Python consumer 应该导入并使用 jupyter_client.session.Session 中的 wire 协议的实现。

每条信息都被序列化为一个至少由六个字节块组成的序列:

[
  b'u-u-i-d',         # zmq identity(ies)
  b'<IDS|MSG>',       # delimiter
  b'baddad42',        # HMAC signature
  b'{header}',        # serialized header dict
  b'{parent_header}', # serialized parent header dict
  b'{metadata}',      # serialized metadata dict
  b'{content}',       # serialized content dict
  b'\xf0\x9f\x90\xb1' # extra raw data buffer(s)
  ...
]

消息的前端是 ZeroMQ 的路由前缀,它可以是零个或多个套接字身份。这是消息中分界符键 <IDS|MSG> 之前的每一块。在 IOPub 的情况下,应该只有一个前缀成分,这就是 IOPub 订阅者的主题,例如 execute_resultdisplay_data

备注

在大多数情况下,IOPub 的主题是不相关的,完全被忽略,因为前端只是订阅了所有的主题。在 IPython 内核中的惯例是使用 msg_type 作为主题,可能还有关于消息的额外信息,例如 kernel.{u-u-i-d}.execute_resultstream.stdout

分隔符之后是信息的 HMAC 签名,用于认证。如果认证被禁用,这应该是一个空字符串。默认情况下,用于计算这些签名的散列函数是 sha256。

备注

要禁用认证和签名检查,将连接文件的 key 字段设置为空字符串。

签名是 HMAC 的十六进制摘要,由以下内容串联而成:

  • 一个共享密钥(通常是连接文件的 key 字段)。

  • 序列化的 header dict

  • 序列化的 parent header dict

  • 序列化的 metadata dict

  • 序列化的 content dict

在 Python 中,这是通过以下方式实现的:

# once:
digester = HMAC(key, digestmod=hashlib.sha256)

# for each message
d = digester.copy()
for serialized_dict in (header, parent, metadata, content):
    d.update(serialized_dict)
signature = d.hexdigest()

在签名之后是实际的信息,总是在四个字节的框架内。组成一个消息的四个字典是分别序列化的,顺序是 header、parent header、metadata 和 content。这些可以由任何将字典变成字节的函数来序列化。默认和最常见的序列化是 JSON,但常被 msgpack 和 pickle 替代。

在序列化的字典为零后,有许多原始数据缓冲区,可以被支持二进制数据的消息类型使用,这可以在自定义消息中使用,如 comms 和 protocol 的扩展。

Python API

由于消息可以被表示为 dict,它们自然地映射为 func(**kw) 的调用形式。我们应该在一些关键点上开发所有请求的函数形式,这些请求以这种方式接受参数,并自动构建必要的 dict 进行发送。

此外,为了方便,消息规范的 Python 实现在反序列化时将消息扩展为以下形式:”

{
  'header' : dict,
  # The msg's unique identifier and type are always stored in the header,
  # but the Python implementation copies them to the top level.
  'msg_id' : str,
  'msg_type' : str,
  'parent_header' : dict,
  'content' : dict,
  'metadata' : dict,
  'buffers': list,
}

所有发送的消息或由任何 IPython 消息处理程序接收的消息都应该有这个扩展结构。

shell(ROUTER/DEALER)通道上的消息

Request-Reply

In general, the ROUTER/DEALER sockets follow a request-reply pattern:

The client sends an <action>_request message (such as execute_request) on its shell (DEALER) socket. The kernel receives that request and immediately publishes a status: busy message on IOPub. The kernel then processes the request and sends the appropriate <action>_reply message, such as execute_reply. After processing the request and publishing associated IOPub messages, if any, the kernel publishes a status: idle message. This idle status message indicates that IOPub messages associated with a given request have all been received.

All reply messages have a 'status' field, which will have one of the following values:

  • status='ok': The request was processed successfully, and the remaining content of the reply is specified in the appropriate section below.

  • status='error': The request failed due to an error.

    When status is ‘error’, the usual content of a successful reply should be omitted, instead the following fields should be present:

    {
       'status' : 'error',
       'ename' : str,   # Exception name, as a string
       'evalue' : str,  # Exception value, as a string
       'traceback' : list(str), # traceback frames as strings
    }
    
  • status='abort': This is the same as status='error' but with no information about the error. No fields should be present other that status.

As a special case, execute_reply messages (see Execution results) have an execution_count field regardless of their status.

在 5.1 版更改: status='abort' has not proved useful, and is considered deprecated. Kernels should send status='error' instead.

Execute

This message type is used by frontends to ask the kernel to execute code on behalf of the user, in a namespace reserved to the user’s variables (and thus separate from the kernel’s own internal code and variables).

Message type: execute_request:

content = {
    # Source code to be executed by the kernel, one or more lines.
'code' : str,

# A boolean flag which, if True, signals the kernel to execute
# this code as quietly as possible.
# silent=True forces store_history to be False,
# and will *not*:
#   - broadcast output on the IOPUB channel
#   - have an execute_result
# The default is False.
'silent' : bool,

# A boolean flag which, if True, signals the kernel to populate history
# The default is True if silent is False.  If silent is True, store_history
# is forced to be False.
'store_history' : bool,

# A dict mapping names to expressions to be evaluated in the
# user's dict. The rich display-data representation of each will be evaluated after execution.
# See the display_data content for the structure of the representation data.
'user_expressions' : dict,

# Some frontends do not support stdin requests.
# If this is true, code running in the kernel can prompt the user for input
# with an input_request message (see below). If it is false, the kernel
# should not send these messages.
'allow_stdin' : True,

# A boolean flag, which, if True, aborts the execution queue if an exception is encountered.
# If False, queued execute_requests will execute even if this request generates an exception.
'stop_on_error' : True,
}

在 5.0 版更改: user_variables removed, because it is redundant with user_expressions.

The code field contains a single string (possibly multiline) to be executed.

The user_expressions field deserves a detailed explanation. In the past, IPython had the notion of a prompt string that allowed arbitrary code to be evaluated, and this was put to good use by many in creating prompts that displayed system status, path information, and even more esoteric uses like remote instrument status acquired over the network. But now that IPython has a clean separation between the kernel and the clients, the kernel has no prompt knowledge; prompts are a frontend feature, and it should be even possible for different frontends to display different prompts while interacting with the same kernel. user_expressions can be used to retrieve this information.

Any error in evaluating any expression in user_expressions will result in only that key containing a standard error message, of the form:

{
    'status' : 'error',
    'ename' : 'NameError',
    'evalue' : 'foo',
    'traceback' : ...
}

备注

In order to obtain the current execution counter for the purposes of displaying input prompts, frontends may make an execution request with an empty code string and silent=True.

Upon completion of the execution request, the kernel always sends a reply, with a status code indicating what happened and additional data depending on the outcome. See below for the possible return codes and associated data.

Execution counter (prompt number)

The kernel should have a single, monotonically increasing counter of all execution requests that are made with store_history=True. This counter is used to populate the In[n] and Out[n] prompts. The value of this counter will be returned as the execution_count field of all execute_reply and execute_input messages.

Execution results

Message type: execute_reply:

content = {
  # One of: 'ok' OR 'error' OR 'aborted'
  'status' : str,

  # The global kernel counter that increases by one with each request that
  # stores history.  This will typically be used by clients to display
  # prompt numbers to the user.  If the request did not store history, this will
  # be the current value of the counter in the kernel.
  'execution_count' : int,
}

When status is ‘ok’, the following extra fields are present:

{
  # 'payload' will be a list of payload dicts, and is optional.
  # payloads are considered deprecated.
  # The only requirement of each payload dict is that it have a 'source' key,
  # which is a string classifying the payload (e.g. 'page').

  'payload' : list(dict),

  # Results for the user_expressions.
  'user_expressions' : dict,
}

在 5.0 版更改: user_variables is removed, use user_expressions instead.

Payloads (DEPRECATED)

Execution payloads

Payloads are considered deprecated, though their replacement is not yet implemented.

Payloads are a way to trigger frontend actions from the kernel. Current payloads:

page: display data in a pager.

Pager output is used for introspection, or other displayed information that’s not considered output. Pager payloads are generally displayed in a separate pane, that can be viewed alongside code, and are not included in notebook documents.

{
  "source": "page",
  # mime-bundle of data to display in the pager.
  # Must include text/plain.
  "data": mimebundle,
  # line offset to start from
  "start": int,
}

set_next_input: create a new output

used to create new cells in the notebook, or set the next input in a console interface. The main example being %load.

{
  "source": "set_next_input",
  # the text contents of the cell to create
  "text": "some cell content",
  # If true, replace the current cell in document UIs instead of inserting
  # a cell. Ignored in console UIs.
  "replace": bool,
}

edit_magic: open a file for editing.

Triggered by %edit. Only the QtConsole currently supports edit payloads.

{
  "source": "edit_magic",
  "filename": "/path/to/file.py", # the file to edit
  "line_number": int, # the line number to start with
}

ask_exit: instruct the frontend to prompt the user for exit

Allows the kernel to request exit, e.g. via %exit in IPython. Only for console frontends.

{
  "source": "ask_exit",
  # whether the kernel should be left running, only closing the client
  "keepkernel": bool,
}

Introspection

Code can be inspected to show useful information to the user. It is up to the Kernel to decide what information should be displayed, and its formatting.

Message type: inspect_request:

content = {
    # The code context in which introspection is requested
    # this may be up to an entire multiline cell.
    'code' : str,

    # The cursor position within 'code' (in unicode characters) where inspection is requested
    'cursor_pos' : int,

    # The level of detail desired.  In IPython, the default (0) is equivalent to typing
    # 'x?' at the prompt, 1 is equivalent to 'x??'.
    # The difference is up to kernels, but in IPython level 1 includes the source code
    # if available.
    'detail_level' : 0 or 1,
}

在 5.0 版更改: object_info_request renamed to inspect_request.

在 5.0 版更改: name key replaced with code and cursor_pos, moving the lexing responsibility to the kernel.

在 5.2 版更改: Due to a widespread bug in many frontends, cursor_pos in versions prior to 5.2 is ambiguous in the presence of “astral-plane” characters. In 5.2, cursor_pos must be the actual encoding-independent offset in unicode codepoints. See cursor_pos and unicode offsets for more.

The reply is a mime-bundle, like a display_data message, which should be a formatted representation of information about the context. In the notebook, this is used to show tooltips over function calls, etc.

Message type: inspect_reply:

content = {
    # 'ok' if the request succeeded or 'error', with error information as in all other replies.
    'status' : 'ok',

    # found should be true if an object was found, false otherwise
    'found' : bool,

    # data can be empty if nothing is found
    'data' : dict,
    'metadata' : dict,
}

在 5.0 版更改: object_info_reply renamed to inspect_reply.

在 5.0 版更改: Reply is changed from structured data to a mime bundle, allowing formatting decisions to be made by the kernel.

Completion

Message type: complete_request:

content = {
    # The code context in which completion is requested
    # this may be up to an entire multiline cell, such as
    # 'foo = a.isal'
    'code' : str,

    # The cursor position within 'code' (in unicode characters) where completion is requested
    'cursor_pos' : int,
}

在 5.0 版更改: line, block, and text keys are removed in favor of a single code for context. Lexing is up to the kernel.

在 5.2 版更改: Due to a widespread bug in many frontends, cursor_pos in versions prior to 5.2 is ambiguous in the presence of “astral-plane” characters. In 5.2, cursor_pos must be the actual encoding-independent offset in unicode codepoints. See cursor_pos and unicode offsets for more.

Message type: complete_reply:

content = {
# status should be 'ok' unless an exception was raised during the request,
# in which case it should be 'error', along with the usual error message content
# in other messages.
'status' : 'ok'

# The list of all matches to the completion request, such as
# ['a.isalnum', 'a.isalpha'] for the above example.
'matches' : list,

# The range of text that should be replaced by the above matches when a completion is accepted.
# typically cursor_end is the same as cursor_pos in the request.
'cursor_start' : int,
'cursor_end' : int,

# Information that frontend plugins might use for extra display information about completions.
'metadata' : dict,
}

在 5.0 版更改:

  • matched_text is removed in favor of cursor_start and cursor_end.

  • metadata is added for extended information.

History

For clients to explicitly request history from a kernel. The kernel has all the actual execution history stored in a single location, so clients can request it from the kernel when needed.

Message type: history_request:

content = {

  # If True, also return output history in the resulting dict.
  'output' : bool,

  # If True, return the raw input history, else the transformed input.
  'raw' : bool,

  # So far, this can be 'range', 'tail' or 'search'.
  'hist_access_type' : str,

  # If hist_access_type is 'range', get a range of input cells. session
  # is a number counting up each time the kernel starts; you can give
  # a positive session number, or a negative number to count back from
  # the current session.
  'session' : int,
  # start and stop are line (cell) numbers within that session.
  'start' : int,
  'stop' : int,

  # If hist_access_type is 'tail' or 'search', get the last n cells.
  'n' : int,

  # If hist_access_type is 'search', get cells matching the specified glob
  # pattern (with * and ? as wildcards).
  'pattern' : str,

  # If hist_access_type is 'search' and unique is true, do not
  # include duplicated history.  Default is false.
  'unique' : bool,

}

4.0 新版功能: The key unique for history_request.

Message type: history_reply:

content = {
  # 'ok' if the request succeeded or 'error', with error information as in all other replies.
  'status' : 'ok',

  # A list of 3 tuples, either:
  # (session, line_number, input) or
  # (session, line_number, (input, output)),
  # depending on whether output was False or True, respectively.
  'history' : list,
}

备注

Most of the history messaging options are not used by Jupyter frontends, and many kernels do not implement them. If you’re implementing these messages in a kernel, the ‘tail’ request is the most useful; this is used by the Qt console, for example. The notebook interface does not use history messages at all.

This interface was designed by exposing all the main options of IPython’s history interface. We may remove some options in a future version of the message spec.

Code completeness

5.0 新版功能.

When the user enters a line in a console style interface, the console must decide whether to immediately execute the current code, or whether to show a continuation prompt for further input. For instance, in Python a = 5 would be executed immediately, while for i in range(5): would expect further input.

There are four possible replies:

  • complete code is ready to be executed

  • incomplete code should prompt for another line

  • invalid code will typically be sent for execution, so that the user sees the error soonest.

  • unknown - if the kernel is not able to determine this. The frontend should also handle the kernel not replying promptly. It may default to sending the code for execution, or it may implement simple fallback heuristics for whether to execute the code (e.g. execute after a blank line).

Frontends may have ways to override this, forcing the code to be sent for execution or forcing a continuation prompt.

Message type: is_complete_request:

content = {
    # The code entered so far as a multiline string
    'code' : str,
}

Message type: is_complete_reply:

content = {
    # One of 'complete', 'incomplete', 'invalid', 'unknown'
    'status' : str,

    # If status is 'incomplete', indent should contain the characters to use
    # to indent the next line. This is only a hint: frontends may ignore it
    # and use their own autoindentation rules. For other statuses, this
    # field does not exist.
    'indent': str,
}

Connect

5.1 版后已移除: connect_request/reply have not proved useful, and are considered deprecated. Kernels are not expected to implement handlers for this message.

When a client connects to the request/reply socket of the kernel, it can issue a connect request to get basic information about the kernel, such as the ports the other ZeroMQ sockets are listening on. This allows clients to only have to know about a single port (the shell channel) to connect to a kernel. The ports for any additional channels the kernel is listening on should be included in the reply. If any ports are omitted from the reply, this indicates that the channels are not running.

Message type: connect_request:

content = {}

For example, a kernel with all channels running:

Message type: connect_reply:

content = {
    'shell_port' : int,   # The port the shell ROUTER socket is listening on.
    'iopub_port' : int,   # The port the PUB socket is listening on.
    'stdin_port' : int,   # The port the stdin ROUTER socket is listening on.
    'hb_port' : int,      # The port the heartbeat socket is listening on.
    'control_port' : int,      # The port the control ROUTER socket is listening on.
}

Comm info

When a client needs the currently open comms in the kernel, it can issue a request for the currently open comms. When the optional target_name is specified, the reply only contains the currently open comms for the target.

Message type: comm_info_request:

content = {
    # Optional, the target name
    'target_name': str,
}

Message type: comm_info_reply:

content = {
    # 'ok' if the request succeeded or 'error', with error information as in all other replies.
    'status' : 'ok',

    # A dictionary of the comms, indexed by uuids.
    'comms': {
        comm_id: {
            'target_name': str,
        },
    },
}

5.1 新版功能.

Kernel info

If a client needs to know information about the kernel, it can make a request of the kernel’s information. This message can be used to fetch core information of the kernel, including language (e.g., Python), language version number and IPython version number, and the IPython message spec version number.

Message type: kernel_info_request:

content = {
}

Message type: kernel_info_reply:

content = {
    # 'ok' if the request succeeded or 'error', with error information as in all other replies.
    'status' : 'ok',

    # Version of messaging protocol.
    # The first integer indicates major version.  It is incremented when
    # there is any backward incompatible change.
    # The second integer indicates minor version.  It is incremented when
    # there is any backward compatible change.
    'protocol_version': 'X.Y.Z',

    # The kernel implementation name
    # (e.g. 'ipython' for the IPython kernel)
    'implementation': str,

    # Implementation version number.
    # The version number of the kernel's implementation
    # (e.g. IPython.__version__ for the IPython kernel)
    'implementation_version': 'X.Y.Z',

    # Information about the language of code for the kernel
    'language_info': {
        # Name of the programming language that the kernel implements.
        # Kernel included in IPython returns 'python'.
        'name': str,

        # Language version number.
        # It is Python version number (e.g., '2.7.3') for the kernel
        # included in IPython.
        'version': 'X.Y.Z',

        # mimetype for script files in this language
        'mimetype': str,

        # Extension including the dot, e.g. '.py'
        'file_extension': str,

        # Pygments lexer, for highlighting
        # Only needed if it differs from the 'name' field.
        'pygments_lexer': str,

        # Codemirror mode, for highlighting in the notebook.
        # Only needed if it differs from the 'name' field.
        'codemirror_mode': str or dict,

        # Nbconvert exporter, if notebooks written with this kernel should
        # be exported with something other than the general 'script'
        # exporter.
        'nbconvert_exporter': str,
    },

    # A banner of information about the kernel,
    # which may be desplayed in console environments.
    'banner': str,

    # A boolean flag which tells if the kernel supports debugging in the notebook.
    # Default is False
    'debugger': bool,

    # Optional: A list of dictionaries, each with keys 'text' and 'url'.
    # These will be displayed in the help menu in the notebook UI.
    'help_links': [
        {'text': str, 'url': str}
    ],
}

Refer to the lists of available Pygments lexers and codemirror modes for those fields.

在 5.0 版更改: Versions changed from lists of integers to strings.

在 5.0 版更改: ipython_version is removed.

在 5.0 版更改: language_info, implementation, implementation_version, banner and help_links keys are added.

在 5.0 版更改: language_version moved to language_info.version

在 5.0 版更改: language moved to language_info.name

Messages on the Control (ROUTER/DEALER) channel

Kernel shutdown

The clients can request the kernel to shut itself down; this is used in multiple cases:

  • when the user chooses to close the client application via a menu or window control.

  • when the user types ‘exit’ or ‘quit’ (or their uppercase magic equivalents).

  • when the user chooses a GUI method (like the ‘Ctrl-C’ shortcut in the IPythonQt client) to force a kernel restart to get a clean kernel without losing client-side state like history or inlined figures.

The client sends a shutdown request to the kernel, and once it receives the reply message (which is otherwise empty), it can assume that the kernel has completed shutdown safely. The request is sent on the control channel.

Upon their own shutdown, client applications will typically execute a last minute sanity check and forcefully terminate any kernel that is still alive, to avoid leaving stray processes in the user’s machine.

Message type: shutdown_request:

content = {
    'restart' : bool # False if final shutdown, or True if shutdown precedes a restart
}

Message type: shutdown_reply:

content = {
    # 'ok' if the request succeeded or 'error', with error information as in all other replies.
    'status' : 'ok',

    'restart' : bool # False if final shutdown, or True if shutdown precedes a restart
}

备注

When the clients detect a dead kernel thanks to inactivity on the heartbeat socket, they simply send a forceful process termination signal, since a dead process is unlikely to respond in any useful way to messages.

在 5.4 版更改: Sending a shutdown_request message on the shell channel is deprecated.

Kernel interrupt

In case a kernel can not catch operating system interrupt signals (e.g. the used runtime handles signals and does not allow a user program to define a callback), a kernel can choose to be notified using a message instead. For this to work, the kernels kernelspec must set interrupt_mode to message. An interruption will then result in the following message on the control channel:

Message type: interrupt_request:

content = {}

Message type: interrupt_reply:

content = {
    # 'ok' if the request succeeded or 'error', with error information as in all other replies.
    'status' : 'ok'
}

5.3 新版功能.

Debug request

This message type is used with debugging kernels to request specific actions to be performed by the debugger such as adding a breakpoint or stepping into a code.

Message type: debug_request:

content = {}

Message type: debug_reply:

content = {}

The content dicts of the debug_request and debug_reply messages respectively follow the specification of the Request and Response messages from the Debug Adapter Protocol (DAP) as of version 1.39 or later.

Debug requests and replies are sent over the control channel to prevent queuing behind execution requests.

Additions to the DAP

The Jupyter debugger protocol makes several additions to the DAP:

  • the dumpCell request and response messages

  • the debugInfo request and response messages

  • the inspectVariables request and response messages

In order to support the debugging of notebook cells and of Jupyter consoles, which are not based on source files, we need a message to submit code to the debugger to which breakpoints can be added.

Content of the dumpCell request:

{
    'type' : 'request',
    'command' : 'dumpCell',
    'arguments' : {
        'code' : str  # the content of the cell being submitted.
    }
}

Content of the dumpCell response:

{
     'type' : 'response',
     'success': bool,
     'body': {
         'sourcePath': str  # filename for the dumped source
     }
}

In order to support page reloading, or a client connecting at a later stage, Jupyter kernels must store the state of the debugger (such as breakpoints, whether the debugger is currently stopped). The debugInfo request is a DAP Request with no extra argument.

Content of the debugInfo request:

{
    'type' : 'request',
    'command' : 'debugInfo'
}

Content of the debugInfo response:

{
    'type' : 'response',
    'success' : bool,
    'body' : {
        'isStarted' : bool,  # whether the debugger is started,
        'hashMethod' : str,  # the hash method for code cell. Default is 'Murmur2',
        'hashSeed' : str,  # the seed for the hashing of code cells,
        'tmpFilePrefix' : str,  # prefix for temporary file names
        'tmpFileSuffix' : str,  # suffix for temporary file names
        'breakpoints' : [  # breakpoints currently registered in the debugger.
            {
                'source' : str,  # source file
                'breakpoints' : list(source_breakpoints)  # list of breakpoints for that source file
            }
        ],
        'stoppedThreads' : list(int),  # threads in which the debugger is currently in a stopped state
        'richRendering' : bool,  # whether the debugger supports rich rendering of variables
        'exceptionPaths' : list(str),  # exception names used to match leaves or nodes in a tree of exception
    }
}

The source_breakpoint schema is specified by the Debug Adapter Protocol.

The inspectVariables is meant to retrieve the values of all the variables that have been defined in the kernel. It is a DAP Request with no extra argument.

Content of the inspectVariables request:

{
    'type' : 'request',
    'command' : 'inspectVariables'
}

Content of the inspectVariables response:

{
    'type' : 'response',
    'success' : bool,
    'body' : {
        'variables' : [ # variables defined in the notebook.
            {
                'name' : str,
                'variablesReference' : int,
                'value' : str,
                'type' : str
            }
        ]
    }
}

The richInspectVariables request allows to get the rich representation of a variable that has been defined in the kernel.

Content of the richInspectVariables request:

{
    'type' : 'request',
    'command' : 'richInspectVariables',
    'arguments' : {
        'variableName' : str,
        # The frameId is used when the debugger hit a breakpoint only.
        'frameId' : int
    }
}

Content of the richInspectVariables response:

{
    'type' : 'response',
    'success' : bool,
    'body' : {
        # Dictionary of rich reprensentations of the variable
        'data' : dict,
        'metadata' : dict
    }
}

5.5 新版功能.

Messages on the IOPub (PUB/SUB) channel

Streams (stdout, stderr, etc)

Message type: stream:

content = {
    # The name of the stream is one of 'stdout', 'stderr'
    'name' : str,

    # The text is an arbitrary string to be written to that stream
    'text' : str,
}

在 5.0 版更改: ‘data’ key renamed to ‘text’ for consistency with the notebook format.

Display Data

This type of message is used to bring back data that should be displayed (text, html, svg, etc.) in the frontends. This data is published to all frontends. Each message can have multiple representations of the data; it is up to the frontend to decide which to use and how. A single message should contain all possible representations of the same information. Each representation should be a JSON’able data structure, and should be a valid MIME type.

Some questions remain about this design:

  • Do we use this message type for execute_result/displayhook? Probably not, because the displayhook also has to handle the Out prompt display. On the other hand we could put that information into the metadata section.

Message type: display_data:

content = {

    # Who create the data
    # Used in V4. Removed in V5.
    # 'source' : str,

    # The data dict contains key/value pairs, where the keys are MIME
    # types and the values are the raw data of the representation in that
    # format.
    'data' : dict,

    # Any metadata that describes the data
    'metadata' : dict,

    # Optional transient data introduced in 5.1. Information not to be
    # persisted to a notebook or other documents. Intended to live only
    # during a live kernel session.
    'transient': dict,
}

The metadata contains any metadata that describes the output. Global keys are assumed to apply to the output as a whole. The metadata dict can also contain mime-type keys, which will be sub-dictionaries, which are interpreted as applying only to output of that type. Third parties should put any data they write into a single dict with a reasonably unique name to avoid conflicts.

The only metadata keys currently defined in IPython are the width and height of images:

metadata = {
  'image/png' : {
    'width': 640,
    'height': 480
  }
}

and expanded for JSON data:

metadata = {
  'application/json' : {
    'expanded': True
  }
}

The transient dict contains runtime metadata that should not be persisted to document formats and is fully optional. The only transient key currently defined in Jupyter is display_id:

transient = {
    'display_id': 'abcd'
}

在 5.0 版更改: application/json data should be unpacked JSON data, not double-serialized as a JSON string.

在 5.1 版更改: transient is a new field.

Update Display Data

5.1 新版功能.

Displays can now be named with a display_id within the transient field of display_data or execute_result.

When a display_id is specified for a display, it can be updated later with an update_display_data message. This message has the same format as display_data messages and must contain a transient field with a display_id.

Message type: update_display_data:

content = {

    # The data dict contains key/value pairs, where the keys are MIME
    # types and the values are the raw data of the representation in that
    # format.
    'data' : dict,

    # Any metadata that describes the data
    'metadata' : dict,

    # Any information not to be persisted to a notebook or other environment
    # Intended to live only during a kernel session
    'transient': dict,
}

Frontends can choose how they update prior outputs (or if they regard this as a regular display_data message). Within the jupyter and nteract notebooks, all displays that match the display_id are updated (even if there are multiple).

Code inputs

To let all frontends know what code is being executed at any given time, these messages contain a re-broadcast of the code portion of an execute_request, along with the execution_count.

Message type: execute_input:

content = {
    'code' : str,  # Source code to be executed, one or more lines

    # The counter for this execution is also provided so that clients can
    # display it, since IPython automatically creates variables called _iN
    # (for input prompt In[N]).
    'execution_count' : int
}

在 5.0 版更改: pyin is renamed to execute_input.

Execution results

Results of an execution are published as an execute_result. These are identical to display_data messages, with the addition of an execution_count key.

Results can have multiple simultaneous formats depending on its configuration. A plain text representation should always be provided in the text/plain mime-type. Frontends are free to display any or all of these according to its capabilities. Frontends should ignore mime-types they do not understand. The data itself is any JSON object and depends on the format. It is often, but not always a string.

Message type: execute_result:

content = {

    # The counter for this execution is also provided so that clients can
    # display it, since IPython automatically creates variables called _N
    # (for prompt N).
    'execution_count' : int,

    # data and metadata are identical to a display_data message.
    # the object being displayed is that passed to the display hook,
    # i.e. the *result* of the execution.
    'data' : dict,
    'metadata' : dict,
}

Execution errors

When an error occurs during code execution

Message type: error:

content = {
   # Similar content to the execute_reply messages for the 'error' case,
   # except the 'status' and 'execution_count' fields are omitted.
}

在 5.0 版更改: pyerr renamed to error

Kernel status

This message type is used by frontends to monitor the status of the kernel.

Message type: status:

content = {
    # When the kernel starts to handle a message, it will enter the 'busy'
    # state and when it finishes, it will enter the 'idle' state.
    # The kernel will publish state 'starting' exactly once at process startup.
    execution_state : ('busy', 'idle', 'starting')
}

When a kernel receives a request and begins processing it, the kernel shall immediately publish a status message with execution_state: 'busy'. When that kernel has completed processing the request and has finished publishing associated IOPub messages, if any, it shall publish a status message with execution_state: 'idle'. Thus, the outputs associated with a given execution shall generally arrive between the busy and idle status messages associated with a given request.

备注

A caveat for asynchronous output

Asynchronous output (e.g. from background threads) may be produced after the kernel has sent the idle status message that signals the completion of the request. The handling of these out-of-order output messages is currently undefined in this specification, but the Jupyter Notebook continues to handle IOPub messages associated with a given request after the idle message has arrived, as long as the output area corresponding to that request is still active.

在 5.0 版更改: Busy and idle messages should be sent before/after handling every request, not just execution.

Clear output

This message type is used to clear the output that is visible on the frontend.

Message type: clear_output:

content = {

    # Wait to clear the output until new output is available.  Clears the
    # existing output immediately before the new output is displayed.
    # Useful for creating simple animations with minimal flickering.
    'wait' : bool,
}

在 4.1 版更改: stdout, stderr, and display boolean keys for selective clearing are removed, and wait is added. The selective clearing keys are ignored in v4 and the default behavior remains the same, so v4 clear_output messages will be safely handled by a v4.1 frontend.

Debug event

This message type is used by debugging kernels to send debugging events to the frontend.

Message type: debug_event:

content = {}

The content dict follows the specification of the Event message from the Debug Adapter Protocol (DAP).

5.5 新版功能.

Messages on the stdin (ROUTER/DEALER) channel

With the stdin ROUTER/DEALER socket, the request/reply pattern goes in the opposite direction of most kernel communication. With the stdin socket, the kernel makes the request, and the single frontend provides the response. This pattern allows code to prompt the user for a line of input, which would normally be read from stdin in a terminal.

Many programming languages provide a function which displays a prompt, blocks until the user presses return, and returns the text they typed before pressing return. In Python 3, this is the input() function; in R it is called readline(). If the execute_request message has allow_stdin==True, kernels may implement these functions so that they send an input_request message and wait for a corresponding input_reply. The frontend is responsible for displaying the prompt and getting the user’s input.

If allow_stdin is False, the kernel must not send stdin_request. The kernel may decide what to do instead, but it’s most likely that calls to the ‘prompt for input’ function should fail immediately in this case.

Message type: input_request:

content = {
    # the text to show at the prompt
    'prompt' : str,
    # Is the request for a password?
    # If so, the frontend shouldn't echo input.
    'password' : bool
}

Message type: input_reply:

content = { 'value' : str }

When password is True, the frontend should not show the input as it is entered. Different frontends may obscure it in different ways; e.g. showing each character entered as the same neutral symbol, or not showing anything at all as the user types.

在 5.0 版更改: password key added.

备注

The stdin socket of the client is required to have the same zmq IDENTITY as the client’s shell socket. Because of this, the input_request must be sent with the same IDENTITY routing prefix as the execute_reply in order for the frontend to receive the message.

备注

This pattern of requesting user input is quite different from how stdin works at a lower level. The Jupyter protocol does not support everything code running in a terminal can do with stdin, but we believe that this enables the most common use cases.

Heartbeat for kernels

Clients send ping messages on a REQ socket, which are echoed right back from the Kernel’s REP socket. These are simple bytestrings, not full JSON messages described above.

Custom Messages

4.1 新版功能.

Message spec 4.1 (IPython 2.0) added a messaging system for developers to add their own objects with Frontend and Kernel-side components, and allow them to communicate with each other. To do this, IPython adds a notion of a Comm, which exists on both sides, and can communicate in either direction.

These messages are fully symmetrical - both the Kernel and the Frontend can send each message, and no messages expect a reply. The Kernel listens for these messages on the Shell channel, and the Frontend listens for them on the IOPub channel.

Opening a Comm

Opening a Comm produces a comm_open message, to be sent to the other side:

{
  'comm_id' : 'u-u-i-d',
  'target_name' : 'my_comm',
  'data' : {}
}

Every Comm has an ID and a target name. The code handling the message on the receiving side is responsible for maintaining a mapping of target_name keys to constructors. After a comm_open message has been sent, there should be a corresponding Comm instance on both sides. The data key is always a dict and can be any extra JSON information used in initialization of the comm.

If the target_name key is not found on the receiving side, then it should immediately reply with a comm_close message to avoid an inconsistent state.

Comm Messages

Comm messages are one-way communications to update comm state, used for synchronizing widget state, or simply requesting actions of a comm’s counterpart.

Essentially, each comm pair defines their own message specification implemented inside the data dict.

There are no expected replies (of course, one side can send another comm_msg in reply).

Message type: comm_msg:

{
  'comm_id' : 'u-u-i-d',
  'data' : {}
}

Tearing Down Comms

Since comms live on both sides, when a comm is destroyed the other side must be notified. This is done with a comm_close message.

Message type: comm_close:

{
  'comm_id' : 'u-u-i-d',
  'data' : {}
}

Output Side Effects

Since comm messages can execute arbitrary user code, handlers should set the parent header and publish status busy / idle, just like an execute request.

Notes

cursor_pos and unicode offsets

Many frontends, especially those implemented in javascript, reported cursor_pos as the interpreter’s string index, which is not the same as the unicode character offset if the interpreter uses UTF-16 (e.g. javascript or Python 2 on macOS), which stores “astral-plane” characters such as 𝐚 (U+1D41A) as surrogate pairs, taking up two indices instead of one, causing a unicode offset drift of one per astral-plane character. Not all frontends have this behavior, however, and after JSON serialization information about which encoding was used when calculating the offset is lost, so assuming cursor_pos is calculated in UTF-16 could result in a similarly incorrect offset for frontends that did the right thing.

For this reason, in protocol versions prior to 5.2, cursor_pos is officially ambiguous in the presence of astral plane unicode characters. Frontends claiming to implement protocol 5.2 MUST identify cursor_pos as the encoding-independent unicode character offset. Kernels may choose to expect the UTF-16 offset from requests implementing protocol 5.1 and earlier, in order to behave correctly with the most popular frontends. But they should know that doing so introduces the inverse bug for the frontends that do not have this bug.

As an example, use a python3 kernel and evaluate 𨭎𨭎𨭎𨭎𨭎 = 10. Then type 𨭎𨭎 followed by the tab key and see if it properly completes.

Known affected frontends (as of 2017-06):

  • Jupyter Notebook < 5.1

  • JupyterLab < 0.24

  • nteract < 0.2.0

  • Jupyter Console and QtConsole with Python 2 on macOS and Windows

Known not affected frontends:

  • QtConsole, Jupyter Console with Python 3 or Python 2 on Linux, CoCalc