box_coder box_coder ( prior_box, prior_box_var, target_box, code_type='encode_center_size', box_normalized=True, axis=0, name=None ) [source]

Encode/Decode the target bounding box with the priorbox information.

The Encoding schema described below:

\[ \begin{align}\begin{aligned}ox &= (tx - px) / pw / pxv\\oy &= (ty - py) / ph / pyv\\ow &= log(abs(tw / pw)) / pwv\\oh &= log(abs(th / ph)) / phv\end{aligned}\end{align} \]

The Decoding schema described below:

\[ \begin{align}\begin{aligned}ox &= (pw * pxv * tx * + px) - tw / 2\\oy &= (ph * pyv * ty * + py) - th / 2\\ow &= exp(pwv * tw) * pw + tw / 2\\oh &= exp(phv * th) * ph + th / 2\end{aligned}\end{align} \]

where tx, ty, tw, th denote the target box’s center coordinates, width and height respectively. Similarly, px, py, pw, ph denote the priorbox’s (anchor) center coordinates, width and height. pxv, pyv, pwv, phv denote the variance of the priorbox and ox, oy, ow, oh denote the encoded/decoded coordinates, width and height. During Box Decoding, two modes for broadcast are supported. Say target box has shape [N, M, 4], and the shape of prior box can be [N, 4] or [M, 4]. Then prior box will broadcast to target box along the assigned axis.

  • prior_box (Tensor) – Box list prior_box is a 2-D Tensor with shape [M, 4] holds M boxes and data type is float32 or float64. Each box is represented as [xmin, ymin, xmax, ymax], [xmin, ymin] is the left top coordinate of the anchor box, if the input is image feature map, they are close to the origin of the coordinate system. [xmax, ymax] is the right bottom coordinate of the anchor box.

  • prior_box_var (List|Tensor|None) – prior_box_var supports three types of input. One is Tensor with shape [M, 4] which holds M group and data type is float32 or float64. The second is list consist of 4 elements shared by all boxes and data type is float32 or float64. Other is None and not involved in calculation.

  • target_box (Tensor) – This input can be a 2-D LoDTensor with shape [N, 4] when code_type is ‘encode_center_size’. This input also can be a 3-D Tensor with shape [N, M, 4] when code_type is ‘decode_center_size’. Each box is represented as [xmin, ymin, xmax, ymax]. The data type is float32 or float64.

  • code_type (str, optional) – The code type used with the target box. It can be encode_center_size or decode_center_size. encode_center_size by default.

  • box_normalized (bool, optional) – Whether treat the priorbox as a normalized box. Set true by default.

  • axis (int, optional) – Which axis in PriorBox to broadcast for box decode, for example, if axis is 0 and TargetBox has shape [N, M, 4] and PriorBox has shape [M, 4], then PriorBox will broadcast to [N, M, 4] for decoding. It is only valid when code type is decode_center_size. Set 0 by default.

  • name (str, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.


output boxes, when code_type is ‘encode_center_size’, the

output tensor of box_coder_op with shape [N, M, 4] representing the result of N target boxes encoded with M Prior boxes and variances. When code_type is ‘decode_center_size’, N represents the batch size and M represents the number of decoded boxes.

Return type



import paddle

# For encode
prior_box_encode = paddle.rand((80, 4), dtype=paddle.float32)
prior_box_var_encode = paddle.rand((80, 4), dtype=paddle.float32)
target_box_encode = paddle.rand((20, 4), dtype=paddle.float32)
output_encode =

# For decode
prior_box_decode = paddle.rand((80, 4), dtype=paddle.float32)
prior_box_var_decode = paddle.rand((80, 4), dtype=paddle.float32)
target_box_decode = paddle.rand((20, 80, 4), dtype=paddle.float32)
output_decode =