shard_index

paddle. shard_index ( input, index_num, nshards, shard_id, ignore_value=- 1 ) [source]

Reset the values of input according to the shard it beloning to. Every value in input must be a non-negative integer, and the parameter index_num represents the integer above the maximum value of input. Thus, all values in input must be in the range [0, index_num) and each value can be regarded as the offset to the beginning of the range. The range is further split into multiple shards. Specifically, we first compute the shard_size according to the following formula, which represents the number of integers each shard can hold. So for the i’th shard, it can hold values in the range [i*shard_size, (i+1)*shard_size).

shard_size = (index_num + nshards - 1) // nshards

For each value v in input, we reset it to a new value according to the following formula:

v = v - shard_id * shard_size if shard_id * shard_size <= v < (shard_id+1) * shard_size else ignore_value

That is, the value v is set to the new offset within the range represented by the shard shard_id if it in the range. Otherwise, we reset it to be ignore_value.

Parameters
  • input (Tensor) – Input tensor with data type int64 or int32. It’s last dimension must be 1.

  • index_num (int) – An integer represents the integer above the maximum value of input.

  • nshards (int) – The number of shards.

  • shard_id (int) – The index of the current shard.

  • ignore_value (int, optional) – An integer value out of sharded index range. The default value is -1.

Returns

Tensor.

Examples

>>> import paddle
>>> label = paddle.to_tensor([[16], [1]], "int64")
>>> shard_label = paddle.shard_index(input=label,
...                                  index_num=20,
...                                  nshards=2,
...                                  shard_id=0)
>>> print(shard_label.numpy())
[[-1]
 [ 1]]