Embedding

class paddle.fluid.dygraph.Embedding(size, is_sparse=False, is_distributed=False, padding_idx=None, param_attr=None, dtype='float32')[source]

Embedding Layer

This interface is used to construct a callable object of the Embedding class. For specific usage, refer to code examples. It implements the function of the Embedding Layer. This layer is used to lookup embeddings vector of ids provided by input . It automatically constructs a 2D embedding matrix based on the input size (vocab_size, emb_size) and dtype .

The shape of output Tensor is generated by appending an emb_size dimension to the last dimension of the input Tensor shape.

Note: The id in input must satisfy \(0 =< id < size[0]\) , otherwise the program will throw an exception and exit.

Case 1:

input is a Tensor. padding_idx = -1
    input.data = [[1, 3], [2, 4], [4, 127]
    input.shape = [3, 2]
Given size = [128, 16]
output is a Tensor:
    out.shape = [3, 2, 16]
    out.data = [[[0.129435295, 0.244512452, ..., 0.436322452],
                [0.345421456, 0.524563927, ..., 0.144534654]],

                [[0.345249859, 0.124939536, ..., 0.194353745],
                [0.945345345, 0.435394634, ..., 0.435345365]],

                [[0.945345345, 0.435394634, ..., 0.435345365],
                [0.0,         0.0,         ..., 0.0        ]]]  # padding data
The input padding_idx is less than 0, it is automatically converted to padding_idx = -1 + 128 = 127
It will pad all-zero data when ids is 127.
Parameters
  • size (tuple|list) – The shape of the look up table parameter. It should have two elements which indicate the size of the dictionary of embeddings and the size of each embedding vector respectively.

  • is_sparse (bool) – The flag indicating whether to use sparse update. This parameter only affects the performance of the backwards gradient update. It is recommended to set True because sparse update is faster. But some optimizer does not support sparse update, such as AdadeltaOptimizer , AdamaxOptimizer , DecayedAdagradOptimizer , FtrlOptimizer , LambOptimizer and LarsMomentumOptimizer . In these case, is_sparse must be False. Default: False.

  • is_distributed (bool) – Whether to store the embedding matrix in a distributed manner. Only used in multi-machine distributed CPU training. Default: False.

  • padding_idx (int|long|None) – padding_idx needs to be in the interval [-vocab_size, vocab_size). If \(padding\_idx < 0\), the \(padding\_idx\) will automatically be converted to \(vocab\_size + padding\_idx\) . It will output all-zero padding data whenever lookup encounters \(padding\_idx\) in id. And the padding data will not be updated while training. If set None, it makes no effect to output. Default: None.

  • param_attr (ParamAttr) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr . In addition, user-defined or pre-trained word vectors can be loaded with the param_attr parameter. The local word vector needs to be transformed into numpy format, and the shape of local word vector should be consistent with size . Then NumpyArrayInitializer is used to load custom or pre-trained word vectors. See code example 2 for details.

  • dtype (np.dtype|core.VarDesc.VarType|str) – It refers to the data type of output Tensor. It must be “float32” or “float64”. Default: “float32”.

Attribute:

weight (Parameter): the learnable weights of this layer.

Returns

Embedding Tensor or LoDTensor mapped by input. The data type is the same as dtype .

Return type

Variable

Examples

import paddle.fluid as fluid
import paddle.fluid.dygraph.base as base
import numpy as np

# example 1
inp_word = np.array([[2, 3, 5], [4, 2, 1]]).astype('int64')
inp_word.shape  # [2, 3]
dict_size = 20
with fluid.dygraph.guard():
    emb = fluid.dygraph.Embedding(
        size=[dict_size, 32],
        param_attr='emb.w',
        is_sparse=False)
    static_rlt3 = emb(base.to_variable(inp_word))
    static_rlt3.shape  # [2, 3, 32]

# example 2: load custom or pre-trained word vectors
weight_data = np.random.random(size=(128, 100))  # word vectors with numpy format
w_param_attrs = fluid.ParamAttr(
    name="emb_weight",
    learning_rate=0.5,
    initializer=fluid.initializer.NumpyArrayInitializer(weight_data),
    trainable=True)
with fluid.dygraph.guard():
    emb = fluid.dygraph.Embedding(
        size=[128, 100],
        param_attr= w_param_attrs,
        is_sparse=False)
    static_rlt3 = emb(base.to_variable(inp_word))
forward(input)

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments