Conll05st

class paddle.text.datasets. Conll05st ( data_file=None, word_dict_file=None, verb_dict_file=None, target_dict_file=None, emb_file=None, download=True ) [source]

Implementation of Conll05st test dataset.

Note: only support download test dataset automatically for that

only test dataset of Conll05st is public.

Parameters
  • data_file (str) – path to data tar file, can be set None if download is True. Default None

  • word_dict_file (str) – path to word dictionary file, can be set None if download is True. Default None

  • verb_dict_file (str) – path to verb dictionary file, can be set None if download is True. Default None

  • target_dict_file (str) – path to target dictionary file, can be set None if download is True. Default None

  • emb_file (str) – path to embedding dictionary file, only used for get_embedding can be set None if download is True. Default None

  • download (bool) – whether to download dataset automatically if data_file word_dict_file verb_dict_file target_dict_file is not set. Default True

Returns

instance of conll05st dataset

Return type

Dataset

Examples

import paddle
from paddle.text.datasets import Conll05st

class SimpleNet(paddle.nn.Layer):
    def __init__(self):
        super(SimpleNet, self).__init__()

    def forward(self, pred_idx, mark, label):
        return paddle.sum(pred_idx), paddle.sum(mark), paddle.sum(label)

paddle.disable_static()

conll05st = Conll05st()

for i in range(10):
    pred_idx, mark, label= conll05st[i][-3:]
    pred_idx = paddle.to_tensor(pred_idx)
    mark = paddle.to_tensor(mark)
    label = paddle.to_tensor(label)

    model = SimpleNet()
    pred_idx, mark, label= model(pred_idx, mark, label)
    print(pred_idx.numpy(), mark.numpy(), label.numpy())
get_dict ( )

Get the word, verb and label dictionary of Wikipedia corpus.

Examples


from paddle.text.datasets import Conll05st conll05st = Conll05st() word_dict, predicate_dict, label_dict = conll05st.get_dict()

get_embedding ( )

Get the embedding dictionary file.

Examples


from paddle.text.datasets import Conll05st conll05st = Conll05st() emb_file = conll05st.get_embedding()