HDFSClient

class paddle.distributed.fleet.utils. HDFSClient ( hadoop_home, configs, time_out=300000, sleep_inter=1000 ) [source]

A tool of HDFS.

Parameters
  • hadoop_home (str) – Hadoop home.

  • configs (dict) – Hadoop config. It is a dictionary and needs to contain the keys: “fs.default.name” and “hadoop.job.ugi”.

Examples

from paddle.distributed.fleet.utils import HDFSClient
hadoop_home = "/home/client/hadoop-client/hadoop/"

configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.ls_dir("hdfs:/test_hdfs_client")
list_dirs ( fs_path )

list_dirs

Only list directorys under fs_path .

Parameters

fs_path (str) – The HDFS file path.

Returns

A list of all its subdirectories, e.g. [subdirname1, subdirname1, …].

Return type

List

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
subdirs = client.list_dirs("hdfs:/test_hdfs_client")
ls_dir ( fs_path )

ls_dir

List directorys and files under fs_path .

Parameters

fs_path (str) – The HDFS file path.

Returns

Return a 2-tuple, the first element is the list of all its subdirectories, and the second one is the list of all its subfiles, e.g. ([subdirname1, subdirname1, …], [filename1, filename2, …]).

Return type

Tuple

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
subdirs, files = client.ls_dir("hdfs:/test_hdfs_client")
is_dir ( fs_path )

is_dir

Whether the remote HDFS path is a directory.

Parameters

fs_path (str) – The HDFS file path.

Returns

Return true if the path exists and it’s a directory, otherwise return false.

Return type

Bool

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
ret = client.is_file("hdfs:/test_hdfs_client")
is_file ( fs_path )

is_file

Whether the remote HDFS path is a file.

Parameters

fs_path (str) – The HDFS file path.

Returns

Return true if the path exists and it’s a file, otherwise return false.

Return type

Bool

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
ret = client.is_file("hdfs:/test_hdfs_client")
is_exist ( fs_path )

is_exist

Whether the remote HDFS path exists.

Parameters

fs_path (str) – The hdfs file path.

Returns

Whether it’s is file or directory, return true if the path exists, otherwise return false.

Return type

Bool

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
ret = client.is_exist("hdfs:/test_hdfs_client")
upload ( local_path, fs_path )

upload

Upload the local path to remote HDFS.

Parameters
  • local_path (str) – The local path.

  • fs_path (str) – The HDFS path.

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.upload("test_hdfs_client", "hdfs:/test_hdfs_client")
download ( fs_path, local_path )

download

Download remote HDFS path to the local.

Parameters
  • fs_path (str) – The HDFS path.

  • local_path (str) – The local path.

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.download("hdfs:/test_hdfs_client", "./")
mkdirs ( fs_path )

mkdirs

Create a remote HDFS directory.

Parameters

fs_path (str) – The HDFS directory path.

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.mkdirs("hdfs:/test_hdfs_client")
mv ( fs_src_path, fs_dst_path, overwrite=False, test_exists=True )

mv

Move a remote HDFS file or directory from fs_src_path to fs_dst_path .

Parameters
  • fs_src_path (str) – Name of the file or directory, that’s needed to be moved.

  • fs_dst_path (str) – Name of the file or directory to which to move to.

  • overwrite (bool) – Whether to re-write fs_dst_path if that exists. Default is False.

  • test_exists (bool) – Check the existence of fs_src_path and fs_dst_path . When test_exists is set true, if fs_src_path doesn’t exist or fs_dst_path exists, program will throw an Excetption.

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.mv("hdfs:/test_hdfs_client", "hdfs:/test_hdfs_client2")
delete ( fs_path )

delete

Delete a remote HDFS path, whether it’s a file or directory.

Parameters

fs_path (str) – The HDFS file path.

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.delete("hdfs:/test_hdfs_client")
touch ( fs_path, exist_ok=True )

touch

Create a remote HDFS file.

Parameters
  • fs_path (str) – The HDFS file path.

  • exist_ok (bool) – When fs_path exists, if exist_ok is set false,

  • will throw an Exception. Default is true. (program) –

Examples

from paddle.distributed.fleet.utils import HDFSClient

hadoop_home = "/home/client/hadoop-client/hadoop/"
configs = {
    "fs.default.name": "hdfs://xxx.hadoop.com:54310",
    "hadoop.job.ugi": "hello,hello123"
}

client = HDFSClient(hadoop_home, configs)
client.touch("hdfs:/test_hdfs_client")