HDFSClient¶
- class paddle.distributed.fleet.utils. HDFSClient ( hadoop_home, configs, time_out=300000, sleep_inter=1000 ) [source]
- 
         A tool of HDFS. - Parameters
- 
           - hadoop_home (str) – Hadoop home. 
- configs (dict) – Hadoop config. It is a dictionary and needs to contain the keys: “fs.default.name” and “hadoop.job.ugi”. 
 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.ls_dir("hdfs:/test_hdfs_client")- 
            
           list_dirs
           (
           fs_path
           )
           list_dirs¶
- 
           Only list directorys under fs_path . - Parameters
- 
             fs_path (str) – The HDFS file path. 
- Returns
- 
             A list of all its subdirectories, e.g. [subdirname1, subdirname1, …]. 
- Return type
- 
             List 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) subdirs = client.list_dirs("hdfs:/test_hdfs_client")
 - 
            
           ls_dir
           (
           fs_path
           )
           ls_dir¶
- 
           List directorys and files under fs_path . - Parameters
- 
             fs_path (str) – The HDFS file path. 
- Returns
- 
             Return a 2-tuple, the first element is the list of all its subdirectories, and the second one is the list of all its subfiles, e.g. ([subdirname1, subdirname1, …], [filename1, filename2, …]). 
- Return type
- 
             Tuple 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) subdirs, files = client.ls_dir("hdfs:/test_hdfs_client")
 - 
            
           is_dir
           (
           fs_path
           )
           is_dir¶
- 
           Whether the remote HDFS path is a directory. - Parameters
- 
             fs_path (str) – The HDFS file path. 
- Returns
- 
             Return true if the path exists and it’s a directory, otherwise return false. 
- Return type
- 
             Bool 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) ret = client.is_file("hdfs:/test_hdfs_client")
 - 
            
           is_file
           (
           fs_path
           )
           is_file¶
- 
           Whether the remote HDFS path is a file. - Parameters
- 
             fs_path (str) – The HDFS file path. 
- Returns
- 
             Return true if the path exists and it’s a file, otherwise return false. 
- Return type
- 
             Bool 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) ret = client.is_file("hdfs:/test_hdfs_client")
 - 
            
           is_exist
           (
           fs_path
           )
           is_exist¶
- 
           Whether the remote HDFS path exists. - Parameters
- 
             fs_path (str) – The hdfs file path. 
- Returns
- 
             Whether it’s is file or directory, return true if the path exists, otherwise return false. 
- Return type
- 
             Bool 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) ret = client.is_exist("hdfs:/test_hdfs_client")
 - 
            
           upload_dir
           (
           local_dir, 
           dest_dir, 
           overwrite=False
           )
           upload_dir¶
- 
           upload dir to hdfs :param local_dir: local dir :type local_dir: str :param dest_dir: hdfs dest dir :type dest_dir: str :param overwrite: is overwrite :type overwrite: bool - Returns
- 
             return code 
 
 - 
            
           upload
           (
           local_path, 
           fs_path, 
           multi_processes=5, 
           overwrite=False
           )
           upload¶
- 
           Upload the local path to remote HDFS. - Parameters
- 
             - local_path (str) – The local path. 
- fs_path (str) – The HDFS path. 
- multi_processes (int|1) – the upload data process at the same time, default=5 
- overwrite (bool|False) – will overwrite file on HDFS or not 
 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.upload("test_hdfs_client", "hdfs:/test_hdfs_client")
 - 
            
           download
           (
           fs_path, 
           local_path, 
           multi_processes=5, 
           overwrite=False
           )
           download¶
- 
           Download remote HDFS path to the local. - Parameters
- 
             - fs_path (str) – The HDFS path. 
- local_path (str) – The local path. 
- multi_processes (int|1) – the download data process at the same time, default=1 
- overwrite (bool) – is overwrite 
 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.download("hdfs:/test_hdfs_client", "./")
 - 
            
           mkdirs
           (
           fs_path
           )
           mkdirs¶
- 
           Create a remote HDFS directory. - Parameters
- 
             fs_path (str) – The HDFS directory path. 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.mkdirs("hdfs:/test_hdfs_client")
 - 
            
           mv
           (
           fs_src_path, 
           fs_dst_path, 
           overwrite=False, 
           test_exists=True
           )
           mv¶
- 
           Move a remote HDFS file or directory from fs_src_path to fs_dst_path . - Parameters
- 
             - fs_src_path (str) – Name of the file or directory, that’s needed to be moved. 
- fs_dst_path (str) – Name of the file or directory to which to move to. 
- overwrite (bool) – Whether to re-write fs_dst_path if that exists. Default is False. 
- test_exists (bool) – Check the existence of fs_src_path and fs_dst_path . When test_exists is set true, if fs_src_path doesn’t exist or fs_dst_path exists, program will throw an Excetption. 
 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.mv("hdfs:/test_hdfs_client", "hdfs:/test_hdfs_client2")
 - 
            
           delete
           (
           fs_path
           )
           delete¶
- 
           Delete a remote HDFS path, whether it’s a file or directory. - Parameters
- 
             fs_path (str) – The HDFS file path. 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.delete("hdfs:/test_hdfs_client")
 - 
            
           touch
           (
           fs_path, 
           exist_ok=True
           )
           touch¶
- 
           Create a remote HDFS file. - Parameters
- 
             - fs_path (str) – The HDFS file path. 
- exist_ok (bool) – When fs_path exists, if exist_ok is set false, 
- true. (program will throw an Exception. Default is) – 
 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.touch("hdfs:/test_hdfs_client")
 - 
            
           cat
           (
           fs_path=None
           )
           cat¶
- 
           Cat a remote HDFS file. - Parameters
- 
             fs_path (str) – The HDFS file path. 
- Returns
- 
             file content 
 Examples from paddle.distributed.fleet.utils import HDFSClient hadoop_home = "/home/client/hadoop-client/hadoop/" configs = { "fs.default.name": "hdfs://xxx.hadoop.com:54310", "hadoop.job.ugi": "hello,hello123" } client = HDFSClient(hadoop_home, configs) client.cat("hdfs:/test_hdfs_client")
 - 
            
           list_files_info
           (
           path_list
           )
           list_files_info¶
- 
           list_files return file path and size :param path_list: file list :type path_list: list - Returns
- 
             file list with file path and size 
- Return type
- 
             fileist(list) 
 
 
