Commonly Used HDFS Commands : Learn Data Science

Hadoop Distributed File System or HDFS is the underlying storage for all Hadoop applications. HDFS can be manipulated using APIs such as Java API or REST API but using HDFS shell is the most commonly used option. Below is a list of ten commonly used HDFS commands.

1. Invoking the file system: HDFS Shell supports various file systems and not just HDFS. This means you can invoke file systems including Local FS, HFTP FS, S3 FS, and others. Invoking generic file system (any file system listed above):

hadoop fs

2. Listing Directory Contents

hdfs dfs -ls /user/hadoop/file1

3. Creating Directories

hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2

4. Copying Local Files to HDFS

hdfs dfs -put localfile /user/hadoop/hadoopfile

5. Copying From HDFS to Local FS

hdfs dfs -get /user/hadoop/file1 localfile

6. Renaming or Moving Files

hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2

7. Copying Files Within HDFS

hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2

Tip: There is distcp command for inter-hdfs transfers.

8. Deleting Files From HDFS

hdfs dfs -rm /user/hadoop/file1

Tip: -skipTrash option can also be used. Tip: -rmr command can be used for recursive deletion

9. Display file contents

hdfs dfs -cat /user/hadoop/file1

Tip: We can pipe cat output to native head as there is no head command here.

10. Empty Trash

hdfs dfs -expunge

Please feel free to ask questions in the comments!