Setup Trash in hadoop
To set up trash in hadoop you just need to do is, set fs.trash.interval and fs.trash.checkpoint.interval in /usr/local/hadoop/conf/core-site.xml file
- First upload file to your HDFS (ex. hadoop fs -put shakespeare shakee)
- Delete file from HDFS (ex. hadoop fs -rmr shakee )
- Reload HDFS page in browser then you will see .Trash directory
Below is the example configuration,
<property>
<name>fs.trash.interval</name>
<value>3</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>1</value>
</property>
- Here fs.trash.interval value is 3 which means deleted files which are in .Trash folder will get removed permanently in 3 minutes.
- fs.trash.checkpoint.interval value is 1, which means for every one minute check is performed and deletes all files that are more than 3 mins old.
- fs.trash.checkpoint.interval should be equal to or less than fs.trash.interval
When trash is enabled, each user has her own trash directory called .Trash in her home
directory. File recovery is simple: you look for the file in a subdirectory of .Trash and
move it out of the trash subtree.
HDFS will automatically delete files in trash folders, but other filesystems will not, so
you have to arrange for this to be done periodically. You can expunge the trash, which
will delete files that have been in the trash longer than their minimum period, using
the filesystem shell:
% hadoop fs -expunge
Comments
Post a Comment