Thursday, June 18, 2009

Using HDFS in java (0.20.0)

Below is a code sample of how to read from and write to HDFS in java.

1. Creating a configuration object: To be able to read from or write to HDFS, you need to create a Configuration object and pass configuration parameter to it using hadoop configuration files.
 
    // Conf object will read the HDFS configuration parameters from these XML
    // files. You may specify the parameters for your own if you want.


    Configuration conf = new Configuration();
    conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-site.xml"));
    conf.addResource(new Path("/opt/hadoop-0.20.0/conf/hdfs-site.xml"));

    If you do not assign the configurations to conf object (using hadoop xml file) your HDFS operation will be performed on the local file system and not on the HDFS.

2. Adding file to HDFS:
Create a FileSystem object and use a file stream to add a file.

    FileSystem fileSystem = FileSystem.get(conf);
   
    // Check if the file already exists

    Path path = new Path("/path/to/file.ext");
    if (fileSystem.exists(path)) {
        System.out.println("File " + dest + " already exists");
        return;
    }

    // Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));


    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();

3. Reading file from HDFS: Create a file stream object to a file in HDFS and read it.

    FileSystem fileSystem = FileSystem.get(conf);

    Path path = new Path("/path/to/file.ext");

    if (!fileSystem.exists(path)) {
        System.out.println("File does not exists");
        return;
    }

    FSDataInputStream in = fileSystem.open(path);


    String filename = file.substring(file.lastIndexOf('/') + 1,
        file.length());


    OutputStream out = new BufferedOutputStream(new FileOutputStream(
        new File(filename)));


    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    in.close();
    out.close();
    fileSystem.close();

3. Deleting file from HDFS: Create a file stream object to a file in HDFS and delete it.

    FileSystem fileSystem = FileSystem.get(conf);

    Path path = new Path("/path/to/file.ext");
    if (!fileSystem.exists(path)) {
        System.out.println("File does not exists");
        return;
    }

    // Delete file
    fileSystem.delete(new Path(file), true);


    fileSystem.close();

3. Create dir in HDFS: Create a file stream object to a file in HDFS and read it.

    FileSystem fileSystem = FileSystem.get(conf);

    Path path = new Path(dir);
    if (fileSystem.exists(path)) {
        System.out.println("Dir " + dir + " already not exists");
        return;
    }

    // Create directories
    fileSystem.mkdirs(path);


    fileSystem.close();

Code:

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSClient {
    public HDFSClient() {

    }

    public void addFile(String source, String dest) throws IOException {
        Configuration conf = new Configuration();

        // Conf object will read the HDFS configuration parameters from these
        // XML files.
        conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-site.xml"));
        conf.addResource(new Path("/opt/hadoop-0.20.0/conf/hdfs-site.xml"));

        FileSystem fileSystem = FileSystem.get(conf);

        // Get the filename out of the file path
        String filename = source.substring(source.lastIndexOf('/') + 1,
            source.length());


        // Create the destination path including the filename.
        if (dest.charAt(dest.length() - 1) != '/') {
            dest = dest + "/" + filename;
        } else {
            dest = dest + filename;
        }

        // System.out.println("Adding file to " + destination);

        // Check if the file already exists
        Path path = new Path(dest);
        if (fileSystem.exists(path)) {
            System.out.println("File " + dest + " already exists");
            return;
        }

        // Create a new file and write data to it.
        FSDataOutputStream out = fileSystem.create(path);
        InputStream in = new BufferedInputStream(new FileInputStream(
            new File(source)));


        byte[] b = new byte[1024];
        int numBytes = 0;
        while ((numBytes = in.read(b)) > 0) {
            out.write(b, 0, numBytes);
        }

        // Close all the file descripters
        in.close();
        out.close();
        fileSystem.close();
    }

    public void readFile(String file) throws IOException {
        Configuration conf = new Configuration();
        conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-site.xml"));

        FileSystem fileSystem = FileSystem.get(conf);

        Path path = new Path(file);
        if (!fileSystem.exists(path)) {
            System.out.println("File " + file + " does not exists");
            return;
        }

        FSDataInputStream in = fileSystem.open(path);

        String filename = file.substring(file.lastIndexOf('/') + 1,
            file.length());


        OutputStream out = new BufferedOutputStream(new FileOutputStream(
            new File(filename)));


        byte[] b = new byte[1024];
        int numBytes = 0;
        while ((numBytes = in.read(b)) > 0) {
            out.write(b, 0, numBytes);
        }

        in.close();
        out.close();
        fileSystem.close();
    }

    public void deleteFile(String file) throws IOException {
        Configuration conf = new Configuration();
        conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-site.xml"));

        FileSystem fileSystem = FileSystem.get(conf);

        Path path = new Path(file);
        if (!fileSystem.exists(path)) {
            System.out.println("File " + file + " does not exists");
            return;
        }

        fileSystem.delete(new Path(file), true);

        fileSystem.close();
    }

    public void mkdir(String dir) throws IOException {
        Configuration conf = new Configuration();
        conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-site.xml"));

        FileSystem fileSystem = FileSystem.get(conf);

        Path path = new Path(dir);
        if (fileSystem.exists(path)) {
            System.out.println("Dir " + dir + " already not exists");
            return;
        }

        fileSystem.mkdirs(path);

        fileSystem.close();
    }

    public static void main(String[] args) throws IOException {

        if (args.length < 1) {
            System.out.println("Usage: hdfsclient add/read/delete/mkdir" +
                " [<local_path> <hdfs_path>]");

            System.exit(1);
        }

        HDFSClient client = new HDFSClient();
        if (args[0].equals("add")) {
            if (args.length < 3) {
                System.out.println("Usage: hdfsclient add <local_path> " +
                "<hdfs_path>");

                System.exit(1);
            }

            client.addFile(args[1], args[2]);
        } else if (args[0].equals("read")) {
            if (args.length < 2) {
                System.out.println("Usage: hdfsclient read <hdfs_path>");
                System.exit(1);
            }

            client.readFile(args[1]);
        } else if (args[0].equals("delete")) {
            if (args.length < 2) {
                System.out.println("Usage: hdfsclient delete <hdfs_path>");
                System.exit(1);
            }

            client.deleteFile(args[1]);
        } else if (args[0].equals("mkdir")) {
            if (args.length < 2) {
                System.out.println("Usage: hdfsclient mkdir <hdfs_path>");
                System.exit(1);
            }

            client.mkdir(args[1]);
        } else {  
            System.out.println("Usage: hdfsclient add/read/delete/mkdir" +
                " [<local_path> <hdfs_path>]");
            System.exit(1);

        }

        System.out.println("Done!");
    }
}

25 comments :

Anonymous said...

Very informative blog.

It would be useful to the configure the SyntaxHighlighter in your blogspot and then just use the pre tags to highlight the source code / bash scripts.

Would be very useful to read through realms of code / copy paste and try the same.

SATHISH said...

Hi,

Thanks for ur post .. How to listall the files available within a directory in hdfs using java ..


Regards,
Sathish

sadak rathod said...

Hello Rajeev,

Its a great thing that you have posted,just 1 doubt should be use only this java code to do file uploads or should we use map reduce as well,is map reduce needed to distribute the data in to blocks???

tiru said...

Hi,
I am trying to write and read a file from hdfs and before that i am trying to create a dir using java code.but it is not working and i am not able to create dir inside hdfs.can u please give some instuctions on that .

Arpith said...

Thanks,this helped lot for my project on query evaluation using MapReduce.

Anonymous said...

dude, lifesaver!! . thanks a lot. i wasn't putting in the core-site/hdfs-site xml files in the config and had been getting 'file doesn't exist' forever
- karthik -

DJ said...

Very useful.. thanks :)

akanksha said...

u r blog turned out to very useful for me...
can u also tell how to compile and run this code because i have tried out everything like including hadoop jar during compilation still i m getting error NoClassDefFoundError please help

Laxmikant said...

Hi Rajiv,
I am trying o read images from hdfs & pass them to map reduce.I worte the code but not working.Can anyone please help me in the code? How images can be read from hdfs? Link- http://stackoverflow.com/questions/10885039/reading-images-from-hdfs-using-mapreduce

Anonymous said...

I was trying to extend your program and trying to do append. I got an exception

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Append to hdfs not supported. Please refer to dfs.support.append configuration parameter.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1153)


How to fix these

mohitgoyal said...

How to know whether hdfs cluster is up and not in safemode inside java code

Anju Singh said...

Hello Rajeev Sharma, i m getting the following error, plz help me.



Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.(Configuration.java:153)
at pkgHdfs.HDFSClient.main(HDFSClient.java:192)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 2 more
Java Result: 1

Ankur Shanbhag said...

Very nice blog. Had some doubts with configurations. Would like to know other options for adding configuration parameters to the Configuration object like (core-site.xml). How can we make them relative by not specifying the complete path. As my program may be deloyed on other machine where hadoop is at different location.

Hiranya Jayathilaka said...

Thanks Rajeev. It was very useful

Rohana Rajapakse said...

Hi,

I am trying to do the same but from a java prog running in a different machine in a different platform (windows). My hadoop/MapReduce installation is in linux machine. I think I need to remotely login to the linux box (using ssh?) from within my java code, Should i?. Any idea how to do this?

Thanks Rohana

Aswathaman Balasubramanian said...

Sir,
plz upload the steps to run a java program in hadoop... i don no how to run these programs....

doubter said...

Very informative. Is it possible to delete multiple files by pattern?

doubter said...
This comment has been removed by the author.
Anonymous said...

Hi, this is with ref to the below query "trying to do append an existing file", request you to let me know if this is possible. I have a file already uploaded on HDFS. After the upload the file on my local system got updated and the same I want to update on the copy that is uploaded in HDFS. Is that possible and if yes how?

Tom frnd of jerry said...

Can we manually create regions after creating tables in hbase . And how to add newly created region to meta table

am using java api for hbase

KHATHUTSHELO PRINCE said...

Hi,

can anyone help me on how to move files from Hue to Hdfs every hour using oozie coordination.

i currently know how to create a working oozie coordination. my main problem is to move files/data from Hue to hdfs.


Thanks.

velraj said...

I love this!!The blog is very nice to me. Im always keeping this idea in mind. I will appreciate your help once again. Thanks in advance.
core java training in chennai
core java course
core java Training in Adyar
clinical sas training in chennai
Spring Training in Chennai
QTP Training in Chennai
Manual Testing Training in Chennai
JMeter Training in Chennai

Chris Hemsworth said...

The article is so informative. This is more helpful for our
best software testing training in chennai
best software testing training institute in chennai with placement
software testing training
courses

software testing training and placement
software testing training online
software testing class
software testing classes in chennai
best software testing courses in chennai
automation testing courses in chennai
Thanks for sharing.

Naveen S said...

This is the first & best article to make me satisfied by presenting good content. I feel so happy and delighted. Thank you so much for this article.

Learn Best Digital Marketing Course in Chennai

Digital Marketing Course Training with Placement in Chennai

Learn Digital Marketing Course Training in Chennai

Digital Marketing Training with Placement Institute in Chennai

zuan said...

Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!
Web Designing Training Institute in Chennai | web design training class in chennai | web designing course in chennai with placement
Mobile Application Development Courses in chennai
Data Science Training in Chennai | Data Science courses in Chennai
Professional packers and movers in chennai | PDY Packers | Household Goods Shifting
Web Designing Training Institute in Chennai | Web Designing courses in Chennai
Google ads services | Google Ads Management agency
Web Designing Course in Chennai | Web Designing Training in Chennai