AnsweredAssumed Answered

Write to MAPRFS with hadoop CLI fails inside docker while running on a data node.

Question asked by coderfi on Nov 4, 2014
Latest reply on Nov 6, 2014 by najmuddin_chirammal
I created a Debian docker image containing the hadoop client tools.

When I am inside the docker container:

Read operations seem to work fine, i.e.

    hadoop fs -cat 'maprfs://maprfs.example.com/tmp/xyz.csv'

However, write operations return an error

    hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'

    2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376
    14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument
    14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
    14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
    14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument
    14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
    14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
    copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)

 After it fails, if I do a `hadoop fs -ls /`, I can see the file on MAPRFS, however, it has zero bytes.

The interesting thing about my setup is:

   a) The docker container is running on one of the data nodes of the mapr cluster.
       This may be the biggest clue... perhaps the hadoop driver is trying to do some sort of low level disk access and is failing.
    
   b) The host is Centos.


I can't seem to find any additional logs. Does anyone know how to enable more verbose logging? Or maybe suggest where to look. 

Other interesting notes:

 -  When I run the write command 'natively' on the host itself (remember, the host is one of the data nodes), I am able to write to maprfs just fine.

- When I run the docker on a physically separate host (i.e. not on the data node), it also works fine.

- The host is apparently running its MAPR services using OpenJDK, while my docker container has the Oracle JDK

Here is my docker info as reported on the host

    docker info
Containers: 10
Images: 124
Storage Driver: devicemapper
 Pool Name: docker-8:3-14419152-pool
 Pool Blocksize: 64 Kb
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 4128.2 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 7.2 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 2.6.32-431.29.2.el6.x86_64
Operating System: <unknown>

I do not have any special /etc/sysconfig/docker configurations.

The Docker Container OS is:

    Ubuntu 14.04.1 LTS

On the host, I have installed

docker-io.x86_64      1.2.0-3.el6       @os-epel                               
(and lxc-docker)
lua-lxc.x86_64        1.0.5-3.el6       @os-epel                               
lxc.x86_64            1.0.5-3.el6       @os-epel                               
lxc-libs.x86_64       1.0.5-3.el6       @os-epel  

Host OS:

    CentOS release 6.5 (Final)

MapRBuildVersion:

    3.1.1.26113.GA
    
Hadoop 1.0.3
   Subversion http://mapr.com -r 862f0747a63bf5e8e0b42dcf6ed4a56e3aa07d84
   Compiled by root on Thu Jun 12 09:55:40 PDT 2014
   From source with checksum c7516e768e82b19f432a4b15b013404e
   This command was run using /opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar

BTW, I run the container with the --net=host parameter because it seems the mapr libraries would refuse to run without it (perhaps it needed the container's hostname or ip address...)

I also tried running with the --privileged=true parameter, but it did not help.

Again, the big clues are that I can:

- write while outside the docker or on a totally separate machine.

- can read whether in the docker or not

But, I can not write while inside the docker on the same data node, instead I, get an error on the client, and a zero byte file on MAPRFS.

Outcomes