AnsweredAssumed Answered

Drill returns no rows even if they exist

Question asked by masaharo on Jan 7, 2016
Latest reply on Jan 7, 2016 by masaharo
I am running MapR on a VirtualBox image with a drill, downloaded from the MapR website.
My Java software creates any missing MapR tables by using the MapR-specific version of the HBase library, and by calling createTable() function of the HBaseAdmin class.

The software then writes some rows into the newly created tables. Writing and querying works fine within the Java software and within hbase shell in the VirtualBox. I can verify this by running hbase shell within the VirtualBox image. First check the table existance and column families (compared to the example products table that was already created and populated in the VirtualBox MapR image):

    hbase(main):001:0> desc 'CP_DATA'
    Table CP_DATA is ENABLED
    CP_DATA, {TABLE_ATTRIBUTES => {MAX_FILESIZE => '4294967296', METADATA => {'AUTOSPLIT' => 'true', 'MAPR_UUID' => '18c6221e-3163-0c59-714e-0949e68d5600', 'MAX_VALUE_SIZE_IN_MEM' => '100'}}
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'A', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_MEMORY
     => 'false', BLOCKCACHE => 'true', METADATA => {'compression_raw' => '31'}}
    1 row(s) in 1.3090 seconds
    
    hbase(main):002:0> desc 'products'
    Table products is ENABLED
    products, {TABLE_ATTRIBUTES => {MAX_FILESIZE => '4294967296', METADATA => {'AUTOSPLIT' => 'true', 'MAPR_UUID' => '2c4ee5c3-46a0-581f-10d2-07b538185600', 'MAX_VALUE_SIZE_IN_MEM' => '100'}}
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'details', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'LZ4', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_M
    EMORY => 'false', BLOCKCACHE => 'true', METADATA => {'compression_raw' => '2'}}
    {NAME => 'pricing', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'LZ4', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_M
    EMORY => 'false', BLOCKCACHE => 'true', METADATA => {'compression_raw' => '2'}}
    2 row(s) in 0.0170 seconds

Then checking the number of rows in the tables:

    hbase(main):003:0> count 'CP_DATA',{INTERVAL=>10000}
    412 row(s) in 0.0840 seconds
    
    => 412
    
    hbase(main):006:0> count 'products',{INTERVAL=>10000}
    965 row(s) in 0.0590 seconds
    
    => 965
    
Both return correct number of rows. The example table (existing in the downloaded MapR VirtualBox image) has 965 rows, and my table has 412 rows.

But, when checking with sqlline Drill tool, the other table seems to be empty, even if it's not:

    0: jdbc:drill:> use maprdb;
    +-------+-------------------------------------+
    |  ok   |               summary               |
    +-------+-------------------------------------+
    | true  | Default schema changed to [maprdb]  |
    +-------+-------------------------------------+
    
    
    0: jdbc:drill:> show tables;
    +---------------+------------------------------+
    | TABLE_SCHEMA  |          TABLE_NAME          |
    +---------------+------------------------------+
    | maprdb        | CP_DATA                      |
    | maprdb        | customers                    |
    | maprdb        | embeddedclicks               |
    | maprdb        | products                     |
    +---------------+------------------------------+
    
    0: jdbc:drill:> describe CP_DATA;
    +--------------+------------+--------------+
    | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
    +--------------+------------+--------------+
    | row_key      | ANY        | NO           |
    | A            | MAP        | NO           |
    +--------------+------------+--------------+
    2 rows selected (0.213 seconds)
    
    0: jdbc:drill:> describe products;
    +--------------+------------+--------------+
    | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
    +--------------+------------+--------------+
    | row_key      | ANY        | NO           |
    | details      | MAP        | NO           |
    | pricing      | MAP        | NO           |
    +--------------+------------+--------------+
    
    
    
    0: jdbc:drill:> select count(*) from CP_DATA;
    +--+
    |  |
    +--+
    +--+
    No rows selected (0.152 seconds)
    0: jdbc:drill:> select count(*) from products;
    +---------+
    | EXPR$0  |
    +---------+
    | 965     |
    +---------+
    1 row selected (0.163 seconds)


When I check the tables in the Hadoop file system, it all seems OK to me:

    [root@maprdemo ~]# hadoop fs -ls /tables
    Found 4 items
    trwxrwxrwx   1 root root          2 2016-01-06 20:15 /tables/CP_DATA
    trwxrwxrwx   1 root root          2 2015-10-09 15:00 /tables/customers
    trwxrwxrwx   1 root root          2 2015-10-09 15:02 /tables/embeddedclicks
    trwxrwxrwx   1 root root          2 2015-10-09 14:59 /tables/products   
    
For some reason, the 412 rows in the CP_DATA table are not found by Drill.   
Any help is appreciated to solve this problem of mine :-)


Outcomes