AnsweredAssumed Answered

Can't read files in mapr for some time after initially written?

Question asked by reedv on Feb 22, 2018
Latest reply on Mar 21, 2018 by maprcommunity

Having a problem where (it seems like) we are not able to see files that have been written to maprFS until some time (20+ seconds) after they have been written. What could be happening here? Is there some kind of underlying mechanism where mapr need to propagate new data across the cluster before it can be accessed (taking more than a few milliseconds)? If so, is there more documentation about this (and the technical term for it)? Thanks.

 

Details:

We run a script where sqoop imports json files json files from multiple DB tables into the cluster (creating individual directories for each set of files) and we use drill CREATE TABLE (with the store.format='tsv' option) to convert the json files into tsv files. This process normally runs fine, but occasionally, for certain tables, we run into the issue where the drill file conversion fails because drill complains that the directory for the json files does not exist. Yet, looking in the cluster via NFS, we see that the directory does exist and is in fact populated. The way that we have been workinng around this problem so far is by adding a waiting time between when we sqoop in the data and when we try to convert the imported files (which is why I am suspecting some kind of timing issue like the one I ask about in  this question).

 

cluster type
[mapr@mapr002 ingest_scripts]$ cat /opt/mapr/conf/mapr-clusters.conf
my.cluster.local secure=true mapr001.my.local:7222
OS version
[mapr@mapr002 ingest_scripts]$ cat /etc/*release
CentOS Linux release 7.4.1708 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.4.1708 (Core)
CentOS Linux release 7.4.1708 (Core)
 
mapr version
[root@mapr002 ingest_scripts]# cat /opt/mapr/MapRBuildVersion
 
MEP
[mapr@mapr001 ingest_scripts]$ clush -ab 'rpm -qa | grep mapr'
---------------
mapr[005-006] (2)
---------------
mapr-core-6.0.0.20171109191718.GA-1.x86_64
mapr-kafka-0.9.0.201711121504-1.noarch
mapr-hive-2.1.201711121515-1.noarch
mapr-hbase-rest-1.1.8.201711121557-1.noarch
mapr-sqoop2-client-2.0.0.201711021417-1.noarch
mapr-spark-2.1.0.201711121518-1.noarch
mapr-core-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-fileserver-6.0.0.20171109191718.GA-1.x86_64
mapr-hbase-1.1.8.201711121557-1.noarch
mapr-tez-0.8.201711121459-1.noarch
mapr-kafka-connect-jdbc-2.0.1.201711121809-1.noarch
mapr-flume-1.7.0.201703242113-1.noarch
mapr-pig-0.16.201707251429-1.noarch
mapr-hadoop-core-2.7.0.20171109191718.GA-1.x86_64
mapr-nodemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-drill-internal-1.11.0.201711161142-1.noarch
mapr-collectd-5.7.2.201711022055-1.x86_64
mapr-kafka-rest-2.0.1.201711092348-1.noarch
mapr-asynchbase-1.7.0.201711021603-1.noarch
mapr-mapreduce2-2.7.0.20171109191718.GA-1.x86_64
mapr-librdkafka-0.9.1.201711121604-1.noarch
mapr-drill-1.11.0.201711161142-1.noarch
mapr-kafka-connect-hdfs-2.0.1.201711121950-1.noarch
mapr-sqoop-1.4.6.201711121615-1.noarch
---------------
mapr001
---------------
mapr-kafka-connect-hdfs-2.0.1.201711121950-1.noarch
mapr-kafka-0.9.0.201711121504-1.noarch
mapr-asynchbase-1.7.0.201711021603-1.noarch
mapr-resourcemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-tez-0.8.201711121459-1.noarch
mapr-fileserver-6.0.0.20171109191718.GA-1.x86_64
mapr-hbase-rest-1.1.8.201711121557-1.noarch
mapr-mapreduce2-2.7.0.20171109191718.GA-1.x86_64
mapr-librdkafka-0.9.1.201711121604-1.noarch
mapr-drill-1.11.0.201711161142-1.noarch
mapr-nfs-6.0.0.20171109191718.GA-1.x86_64
mapr-kafka-rest-2.0.1.201711092348-1.noarch
mapr-core-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-cldb-6.0.0.20171109191718.GA-1.x86_64
mapr-hbase-1.1.8.201711121557-1.noarch
mapr-kafka-connect-jdbc-2.0.1.201711121809-1.noarch
mapr-pig-0.16.201707251429-1.noarch
mapr-hadoop-core-2.7.0.20171109191718.GA-1.x86_64
mapr-webserver-6.0.0.20171108133112.GA-1.noarch
mapr-drill-internal-1.11.0.201711161142-1.noarch
mapr-sqoop2-client-2.0.0.201711021417-1.noarch
mapr-installer-definitions-1.8.0.201801312110-1.noarch
mapr-zk-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-zookeeper-6.0.0.20171109191718.GA-1.x86_64
mapr-collectd-5.7.2.201711022055-1.x86_64
mapr-apiserver-6.0.0.20171108133112.GA-1.noarch
mapr-flume-1.7.0.201703242113-1.noarch
mapr-installer-1.8.0.201801312110-1.noarch
mapr-nodemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-spark-2.1.0.201711121518-1.noarch
mapr-gateway-6.0.0.20171109191718.GA-1.x86_64
mapr-sqoop-1.4.6.201711121615-1.noarch
mapr-core-6.0.0.20171109191718.GA-1.x86_64
mapr-hive-2.1.201711121515-1.noarch
---------------
mapr002
---------------
mapr-kafka-connect-hdfs-2.0.1.201711121950-1.noarch
mapr-spark-2.1.0.201711121518-1.noarch
mapr-core-6.0.0.20171109191718.GA-1.x86_64
mapr-webserver-6.0.0.20171108133112.GA-1.noarch
mapr-resourcemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-drill-internal-1.11.0.201711161142-1.noarch
mapr-tez-0.8.201711121459-1.noarch
mapr-hbase-rest-1.1.8.201711121557-1.noarch
mapr-kafka-rest-2.0.1.201711092348-1.noarch
mapr-sqoop-1.4.6.201711121615-1.noarch
streamsets-datacollector-mapr_5_2-lib-3.0.1.0-1.noarch
streamsets-datacollector-mapr_6_0-lib-3.0.1.0-1.noarch
mapr-core-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-apiserver-6.0.0.20171108133112.GA-1.noarch
mapr-gateway-6.0.0.20171109191718.GA-1.x86_64
mapr-asynchbase-1.7.0.201711021603-1.noarch
mapr-collectd-5.7.2.201711022055-1.x86_64
mapr-kafka-connect-jdbc-2.0.1.201711121809-1.noarch
mapr-flume-1.7.0.201703242113-1.noarch
streamsets-datacollector-mapr_5_1-lib-3.0.1.0-1.noarch
streamsets-datacollector-mapr_spark_2_1_mep_3_0-lib-3.0.1.0-1.noarch
mapr-hadoop-core-2.7.0.20171109191718.GA-1.x86_64
mapr-zk-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-fileserver-6.0.0.20171109191718.GA-1.x86_64
mapr-kafka-0.9.0.201711121504-1.noarch
mapr-librdkafka-0.9.1.201711121604-1.noarch
mapr-opentsdb-2.4.0.201711021846-1.noarch
mapr-sqoop2-client-2.0.0.201711021417-1.noarch
mapr-pig-0.16.201707251429-1.noarch
streamsets-datacollector-mapr_6_0-mep4-lib-3.0.1.0-1.noarch
mapr-mapreduce2-2.7.0.20171109191718.GA-1.x86_64
mapr-zookeeper-6.0.0.20171109191718.GA-1.x86_64
mapr-nodemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-hbase-1.1.8.201711121557-1.noarch
mapr-hive-2.1.201711121515-1.noarch
mapr-drill-1.11.0.201711161142-1.noarch
---------------
mapr003
---------------
mapr-kafka-connect-hdfs-2.0.1.201711121950-1.noarch
mapr-spark-2.1.0.201711121518-1.noarch
mapr-core-6.0.0.20171109191718.GA-1.x86_64
mapr-webserver-6.0.0.20171108133112.GA-1.noarch
mapr-resourcemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-drill-internal-1.11.0.201711161142-1.noarch
mapr-tez-0.8.201711121459-1.noarch
mapr-hbase-rest-1.1.8.201711121557-1.noarch
mapr-kafka-rest-2.0.1.201711092348-1.noarch
mapr-sqoop-1.4.6.201711121615-1.noarch
mapr-core-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-apiserver-6.0.0.20171108133112.GA-1.noarch
mapr-gateway-6.0.0.20171109191718.GA-1.x86_64
mapr-asynchbase-1.7.0.201711021603-1.noarch
mapr-collectd-5.7.2.201711022055-1.x86_64
mapr-kafka-connect-jdbc-2.0.1.201711121809-1.noarch
mapr-flume-1.7.0.201703242113-1.noarch
mapr-hadoop-core-2.7.0.20171109191718.GA-1.x86_64
mapr-zk-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-fileserver-6.0.0.20171109191718.GA-1.x86_64
mapr-kafka-0.9.0.201711121504-1.noarch
mapr-librdkafka-0.9.1.201711121604-1.noarch
mapr-opentsdb-2.4.0.201711021846-1.noarch
mapr-sqoop2-client-2.0.0.201711021417-1.noarch
mapr-pig-0.16.201707251429-1.noarch
mapr-mapreduce2-2.7.0.20171109191718.GA-1.x86_64
mapr-zookeeper-6.0.0.20171109191718.GA-1.x86_64
mapr-nodemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-hbase-1.1.8.201711121557-1.noarch
mapr-hive-2.1.201711121515-1.noarch
mapr-drill-1.11.0.201711161142-1.noarch
---------------
mapr004
---------------
mapr-drill-internal-1.11.0.201711161142-1.noarch
mapr-oozie-4.3.0.201711121534-1.noarch
mapr-sqoop2-server-2.0.0.201711021417-1.noarch
mapr-hbase-rest-1.1.8.201711121557-1.noarch
mapr-hivemetastore-2.1.201711121515-1.noarch
mapr-flume-1.7.0.201703242113-1.noarch
mapr-pig-0.16.201707251429-1.noarch
mapr-core-6.0.0.20171109191718.GA-1.x86_64
mapr-timelineserver-2.7.0.20171109191718.GA-1.x86_64
mapr-spark-2.1.0.201711121518-1.noarch
mapr-asynchbase-1.7.0.201711021603-1.noarch
mapr-collectd-5.7.2.201711022055-1.x86_64
mapr-drill-1.11.0.201711161142-1.noarch
mapr-hbasethrift-1.1.8.201711121557-1.noarch
mapr-kafka-connect-jdbc-2.0.1.201711121809-1.noarch
mapr-hiveserver2-2.1.201711121515-1.noarch
mapr-httpfs-1.0.201711121521-1.noarch
mapr-core-internal-6.0.0.20171109191718.GA-1.x86_64
mapr-fileserver-6.0.0.20171109191718.GA-1.x86_64
mapr-hive-2.1.201711121515-1.noarch
mapr-sqoop2-client-2.0.0.201711021417-1.noarch
mapr-librdkafka-0.9.1.201711121604-1.noarch
mapr-hue-3.12.0.201711121551-1.noarch
mapr-spark-historyserver-2.1.0.201711121518-1.noarch
mapr-kafka-rest-2.0.1.201711092348-1.noarch
mapr-hivewebhcat-2.1.201711121515-1.noarch
mapr-grafana-4.4.2.201711101631-1.x86_64
mapr-hadoop-core-2.7.0.20171109191718.GA-1.x86_64
mapr-historyserver-2.7.0.20171109191718.GA-1.x86_64
mapr-kafka-0.9.0.201711121504-1.noarch
mapr-oozie-internal-4.3.0.201711121534-1.noarch
mapr-opentsdb-2.4.0.201711021846-1.noarch
mapr-spark-thriftserver-2.1.0.201711121518-1.noarch
mapr-kafka-connect-hdfs-2.0.1.201711121950-1.noarch
mapr-tez-0.8.201711121459-1.noarch
mapr-sqoop-1.4.6.201711121615-1.noarch
mapr-mapreduce2-2.7.0.20171109191718.GA-1.x86_64
mapr-nodemanager-2.7.0.20171109191718.GA-1.x86_64
mapr-hbase-1.1.8.201711121557-1.noarch

Outcomes