AnsweredAssumed Answered

Best Practice for Data Ingestion into HDFS folders

Question asked by elloyd79 on Mar 26, 2018
Latest reply on Apr 11, 2018 by Ted Dunning

Hello we are new to MapR (installing it this week) and we are coming from Hortonworks using Nifi.

 

Our system currently is as follows: logs ingested via Nifi -> pushed into folders on HDFS -> read by Splunk Analytics for Hadoop (HUNK) to do analytics

 

In order to use HUNK effectively, we need our HDFS folders to be divided into this type of pattern:

/logs/year/month/day/hour/sourcetype/file.log


Using Nifi this is currently possible as we can create new folders and a hierarchy and push the folders into where they belong based on reading the actual events timestamp.

 

As we move to MapR now, I want to know if I will be able to duplicate this functionality. I know MapR does a direct read from an NFS mount into HDFS so in my theoretical-only vision, these files will just be dumped into HDFS without any formatting as to their location or ability to read the timestamp of each event and place them appropriately.

 

What I am asking is are there capabilities with MapR to:

(1) Read the timestamp of each event and use the year, month, day, hour to

(2) Push the data into folders based on the data read from the timestamps

 

And if so, how is it done?    BTW, this data isn't just key-value pairs.  It contains key-value pairs but also contains various unstructured data such as just text and some Java errors.


Thank you for any advice.

Outcomes