mahdi62b

Transmission of large amount of web logs to MapR cluster

Discussion created by mahdi62b on Feb 13, 2017
Latest reply on Feb 22, 2017 by maprcommunity

Hi, I need to find the most efficient way to transmit log files from a remote webserver to a MapR cluster (hourly) . I need to use Spark Streaming (filestreams) on those log files then on my cluster.

 

what is the most efficient and reliable way to do above?

 

 

 Shoud I install a POSIX client on my webserver and then write a Kafka producer to produce Kafka messages? Can I copy files to my cluster periodically by any utility ? Can I use amazon services such as s3 instead if they are not slow and inefficient?

Outcomes