Hi, I need to find the most efficient way to transmit log files from a remote webserver to a MapR cluster (hourly) . I need to use Spark Streaming (filestreams) on those log files then on my cluster.
what is the most efficient and reliable way to do above?
Shoud I install a POSIX client on my webserver and then write a Kafka producer to produce Kafka messages? Can I copy files to my cluster periodically by any utility ? Can I use amazon services such as s3 instead if they are not slow and inefficient?