We'll be deploying a spark streaming app sometime soon on YARN.
I've noticed during testing that the logs in YARN grow absolutely huge; to the point where if you need to check something from yesterday, it can take a huge amount of time to load in the browser. Clearly, if this keeps up, it will not even be loadable.
How do people manage YARN logs for spark streaming?
- Can I set rolling?
- Can I set retention time?
- Anything else useful to set?
- Are there external tools people tend to use (I'd like to avoid that probably).