I keep hearing about people serving query results interactively from Spark SQL. Given what I know about Spark though, it sounds complex to implement maintain. I'd like to try this route before selecting an architecture though.
I'm also a little confused about how it could be performant.
- Do people do this with a normal spark-submitted job on YARN?
- How are they keeping the job alive for long periods of time/managing it?
- How do they interact with it to execute queries? Is a web-server launched in the driver, etc?
- Given "big data", it is unlikely that you could cache enough for it to be overly useful. Reading a parquet file in Spark, for example, though sounds like it would be just as slow as reading it in Drill. So, how can this be more interactive than just trying to use drill?
I know there are a few COTS products that do this but I'm more interested in how people here are implementing it. themselves.