We need to extend our streaming system to take in events with time stamps (which may not be current) and aggregate them to minute level, etc.
Currently we're on Spark 2.1.0 which makes it difficult. To quote a "structured streaming" article from newer versions of Spark:
Earlier Spark Streaming DStream APIs made it hard to express such event-time windows as the API was designed solely for processing-time windows (that is, windows on the time the data arrived in Spark). In Structured Streaming, expressing such windows on event-time is simply performing a special grouping using the
window()function. For example, counts over 5 minute tumbling (non-overlapping) windows on the eventTime column in the event is as following.
- Has MapR released a newer version of Spark yet where I can use a stable version of structured streaming?
- If so, can I use the newer version of Spark without upgrading other parts of the system? I assume yes as Spark tends to be stand-alone.
- If not, is there a technique for dealing with aggregating streaming data on event time? How do I know if I have all of the data points in a minute?