This presentation explores how SQL developers can deliver powerful machine learning applications by leveraging Spark's SQL and MLlib libraries.  A brief overview covering Spark components and architecture kicks things off, and then we dive right in with a live demonstration of loading and querying data using Spark SQL.  Next, we'll examine the basics of machine learning algorithms and workflows before getting under the hood of a Spark MLlib-based recommendation engine.  Our final demonstration looks at how familiar tools can be used to query our recommendation data before we wrap up with a survey of real-world use cases.



• Spark Background/Overview - Brief Spark background, the Spark+Hadoop team, Spark's five main components, How to download a ready-to-use sandbox VM

• Spark SQL Architecture - Features, Languages, How DataFrames work, The SQLContext, Data sources

• Demo #1: Loading And Querying a Dataset with Spark SQL - Live demonstration of setting up a SQLContext, loading it with data, and running queries against it

• Machine Learning with Spark MLlib - Collaborative filtering basics, Alternating Least Squares (ALS) algorithm, General machine learning workflow

• Demo #2: Under The Hood With A Spark MLlib Recommendation Engine - Recommender model code review and live demonstration of training-test loop iterations

• Demo #3 Putting It All Together - Live demonstration of how to leverage Spark SQL ODBC/JDBC connectivity to query recommendation data using familiar tools

• Some Real-World Use Cases - Basically answer the questions "What's it good for?" and "Who's using this?"


Prerequisite Knowledge

• Basic knowledge of typical object-oriented programming languages and concepts is helpful

• Basic understanding of databases, filesystems, and SQL


Audience Type



