Some comments about DEV 361 course

Discussion created by alexott on Apr 30, 2016
Latest reply on May 2, 2016 by onelson

Hello all


Here are my comments from the DEV 361 course, including some errors that I've found. I hope that they could be useful:

  • Lesson 5, slide 10: "Import necessary classes".  The code on this slide doesn't work, until it changed to:

import org.apache.spark.sql._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

import sqlContext.implicits._

  • Lesson 5, slide 14: "Import classes" - the same problem - it should be mentioned that sqlContext is created before the implicits are imported from it (as shown in Spark documentation);
  • Lesson 5, slide 16: "Create schema separately" - there are not necessary backslash ('\') characters in the code example - looks like they were copied from spark-shell?
  • Lesson 5, slide 31: the groupBy is called with argument "address", while when you read data from JSON, the field is called "Address", otherwise I get the error. But when I execute SQL query, then the lower-case name is correct.
  • Lesson 5, slide 32: result in the lecture is different from the result obtained from provided data;
  • Lesson 5, slides 45,47,48: in the definition of getStr - we can simply call substring, without assigning it to intermediate val, that is immediately returned;
  • Lesson 5, slide 58 (quiz): in the lecture, slide 59, lecturer says that number of partitions could be determined as "df.rdd.partitions.size", but later, in the quiz, the answer with this value is marked as incorrect, and the correct answer is "rdd.partitions.size" - maybe the question should be reformulated?
  • Lesson 6, slide 36: There was no sound


Thank you for good course!