Big Data Science Meetup @ Hadoop Summit- June 27, 2016 - CA

Document created by aalvarez on Jun 1, 2016Last modified by aalvarez on Jun 27, 2016
Version 5Show Document
  • View in full screen mode



Date:June 27, 2016

San Jose Convention Center, Room LL21C

150 W San Carlos St,, San Jose, CA

Time:18:00 - 20:00
Registration Link:

Ticket PriceFree



There are demands for good mathematicians to write algorithms that can churn through billions or trillions of data points and show where patterns emerge. The Economist data issue raised this issue as follows: "During the recent financial crisis it became clear that banks and rating agencies had been relying on models which, although they required a vast amount of information to be fed in, failed to reflect financial risk in the real world. This was the first crisis to be sparked by big data—and there will be more". With proper management, big data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. The Big Data Meetup is for big data science. Data scientists are welcome to join this group and exchange ideas.



6:00 P.M. - 6:30 P.M.   Introduction and Networking

6:30 P.M. - 7:10 P.M    Mathematical Bridges Between Old and New - Ted Dunning - Chief Application Architect at MapR

     The computing world seems lately to be all a quiver about the novelty deep learning models and how they seem so mysterious. In fact, the basic ideas behind these systems are very closely related to commonly known algorithms like k-means clustering.

     I will present a simple example of an anomaly detector built using k-means clustering and show how it provides a insight into how much more advanced models such as neural networks and recurrent networks.

7:10 P.M. -  7:15 P.M.    Q/A

7:15 P.M. -  7:55  P.M.   Mathematical Model to unify IoT, Big Data and Artificial Intelligence in the Cloud - Dr. S. Sarkar, Organizer of Big Data Science Meetup and A. Sarkar, Software Engineer, AyushNet

    The insights hidden in the vast and growing oceans of data available from IoTs are extremely valuable but current approaches don’t scale to IoT volumes. The future realization of IoT’s promise is dependent on machine learning to find the patterns and correlations to be stored and made available as “AI in the cloud”. The companies will be able to infuse their own services with such intelligence available in the cloud that can improve almost every aspect of our daily lives. Machine learning generally uses models based on statistics. However, derivatives in calculus provide valuable rate of change information over volumes of data leading to the use of anti-derivatives for stronger predictions.

     The purpose of this presentation is to implement a model iterating over a sequence of computing stages based on Calculus (CAL), Statistics (STAT) and database normalization (DN) in order to (a) seamlessly combine “AI in the cloud”, (b) perform joins over information components and (c) reduce overall processing time with enhanced power of predictions. An example implementation of the model using Spark and Mathematica will be presented.

7:55 P.M. -  8:00 P.M.   Q/A