I am excited to be giving a talk on Application Architectures with Apache Hadoop on Wednesday, June 25th, at East Bay JUG. This talk is, of course, inspired by the same motivations behind the book, but has a fun twist. I had promised, Chris Richardson, the organizer of the JUG, that I will make this talk more Java friendly. So, I wrote some MapReduce code for doing sessionization of clicks, a very common algorithm used in clickstream analytics and many slides describing the Clickstream analytics use case and walking through the sessionization code. Also, in the same code repo, you will find code for doing the same sessionization in Hive and in Spark (thanks to my talented co-author Gwen Shapira).
The talk starts off with a quick introduction to the case study of Clickstream Analytics, what has been the status quo without using Hadoop and how Hadoop has changed the said status quo. It, then, walks through the high level design, and goes into the details of doing sessionization in MapReduce. Depending on how much time we have left, we will talk about various other architectural considerations for ingesting, storing and processing data in Hadoop, continuing to use Clickstream Analytics as an example. The presentation, as of now, has about 100 slides, the intent is definitely not to cover all of those (thank God!) but talk more about the sections that interest the audience and dive deep into those areas, ignoring the rest for another time.
If you are in the San Francisco bay area this Wednesday, come join us!
- Mark (Follow me on Twitter)
And, here are the presentation slides: