About
I'm currently a software engineer at LightStep, where we're making it easy to understand the performance of complex distributed systems. We're hiring!
I recently graduated from the UC Berkeley Ph.D. program, where I was a member of the NetSys Lab and advised by Sylvia Ratnasamy. My PhD thesis work focused on understanding performance in data analytics frameworks.
I am also a committer and PMC member for Apache Spark. My work on Spark has focused on improving scheduler performance, and I currently help maintain and review pull requests for the scheduler code. I have worked on two high-throughput schedulers for Spark, Sparrow and Drizzle, and I have also written and talked about how users can better understand the performance of their Spark workloads.
My Ph.D. research was supported by a Google PhD Fellowship, a Hertz Foundation Graduate Fellowship, a UC Berkeley Chancellor's Fellowship, and a Google Anita Borg Memorial Scholarship.
I graduated from Princeton University in 2011 with a B.S.E. in Computer Science. At Princeton, I was advised by Jennifer Rexford and Michael J. Freedman.
Thesis Work
The first component of my thesis work focused on characterizing the performance of large-scale data analytics frameworks like Spark. As part of that project, I added instrumentation to Spark to measure how much time is spent doing network and disk I/O. Most of that instrumentation is now part of Spark, and can be visualized in the Spark UI by clicking the "Event Timeline" link on the stage detail page. More information about that project is available here; that page includes links to some detailed traces we collected.
One takeaway from my work measuring performance in current systems is that today's systems make it difficult to reason about performance. In Spark, for example, pervasive pipelining and parallelism make it difficult (even with extensive instrumentation and metrics) for users to model performance and understand how changing the software or hardware configuration would impact performance. Today's users have many choices in how to configure their workloads (e.g., what type of EC2 instance should they use to run their job?); without the ability to reason about performance, they cannot configure for the best performance. The second part of my Ph.D. research focuses on a new system, Monotasks, that we designed with the singular goal of making it easy for users to reason about performance. Monotasks is a replacement for the execution layer of Apache Spark, and is fully API-compatible with Spark. For more information about monotasks, refer to our SOSP paper (linked below).
Publications
Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks Kay Ousterhout, Christopher Canel, Sylvia Ratnasamy, Scott Shenker SOSP 2017
Drizzle: Fast and Adaptable Stream Processing at Scale
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, Ion Stoica
SOSP 2017
Performance clarity as a first-class design principle Kay Ousterhout, Christopher Canel, Max Wolffe, Sylvia Ratnasamy, Scott Shenker HotOS 2017
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun NSDI 2015
Sparrow: Distributed, Low Latency Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica SOSP 2013
The Case for Tiny Tasks in Compute Clusters Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, Ion Stoica HotOS 2013
Other Writing
Generating Flame Graphs for Apache Spark
Talks
Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks at SOSP 2017 pdf pptx
Drizzle: Fast and Adaptable Stream Processing at Scale at SOSP 2017 pdf pptx
Re-Architecting Apache Spark for Performance Understandability at Spark Summit 2016 pdf pptx video
Making Sense of Spark Performance at Spark Summit 2015 pdf pptx video
Making Sense of Performance in Data Analytics Frameworks at NSDI 2015 pdf pptx video
Making Sense of Spark Performance O'Reilly Webcast pdf pptx webcast
Next-Generation Spark Scheduling with Sparrow at Spark Summit 2013 pdf pptx video
Sparrow: Distributed, Low Latency Scheduling at SOSP 2013 pdf pptx video
The Case for Tiny Tasks in Compute Clusters at HotOS 2013 pdf pptx
Teaching
In the 2016 Spring Semester, I was a TA for CS61B. As part of my work as a TA, I wrote the Editor Project. I also maintained a list of weekly programming tips that are available here.
I also developed the course materials for the "Scaling Up Analytics" portion of the Spring 2014 Introduction to Data Science course, including a lab on how Apache Spark works.