Kay Ousterhout

Kay Ousterhout

kayousterhout (at) gmail (dot) com
Twitter: @kayousterhout
GitHub: @kayousterhout


About

I'm currently a software engineer at LightStep, where we're making it easy to understand the performance of complex distributed systems. We're hiring!

I recently graduated from the UC Berkeley Ph.D. program, where I was a member of the NetSys Lab and advised by Sylvia Ratnasamy. My PhD thesis work focused on understanding performance in data analytics frameworks.

I am also a committer and PMC member for Apache Spark. My work on Spark has focused on improving scheduler performance, and I currently help maintain and review pull requests for the scheduler code. I have worked on two high-throughput schedulers for Spark, Sparrow and Drizzle, and I have also written and talked about how users can better understand the performance of their Spark workloads.

My Ph.D. research was supported by a Google PhD Fellowship, a Hertz Foundation Graduate Fellowship, a UC Berkeley Chancellor's Fellowship, and a Google Anita Borg Memorial Scholarship.

I graduated from Princeton University in 2011 with a B.S.E. in Computer Science. At Princeton, I was advised by Jennifer Rexford and Michael J. Freedman.


Thesis Work

The first component of my thesis work focused on characterizing the performance of large-scale data analytics frameworks like Spark. As part of that project, I added instrumentation to Spark to measure how much time is spent doing network and disk I/O. Most of that instrumentation is now part of Spark, and can be visualized in the Spark UI by clicking the "Event Timeline" link on the stage detail page. More information about that project is available here; that page includes links to some detailed traces we collected.

One takeaway from my work measuring performance in current systems is that today's systems make it difficult to reason about performance. In Spark, for example, pervasive pipelining and parallelism make it difficult (even with extensive instrumentation and metrics) for users to model performance and understand how changing the software or hardware configuration would impact performance. Today's users have many choices in how to configure their workloads (e.g., what type of EC2 instance should they use to run their job?); without the ability to reason about performance, they cannot configure for the best performance. The second part of my Ph.D. research focuses on a new system, Monotasks, that we designed with the singular goal of making it easy for users to reason about performance. Monotasks is a replacement for the execution layer of Apache Spark, and is fully API-compatible with Spark. For more information about monotasks, refer to our SOSP paper (linked below).


Publications

Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
Kay Ousterhout, Christopher Canel, Sylvia Ratnasamy, Scott Shenker
SOSP 2017

Drizzle: Fast and Adaptable Stream Processing at Scale
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, Ion Stoica
SOSP 2017

Performance clarity as a first-class design principle
Kay Ousterhout, Christopher Canel, Max Wolffe, Sylvia Ratnasamy, Scott Shenker
HotOS 2017

Making Sense of Performance in Data Analytics Frameworks
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun
NSDI 2015

Sparrow: Distributed, Low Latency Scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
SOSP 2013

The Case for Tiny Tasks in Compute Clusters
Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, Ion Stoica
HotOS 2013


Other Writing

Generating Flame Graphs for Apache Spark


Talks

Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks at SOSP 2017 pdf pptx

Drizzle: Fast and Adaptable Stream Processing at Scale at SOSP 2017 pdf pptx

Re-Architecting Apache Spark for Performance Understandability at Spark Summit 2016 pdf pptx video

Making Sense of Spark Performance at Spark Summit 2015 pdf pptx video

Making Sense of Performance in Data Analytics Frameworks at NSDI 2015 pdf pptx video

Making Sense of Spark Performance O'Reilly Webcast pdf pptx webcast

Next-Generation Spark Scheduling with Sparrow at Spark Summit 2013 pdf pptx video

Sparrow: Distributed, Low Latency Scheduling at SOSP 2013 pdf pptx video

The Case for Tiny Tasks in Compute Clusters at HotOS 2013 pdf pptx


Teaching

In the 2016 Spring Semester, I was a TA for CS61B. As part of my work as a TA, I wrote the Editor Project. I also maintained a list of weekly programming tips that are available here.

I also developed the course materials for the "Scaling Up Analytics" portion of the Spring 2014 Introduction to Data Science course, including a lab on how Apache Spark works.