[GSoC 2018] Extending Envoy’s fuzzing coverage
Program: GSoC 2018 (https://summerofcode.withgoogle.com)
Organization: Envoy Proxy (https://www.envoyproxy.io/) under Cloud Native Computing Foundation [CNCF] (https://www.cncf.io/)
Project: Extending Envoy’s fuzzing coverage (https://summerofcode.withgoogle.com/projects/#4573515492098048)
Student: Anirudh M
Mentors: Matt Klein, Harvey Tuch, Constance Caramanolis
GSoC ’18 Experience
It was February 12 when Google posted the list of organisations that will be participating for the Google Summer of Code 2018, having interned previously at Hasura, a platform to build full-stack applications quickly. They used Kubernetes extensively, and that is where I came to know about the existence of Cloud Native Computing Foundation.
Having heard about CNCF’s success from the people who participated in Google Summer of Code with CNCF the previous year, I was interested in sending a proposal to CNCF, but had no thought on any specific project or domain to work on. The CNCF projects which participated in GSoC (Kubernetes, Prometheus, Envoy, CoreDNS and few more) had huge codebases which was overwhelming for me at first. Though most codebases had parts of code written in different languages, the majority usage was Golang. I wanted to learn Go, find low-hanging issues, fix them and then work on a proposal. But I realised it was late to learn a new language with just few weeks left for the program to begin, and it would be even more difficult to write code for the projects, since the projects are not beginner friendly.
An understanding of the tool/project is needed, confidence with the programming language, only then it would be possible to contribute to the codebase. The list of ideas for CNCF’s Summer of Code were posted here: https://github.com/cncf/soc. I spent hours understanding the ideas listed there, it made no clue initially, but I was able to get the context of the ideas mentioned after a while.
I did notice Envoy being listed there, but I failed to notice that it was C++. I was with an assumption that the codebase is of Go, just like any other project under CNCF. I had a phone call with Amit Kumar Jaiswal, who did GSoC previously with Kubernetes, I got my queries clarified on how CNCF worked, and I told him about my previous GSoC with Haiku, saying I previously have experience working in a complex C++ codebase and was looking for such a codebase. He recommended me Envoy, quoting that this was the first time for Envoy to participate in GSoC.
You can read about Envoy here: https://www.envoyproxy.io/, Envoy is a service mesh initially created at Lyft, with codebase in C++, it’s designed to minimize memory and CPU footprint, while offering load balancing, network tracing and database activity in micro-service architecure system. As of now, Envoy is being used by:
I spent few days learning what Envoy was, I took part in one of the community meeting of Envoy, though I didn’t understand what they spoke technically, it was a good experience on how people were in touch with each other remotely and worked collaboratively on a project. I spoke to Matt Klein in Slack and confirmed again whether Envoy was participating in GSoC since I was planning to fully involve with Envoy and forget about submitting proposals to any other CNCF project or even other organisations participating in GSoC.
There were two ideas listed under Envoy for GSoC:
- Replace evbuffer buffer implementation with custom rewrite
- Add more fuzzing
Requirements for both were C++, so language wise, I had no problems. I took the next steps, following the Getting Started guide in Envoy’s site. I tried building Envoy locally, faced lot of problems, it was clear that the problem was not with Envoy’s build, but was with my messy Homebrew gcc toolchain. Here’s the issue thread of the same: https://github.com/envoyproxy/envoy/issues/2639
I could have spent time resolving it, but it was already late, so I started working on a Ubuntu VM in VirtualBox, and things went well with the setup environment part. With just few days left for the proposal to be submitted on GSoC dashboard, I had talks with Harvey Tuch, who was already working on fuzzing for Envoy. Matt told me that I’d be working with him and extending the fuzzing support for Envoy. After he gave me a quick context on what was expected for the summers work, and since I had previous experience on writing a proposal for GSoC, I was able to draft a proposal soon and had submitted it on GSoC dashboard. The results were set to be announced on April 23, it was a long month wait, but I had my academic exams to concentrate, so time flew.
And it was on April 23, the dashboard showed this:
I was on cloud nine seeing this, not all people opt for second GSoC, but I had no big plans for the summer, so I was depending on GSoC, and Envoy gave me the opportunity.
And community bonding period began, I had few more exams to concentrate, so was not able to indulge in the project fully, but found time to learn about the coding style guidelines, code of conduct, contributing guidelines and more fundamentals.
My last exam and the community bonding, both got over on May 2nd week. With no exams, I was able to spend more time for GSoC. I had three mentors allotted officially, but I was mainly in discussions with Harvey since he had experience with the fuzzing work happening. We had a video call to get things started. There was an ongoing issue with all the fuzzing support needed for Envoy: https://github.com/envoyproxy/envoy/issues/508
I became part of the Envoy GitHub organisation, and was assigned the issue to work on. Harvey gave me a quick intro on the library Envoy was using for fuzzing: oss-fuzz, and there was a guide on getting started with writing fuzz tests.
I tried building the fuzzers locally on my MacBook Pro, it did work but it took atleast 40-50 minutes for the fuzzers to be built. I came to know that more computing power is needed, I had student credits in Amazon Web Services and Google Cloud Platform gave $300 worth credits for any new user. I just needed a Virtual Machine instance to build, which takes few minutes in a powerful instance, so the credits were more than enough for me to work on. I went ahead with Google Cloud Platform, setup a beefy Linux setup, ssh into it, worked on the code with vim, tmux. Building and running fuzzers were a breeze in such a powerful setup.
With the hurdles solved, I worked on fuzzing utility functions in Envoy, like the string functions, and then gave a pull request to the repository: https://github.com/envoyproxy/envoy/pull/3493
After a series of code reviews, suggestions, followed by fixing them, the pull request got merged, and marking my first contribution to Envoy. 😀
The first evaluation results came out, and it was positive. Mentors had told me to get started with progressing on the proposed idea. I picked up the configuration validation fuzzing task, and the task was supposed to be a small fix to the existing server fuzz test, but it took me time to get that figured out. Similar to last time, after many feedback, and nits, it got merged as well:
This PR lifecycle took most of the second month of coding, and I started working on the H1 capture fuzzing test. H1 fuzzing is on both the request and response path as of now, i.e. fuzzing happens both downstream and upstream, with direct response enabled the fuzzing happens only in the downstream. A response like a file is thrown instead of connecting to the upstream. Read more about it here.
The second evaluation came and the result was positive again, the feedback was to improve the response time and fix, commit faster. With few classes at college happening, it was difficult to manage both, but thanks to weekends and late nights, I was able to do improve and do better work than the previous month.
OSS-Fuzz is a project by Google which helps opensource projects be more secure and stable by fuzzing the codebase with fuzzing engines. I got access to Envoy’s OSS-Fuzz dashboard and started working on the issues present there. I picked up simple tests which were failing because of a failure in asserts, and fixed them by proper error capture or adding bazel build constraints. Some PRs needed changes to other repositories as well, raised issues there as well. As of now, there have been 5 oss-fuzz tickets fixed.
The work with H1 capture fuzz test has now been merged:
Work done during the summer:
- fuzz: utility fuzz test – https://github.com/envoyproxy/envoy/pull/3493 [Merged]
- fuzz: server config validation fuzz test – https://github.com/envoyproxy/envoy/pull/3770 [Merged]
- fuzz: fixes oss-fuzz: 8363 – https://github.com/envoyproxy/envoy/pull/3905 [Merged]
- fuzz: fixes oss-fuzz: 9204 – https://github.com/envoyproxy/envoy/pull/3935 [Merged]
- fuzz: fixes oss-fuzz: 9599, 9600 – https://github.com/envoyproxy/envoy/pull/3979 [Merged]
- fuzz: fixes oss-fuzz: 9621 – https://github.com/envoyproxy/envoy/pull/3988 [Merged]
- fuzz: h1_capture_fuzz with direct response – https://github.com/envoyproxy/envoy/pull/3787 [Merged]
- envoy: modified identification of corpus path – https://github.com/google/oss-fuzz/pull/1607 [Merged]
- fuzz: fixes oss-fuzz: 9809 – https://github.com/envoyproxy/envoy/pull/4145 [Closed]
- fuzz: fixes oss-fuzz: 9895 – https://github.com/envoyproxy/envoy/pull/4189 [Merged]
Blog post/Final work submission:
- Navigating and working with large codebases
- No room for bad/broken code, test everything and get utmost coverage
- Writing tests become harder if there’s no understanding of the code written
- PR complete lifecycle is important and not just pushing commits 😛
- Communicating effectively remotely with different timezones
- Technically: Envoy, Bazel builds, OSS-Fuzz, protobuf and lot more..
Even after GSoC ends, I look forward to stay with the community and help the project with my tiny contributions.
Owe my biggest thanks to:
- Envoy open-source community (Specifically, my kind mentors for taking time to clarify doubts, and helping me learn).
- Nikhita, Chris Aniszczyk (CNCF organisation admins) and Amit, for introducing me to Envoy.
- People at my university for the permission and time to pursue GSoC.