Git Intern Week 6: Getting Started with Large Codebases: Tips for Open-Source Contribution

One common question I often get is: How did you get started with X's huge codebase? If you know me well, you'll know that I have contributed to and worked on several very large projects, often in programming languages that some people consider complex (though, in reality, those languages are often the simplest). Some of the major projects I've contributed to, like Git, LLVM, systemd, and LibreOffice, have hundreds of thousands or even millions of lines of code. As a new contributor, it’s natural to feel intimidated by such projects. However, there’s no need to be scared. In this blog post, I’ll share some tips and insights that can help you contribute to big projects and potentially any other open-source project.

You Don’t Need to Understand the Entire Codebase

First and foremost, you don’t need to understand or read the entire codebase before you start contributing. Most of the time, it’s simply impossible to do so. I remember speaking with a core contributor to the LLVM project who’d been involved for years. They told me that there were parts of the codebase outside their expertise where they couldn’t offer much help. Similarly, in projects like the Linux kernel, which has many maintainers for different subsystems, it’s common for a maintainer of one subsystem to have little knowledge of what’s happening in another.

As a beginner, accept that you won’t know every part of the codebase. Over time, as you engage more with the community and the project, your understanding of the codebase will grow. Eventually, you might even become familiar with the entire codebase—if it’s not too large. So, give yourself permission to focus on specific parts of the project and be okay with not knowing everything at the start.

Focus on the Relevant Parts

When contributing to a codebase or fixing an issue as a beginner, focus on understanding the part of the codebase that’s relevant to your work. If you’re unsure where to begin, don’t hesitate to ask the maintainers for help or pointers. They can guide you, making it easier to navigate the codebase and contribute effectively.

Setting Up the Project

When you join a community or start working on a project, one of the first and most critical steps is setting up the project codebase on your machine. This includes building, compiling, installing, and testing the project. Understanding this setup process will form the foundation of your contributions.

Start with Beginner-Friendly Issues

Most large projects have beginner-friendly issues or “good first issues” specifically designed to help new contributors get started. These issues are typically simpler to resolve compared to others. By working on these, you’ll familiarize yourself with the codebase and learn how to work within its structure. Once you’ve completed several good first issues, challenge yourself by tackling more complex issues. This progression will help you grow and improve as a contributor.

Don’t Be Afraid to Ask for Help

It’s perfectly fine to ask for help. Whether it’s from maintainers or other contributors, seeking guidance can save you time and effort. The open-source community is generally welcoming and supportive, so take advantage of the resources and expertise available.

Summary

In summary, don’t be intimidated by large projects. At the beginning, you won’t understand the entire codebase—even core contributors often don’t, and that’s okay. Focus on setting up the project on your machine and start with beginner-friendly issues. Gradually take on more complex tasks as you gain confidence and experience. And remember: asking for help is not a sign of weakness but a step toward becoming a better contributor.

My Git Internship Update

From my last blog post, I mentioned that I would be working on reviews and reiterating the patches I sent to the mailing list. I followed through on that last week. Specifically, I had to split some single commits into two patches to improve code clarity and cleanliness, ensuring that each commit focused on fixing or adding one thing to the codebase. Additionally, I renamed some of the test files I created to better reflect their content for improved clarity. I have now sent the second iteration of these patches to the Git mailing list. You can view my patches here.

What I Will Be Doing Next Week

After submitting the second iteration of my patches to the Git codebase, I received reviews and comments from the Git community. For next week, I will continue refining and reiterating my patches based on this feedback until they are accepted and integrated into the Git codebase.

Thank you.