A Peek at Software Engineering in Practice
Mostly with
as an Example
Managing massive codebases Where to put the code?
2
First thought might be to create a new repo for each project.
That is a common source control policy: a repo for each project, application, tool, etc.
So a company’s shared codebase is managed across multiple repositories. This is a polyrepo architecture.
Is there another option? Yes, you could maintain the entire codebase in a single repository. That repository contains the code for multiple projects. This is a monorepo architecture.
The monorepo would contain all the company’s code.
If also taking a trunk-based approach, all the company’s code would
be in a single branch.
Would it be crazy?! How could it possibly scale?
Would anyone really do that?
COMP3297: Software Engineering
Who uses a monorepo?
COMP3297: Software Engineering 3
google3/Piper
The monorepo at Google
Google stores almost all its code – over 2 billion LOC – in a single shared source repo.
50,000 engineers share that repo.
The code it contains can be accessed by any Google developer. Any developer can pull code, work on it, and submit changes.
Developers and semi-automated processes make around 70,000 commits per day.
Google uses true trunk-based development in this repo. That means developers work at the head of the single trunk. Branching is rare except for releases. No feature branches. They commit directly to trunk.
The internal name for this repo is google3 and the source control system that hosts it is Piper.
COMP3297: Software Engineering 5
How can it work?
The workflow is simple, but it takes good practices and extensive support tools to keep the code from breaking when tens of thousands of changes are committed each day.
Sync user workspace to repo
Write code
Code review
Commit
Developers can browse and edit files anywhere across the repo.
To work on a file, a developer makes a local copy in their workspace in the repo. .
This is kept safe by a strict review process.
COMP3297: Software Engineering
6
The codebase has a tree structure. Each level identifies a set of owners who are allowed to approve changes in that subtree of the repo (along with those above them).
All code is reviewed before commit and the commit must be approved by an owner. This is enforced by Piper.
Automated analysis and tests are run during development, during review, before commit, and after commit.
New features are developed on the trunk. With good test support, reviews and Continuous Integration, we don’t need to develop in feature branches to protect against causing regressions.
Use feature flags to control new and old code paths that may exist at the same time. Eventually, flags are retired and old code can be deleted.
We’ll see when we cover workflows in the other slide set, Release Branches are common:
o If not doing Continuous Deployment, releases branch from some specific revision, typically the head.
o Bug fixes and enhancements are developed on the trunk and cherry-picked from head if needed. This is kept to a minimum.
o The branch represents the exact code that is out in Production. There is no development done on the branch and so it is not remerged into Trunk.
Release branches only
COMP3297: Software Engineering 7
Benefits
o “One-Version, One Source of Truth”. Unified versioning. Developers never have a choice about which version to depend on. So can safely depend on other code directly. No forking without repackaging/renaming, and it must be safe to mix the original and the fork
o Easier dependency management. Changes trigger a rebuild/retest of dependent code – push dependency management.
No problems with independent versioning of dependencies.
o Wide code-sharing and reuse.
o Atomic changes. Large cross-cutting changes done together.
o Cross-team collaboration – everything is visible.
o Large-scale refactoring and code modernization, centrally managed. o Flexible team boundaries.
COMP3297: Software Engineering 8
Costs
o Tooling costs.
Cost of tool development.
Cost of tool execution. Change may trigger every test to run.
o Codebase complexity – too easy to add dependencies.
o Need to keep code healthy.
o If different projects have different legal, compliance, regulatory, secrecy and privacy requirements, they could be managed more easily in separate repos.
Possible future direction is to move towards virtual-monorepos
COMP3297: Software Engineering 9
Support tools to make it work
CodeSearch: Code searching and browsing, exploration of history and file evolution. Where is it, What does it do, How to use, Why does it behave like this, Who added it and When?
Critique: code review support
Tricorder: Static code analysis – bug pattern matching, linting, suggested fixes, check
naming conventions, check use of APIs, etc. Integrated with Critique. Presubmits: Run when sending a change for review and ahead of commit – a suite of
tests that must pass, no Tricorder issues, etc.
TAP (Test Automation Platform): Automated test infrastructure. Runs all associated tests when engineer attempts to submit code. 4 billion test cases a day.
Blaze: Build system. (Open-sourced as Bazel). Most builds are triggered automatically.
Rosie: Manages large-scale change. Big code clean-ups, replacing deprecated library
features, low-level infrastructure changes, for example. Shards change into
atomic changes, tests via TAP, sends for review, and submits.
COMP3297: Software Engineering 10
SE at Google and others
Most common title at Google is Software Engineer. New hires on the tech side enter the company as Software Engineers. and the title stays in some form through the career development levels:
Senior Software Engineer (L5), Staff Software Engineer (L6), Senior Staff Software Engineer (L7), Principal Engineer (L8), Distinguished Engineer (L9), …
And everyone working in development really is doing software engineering because Google’s practices demand it.
Saw how those practices and excellent support tools are necessary for a monorepo to be an effective strategy – to avoid build-breaking changes and manage churn in the single trunk.
COMP3297: Software Engineering 12
SE Discipline
Where possible, testing is triggered by any change to a file in a module’s transitive dependencies.
Automatically inform author and reviewers of any failure caused by the change.
Automated testing
Code review
Unit tests expected
Load testing
Defect tracking
Required before deployment. Run automatically.
All issues tracked in detail.
Engineers encouraged to develop in the main section of the repo where the rule is that every change must be reviewed (there is an “experimental” section where review rules are looser).
Certainly in all production code. Code review tool will alert if source files are added without tests.
COMP3297: Software Engineering 13
SE Discipline
Style Guides
Release Engineering
Rewrites Products rewritten every few years
For each language to keep common style, layout, naming across the codebase. A few languages are encouraged, to make reuse and collaboration easier: C++, Python, Java, JavaScript, Go, although, of course, other languages are used too.
Typically the responsibility of the software engineers. Short-cycle – daily to two-weekly, mostly automated.
Staging server first and can be exposed to traffic running on production servers without responses going to live users. Then gradual roll-out via canary servers.
COMP3297: Software Engineering 14
Example of rules. Who reviews?
When an author makes change to the codebase in their workspace, they upload details to Critique.
When satisfied with the results of Critique’s analysis, the code review request is mailed to reviewers suggested by the author.
There are three aspects that must be approved:
An engineer must confirm that the code is appropriate and does what the author claims. That it is correct and valid.
Correctness and comprehension
Appropriate for this part of the codebase
Readability
A code owner must approve it. Generally looking at aspects maintainability and whether it adds to technical debt.
An engineer with “Readability” for the language must confirm that the code conforms to Google’s style standards and best practices, and that it is written well.
To get readability certification you must demonstrate you consistently write clear, idiomatic, and maintainable code that exemplifies the company’s best practices and coding style for that language. You may then volunteer as a readability reviewer. Around 1-2% of engineers are reaCdOaMbPi3li2t9y7 r: eSvoifetwaerersE.ngineer ing 15
Two broad career tracks, Engineering and Management, with progression through a series of levels. Not listed here: Product Manager Program Manager
Manages teams but maybe doesn’t lead them, although they can do that as well. Engineering background, technically skilled
Software Engineers (or Engineering Managers) who lead projects. Have responsibility for technical decision for the project and project management (maybe with help of Program Manager). Selected by the Engineering Manager. Also Tech Lead Managers are common.
Developers. Don’t need to become an Engineering Manager to progress. Flexible track. Can develop new products and technologies. Can do research.
Talented researchers who can code as well. Evaluated differently from Software Engineers, but have much the same flexibility.
Maintains operational systems. Responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Also develop.
The People Side at Google
Engineering Managers
Tech Leads
Software Engineers
Research Scientists
Site
Reliability
Engineers
COMP3297: Software Engineering 16
The Culture
Environment
Innovation (Old 100%/20%?)
Career
Paradise? Lookalikes?
Competition
Burnout
COMP3297: Software Engineering
17
Is it for you?