Project 3 | CS 61B Spring 2022
Copyright By PowCoder代写 加微信 powcoder
Course Info
Screencasts
Due 29 April 2022
Useful Links
Overview of Gitlet
Internal Structures
Detailed Spec of Behavior
Overall Spec
The Commands
global-log
Miscellaneous Things to Know about the Project
Dealing with Files
Serialization Details
Understanding Acceptance Tests
Example test
Setup for a test
Pattern matching output
Testing conclusion
Design Document and Checkpoint
Grader Details
Things to Avoid
Going Remote (Extra Credit)
The Remote Commands
add-remote
Diffs (Extra Credit)
Diff Utility
Acknowledgments
Project 3: Gitlet, your own version-control system
Due 29 April 2022
Useful Links
Listed below are many high quality resources compiled across multiple semesters to help you get started/unstuck on Gitlet.
These videos and resources will be linked in the relevant portions of the spec, but they are here as well for your
convenience. More resources may be created throughout the duration of the project as needed; if so, they will be linked here as well.
Git Intros: These should mostly be review at this point since you have been using Git throughout the semester, but it is
vital that you have a strong understanding of Git itself before trying to implement Gitlet. Be sure you understand the contents
of these videos thoroughly before proceeding.
Gitlet Intros: The introduction to our mini-version of Git, Gitlet.
OH Presentations (from Fall 2021)
Getting Started: Recording, Slides
Testing and Debugging: Recording, Slides
Merge: Recording, Slides
Understanding Branch
Understanding Merge
Gitlet FAQ/Help Doc
Overview of Gitlet
In this project you’ll be implementing a version-control system that
mimics some of the basic features of the
popular system Git. Ours is
smaller and simpler, however, so we have named it Gitlet.
A version-control system is essentially a backup system for related collections
of files. The main functionality that Gitlet supports is:
Saving the contents of entire directories of files.
In Gitlet, this is called committing, and the saved contents themselves are
called commits.
Restoring a version of one or more files or entire commits.
In Gitlet, this is called checking out those files or that commit.
Viewing the history of your backups. In Gitlet, you view this
history in something called the log.
Maintaining related sequences of commits, called branches.
Merging changes made in one branch into another.
The point of a version-control system is to help you when creating
complicated (or even not-so-complicated) projects, or when collaborating
with others on a project.
You save versions of the project periodically.
If at some later point in time you
accidentally mess up your code, then you can restore your source to
a previously committed version (without losing any of the changes
you made since then). If your collaborators make changes embodied in a commit,
you can incorporate (merge) these changes into your own version.
In Gitlet, you don’t just commit individual files at a time. Instead,
you can commit a coherent set of files at the same time. We like to think of
each commit as a snapshot of your entire project at one point
in time. However, for simplicity, many of the examples in the
remainder of this document involve changes to just one file at a time.
Just keep in mind you could change multiple files in each commit.
In this project, it will be helpful for us to visualize the commits we
make over time. Suppose we have a project consisting just of the
file wug.txt, we add some text to
it, and commit it. Then we modify the file and commit these changes.
Then we modify the file again, and commit the changes again. Now we
have saved three total versions of this file, each one later
in time than the previous. We can visualize these commits like so:
Here we’ve drawn an arrow indicating that each commit contains some
kind of reference to the commit that came before it. We call the
commit that came before it the parent commit—this will be important
later. But for now, does this drawing look familiar? That’s right;
it’s a linked list!
The big idea behind Gitlet is that we can visualize the history of the
different versions of our files in a list like this. Then it’s easy
for us to restore old versions of files. You can imagine making a
command like: “Gitlet, please revert to the state of the files at
commit #2”, and it would go to the second node in the linked list and
restore the copies of files found there, while removing any files that are in the
first node, but not the second.
If we tell Gitlet to revert to an old commit, the front of the linked
list will no longer reflect the current state of your files, which
might be a little misleading. In order to fix this problem, we
introduce something called the head pointer. The head pointer keeps
track of where in the linked list we currently are. Normally, as
we make commits, the head pointer will stay at the front of the linked
list, indicating that the latest commit reflects the current state of
the files:
However, let’s say we revert to the state of the files at commit #2
(technically, this is the reset command, which you’ll see later in
the spec). We move the head pointer back to show this:
All right, now, if this were all Gitlet could do, it would be a pretty
simple system. But Gitlet has one more trick up its sleeve: it doesn’t
just maintain older and newer versions of files, it can maintain
differing versions. Imagine you’re coding a project, and you have
two ideas about how to proceed: let’s call one Plan A, and the other
Plan B. Gitlet allows you to save both versions, and switch between
them at will. Here’s what this might look like, in our pictures:
It’s not really a linked list anymore. It’s more like a tree. We’ll
call this thing the commit tree. Keeping with this metaphor, each of
the separate versions is called a branch of the tree. You can
develop each version separately:
There are two pointers into the tree, representing the furthest
point of each branch. At any given time, only one of these is the
currently active pointer, and this is what’s called the head pointer. The
head pointer is the pointer at the front of the current branch.
That’s it for our brief overview of the Gitlet system! Don’t worry if
you don’t fully understand it yet; the section above was just to give
you a high level picture of what its meant to do. A detailed spec of
what you’re supposed to do for this project follows this section.
But a last word here: commit trees are
immutable: once a commit node has been created, it can
never be destroyed (or changed at all). We can only add new things to
the commit tree, not modify existing things. This is an important
feature of Gitlet! One of Gitlet’s
goals is to allow us to save things that we worked on in the past so we
don’t delete them accidentally; this functionality would be jeopardized if
we were allowed to edit past commits.
Internal Structures
Real Git distinguishes several different kinds of objects. For
our purposes, the important ones are
blobs: Essentially the contents of files.
trees: Directory structures mapping names to references to blobs and
other trees (subdirectories).
commits: Combinations of log messages,
other metadata (commit date, author,
etc.), a reference to a tree, and references to
parent commits.
The repository also maintains a mapping from branch heads (in this course,
we’ve used names
like master, proj2, etc.) to references to commits, so that
certain important commits have symbolic names.
We will simplify from Git still
further by
Incorporating trees into commits and not dealing with subdirectories (so
there will be one
“flat” directory of plain files for each repository).
Limiting ourselves to merges that reference two parents (in real Git, there
can be any number of parents.)
Having our metadata consist only of a timestamp and log message.
A commit, therefore, will consist of a log message,
timestamp, a mapping of file names to blob references, a parent
reference, and (for merges) a second parent reference.
Every object—every blob and every commit in our case—has a
unique integer id that serves as a reference to the object. An
interesting feature of Git is that these ids are universal: unlike a
typical Java implementation, two objects with exactly the same content
will have the same id on all systems (i.e. my computer, your computer,
and anyone else’s computer will compute this same exact id). In the
case of blobs, “same content” means the same file contents. In the
case of commits, it means the same metadata, the same mapping of names
to references, and the same parent reference. The objects in a
repository are thus said to be content addressable.
Both Git and Gitlet accomplish this the same way: by using a cryptographic
hash function called SHA-1 (Secure Hash 1), which produces a 160-bit integer
hash from any sequence of bytes. Cryptographic hash functions have the property
that it is extremely difficult to find two different byte streams with the
same hash value (or indeed to find any byte stream given just its hash value),
so that essentially, we may assume that the probability
that any two objects with different contents have the same SHA-1 hash value is
2-160 or about 10-48. Basically, we simply ignore the
possibility of a hashing collision, so that the system has, in principle,
a fundamental bug that in practice never occurs!
Fortunately, there are library classes for computing SHA-1 values, so you won’t
have to deal with the actual algorithm.
All you have to do is to make sure that you
correctly label all your objects. In particular, this involves
Including all metadata and references when hashing a commit.
Distinguishing somehow between hashes for commits and hashes for blobs. A
good way of doing this involves a well-thought out directory structure
within the .gitlet directory. Another way to do so is to hash in an extra
word for each object that has one value for blobs and another for commits.
By the way, the SHA-1 hash value, rendered as a 40-character
hexadecimal string, makes
a convenient file name for storing your data in your .gitlet
directory (more on that below). It also gives you a convenient way to
compare two files (blobs) to see if they have the same contents: if their
SHA-1s are the same, we simply assume the files are the same.
For remotes (like origin and shared, which we’ve been using all semester),
we’ll simply use other Gitlet repositories. Pushing simply means copying all
commits and blobs that the remote repository does not yet have to the remote
repository, and resetting a branch reference. Pulling is the same, but in the
other direction. Remotes are extra credit in this project and not required for
full credit.
Reading and writing your internal objects from and to files is actually pretty
easy, thanks to Java’s serialization facilities. The interface
java.io.Serializable has no methods,
but if a class implements it, then the Java
runtime will automatically provide a way to convert to and from a stream of
bytes, which you can then write to a file using the I/O class
java.io.ObjectOutputStream and read back (and deserialize) with
java.io.ObjectInputStream.
The term “serialization” refers to the conversion from some arbitrary structure
(array, tree, graph, etc.) to a serial sequence of bytes. You should have seen
and gotten practice with serialization in lab 11. You’ll be using a very similar
approach here, so do use your lab11 as a resource when it comes to persistence
and serialization.
Here is a summary example of the structures discussed in this section.
As you can see, each commit (rectangle) points to some blobs (circles), which
contain file contents. The commits contain the file names and references to
these blobs, as well as a parent link. These references, depicted as arrows,
are represented in the .gitlet directory using their SHA-1 hash values (the
small hexadecimal numerals above the commits and below the blobs). The newer
commit contains an updated version of wug1.txt, but shares the same version
of wug2.txt as the older commit. Your commit class will somehow store all of
the information that this diagram shows: a careful selection of internal data
structures will make the implementation easier or harder, so it behooves you to
spend time planning and thinking about the best way to store everything.
Detailed Spec of Behavior
Overall Spec
The only structure requirement we’re giving you is that you have a
class named gitlet.Main and that it has a main method. Here’s your skeleton
code for this project (in package Gitlet):
public class Main {
public static void main(String[] args) {
// FILL IN
We are also giving you some utility methods for performing a number of
mostly file-system-related tasks, so that you can concentrate on the logic
of the project rather than the peculiarities of dealing with the OS.
You may, of course, write additional Java classes to support your
project—in fact, please do. But don’t use any external code (aside
from JUnit), and don’t use any programming language other than Java.
You can use all of the Java Standard Library that you wish, plus utilities we
The majority of this spec will describe how Main.java’s main
method must react when it receives various Gitlet commands as
command-line arguments. But before we break down command-by-command,
here are some overall guidelines the whole project should
In order for Gitlet to work, it will need a place to store old
copies of files and other
metadata. All of this stuff must be stored in a directory called
.gitlet, just as this information is stored in directory .git for the
real git system (files with a . in front are hidden files. You will
not be able to see them by default on most operating systems. On Unix,
the command ls -a will show them.) A
Gitlet system is considered “initialized” in a particular location if
it has a .gitlet directory there. Most Gitlet commands (except for the
init command) only need to work when used from a directory where a
Gitlet system has been initialized—i.e. a directory that has a
.gitlet directory. The files that aren’t in your .gitlet
directory (which are copies of files from the repository that you are
using and editing, as well as files you plan to add to the repository) are
referred to as the files in your working directory.
Most commands have runtime or memory usage requirements. You must
follow these. Some of the runtimes are described as constant
“relative to any significant measure”. The significant measures are:
any measure of number or size of files, any measure of number of
commits. You can ignore time required to serialize or deserialize,
with the one caveat that your serialization time cannot depend in
any way on the total size of files that have been added, committed,
etc (what is serialization? You’ll see later in the spec). You can
also assume that getting from a hash table is constant time.
Some commands have failure cases with a specified error message. The
exact formats of these are specified later in the spec. All error
message end with a period; since our autograding is literal, be
sure to include it. If your
program ever encounters one of these failure cases, it must print
the error message and not change anything else. You don’t need to
handle any other error cases except the ones listed as failure
There are some failure cases you need to handle that don’t apply to
a particular command. Here they are:
If a user doesn’t input any arguments, print the message
Please enter a command. and exit.
If a user inputs a command that doesn’t exist, print the
message No command with that name exists. and exit.
If a user inputs a command with the wrong number or format of
operands, print the message Incorrect operands. and exit.
If a user inputs a command that requires being in an initialized
Gitlet working directory (i.e., one containing a .gitlet subdirectory),
but is not in such a directory, print the message Not in an initialized
Gitlet directory.
Some of the commands have their differences from real Git
listed. The spec is not exhaustive in listing all differences from
git, but it does list some of the bigger or potentially confusing
and misleading ones.
Do NOT print out anything except for what the spec says. Some of
our autograder tests will break if you print anything more than
necessary.
Always exit with exit code 0, even in the presence of errors. This allows
us to use other exit codes as an indication that something blew up.
The spec classifies some commands as “dangerous”. Dangerous commands
are ones that potentially overwrite files (that aren’t just
metadata)—for example, if a user tells Gitlet to restore files to
older versions, Gitlet may overwrite the current versions of the
files. Just FYI.
The Commands
We now go through each command you must support in detail. Remember that good
programmers always care about their data structures: as you read these commands,
you should think first about how you should store your data to easily support
these commands and second about if there is any opportunity to reuse commands
that you’ve already implemented (hint: there is ample opportunity in this
project to reuse code you’ve already written).
Usage: java gitlet.Main init
Description: Creates a new Gitlet version-control system in the
current directory. This system will automatically start with one
commit: a commit that contains no files and has the commit message
initial commit (just like that, with no punctuation).
It will have a single branch: master, which
initially points to this initial commit, and master will be the
current branch. The timestamp for this initial commit will be
00:00:00 UTC, Thursday, 1 January 1970 in whatever format you
choose for dates (this is
called “The (Unix) Epoch”, represented internally by the time 0.)
Since the initial commit in all repositories
created by Gitlet will have exactly the same content,
it follows that all repositories will automatically share
this commit (they will all have the same UID)
and all commits in all repositories will trace back to it.
Runtime: Should be constant relative to any significant measure.
Failure cases: If there is already a Gitlet version-control
system in the current directory, it should abort. It should NOT
overwrite the existing system with a new one. Should print the error
message A Gitlet version-control system already exists in the
current directory.
Dangerous?: No
Our line count: ~25
Usage: java gitlet.Main add [file name]
Description: Adds a copy of the file as it currently exists to
the staging area (see the description of the commit
command). For this reason, adding a file is also
called staging the file for addition.
Staging an already-staged file overwrites the previous entry
in the staging area with the new contents.
The staging area should be somewhere in
.gitlet. If the current working version of the file is identical to
the version in the current commit, do not stage it to be added,
and remove it from the staging area if it is already there (as
can happen when a file is changed, added, and then changed back).
The file will no longer be staged for removal (see gitlet rm), if it
was at the time of the command.
Runtime: In the worst case, s
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com