CS 61BL Summer 2021
About Beacon Ed Gradescope Queue Resources Staff
Project 2: Gitlet
A note on this spec Overview of Gitlet Internal Structures Detailed Spec of Behavior The Commands
Skeleton
Design Document
Grader Details
Miscellaneous Things to Know about the Project Dealing with Files
Serialization Details
Testing
Understanding Integration Tests
Debugging Integration Tests
Merge (Extra Credit)
I. Things to Avoid
J. Acknowledgments
A note on this spec
We encourage you all to read the spec and watch before starting.
This spec is fairly long. The rst half is a verbose and detailed description of every command you’ll support, and the other half is the testing details and some words of advice. To help you digest this, we’ve prepared many high quality videos describing portions of the spec and giving advice on how and where to begin. All videos are linked throughout this spec in the relevant location, but we’ll also list them right here for your convenience. Note: some of these videos were created in Spring 2020 when Gitlet was Project 3 and Capers was Lab 12, and some videos briefly mention Professor Hil nger’s CS 61B setup (including a remote called shared , a repository called repo , etc). Please ignore these as they do not provide any useful information for you this semester. The actual content of the assignment is unchanged.
Git intro – Part 1 Git intro – Part 2 Live lecture 12 Gitlet intro playlist
Part 1
Part 2
Part 3
Part 4
Slides that Itai used
Merge overview and example
Designing Persistence (written notes)
As more resources are created, we’ll add them here, so refresh often!
Overview of Gitlet
Note: We highly recommend completing Lab 9: Persistence before this project. Lab 9 is intended to be an introduction to this project and will be very helpful in getting you started and ensure you’re all set up. If you are eager to start this project before the release of this lab, you de nitely can, but it may be easier to complete the lab rst.
In this project you’ll be implementing a version-control system that mimics some of the basic features of the popular system Git. Ours is smaller and simpler, however, so we have named it Gitlet.
A version-control system is essentially a backup system for related collections of les. The main functionality that Gitlet supports is:
1. Saving the contents of entire directories of les. In Gitlet, this is called committing, and the saved contents themselves are called commits.
2. Restoring a version of one or more les or entire commits. In Gitlet, this is called checking out those les or that commit.
3. Viewing the history of your backups. In Gitlet, you view this history in something called the log.
4. Maintaining related sequences of commits, called branches.
5. Merging changes made in one branch into another.
The point of a version-control system is to help you when creating complicated (or even not-so-complicated) projects, or when collaborating with others on a project. You save versions of the project periodically. If at some later point in time you accidentally mess up your
code, then you can restore your source to a previously committed version (without losing any of the changes you made since then). If your collaborators make changes embodied in a commit, you can incorporate (merge) these changes into your own version.
In Gitlet, you don’t just commit individual les at a time. Instead, you can commit a coherent set of les at the same time. We like to think of each commit as a snapshot of your entire project at one point in time. However, for simplicity, many of the examples in the remainder of this document involve changes to just one le at a time. Just keep in mind you could change multiple les in each commit.
In this project, it will be helpful for us to visualize the commits we make over time. Suppose we have a project consisting just of the le wug.txt, we add some text to it, and commit it. Then we modify the le and commit these changes. Then we modify the le again, and commit the changes again. Now we have saved three total versions of this le, each one later in time than the previous. We can visualize these commits like so:
Here we’ve drawn an arrow indicating that each commit contains some kind of reference to the commit that came before it. We call the commit that came before it the parent commit–this will be important later. But for now, does this drawing look familiar? That’s right; it’s a linked list!
The big idea behind Gitlet is that we can visualize the history of the different versions of our les in a list like this. Then it’s easy for us to restore old versions of les. You can imagine making a command like:
“Gitlet, please revert to the state of the les at commit #2”, and it would go to the second node in the linked list and restore the copies of les found there, while removing any les that are in the rst node, but not the second.
If we tell Gitlet to revert to an old commit, the front of the linked list will no longer reflect the current state of your les, which might be a little misleading. In order to x this problem, we introduce something called the head pointer (also called the HEAD pointer). The head pointer keeps track of where in the linked list we currently are. Normally, as we make commits, the head pointer will stay at the front of the linked list, indicating that the latest commit reflects the current state of the les:
However, let’s say we revert to the state of the les at commit #2 (technically, this is the reset command, which you’ll see later in the spec). We move the head pointer back to show this:
Here we say that we are in a detached head state which you may have encountered yourself before. This is what it means!
EDIT 7/9: Note that in Gitlet, there is no way to be in a detached head state since there is no command that will move the HEAD pointer to a speci c commit. The command will do that, though it also moves the branch pointer. Thus, in Gitlet, you will never be in a detached HEAD state.
All right, now, if this were all Gitlet could do, it would be a pretty simple system. But Gitlet has one more trick up its sleeve: it doesn’t just maintain older and newer versions of les, it can maintain differing versions. Imagine you’re coding a project, and you have two ideas about how to proceed: let’s call one Plan A, and the other Plan B. Gitlet allows you to save both versions, and switch between them at will. Here’s what this might look like, in our pictures:
checkout
reset
It’s not really a linked list anymore. It’s more like a tree. We’ll call this thing the commit tree. Keeping with this metaphor, each of the separate versions is called a branch of the tree. You can develop each version separately:
There are two pointers into the tree, representing the furthest point of each branch. At any given time, only one of these is the currently active pointer, and this is what’s called the head pointer. The head pointer is the pointer at the front of the current branch.
That’s it for our brief overview of the Gitlet system! Don’t worry if you don’t fully understand it yet; the section above was just to give you a high level picture of what its meant to do. A detailed spec of what you’re supposed to do for this project follows this section.
But a last word here: commit trees are immutable: once a commit node has been created, it can never be destroyed (or changed at all). We can only add new things to the commit tree, not modify existing things. This is an important feature of Gitlet! One of Gitlet’s goals is to allow us to save things so we don’t delete them accidentally.
Internal Structures
Real Git distinguishes several different kinds of objects. For our purposes, the important ones are
blobs: The saved contents of les. Since Gitlet saves many versions of les, a single le might correspond to multiple blobs: each being tracked in a different commit.
trees: Directory structures mapping names to references to blobs and other trees (subdirectories).
commits: Combinations of log messages, other metadata (commit date, author, etc.), a reference to a tree, and references to parent commits. The repository also maintains a mapping from branch heads to references to commits, so that certain important commits have symbolic names.
Gitlet simpli es from Git still further by
Incorporating trees into commits and not dealing with subdirectories (so there will be one “flat” directory of plain les for each repository).
Limiting ourselves to merges that reference two parents (in real Git, there can be any number of parents.)
Having our metadata consist only of a timestamp and log message. A commit, therefore, will consist of a log message, timestamp, a mapping of le names to blob references, a parent reference, and (for merges) a second parent reference.
Note that merging is only applicable for those seeking the 2 extra credit points.
Every object–every blob and every commit in our case–has a unique integer id that serves as a reference to the object. An interesting feature of Git is that these ids are universal: unlike a typical Java implementation, two objects with exactly the same content will have the same id on all systems (i.e. my computer, your computer, and anyone else’s computer will compute this same exact id). In the case of blobs, “same content” means the same le contents. In the case of commits, it means the same metadata, the same mapping of names to references, and the same parent reference. The objects in a repository are thus said to be content addressable.
Both Git and Gitlet accomplish this the same way: by using a cryptographic hash function called SHA-1 (Secure Hash 1), which produces a 160-bit integer hash from any sequence of bytes. Cryptographic hash functions have the property that it is extremely dif cult to nd two different byte streams with the same hash value (or indeed to nd any byte stream given just its hash value), so that essentially, we may assume that the probability that any two objects with different contents have the same SHA-1 hash value is 2-160 or about 10-48. Basically, we simply ignore the possibility of a hashing collision, so that the system has, in principle, a fundamental bug that in practice never occurs!
Fortunately, there are library classes for computing SHA-1 values, so you won’t have to deal with the actual algorithm. All you have to do is to make sure that you correctly label all your objects. In particular, this involves
Including all metadata and references when hashing a commit. Distinguishing somehow between hashes for commits and hashes for blobs. A good way of doing this involves a well-thought out directory structure within the .gitlet directory. Another way to do so is to hash in an extra word for each object that has one value for blobs and another for commits.
By the way, the SHA-1 hash value, rendered as a 40-character hexadecimal string, makes a convenient le name for storing your data in your .gitlet directory (more on that below). It also gives you a convenient way to compare two les (blobs) to see if they have the same contents: if their SHA-1s are the same, we simply assume the les are the same.
Reading and writing your internal objects from and to les is actually pretty easy, thanks to Java’s serialization facilities. The interface
java.io.Serializable has no methods, but if a class implements it, then the Java runtime will automatically provide a way to convert to and from a stream of bytes, which you can then write to a le using the I/O class and read back (and deserialize) with . The term “serialization” refers to the conversion from some arbitrary structure (array, tree, graph, etc.) to a serial sequence of bytes. You should have seen and gotten practice with serialization in Lab 9. You’ll be using a very similar approach here, so do use your lab6 as a resource when it comes to persistence and serialization.
Here is a summary example of the structures discussed in this section. As you can see, each commit (rectangle) points to some blobs (circles), which contain le contents. The commits contain the le names and references to these blobs, as well as a parent link. These references, depicted as arrows, are represented in the .gitlet directory using their SHA-1 hash values (the small hexadecimal numerals above the commits and below the blobs). The newer commit contains an updated version of wug1.txt , but shares the same version of wug2.txt as
java.io.ObjectOutputStream
java.io.ObjectInputStream
the older commit. Your commit class will somehow store all of the information that this diagram shows: a careful selection of internal data structures will make the implementation easier or harder, so it behooves you to spend time planning and thinking about the best way to store everything.
Detailed Spec of Behavior Overall Spec
The only structure requirement we’re giving you is that you have a class named gitlet.Main and that it has a main method.
We are also giving you some utility methods for performing a number of mostly le-system-related tasks, so that you can concentrate on the logic of the project rather than the peculiarities of dealing with the OS.
We have also added two suggested classes: Commit , and Repository to get you started. You may, of course, write additional
Java classes to support your project or remove our suggested classes if you’d like. But don’t use any external code (aside from JUnit), and don’t
use any programming language other than Java. You can use all of the Java Standard Library that you wish, plus utilities we provide.
The majority of this spec will describe how gitlet.Main ’s main method must react when it receives various gitlet commands as command-line arguments. But before we break down command-by- command, here are some overall guidelines the whole project should satisfy:
In order for Gitlet to work, it will need a place to store old copies of les and other metadata. All of this stuff must be stored in a directory called , just as this information is stored in directory for the real git system ( les with a . in front are hidden les. You will not be able to see them by default on most operating systems. On Unix, the command ls -a will show them.) A Gitlet system is considered “initialized” in a particular location if it has a directory there. Most Gitlet commands (except for the command) only need to work when used from a directory where a Gitlet system has been initialized–i.e. a directory that has a .gitlet directory. The les that aren’t in your .gitlet directory (which are copies of les from the repository that you are using and editing, as well as les you plan to add to the repository) are referred to as the les in your working directory.
Most commands have runtime or memory usage requirements. You must follow these. Some of the runtimes are described as constant “relative to any signi cant measure”. The signi cant measures are: any measure of number or size of les, any measure of number of commits. You can ignore time required to serialize or deserialize, with the one caveat that your serialization time cannot depend in any way on the total size of les that have been added, committed, etc (what is serialization? Revisit Lab 9 if you don’t know!). You can also pretend that getting from a hash table is constant time.
.gitlet
.git
.gitlet
init
Some commands have failure cases with a speci ed error message. The exact formats of these are speci ed later in the spec. All error message end with a period; since our autograding is literal, be sure to include it. If your program ever encounters one of these failure cases, it must print the error message and not change anything else. You don’t need to handle any other error cases except the ones listed as failure cases.
There are some failure cases you need to handle that don’t apply to a particular command. Here they are:
If a user doesn’t input any arguments, print the message Please enter a command. and exit.
If a user inputs a command that doesn’t exist, print the message No command with that name exists. and exit.
If a user inputs a command with the wrong number or format of operands, print the message Incorrect operands. and exit.
If a user inputs a command that requires being in an initialized Gitlet working directory (i.e., one containing a
.gitlet subdirectory), but is not in such a directory, print the message
Some of the commands have their differences from real Git listed. The spec is not exhaustive in listing all differences from Git, but it does list some of the bigger or potentially confusing and misleading ones.
Do NOT print out anything except for what the spec says. Some of our autograder tests will break if you print anything more than necessary.
Not in an initialized Gitlet
directory.
Always exit with exit code 0, even in the presence of errors. This allows us to use other exit codes as an indication that something blew up. This means do not let your code throw any errors, as your program will automatically exit with code 0 if it ran with no errors.
The spec classi es some commands as “dangerous”. Dangerous commands are ones that potentially overwrite les (that aren’t just metadata)–for example, if a user tells Gitlet to restore les to older versions, Gitlet may overwrite the current versions of the les. Just FYI. So put a helmet on before you test these commands 🙂
The Commands
We now go through each command you must support in detail. Remember that good programmers always care about their data structures: as you read these commands, you should think rst about how you should store your data to easily support these commands and second about if there is any opportunity to reuse commands that you’ve already implemented (hint: there is ample opportunity in this project to reuse code you’ve already written). We have listed lectures in some methods that we have found useful, but you are not required to use concepts from these lectures. There are conceptual quizzes on some of the more confusing commands that you should de nately use to check your understanding. The quizzes are not for a grade, they are only there to help you check your understanding before trying to implement the command.
init
Usage: java gitlet.Main init
Description: Creates a new Gitlet version-control system in the current directory. This system will automatically start with one commit: a commit that contains no les and has the commit message initial commit (just like that, with no punctuation).
It will have a single branch: , which initially points to this initial commit, and will be the current branch. The timestamp for this initial commit will be 00:00:00 UTC, Thursday, 1 January 1970 in whatever format you choose for dates (this is called “The (Unix) Epoch”, represented internally by the time 0.) Since the initial commit in all repositories created by Gitlet will have exactly the same content, it follows that all repositories will automatically share this commit (they will all have the same UID) and all commits in all repositories will trace back to it.
Runtime: Should be constant relative to any signi cant measure.
Failure cases: If there is already a Gitlet version-control system in the current directory, it should abort. It should NOT overwrite the existing system with a new one. Should print the error message A
Dangerous?: No Our line count: ~25
add
Usage: java gitlet.Main add [file name]
Description: Adds a copy of the le as it currently exists to the staging area (see the description of the commit command). For this reason, adding a le is also called staging the le for addition. Staging an already-staged le overwrites the previous entry in the staging area with the new contents. The staging area should be somewhere in .gitlet . If the current working version of the le is identical to the version in the current commit, do not stage it to be added, and remove it from the staging area if it is already there (as can happen when a le is changed, added, and then changed
master
master
Gitlet version-control system already exists in
the current directory.
back to it’s original version). The le will no longer be staged for removal (see gitlet rm ), if it was at the time of the command.
Runtime: In the worst case, should run in linear time relative to the size of the le being added and , for the number of les in the commit.
Failure cases: If the le does not exist, print the error message File does not exist. and exit without changing anything.
Dangerous?: No Our line count: ~20
Suggested Lecture(s): Lecture 19 (Sets, Maps, ADTs), Lecture 19 (Hashing)
commit
Usage: java gitlet.Main commit [message]
Description: Saves a snapshot of tracked les in the current commit and staging area so they can be restored at a later time, creating a new commit. The commit is said to be tracking the saved les. By default, each commit’s snapshot of les will be exactly the same as its parent commit’s snapshot of les; it will keep versions of les exactly as they are, and not update them. A commit will only update the contents of les it is tracking that have been staged for addition at the time of commit, in which case the commit will now include the version of the le that was staged instead of the version it got from its parent. A commit will save and start tracking any les that were staged for addition but weren’t tracked by its parent. Finally, les tracked in the current commit may be untracked in the new commit as a result being staged for removal by the rm command (below).
N Ngl
The bottom line: By default a commit has the same le contents as its parent. Files staged for addition and removal are the updates to the commit. Of course, the date (and likely the mesage) will also different from the parent.
Some additional points about commit:
The staging area is cleared after a commit.
The commit command never adds, changes, or removes les in the working directory (other than those in the .gitlet directory). The rm command will remove such les, as well as staging them for removal, so that they will be untracked after a commit .
Any changes made to les after staging for addition or removal are ignored by the command, which only modi es the contents of the directory. For example, if you remove a tracked le using the Unix rm command (rather than Gitlet’s command of the same name), it has no effect on the next commit, which will still contain the (now deleted) version of the le.
After the commit command, the new commit is added as a new node in the commit tree.
The commit just made becomes the “current commit”, and the head pointer now points to it. The previous head commit is this commit’s parent commit.
Each commit should contain the date and time it was made.
Each commit has a log message associated with it that describes the changes to the les in the commit. This is speci ed by the user. The entire message should take up only one entry in the array args that is passed to main .
commit
.gitlet
To include multiword messages, you’ll have to surround them in quotes.
Each commit is identi ed by its SHA-1 id, which must include the le (blob) references of its les, parent reference, log message, and commit time.
Runtime: Runtime should be constant with respect to any measure of number of commits. Runtime must be no worse than linear with respect to the total size of les the commit is tracking. Additionally, this command has a memory requirement: Committing must increase the size of the .gitlet directory by no more than the total size of the les staged for addition at the time of commit, not including additional metadata. This means don’t store redundant copies of versions of les that a commit receives from its parent (hint: remember that blobs are content addressable and use the SHA1 to your advantage). You are allowed to save whole additional copies of les; don’t worry about only saving diffs, or anything like that.
Failure cases: If no les have been staged, abort. Print the message No changes added to the commit. Every commit must have a non-blank message. If it doesn’t, print the error message Please enter a commit message. It is not a failure for tracked les to be missing from the working directory or changed in the working directory. Just ignore everything outside the
.gitlet directory entirely. Dangerous?: No
Differences from real git: In real git, commits may have multiple parents (due to merging) and also have considerably more metadata.
Our line count: ~35
Suggested Lecture(s): Lecture 19 (Sets, Maps, ADTs), Lecture 19 (Hashing)
Here’s a picture of before-and-after commit:
rm
Usage: java gitlet.Main rm [file name]
Description: Unstage the le if it is currently staged for addition. If the le is tracked in the current commit, stage it for removal and remove the le from the working directory if the user has not already done so (do not remove it unless it is tracked in the current commit).
Runtime: Should run in constant time relative to any signi cant measure.
Failure cases: If the le is neither staged nor tracked by the head commit, print the error message No reason to remove the file.
Dangerous?: Yes (although if you use our utility methods, you will only hurt your repository les, and not all the other les in your directory.)
Our line count: ~20 log
Usage: java gitlet.Main log
Description: Starting at the current head commit, display information about each commit backwards along the commit tree until the initial commit, following the rst parent commit links, ignoring any second parents found in merge commits (this only applicable to those implementing the extra credit).
This set of commit nodes is called the commit’s history. For every node in this history, the information it should display is the commit id, the time the commit was made, and the commit message. Here is an example of the exact format it should follow:
===
commit a0da1ea5a15ab613bf9961fd86f010cf74c7ee48
Date: Thu Nov 9 20:00:05 2017 -0800
A commit message.
===
commit 3e8bf1d794ca2e9ef8a4007275acf3751c7170ff
Date: Thu Nov 9 17:01:33 2017 -0800
Another commit message.
===
commit e881c9575d180a215d1a636545b8fd9abfb1d2bb
Date: Wed Dec 31 16:00:00 1969 -0800
initial commit
There is a === before each commit and an empty line after it. As in real Git, each entry displays the unique SHA-1 id of the commit object. The timestamps displayed in the commits reflect the current timezone, not UTC; as a result, the timestamp for the initial commit does not read Thursday, January 1st, 1970, 00:00:00, but rather the equivalent Paci c Standard Time. Display commits with the most recent at the top. By the way, you’ll nd that the Java classes java.util.Date and
java.util.Formatter are useful for getting and formatting times. Look into them instead of trying to construct it manually yourself!
Of course, the SHA1 identi ers are going to be different, so don’t worry about those. Our tests will ensure that you have something that “looks like” a SHA1 identi er (more on that in the testing section below).
For merge commits (those that have two parent commits), add a line just below the rst, as in
===
commit 3e8bf1d794ca2e9ef8a4007275acf3751c7170ff
Merge: 4975af1 2c1ead1
Date: Sat Nov 11 12:30:00 2017 -0800
Merged development into master.
where the two hexadecimal numerals following “Merge:” consist of the rst seven digits of the rst and second parents’ commit ids, in that order. The rst parent is the branch you were on when you did the merge; the second is that of the merged-in branch. This is as in regular
Git. Note that adding the merge functionality is only required for extra credit.
Runtime: Should be linear with respect to the number of nodes in head’s history.
Failure cases: None Dangerous?: No Our line count: ~20
Here’s a picture of the history of a particular commit. If the current branch’s head pointer happened to be pointing to that commit, log would print out information about the circled commits:
The history ignores other branches and the future. Now that we have the concept of history, let’s re ne what we said earlier about the commit tree being immutable. It is immutable precisely in the sense that the history of a commit with a particular id may never change, ever. If you think of the commit tree as nothing more than a collection of histories, then what we’re really saying is that each history is immutable.
global-log
Usage: java gitlet.Main global-log
Description: Like log, except displays information about all commits ever made. The order of the commits does not matter. Hint: there is a useful method in gitlet.Utils that will help you iterate over les within a directory.
Runtime: Linear with respect to the number of commits ever made. Failure cases: None
Dangerous?: No
Our line count: ~10 –>
nd
Usage: java gitlet.Main find [commit message]
Description: Prints out the ids of all commits that have the given commit message, one per line. If there are multiple such commits, it prints the ids out on separate lines. The commit message is a single operand; to indicate a multiword message, put the operand in quotation marks, as for the commit command below. Hint: the hint for this command is the same as the one for global-log .
Runtime: Should be linear relative to the number of commits. Failure cases: If no such commit exists, prints the error message
Found no commit with that message.
Dangerous?: No
Differences from real git: Doesn’t exist in real git. Similar effects
can be achieved by grepping the output of log. Our line count: ~15
status
Usage: java gitlet.Main status
Description: Displays what branches currently exist, and marks the current branch with a * . Also displays what les have been staged for addition or removal. An example of the exact format it should follow is as follows.
=== Branches ===
*master
other-branch
=== Staged Files ===
wug.txt
wug2.txt
=== Removed Files ===
goodbye.txt
=== Modifications Not Staged For Commit ===
junk.txt (deleted)
wug3.txt (modified)
=== Untracked Files ===
random.stuff
There is an empty line between sections, and the entire status ends in an empty line as well. Entries should be listed in lexicographic order, using the Java string-comparison order (the asterisk doesn’t
count). A le in the working directory is “modi ed but not staged” if it is
Tracked in the current commit, changed in the working directory, but not staged; or
Staged for addition, but with different contents than in the working directory; or
Staged for addition, but deleted in the working directory; or Not staged for removal, but tracked in the current commit and deleted from the working directory.
The nal category (“Untracked Files”) is for les present in the working directory but neither staged for addition nor tracked. This includes les that have been staged for removal, but then re- created without Gitlet’s knowledge. Ignore any subdirectories that may have been introduced, since Gitlet does not deal with them.
The last two sections (modi cations not staged and untracked les) are optional. Feel free to leave them blank (leaving just the headers).
Runtime: Make sure this depends only on the amount of data in the working directory plus the number of les staged to be added or deleted plus the number of branches.
Failure cases: None
Dangerous?: No
Our line count: ~45
Conceptual Quiz (without branching) Conceptual Quiz (with branching)
checkout
Checkout is a kind of general command that can do a few different things depending on what its arguments are. There are 3 possible use cases. In each section below, you’ll see 3 numbered points. Each corresponds to the respective usage of checkout.
Usages:
1. java gitlet.Main checkout — [file name] 2.
3. java gitlet.Main checkout [branch name] Descriptions:
1. Takes the version of the le as it exists in the head commit and puts it in the working directory, overwriting the version of the le that’s already there if there is one. The new version of the le is not staged.
2. Takes the version of the le as it exists in the commit with the given id, and puts it in the working directory, overwriting the version of the le that’s already there if there is one. The new version of the le is not staged.
3. Takes all les in the commit at the head of the given branch, and puts them in the working directory, overwriting the versions of the les that are already there if they exist. Also, at the end of this command, the given branch will now be considered the current branch (HEAD). Any les that are tracked in the current branch but are not present in the checked-out branch are deleted. The staging area is cleared, unless the checked-out branch is the current branch (see Failure cases below).
java gitlet.Main checkout [commit id] —
[file name]
Runtimes:
1. Should be linear relative to the size of the le being checked out.
2. Should be linear with respect to the total size of the les in the commit’s snapshot. Should be constant with respect to any measure involving number of commits. Should be constant with respect to the number of branches.
Failure cases:
1. If the le does not exist in the previous commit, abort, printing the error message File does not exist in that commit. Do not change the CWD.
2. If no commit with the given id exists, print
with that id exists. Otherwise, if the le does not exist in the given commit, print the same message as for failure case 1. Do not change the CWD.
3. If no branch with that name exists, print No such branch If that branch is the current branch, print No
Ifa working le is untracked in the current branch and would be
overwritten by the checkout, print
and exit; perform this check before doing anything else. Do not change the CWD.
Differences from real git: Real git does not clear the staging area and stages the le that is checked out. Also, it won’t do a checkout that would overwrite or undo changes (additions or removals) that you have staged.
A [commit id] is, as described earlier, a hexadecimal numeral. A convenient feature of real Git is that one can abbreviate commits with a unique pre x. For example, one can abbreviate
exists.
need to checkout the current branch.
There is an
untracked file in the way; delete it, or add
and commit it first.
No commit
as
in the (likely) event that no other object exists with a SHA-1 identi er that starts with the same six digits. You should arrange for the same thing to happen for commit ids that contain fewer than 40 characters. Unfortunately, using shortened ids might slow down the nding of objects if implemented naively (making the time to nd a le linear in the number of objects), so we won’t worry about timing for commands that use shortened ids. We suggest, however, that you poke around in a
.git directory (speci cally, .git/objects ) and see how it manages to speed up its search. You will perhaps recognize a familiar data structure implemented with the le system rather than pointers.
Only version 3 (checkout of a full branch) modi es the staging area: otherwise les scheduled for addition or removal remain so.
Dangerous?: Yes! Our line counts:
~15 ~5 ~15
Conceptual Quiz (without branching) Conceptual Quiz (with branching)
branch
Usage: java gitlet.Main branch [branch name]
a0da1e
a0da1ea5a15ab613bf9961fd86f010cf74c7ee48
Description: Creates a new branch with the given name, and points it at the current head commit. A branch is nothing more than a name for a reference (a SHA-1 identi er) to a commit node. This command does NOT immediately switch to the newly created branch (just as in real Git). Before you ever call branch, your code should be running with a default branch called “master”.
Runtime: Should be constant relative to any signi cant measure.
Failure cases: If a branch with the given name already exists, print the error message A branch with that name already exists.
Dangerous?: No
Our line count: ~10
All right, let’s see what branch does in detail. Suppose our state looks
like this:
Now we call java gitlet.Main branch cool-beans . Then we get this:
Hmm… nothing much happened. Let’s switch to the branch with java gitlet.Main checkout cool-beans :
Nothing much happened again?! Okay, say we make a commit now. Modify some les, then then java
java gitlet.Main add…
gitlet.Main commit…
I was told there would be branching. But all I see is a straight line. What’s going on? Maybe I should go back to my other branch with
java gitlet.Main checkout master :
Now I make a commit…
Phew! So that’s the whole idea of branching. Did you catch what’s going on? All that creating a branch does is to give us a new pointer. At any given time, one of these pointers is considered the currently active pointer, also called the HEAD pointer (indicated by *). We can switch the currently active head pointer with checkout [branch name] . Whenever we commit, it means we add a child commit to the currently active HEAD commit even if there is already a child commit. This
naturally creates branching behavior as a commit can now have multiple children.
Make sure that the behavior of your branch , checkout , and commit match what we’ve described above. This is pretty core
functionality of Gitlet that many other commands will depend upon. If any of this core functionality is broken, very many of our autograder tests won’t work!
rm-branch
Usage: java gitlet.Main rm-branch [branch name]
Description: Deletes the branch with the given name. This only means to delete the pointer associated with the branch; it does not mean to delete all commits that were created under the branch, or anything like that.
Runtime: Should be constant relative to any signi cant measure.
Failure cases: If a branch with the given name does not exist, aborts. Print the error message A branch with that name does not exist. If you try to remove the branch you’re currently on, aborts, printing the error message Cannot remove the current branch.
Dangerous?: No Our line count: ~15
reset
Usage: java gitlet.Main reset [commit id]
Description: Checks out all the les tracked by the given commit. Removes tracked les that are not present in that commit. Also moves the current branch’s head to that commit node. See the intro for an example of what happens to the head pointer after using
reset. The may be abbreviated as for
. The staging area is cleared. The command is
checkout
[commit id]
essentially
the current branch head.
of an arbitrary commit that also changes
checkout
Runtime: Should be linear with respect to the total size of les tracked by the given commit’s snapshot. Should be constant with respect to any measure involving number of commits.
Failure case: If no commit with the given id exists, print No commit with that id exists. If a working le is untracked in the current branch and would be overwritten by the reset, print `There is an untracked le in the way; delete it, or add and commit it rst.`
and exit; perform this check before doing anything else. Dangerous?: Yes!
Differences from real git: This command is closest to using the — hard option, as in git reset –hard [commit hash] .
Our line count: ~10 How did we get such a small line count? Recall that you should reuse your code 🙂
Skeleton
The skeleton is fairly bare bones with mostly empty classes. We’ve provided helpful javadoc comments hinting at what you might want to include in each le. You should follow a similar approach to Capers where your Main class doesn’t do a whole lot of work by itself, but rather simply calls other methods depending on the args . You’re absolutely welcome to delete the other classes or add your own, but the
Main class should remain otherwise our tests won’t be able to nd your code.
If you’re confused on where to start, we suggest looking over Lab 9: Persistence and Gitlet.
Design Document
Since you are not working from a substantial skeleton this time, we are asking that everybody submit a design document describing their implementation strategy. It is not graded, but you must have an up-to- date and completed design document before we help you in Lab or on a Gitbug. If you do not have one or it’s not up-to-date/not complete, we cannot help you. This is for both of our sakes: by having a design doc, you have written out a road map for how you will tackle the assignment. If you need help creating a design document, we can de nately help with that 🙂 Here are some guidelines, as well as an example from the Capers lab.
Grader Details
We have two graders for Gitlet: the checkpoint grader and the full grader. Out of the 42 total points available, 10 will come from the checkpoint grader and 32 will come from the full grader.
Checkpoint Grader
Due July 16 at 11:59 PM for 10 points. Submission to the checkpoint autograder follows the grading for a usual project, that is:
Within 24 hours of the deadline: 10% penalty
Within 48 hours of the deadline: 50% penalty
More than 48 hours after the deadline: 100% penalty (no credit)
Submit to the Project 2: Gitlet Checkpoint autograder on Gradescope.
It will test:
Your program compiles.
You pass the sample tests from the skeleton:
testing/samples/*.in
init
add
commit
checkout — [file name]
checkout [commit id] — [file name]
log
. These require you to implement:
,
In addition, it will comment on (but not score):
, and
Whether you pass style checks (it will ignore TODO -type comments for now; we won’t in the nal submission.) Whether there are compiler warning messages.
We will score these in your nal submission.
You’ll have a maximum capacity of 1 token which will refresh every 20 minutes. You will not get full logs on these failures (i.e. you will be told what test you failed but not any additional message), though since you have the tests themselves you can simply debug it locally.
Full Grader
Due July 23 at 11:59 PM for 32 points.
The full grader is a more substantial and comprehensive test suite. Like the checkpoint, you’ll have a maximum capacity of 1 token which will refresh every 20 minutes.
You’ll see that, like Project 1, there is limited access to the grader. Please be kind to yourself and write tests along the way so you do not become too reliant on the autograder for checking your work.
Similar to the checkpoint, the full grader will have English hints on what each test does but not the actual .in le.
Extra credit
There are a total of 2 extra credit points possible, for implementing merge.
The rest of this spec is lled resources for you that you should read to get you started. The section on testing/debugging will be extremely helpful to you as testing and debugging in this project will be different than previous projects, but not so complicated.
Miscellaneous Things to Know about the Project
Phew! That was a lot of commands to go over just now. But don’t worry, not all commands are of the same dif culty. You can see for each command the approximate number of lines we took to do each part (this only counts code speci c to that command – it doesn’t double-count code reused in multiple commands). You shouldn’t worry about matching our solution exactly, but hopefully it gives you an idea about the relative time consumed by each command.
This is an ambitious project, and it would not be surprising for you to feel lost as to where to begin. Therefore, feel free to collaborate with others a little more closely than usual, with the following caveats:
Acknowledge all collaborators in comments near the beginning of your gitlet/Main.java le.
Don’t share speci c code; all collaborators must produce their own versions of the algorithms they come up with, so that we can see they differ.
The Ed megathreads typically get very long for Gitlet, but they are full of very good conversation and discussion on the approach for particular commits. In this project more than any you should take advantage of the size of the class and see if you can nd someone with a similar question to you on the megathread. It’s very unlikely that your question is so unique to you that nobody else has had it (unless it is a bug that relates to your design, in which case you should submit a Gitbug).
By now this spec has given you enough information to get working on the project. But to help you out some more, there are a couple of things you should be aware of:
Dealing with Files
This project requires reading and writing of les. In order to do these operations, you might nd the classes java.io.File and
helpful. Actually, you may nd various things in the and java.nio packages helpful. Be sure to
read the package for other things we’ve written for you. If you do a little digging through all of these, you might nd a couple of methods that will make the io portion of this project much easier! One warning: If you nd yourself using readers, writers, scanners, or streams, you’re making things more complicated than need be.
Serialization Details
If you think about Gitlet, you’ll notice that you can only run one command every time you run the program. In order to successfully complete your version-control system, you’ll need to remember the commit tree across commands. This means you’ll have to design not just a set of classes to represent internal Gitlet structures during execution, but you’ll need an analogous representation as les within
java.nio.file.Files
java.io
gitlet.Utils
your directories, which will carry across multiple runs of your program.
As indicated earlier, the convenient way to do this is to serialize the runtime objects that you will need to store permanently in les. The Java runtime does all the work of guring out what elds need to be converted to bytes and how to do so.
You’ve already done serialization in lab6 and so we will not repeat the information here. If you are still confused on some aspect of serialization, re-read the relevant portion of the lab6 spec and also look over your code.
There is, however, one annoying subtlety to watch out for: Java serialization follows pointers. That is, not only is the object you pass into
writeObject serialized and written, but any object it points to as well. If your internal representation of commits, for example, represents the parent commits as pointers to other commit objects, then writing the head of a branch will write all the commits (and blobs) in the entire subgraph of commits into one le, which is generally not what you want. To avoid this, don’t use Java pointers to refer to commits and blobs in your runtime objects, but instead use SHA-1 hash strings. Maintain a runtime map between these strings and the runtime objects they refer to. You create and ll in this map while Gitlet is running, but never read or write it to a le.
You might nd it convenient to have (redundant) pointers commits as well as SHA-1 strings to avoid the bother and execution time required to look them up each time. You can store such pointers in your objects while still avoiding having them written out by declaring them “transient”, as in
private transient MyCommitType parent1;
.gitlet
Such elds will not be serialized, and when back in and deserialized, will be set to their default values (null for reference types). You must be careful when reading the objects that contain transient elds back in to set the transient elds to appropriate values.
Unfortunately, looking at the serialized les your program has produced with a text editor (for debugging purposes) would be rather unrevealing; the contents are encoded in Java’s private serialization encoding. We have therefore provided a simple debugging utility program you might nd useful: . See the Javadoc comment on
for details.
As usual, testing is part of the project. Be sure to provide your own integration tests for each of the commands, covering all the speci ed functionality. Also, feel free add any unit tests you’d like. We don’t provide any unit tests since unit tests are highly dependent on your implementation.
We have provided a testing program that makes it relatively easy to write integration tests: testing/tester.py . This interprets testing les with an .in extension. You may run all of the tests with the command
If you’d like additional information on the failed tests, such as what your program is outputting, run:
Testing
gitlet.DumpObj
gitlet/DumpObj.java
make check
make check TESTER_FLAGS=”–verbose”
If you’d like to run a single test, within the subdirectory, run the command
where FILE.in … is a list of speci c .in les you want to check.
CAREFUL RUNNING THIS COMMAND as it does not recompile your code. Every time you run a python command, you must rst compile your code (via make ).
The command
will, in addition, keep around the directory that tester.py produces so that you can examine its les at the point the tester script detected an error. If your test did not error, then the directory will still remain there with the nal contents of everything.
In effect, the tester implements a very simple domain-speci c language (DSL) that contains commands to
Set up or remove les from a testing directory;
Run java gitlet.Main ;
Check the output of Gitlet against a speci c output or a regular expression describing possible outputs;
Check the presence, absence, and contents of les. Running the command
python3 tester.py –verbose FILE.in …
python3 tester.py –verbose –keep FILE.in
python3 testing/tester.py
testing
(with no operands, as shown) will provide a message documenting this language. We’ve provided some examples in the directory
testing/samples . Don’t put your own tests in that subdirectory; place them somewhere distinct so you don’t get confused with our tests vs your tests (which may be buggy!). Put all your les in another folder called student_tests within the directory. In the skeleton, this folder is blank.
We’ve added a few things to the Make le to adjust for differences in people’s setups. If your system’s command for invoking Python 3 is simply python , you can still use our make le unchanged by using
You can pass additional flags to tester.py with, for example:
Understanding Integration Tests
The rst thing we’ll ask for in Gitbugs and when you come to receive help in Lab is a test that you’re failing, so it’s paramount that you learn to write tests in this project. We’ve done a lot of work to make this as painless as possible, so please take the time to read through this section so you can understand the provided tests and write good tests yourself.
The integration tests are of similar format to those from Capers. If you don’t know how the Capers integration tests (i.e. the .in les) work, then read that section from the capers spec rst.
The provided tests are hardly comprehensive, and you’ll de nitely need to write your own tests to get a full score on the project. To write a test,
.in
testing
make PYTHON=python check
make TESTER_FLAGS=”–keep –verbose”
let’s rst understand how this all works.
Here is the structure of the testing directory:
.
Makefile
test02-basic-checkout.in
test03-basic-log.in
test04-prev-checkout.in
definitions.inc
notwug.txt
wug.txt
student_tests <==== Your .i
samples <==== Sample
test01-init.in <==== An exa
src <==== Contain
runner.py <==== Script
tester.py <==== Script
Just like Capers, these tests work by creating a temporary directory within the testing directory and running the commands speci ed by a .in le. If you use the --keep flag, this temporary directory will remain after the test nishes so you can inspect it.
Unlike Capers, we’ll need to deal with the contents of les in our working directory. So in this testing folder, we have an additional folder called src . This directory stores many pre- lled .txt les that have particular contents we need. We’ll come back to this later, but for now just know that src stores actual le contents. samples has the .in les of the sample tests (which are the checkpoint tests).
n m
s
t t
──└
──├ ──└ │ ──├ │ ──├ ──└ │
──├ │
──├ │
──├ │
──├ │ ──├ ──├ ──├
When you create your own tests, you should add them to the student_tests folder which is initially empty in the skeleton.
The .in les have more functions in Gitlet. Here is the explanation straight from the tester.py le:
# ... A comment, producing no effect.
back to the default directory.
seconds.
+ NAME F
- NAME
Delete the file named NAME.
> COMMAND OPERANDS
LINE1
LINE2
…
<<<
I FILE Include. Replace this statement with the con
interpreted relative to the directory containi
C DIR Create, if necessary, and switch to a subdire
the main directory for this test. If DIR is m
T N Set the timeout for gitlet commands in the re
Copy the contents of src/F into a file named N
Run gitlet.Main with COMMAND ARGUMENTS as its
its output with LINE1, LINE2, etc., reporting
"sufficient" discrepency. The <<< delimiter m
an asterisk (*), in which case, the preceding
Python regular expressions and matched accordi
or JAR file containing the gitlet.Main program
in directory DIR specifed by --progdir (defaul
t n c i
s
A
p a a l n
t
= NAME F
Check that the file named NAME is identical to
error if not.
* NAME
does. E NAME
does not.
D VAR "VALUE"
first applied to VALUE.
Check that the file NAME does not exist, and r
Check that file or directory NAME exists, and
Defines the variable VAR to have the literal v
taken to be a raw Python string (as in r"VALUE
Don’t worry about the Python regular expressions thing mentioned in the above description: we’ll show you that it’s fairly straightforward and even go through an example of how to use it.
Let’s walk through a test to see what happens from start to nish. Let’s examine test02-basic-checkout.in .
Example test
When we rst run this test, a temporary directory gets created that is initially empty. Our directory structure is now:
.
Makefile
student_tests
samples
test01-init.in
e
r
a "
──├ │ ──├ ──├ ──├
This temporary directory is the Gitlet repository that will be used for this execution of the test, so we will add things there and run all of our Gitlet commands there as well. If you ran the test a second time without deleting the directory, it’ll create a new directory called test02- basic-checkout_1 , and so on. Each execution of a test uses it’s own directory, so don’t worry about tests interfering with each other as that cannot happen.
The rst line of the test is a comment, so we ignore it. The next section is:
This shouldn’t have any output as we can tell by this section not having any text between the rst line with > and the line with <<< . But, as we know, this should create a .gitlet folder. So our directory structure is now:
> init <<<
test02-basic-checkout.in
test03-basic-log.in
test04-prev-checkout.in
definitions.inc
src
notwug.txt
wug.txt
runner.py
tester.py
test02-basic-checkout_0 <==== Just cr
e
──└ ──├ ──├
──└ │ ──├ │ ──├ ──└ │
──├ │
──├ │
──├ │
The next section is:
This line uses the + command. This will take the le on the right-hand side from the src directory and copy its contents to the le on the left-hand side in the temporary directory (creating it if it doesn’t exist). They happen to have the same name, but that doesn’t matter since they’re in different directories. After this command, our directory structure is now:
+ wug.txt wug.txt
.
Makefile
student_tests
samples
test01-init.in
test02-basic-checkout.in
test03-basic-log.in
test04-prev-checkout.in
definitions.inc
src
notwug.txt
wug.txt
test02-basic-checkout_0
runner.py
tester.py
.gitlet <==== Just cr
e
──└
──├ ──└ │ ──├ ──└ │ ──├ │ ──├ ──└ │
──├ │
──├ │
──├ │
──├ │ ──├ ──├ ──├
Now we see what the src directory is used for: it contains le contents that the tests can use to set up the Gitlet repository however you wants. If you want to add special contents to a le, you should add those contents to an appropriately named le in src and then use the same + command as we have here. It’s easy to get confused with the order of arguments, so make sure the right-hand side is referencing the le in the src directory, and the right-hand side is referencing the le in the temporary directory.
The next section is:
.
Makefile
student_tests
samples
test01-init.in
test02-basic-checkout.in
test03-basic-log.in
test04-prev-checkout.in
definitions.inc
src
notwug.txt
wug.txt
test02-basic-checkout_0
.gitlet
runner.py
tester.py
wug.txt <==== Just cr
e
──└
──├ ──└ │ ──├ │ ──├ ──└ │ ──├ │ ──├ ──└ │
──├ │
──├ │
──├ │
──├ │ ──├ ──├ ──├
As you can see, there should be no output. The wug.txt le is now staged for addition in the temporary directory. At this point, your directory structure will likely change within the test02-basic-
directory since you’ll need to somehow persist the fact that is staged for addition.
The next section is:
And, again, there is no output, and, again, your directory strcuture within .gitlet might change.
The next section is:
Since wug.txt already exists in our temporary directory, its contents changes to be whatever was in src/notwug.txt .
The next section is
Which, again, has no output. However, it should change the contents of wug.txt in our temporary directory back to its original contents
checkout_0/.gitlet
wug.txt
> commit “added wug”
<<<
+ wug.txt notwug.txt
> checkout — wug.txt
<<<
> add wug.txt
<<<
which is exactly the contents of . The next command is what asserts that:
This is an assertion: if the le on the left-hand side (again, this is in the temporary directory) doesn’t have the exact contents of the le on the right-hand side (from the src directory), the testing script will error and say your le contents are not correct.
There are two other assertion commands available to you:
Will assert that there exists a le/folder named NAME in the temporary directory. It doesn’t check the contents, only that it exists. If no le/folder with that name exists, the test will fail.
Will assert that there does NOT exist a le/folder named NAME in the temporary directory. If there does exist a le/folder with that name, the test will fail.
That happened to be the last line of the test, so the test nishes. If the --keep flag was provided, the temporary directory will remain,
otherwise it will be deleted. You might want to keep it if you suspect your .gitlet directory is not being properly setup or there is some issue with persistence.
Setup for a test
= wug.txt wug.txt
E NAME
* NAME
src/wug.txt
As you’ll soon discover, there can be a lot of repeated setup to test a particular command: for example, if you’re testing the checkout command you need to:
1. Initialize a Gitlet Repository
2. Create a commit with a le in some version (v1)
3. Create another commit with that le in some other version (v2) 4. Checkout that le to v1
And perhaps even more if you want to test with les that were untracked in the second commit but tracked in the rst.
So the way you can save yourself time is by adding all that setup in a le and using the I command. Say we do that here:
# Initialize, add, and commit a file.
> init
<<<
+ a.txt wug.txt
> add a.txt
<<<
> commit “a is a wug”
<<<
We should place this le with the rest of the tests in the samples directory, but with a le extension , so maybe we name it
. If we gave it the le extension , our testing script will mistake it for a test and try to run it
individually. Now, in our actual test, we simply use the command:
.inc
samples/commit_setup.inc
.in
I commit_setup.inc
This will have the testing script run all of the commands in that le and keep the temporary directory it creates. This keeps your tests relatively short and thus easier to read.
We’ve included one .inc le called definitions.inc that will set up patterns for your convenience. Let’s understand what patterns are.
Pattern matching output
The most confusing part of testing is the output for something like log . There are a few reasons why:
1. The commit SHA will change as you modify your code and hash more things, so you would have to continually modify your test to keep up with the changes to the SHA.
2. Your date will change every time since time only moves forwards.
3. It makes the tests very long.
We also don’t really care the exact text: just that there is some SHA there and something with the right date format. For this reason, our tests use pattern matching.
This is not a concept you will need to understand, but at a high level we de ne a pattern for some text (i.e. a commit SHA) and then just check that the output has that pattern (without caring about the actual letters and numbers).
Here is how you’d do that for the output of log and check that it matches the pattern:
# First "import" the pattern defintions from our set
I definitions.inc
# You would add your lines here that create commits
# specified messages. We'll omit this for this examp
u
w l
The section we see is the same as a normal Gitlet command, except it ends in <<<* which tells the testing script to use patterns. The patterns are enclosed in ${PATTERN_NAME} .
All the patterns are de ned in samples/definitions.inc . You don’t need to understand the actual pattern, just the thing it matches. For example, HEADER matches the header of a commit which should look something like:
That’s just some random commit SHA.
So when we create the expected output for this test, we’ll need to know how many entries are in this log and what the commit messages are.
You can do similar things for the status command:
commit fc26c386f550fc17a0d4d359d70bae33c47c54b9
> log
===
${COMMIT_HEAD}
${DATE}
added wug
===
${COMMIT_HEAD}
${DATE}
initial commit
<<<*
The pattern we used here is ARBLINES which is arbitrary lines. If you actually care what is untracked, then you can add that here without the pattern, but perhaps we’re more interested in seeing g.txt staged for addition.
Notice the \* on the branch master . Recall that in the status command, you should pre x the HEAD branch with a * . If you use a pattern, you’ll need to replace this * with a \* in the expected output. The reason is out of the scope of the class, but it is called “escaping” the asterisk. If you don’t use a pattern (i.e. your command ends in <<< not <<<* , then you can use the * without the \ ).
I definitions.inc
> status
=== Branches ===
\*master
=== Staged Files ===
g.txt
=== Removed Files ===
=== Modifications Not Staged For Commit ===
=== Untracked Files ===
${ARBLINES}
<<<*
# Add commands here to setup the status. We'll omit
t
The nal thing you can do with these patterns is “save” a matched portion. Warning: this seems like magic and we don’t care at all if you understand how this works, just know that it does and it is available to you. You can copy and paste the relevant part from our provided tests so you don’t need to worry too much about making these from scratch. With that out of the way, let’s see what this is.
If you’re doing a checkout command, you need to use the SHA identi er to specify which commit to checkout to/from. But remember we used patterns, so we don’t actually know the SHA identi er at the time of creating the test. That is problematic. We’ll use test04-prev- checkout.in to see how you can “capture” or “save” the SHA:
I definitions.inc
# Each ${COMMIT_HEAD} captures its commit UID.
> log
===
${COMMIT_HEAD}
${DATE}
version 2 of wug.txt
===
${COMMIT_HEAD}
${DATE}
version 1 of wug.txt
===
${COMMIT_HEAD}
${DATE}
initial commit
This will set up the UID (SHA) to be captured after the log command. So right after this command runs, we can use the D command to de ne the UIDs to variables:
# UID of second version
D UID2 “${1}”
# UID of first version
D UID1 “${2}”
Notice how the numbering is backwards: the numbering begins at 1 and starts at the top of the log. That is why the current version (i.e. second version) is de ned as “${1}” . We don’t care about the initial commit, so we don’t bother capturing it’s UID.
Now we can use that de nition to checkout to that captured SHA:
And now you can make your assertions to ensure the checkout was successful.
Testing conclusion
There are many more complex things you can do with our testing script, but this is enough to write very good tests. You should use our provided tests as an example to get started, and also feel free to discuss on Ed high level ideas of how to test things. You may also share your .in
> checkout ${UID1} — wug.txt
<<<
<<<*
les, but please make sure they’re correct before posting them and add comments so other students and staff can see what is going on.
Debugging Integration Tests
Recall from Lab 9 that debugging integration tests is a bit different with the new setup. The runner.py script will work just as it did for Capers, so you should read through that section in the Lab 9 spec and watch the video linked there. Here we describe strategies to debug:
Finding the right execution to debug
Each test runs your program multiple times, and each one of them has the potential to introduce a bug. The rst priority is to identify the right execution of the program that introduces the bug. What we mean by this: imagine you’re failing a test that checks the status command. Say that the output differs by just one le: you say it’s untracked, but the test says it should be staged for addition. This does not mean the
status command has a bug. It’s possible that the
command is buggy, but not guaranteed. It could be that your command didn’t properly persist the fact that a le has been staged for addition! If that is the case, then even with a fully functioning status command, your program would error.
So nding the right (i.e. buggy) execution of the program is very important: how do we do that? You step through every single execution of the program using the runner.py script, and after every execution you look at your temporary directory to make sure everything has been written to a le correctly. This will be harder for serialized objects since, as we know, their contents will be a stream of unintelligable bytes: for serialized objects you can simply check that at the time of serialization they have the correct contents. You may even nd that you never serialized it!
status
add
Eventually, you’ll nd the bug. If you cannot, then that is when you can come to Lab or post a Gitbug. Be warned: we can only spend 10 minutes with each student in Lab, so if you have a nasty bug that you think would take a TA more than 10 minutes, then you should instead submit a Gitbug with as much information as possible. The better your Gitbug, the better/faster your response will be. Don’t forget to update your design doc: remember we will reject Gitbugs that do not have an up-to- date or complete design document.
Merge (Extra Credit)
Depending on how flexibly you have designed the rest of the project, 2 extra-credit points may not be worth the amount of effort it takes to do this section. We’re certainly not expecting everyone to do it. Our priority will be in helping students complete the main project; if you’re doing the extra credit, we expect you to be able to stand on your own a little bit more than most students.
merge
Usage: java gitlet.Main merge [branch name]
Description: Merges les from the given branch into the current branch. This method is a bit complicated, so here’s a more detailed description:
First consider what we call the split point of the current branch and the given branch. For example, if master is the current branch and branch is the given branch:
The split point is a latest common ancestor of the current and given branch heads: - A common ancestor is a commit to which there is a path (of 0 or more parent pointers) from both branch heads. - A latest common ancestor is a common ancestor that is not an ancestor of any other common ancestor. For example, although the leftmost commit in the diagram above is a common ancestor of master and
branch , it is also an ancestor of the commit immediately to its right, so it is not a latest common ancestor. If the split point is the same commit as the given branch, then we do nothing; the merge is complete, and the operation ends with the message
If the split point is the current branch, then the effect is to check out the given branch, and the
operation ends after printing the message Current branch fast-forwarded. Otherwise, we continue with the steps below.
1. Any les that have been modi ed in the given branch since the split point, but not modi ed in the current branch since the split point should be changed to their versions in the given branch (checked out from the commit at the front of the given branch). These les should then all be automatically staged. To clarify, if a le is “modi ed in the
Given branch is an ancestor of the
current branch.
given branch since the split point” this means the version of the le as it exists in the commit at the front of the given branch has different content from the version of the le at the split point. Remember: blobs are content addressable!
2. Any les that have been modi ed in the current branch but not in the given branch since the split point should stay as they are.
3. Any les that have been modi ed in both the current and given branch in the same way (i.e., both les now have the same content or were both removed) are left unchanged by the merge. If a le was removed from both the current and given branch, but a le of the same name is present in the working directory, it is left alone and continues to be absent (not tracked nor staged) in the merge.
4. Any les that were not present at the split point and are present only in the current branch should remain as they are.
5. Any les that were not present at the split point and are present only in the given branch should be checked out and staged.
6. Any les present at the split point, unmodi ed in the current branch, and absent in the given branch should be removed (and untracked).
7. Any les present at the split point, unmodi ed in the given branch, and absent in the current branch should remain absent.
8. Any les modi ed in different ways in the current and given branches are in conflict. “Modi ed in different ways” can mean that the contents of both are changed and different from other, or the contents of one are changed and the other le is deleted, or the le was absent at the split point and has
different contents in the given and current branches. In this case, replace the contents of the conflicted le with
<<<<<<< HEAD
contents of file in current branch
=======
contents of file in given branch
>>>>>>>
(replacing “contents of…” with the indicated fi
and stage the result.
Treat a deleted file in a branch
something like this:
<<<<<<< HEAD
contents of file in current branch=======
contents of file in given branch>>>>>>>
and a line separator deserve what they get.
as an empty file. Use straight concatenation her
of a file with no newline at the end, you might w
This is fine; people who produce non-standard, pa
because they don’t know the difference between a
Once les have been updated according to the above, and the split point was not the current branch or the given branch, merge automatically commits with the log message
Merged [given branch name] into [current
branch name].
Then, if the merge encountered a conflict,
print the message
on the terminal (not the log). Merge commits differ from other commits: they record as parents both the head of the current branch (called the rst parent) and the head of the branch given on the command line to be merged in.
A video walkthrough of this command can be found here.
By the way, we hope you’ve noticed that the set of commits has progressed from a simple sequence to a tree and now, nally, to a full directed acyclic graph.
Runtime: , where is the total number of ancestor commits for the two branches and is the total amount of data in all the les under these commits.
Failure cases: If there are staged additions or removals present, print the error message You have uncommitted changes. and exit. If a branch with the given name does not exist, print the error message A branch with that name does not exist. If attempting to merge a branch with itself, print the error message Cannot merge a branch with itself. If merge would generate an error because the commit that it does has no changes in it, just let the normal commit error message for this go through. If an untracked le in the current commit would be overwritten or deleted by the merge, print
and exit; perform this check before doing
anything else. Dangerous?: Yes!
Differences from real git: Real Git does a more subtle job of merging les, displaying conflicts only in places where both les have changed since the split point.
There is an
untracked file in the way; delete it, or add and
commit it first.
Encountered a merge conflict.
D
N )D + N gl N(O
Real Git has a different way to decide which of multiple possible split points to use.
Real Git will force the user to resolve the merge conflicts before committing to complete the merge. Gitlet just commits the merge, conflicts and all, so that you must use a separate commit to resolve problems.
Real Git will complain if there are unstaged changes to a le that would be changed by a merge. You may do so as well if you want, but we will not test that case.
Our line count: ~70 Conceptual Quiz
I. Things to Avoid
There are few practices that experience has shown will cause you endless grief in the form of programs that don’t work and bugs that are very hard to nd and sometimes not repeatable (“Heisenbugs”).
1. Since you are likely to keep various information in les (such as commits), you might be tempted to use apparently convenient le- system operations (such as listing a directory) to sequence through all of them. Be careful. Methods such as File.list and
File.listFiles produce le names in an unde ned order. If you use them to implement the log command, in particular, you can get random results.
2. Windows users especially should beware that the le separator character is / on Unix (or MacOS) and ‘\’ on Windows. So if you form le names in your program by concatenating some directory names and a le name together with explicit / s or \ s, you can be sure that it won’t work on one system or the other. Java provides
a system-dependent le separator character
( ), or you can use the multi-argument constructors to .
J. Acknowledgments
Thanks to Alicia Luengo, Josh Hug, Sarah Kim, Austin Chen, Andrew Huang, Yan Zhao, Matthew Chow, especially Alan Yao, Daniel Nguyen, and Armani Ferrante for providing feedback on this project. Thanks to git for being awesome.
This project was largely inspired by [this][Nilsson Article] excellent article by Philip Nilsson.
This project was created by Joseph Moghadam. Modi cations for Fall 2015, Fall 2017, and Fall 2019 by Paul Hil nger.
System.getProperty(“file.separator”)
File