Main Course Info Staff Beacon Resources
Due 3 December 2021
Useful Links
Overview of Gitlet
Copyright By PowCoder代写 加微信 powcoder
Internal Structures
Detailed Spec of Behavior
The Commands
Miscellaneous Things to Know about the Project
Dealing with Files
Serialization Details
Design Document and Checkpoint
Grader Details
Going Remote (Extra Credit)
The Commands Things to Avoid Acknowledgments
Project 3: Gitlet, your own version-control system
Due 3 December 2021 Useful Links
Listed below are many high quality resources compiled across multiple semesters to help you get started/unstuck on Gitlet. These videos and resources will be linked in the relevant portions of the spec, but they are here as well for your convenience. More resources may be created throughout the duration of the project as needed; if so, they will be linked here as well.
Git Intros: These should mostly be review at this point since you have been using Git throughout the semester, but it is vital that you have a strong understanding of Git itself before trying to implement Gitlet. Be sure you understand the contents of these videos thoroughly before proceeding.
Gitlet Intros: The introduction to our mini-version of Git, Gitlet.
Part 1 Part 2 Part 3 Part 4
Understanding Branch Understanding Merge Testing
Overview of Gitlet
In this project you’ll be implementing a version-control system that mimics some of the basic features of the popular system Git. Ours is smaller and simpler, however, so we have named it Gitlet.
A version-control system is essentially a backup system for related collections of files. The main functionality that Gitlet supports is:
1. Saving the contents of entire directories of files. In Gitlet, this is called committing, and the saved contents themselves are called commits.
2. Restoring a version of one or more files or entire commits. In Gitlet, this is called checking out those files or that commit.
3. Viewing the history of your backups. In Gitlet, you view this history in something called the log.
4. Maintaining related sequences of commits, called branches.
5. Merging changes made in one branch into another.
The point of a version-control system is to help you when creating complicated (or even not-so- complicated) projects, or when collaborating with others on a project. You save versions of the project periodically. If at some later point in time you accidentally mess up your code, then you can restore your source to a previously committed version (without losing any of the changes you made since then). If your collaborators make changes embodied in a commit, you can incorporate (merge) these changes into your own version.
In Gitlet, you don’t just commit individual files at a time. Instead, you can commit a coherent set of files at the same time. We like to think of each commit as a snapshot of your entire project at one point in time. However, for simplicity, many of the examples in the remainder of this document involve changes to just one file at a time. Just keep in mind you could change multiple files in each commit.
In this project, it will be helpful for us to visualize the commits we make over time. Suppose we have a project consisting just of the file wug.txt, we add some text to it, and commit it. Then we modify the file and commit these changes. Then we modify the file again, and commit the changes again. Now we have saved three total versions of this file, each one later in time than the previous. We can visualize these commits like so:
Here we’ve drawn an arrow indicating that each commit contains some kind of reference to the commit that came before it. We call the commit that came before it the parent commit¡ªthis will be important later. But for now, does this drawing look familiar? That’s right; it’s a linked list!
The big idea behind Gitlet is that we can visualize the history of the different versions of our files in a list like this. Then it’s easy for us to restore old versions of files. You can imagine making a command like: “Gitlet, please revert to the state of the files at commit #2”, and it would go to the second node in the linked list and restore the copies of files found there, while removing any files that are in the first node, but not the second.
If we tell Gitlet to revert to an old commit, the front of the linked list will no longer reflect the current state of your files, which might be a little misleading. In order to fix this problem, we introduce something called the head pointer. The head pointer keeps track of where in the linked list we currently are. Normally, as we make commits, the head pointer will stay at the front of the linked list, indicating that the latest commit reflects the current state of the files:
However, let’s say we revert to the state of the files at commit #2 (technically, this is the reset command, which you’ll see later in the spec). We move the head pointer back to show this:
All right, now, if this were all Gitlet could do, it would be a pretty simple system. But Gitlet has one more trick up its sleeve: it doesn’t just maintain older and newer versions of files, it can maintain differing versions. Imagine you’re coding a project, and you have two ideas about how to proceed: let’s call one Plan A, and the other Plan B. Gitlet allows you to save both versions, and switch between them at will. Here’s what this might look like, in our pictures:
It’s not really a linked list anymore. It’s more like a tree. We’ll call this thing the commit tree. Keeping with this metaphor, each of the separate versions is called a branch of the tree. You can develop each version separately:
There are two pointers into the tree, representing the furthest point of each branch. At any given time, only one of these is the currently active pointer, and this is what’s called the head pointer. The head pointer is the pointer at the front of the current branch.
That’s it for our brief overview of the Gitlet system! Don’t worry if you don’t fully understand it yet; the section above was just to give you a high level picture of what its meant to do. A detailed spec of what you’re supposed to do for this project follows this section.
But a last word here: commit trees are immutable: once a commit node has been created, it can never be destroyed (or changed at all). We can only add new things to the commit tree, not modify existing things. This is an important feature of Gitlet! One of Gitlet’s goals is to allow us to save things so we don’t delete them accidentally.
Internal Structures
Real Git distinguishes several different kinds of objects. For our purposes, the important ones are blobs: Essentially the contents of files.
trees: Directory structures mapping names to references to blobs and other trees (subdirectories).
commits: Combinations of log messages, other metadata (commit date, author, etc.), a reference to a tree, and references to parent commits. The repository also maintains a mapping from branch heads (in this course, we’ve used names like master, proj2, etc.) to references to commits, so that certain important commits have symbolic names.
We will simplify from Git still further by
Incorporating trees into commits and not dealing with subdirectories (so there will be one “flat” directory of plain files for each repository).
Limiting ourselves to merges that reference two parents (in real Git, there can be any number of parents.)
Having our metadata consist only of a timestamp and log message. A commit, therefore, will consist of a log message, timestamp, a mapping of file names to blob references, a parent reference, and (for merges) a second parent reference.
Every object¡ªevery blob and every commit in our case¡ªhas a unique integer id that serves as a reference to the object. An interesting feature of Git is that these ids are universal: unlike a typical Java implementation, two objects with exactly the same content will have the same id on all systems (i.e. my computer, your computer, and anyone else’s computer will compute this same exact id). In the case of blobs, “same content” means the same file contents. In the case of commits, it means the same metadata, the same mapping of names to references, and the same parent reference. The objects in a repository are thus said to be content addressable.
Both Git and Gitlet accomplish this the same way: by using a cryptographic hash function called SHA-1 (Secure Hash 1), which produces a 160-bit integer hash from any sequence of bytes. Cryptographic hash functions have the property that it is extremely difficult to find two different byte streams with the same hash value (or indeed to find any byte stream given just its hash value), so that essentially, we may assume that the probability that any two objects with different contents have the same SHA-1 hash value
is 2-160 or about 10-48. Basically, we simply ignore the possibility of a hashing collision, so that the system has, in principle, a fundamental bug that in practice never occurs!
Fortunately, there are library classes for computing SHA-1 values, so you won’t have to deal with the actual algorithm. All you have to do is to make sure that you correctly label all your objects. In particular, this involves
Including all metadata and references when hashing a commit.
Distinguishing somehow between hashes for commits and hashes for blobs. A good way of doing this involves a well-thought out directory structure within the .gitlet directory. Another way to do so is to hash in an extra word for each object that has one value for blobs and another for commits.
By the way, the SHA-1 hash value, rendered as a 40-character hexadecimal string, makes a convenient file name for storing your data in your .gitlet directory (more on that below). It also gives you a convenient way to compare two files (blobs) to see if they have the same contents: if their SHA-1s are the same, we simply assume the files are the same.
For remotes (like origin and shared, which we’ve been using all semester), we’ll simply use other Gitlet repositories. Pushing simply means copying all commits and blobs that the remote repository does not yet have to the remote repository, and resetting a branch reference. Pulling is the same, but in the other direction. Remotes are extra credit in this project and not required for full credit.
Reading and writing your internal objects from and to files is actually pretty easy, thanks to Java’s serialization facilities. The interface java.io.Serializable has no methods, but if a class implements it, then the Java runtime will automatically provide a way to convert to and from a stream of bytes, which you can then write to a file using the I/O class java.io.ObjectOutputStream and read back (and deserialize) with java.io.ObjectInputStream. The term “serialization” refers to the conversion from some arbitrary structure (array, tree, graph, etc.) to a serial sequence of bytes. You should have seen and gotten practice with serialization in lab 11. You’ll be using a very similar approach here, so do use your lab11 as a resource when it comes to persistence and serialization.
Here is a summary example of the structures discussed in this section. As you can see, each commit (rectangle) points to some blobs (circles), which contain file contents. The commits contain the file names and references to these blobs, as well as a parent link. These references, depicted as arrows, are represented in the .gitlet directory using their SHA-1 hash values (the small hexadecimal numerals above the commits and below the blobs). The newer commit contains an updated version of wug1.txt, but shares the same version of wug2.txt as the older commit. Your commit class will somehow store all of the information that this diagram shows: a careful selection of internal data structures will make the implementation easier or harder, so it behooves you to spend time planning and thinking about the best way to store everything.
Detailed Spec of Behavior Overall Spec
The only structure requirement we’re giving you is that you have a class named gitlet.Main and that it has a main method. Here’s your skeleton code for this project (in package Gitlet):
We are also giving you some utility methods for performing a number of mostly file-system-related tasks, so that you can concentrate on the logic of the project rather than the peculiarities of dealing with the OS.
You may, of course, write additional Java classes to support your project¡ªin fact, please do. But don’t use any external code (aside from JUnit), and don’t use any programming language other than Java. You can use all of the Java Standard Library that you wish, plus utilities we provide.
The majority of this spec will describe how Gitlet.java’s main method must react when it receives various gitlet commands as command-line arguments. But before we break down command-by- command, here are some overall guidelines the whole project should satisfy:
In order for Gitlet to work, it will need a place to store old copies of files and other metadata. All of this stuff must be stored in a directory called .gitlet, just as this information is stored in directory .git for the real git system (files with a . in front are hidden files. You will not be able to see them by default on most operating systems. On Unix, the command ls -a will show them.) A Gitlet system is considered “initialized” in a particular location if it has a .gitlet directory there. Most Gitlet commands (except for the init command) only need to work when used from a directory where a Gitlet system has been initialized¡ªi.e. a directory that has a .gitlet directory. The files that aren’t in your .gitlet directory (which are copies of files from the repository that you are using and editing, as well as files you plan to add to the repository) are referred to as the files in your working directory.
Most commands have runtime or memory usage requirements. You must follow these. Some of the runtimes are described as constant “relative to any significant measure”. The significant measures are: any measure of number or size of files, any measure of number of commits. You can ignore time required to serialize or deserialize, with the one caveat that your serialization time cannot depend in any way on the total size of files that have been added, committed, etc (what is serialization? You’ll see later in the spec). You can also pretend that getting from a hash table is constant time.
Some commands have failure cases with a specified error message. The exact formats of these are specified later in the spec. All error message end with a period; since our autograding is literal, be sure to include it. If your program ever encounters one of these failure cases, it must print the error message and not change anything else. You don’t need to handle any other error cases except the ones listed as failure cases.
There are some failure cases you need to handle that don’t apply to a particular command. Here they are:
If a user doesn’t input any arguments, print the message Please enter a command. and exit.
If a user inputs a command that doesn’t exist, print the message No command with that name exists. and exit.
If a user inputs a command with the wrong number or format of operands, print the message Incorrect operands. and exit.
If a user inputs a command that requires being in an initialized Gitlet working directory (i.e., one containing a .gitlet subdirectory), but is not in such a directory, print the message Not in an initialized Gitlet directory.
Some of the commands have their differences from real Git listed. The spec is not exhaustive in listing all differences from git, but it does list some of the bigger or potentially confusing and misleading ones.
Do NOT print out anything except for what the spec says. Some of our autograder tests will break if you print anything more than necessary.
Always exit with exit code 0, even in the presence of errors. This allows us to use other exit codes as an indication that something blew up.
The spec classifies some commands as “dangerous”. Dangerous commands are ones that potentially overwrite files (that aren’t just metadata)¡ªfor example, if a user tells Gitlet to restore files to older versions, Gitlet may overwrite the current versions of the files. Just FYI.
The Commands
We now go through each command you must support in detail. Remember that good programmers always care about their data structures: as you read these commands, you should think first about how you should store your data to easily support these commands and second about if there is any opportunity to reuse commands that you’ve already implemented (hint: there is ample opportunity in this project to reuse code you’ve already written).
Usage: java gitlet.Main commit [message]
Description: Saves a snapshot of tracked files in the current commit and staging area so they can be restored at a later time, creating a new commit. The commit is said to be tracking the saved files. By default, each commit’s snapshot of files will be exactly the same as its parent commit’s snapshot of files; it will keep versions of files exactly as they are, and not update them. A commit will only update the contents of files it is tracking that have been staged for addition at the time of commit, in which case the commit will now include the version of the file that was staged instead of the version it got from its parent. A commit will save and start tracking any files that were staged for addition but weren’t tracked by its parent. Finally, files tracked in the current commit may be untracked in the new commit as a result being staged for removal by the rm command (below).
The bottom line: By default a commit is the same as its parent. Files staged for addition and removal are the updates to the commit. Of course, the date (and likely the message) will also different from the parent.
Some additional points about commit:
The staging area is cleared after a commit.
The commit command never adds, changes, or removes files in the working directory (other than those in the .gitlet directory). The rm command will remove such files, as well as staging them for removal, so that they will be untracked after a commit.
Any changes made to files after staging for addition or removal are ignored by the commit command, which only modifies the contents of the .gitlet directory. For example, if you remove a tracked file using the Unix rm command (rather than Gitlet’s command of the same name), it has no effect on the next commit, which will still contain the deleted version of the file.
After the commit command, the new commit is added as a new node in the commit tree.
The commit just made becomes the “current commit”, and the head pointer now points to it. The previous head commit is this commit’s parent commit.
Each commit should contain the date and time it was made.
Each commit has a log message associated with it that describes the changes to the files in the commit. This is specified by the user. The entire message should take up only one entry in the array args that is passed to main. To include multiword messages, you’ll have to surround them in quotes.
Each commit is identified by its SHA-1 id, which must include the file (blob) references of its files, parent reference, log message, and commit time.
Runtime: Runtime should be constant with respect to any measure of number of commits. Runtime must be no worse than linear with respect to the total size of files the commit is tracking. Additionally, this command has a memory requirement: Committing must increase the size of the .gitlet directory by no more than the total size of the files staged for addition at the time of commit, not including additional metadata. This means don’t store redundant copies of versions of files that a commit receives from its parent. You are allowed to save whole additional copies of files; don’t worry about only saving diffs, o
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com