ECS160_2021FQ_hw4_v1.0.3.docx
ECS 160 Assignment 4
Due by Friday, Dec 3rd at 11:59 pm
v1.0.3
(Total Possible, Base: 100 points, With bonus: 123 points)
For this assignment we will explore Java reflection and ways of thinking about data and
programs. You will also get more practice implementing and testing a set of non-trivial software
requirements.
More specifically, we will write a tool to automatically serialize java objects. Data serialization is
the process of converting data in memory to a format that can be stored or transmitted. There
are several popular data serialization formats such as CSV, XML, protobuf, YAML, and JSON.
While we could reimplement one of these formats, if you look online (say search for “json
serializer for java github”) there are countless implementations. Thus we find this rather bland.
Instead we will implement a serializer for CSON: “A spicy data format for more modern times”.
Our tool will allow us to perform operations like:
public class App {
public static class Book {
public String bookTitle;
public boolean isPaperback;
public float rating;
public Book(String title, boolean isPaperback, float therating) {
this.bookTitle = title;
this.isPaperback = isPaperback;
this.rating = therating;
}
}
public static void main(String[] args) {
List
new Book(“Foo”, true, 2.5f)
);
CsonObjectSerializer serializer = new CsonObjectSerializer();
String csonText = serializer.serialize(data);
System.out.println(csonText);
// 🍽Book🥣bookTitle🧂string🥣isPaperback🧂boolean🥣rating🧂float32🔥🍲Book🌶Foo🌶👍🌶2.5
Util.writeTextToFile(csonText, Paths.get(“./data.cson”));
}
}
https://en.wikipedia.org/wiki/Serialization
You should do the homework by yourself. We will be using tools to look for plagiarism. The
assignment grade will be the weighted average of the score on the homework, and the
(possible) related pop quiz, which would be given in the lecture.
The signatures of all public methods provided in the starter files shall remain unchanged.
Additional methods and classes are permitted (and encouraged when it aids
readability/code-reuse!).
This file gives you the API you are required to implement as well as a “mostly formal”
specification of the CSON format.
In order to keep this assignment reasonably simple, we are only focusing on the serialization
part, and not the deserialization.
JavaCson
90 points
We will implement a package called javacson. The following public classes are required:
CsonObjectSerializer
A CsonObjectSerializer handles the automatic conversion of an object into CSON text.
By default, the CsonObjectSerializer attempts to serialize all instance fields which are public.
Attempting to serialize a field which is not of a type supported by CSON results in a
CsonSerializationError being thrown. Both primitive and boxed primitives should be supported if
they have a corresponding representation in CSON.
The serializer should not mutate any of the objects or their classes.
The public API shall be exactly:
1. public String serialize(List
– Maps a list of objects into CSON text. Might throw an unchecked
CsonSerializationError if passed in an unsupported object. The resulting text is
influenced by the annotations and settings described elsewhere in this document.
– The schema section of the resulting CSON should include a type definition for all unique
object type present in the passed in list
– Object names should be unqualified names (like what you get from
Class.getSimpleName()). If passed in objects are unique types but have
duplicate unqualified names, behavior is undefined.
– Field names will match the names used in the objects.
– Both type names and field names will appear in lexicographic order (using the
names as they appear in the schema with any annotation renamings).
– The output should not contain any non-semantic whitespace
– If no objects are supplied an empty string is returned
– Behavior is undefined when serializing a method-local class or non-public class
1. public String serialize(Object object)
– Same as above just with a single object
Attempting to serialize a null object or an array results in a CsonSerializationError.
Annotations
7 bonus points each
Controlling the serialization process is a good use of annotations. Javacson supports the
following annotations:
@CsonIgnore – Can be applied to a field. Stops the field from being serialized/deserialized.
Supersedes other annotations.
public class Person {
public int age;
@CsonIgnore
public int numberOfSecrets; // will not appear in schema or data
}
@CsonName(String newName) – Can be applied to a field. Changes the name as represented
in the CSON schema. Values which are not valid CSON identifiers or values which lead to
duplicate field names will cause a CsonSerializationError to be thrown during a call to
CsonObjectSerializer.serialize.
public class Person {
@CsonName(“myAge”)
public int age;
}
All annotations are placed in the javacson.Annotations package.
Other Thoughts
Our serializer is rather restricted. Most notably, only supporting public fields on the classes we
serialize is unrealistic for classes following good OOP and encapsulation.
Some serialization libraries will handle private fields by overriding the protection level using
reflection. We don’t support this. Mostly because in this assignment we don’t want to imply this a
good practice (though something like serialization might be one the few good uses of this
functionality).
Additionally many serialization libraries will automatically detect getters and setters of private
fields. We also do not do this. This is mostly just because we are trying to keep this assignment
fairly simple, and precisely defining and implementing this adds some complexity.
🧭 Where to start?
With reading this specification. You’re already off to a great start! 🎉
Next take a look at the starter files. The provided test cases might help give you a better sense
of what a CSON file looks like.
We don’t define many methods leaving you free to design things as you wish. We recommend
you perhaps start with class(es) to represent and create a schema and test that until it is at least
partially working. You should be able to iterate through all objects and use
Object.getClass().getSimpleName() and Object.getClass().getFields() to
find what you need.
Then after that, you can add more features and work to create other class(es) to take in a
schema object and build the data section.
However, many other approaches can work too.
https://docs.oracle.com/javase/tutorial/reflect/class/classNew.html
https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/Class.html#getFields()
CSON v1.0.0
This gives the specification of the CSON serialization format.
CSON-Base
CSON is a plain text, multi-record data serialization format. A CSON file is canonically named
with a .cson extension, and is encoded as UTF-8 text.
A CSON file has two sections: first a schema section and then data section. These two sections
shall be separated by one 🔥 emoji.
Whitespace
All whitespace codepoints are ignored unless they appear in a string value. A whitespace
codepoint is defined as identical to codepoints matched by the `\s` regex group in Java 16.
This is `[ \t\n\x0B\f\r]`.
Schema Section
The first part of a CSON file describes the types of objects which might be encoded in the file. It
lists a type identifier name, and the type and name of all the object’s fields.
A type definition begins with a 🍽emoji and will list zero or more fields. Each field definition
begins with a 🥣emoji. The definition gives an identifier, followed by 🧂, followed by its types.
CSON-base only supports a few basic types. CSON-base does not support nested objects or
arrays. The supported values include a 32-bit integer (“int32”), 64-bit integer (“int64”), 32-bit
floating point (“float32”), 64-bit floating point (“float64”), a boolean (“boolean”), and an escaped
string value (“string”)
The ordering of type definitions and fields is undefined. Duplicate identifier names for types or a
given object’s fields is disallowed. The grammar for CSON is given below, in EBNF. (Make
SURE YOU READ the FAQ, there are some useful hints there)
EBNF Grammar:
cson = [ schema , “🔥” , data ];
schema = {type_def} ;
type_def = “🍽” , identifier , { field_def } ;
field_def = “🥣” , identifier , “🧂” , type_name;
https://www.twilio.com/docs/glossary/what-utf-8
https://www.twilio.com/docs/glossary/what-utf-8
https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/regex/Pattern.html
https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
type_name = “int32” | “int64” | “float32” | “float64” | “boolean” | “string”;
identifier = (* regex “[a-zA-Z_][a-zA-Z\d_]*” *)
Data Section
The second part of a CSON file is the data section. Here the name of encoded objects is listed,
followed by all of their field values.
Each object in the data section starts with a 🍲 emoji. Then an identifier, which must match a
Type definition name in the schema, must be provided.
Next zero or more field values are provided. The quantity and ordering of fields exactly matches
the ordering provided in the schema. Each field value begins with a 🌶 emoji.
Implementation note: this is the “Hot Pepper Emoji” codepoint sequence U+1F336 U+FE0F
including the variant-16 selector. Without this selector one gets 🌶 which most (but not all)
viewers render identically 🌶, the version without the U+FE0F. To avoid such difficulties, we
strongly suggest your use the provided constants in CsonCodepoints.java. For example:
“`
// Figure out the schema and data
// …
return schema + CsonCodepoints.CSON_SCHEMA_DATA_SEP + data;
“`
Numeric values are encoded as base-ten text. Non-semantic leading or trailing zeros are
disallowed. A non-semantic negative sign is disallowed with a signed zero being considered
non-semantic. Floating points only have a “.” decimal point if a decimal value is present.
Boolean values are encoded as “👍” for True, and “❌” for False.
String values are represented by only printable ascii characters. New line feed (U+000A) is
encoded as “\n”. Values U+0020 to U+007E, inclusive, are encoded as-is, with the exception of
backslash (U+005C) which is always escaped as double backslash “\\”. All other codepoints are
encoded as “\u” followed by the hex unicode codepoint using a minimum of four hex digits.
EBNF Grammar:
data = { ”🍲”, identifier, { field_value } } ;
field_value = “🌶”, ( integer_value | floating_point_val | boolean_val | string_val) ;
integer_value = [ “-” ] , { digit }- ;
floating_point_val = [ “-” ] , { digit } , [ “.” , {digit}- ] ;
boolean_val = “👍” | “❌” ;
digit = (* regex “[0-9]” *)
string_val = (* regex `[ -~]*` all printable ascii chars *)
https://emojipedia.org/hot-pepper/
https://emojipedia.org/variation-selector-16/
https://en.wikipedia.org/wiki/Signed_zero
Testcase Submission
10 points
It is important to understand how to write tests when writing software. The supplied test cases
are not completely comprehensive.
You are required to submit exactly two testcases for us to grade. Please place these as a sibling
of the other tests in `StudentTest.java` that contains a class named `StudentTest`. This should
have exactly two Junit5 tests. You may write other tests for yourself in other files, but if more
than two tests are supplied in StudentTest.java, we will choose two arbitrarily, and apply a
penalty.
In true unittests you would likely test individual methods that are used during the serialization
process. However, this API spec only defines the interface of the final serialize method. Thus in
order to work for any valid implementation, your submission is confined to integration tests. You
are still encouraged to submit tests that test a specific assertion you have about a specific part
of the assignment.
To receive full credit your tests should be non-trivial and ideally detect at least some bugs in
incorrect implementations (so testing exactly the same thing as the supplied tests might not
work).
Submitted tests for this part should not depend on bonus-points features. Additionally please do
not depend on being able to read or write to the file system.
Code Style
9 point bonus
Style conventions are an important part of code quality..
We will adapt a relaxed version of the Google Java Style guide. Unlike in HW2 and HW3 we
enforce a indentation requirement of 4 spaces (some people submitted some weird indentations
previously). Google Style requirements around package name, import order, or public javadocs
are ignored.
We provide a Checkstyle xml file with the starter files (google_checks_custom2.xml).
Submissions that pass a run of Checkstyle without any warnings will be awarded bonus points.
Partial bonus points might be given if you only have a few warnings.
https://google.github.io/styleguide/javaguide.html
https://checkstyle.org/
The submission must get at least 50 other points to be eligible for code quality bonuses.
Submission
Please zip your submission together. The format should be in similar form to the starter files in
that it contains the entire project structure and unzips into a single directory.
Late Policy
We are coming up on the end of the quarter, so we don’t have any room for late submissions.
For this assignment we use a late policy of:
1 Day Late: -5 percentage-points
2 Day Late: -20 percentage points (and no option for bonus points)
Submissions cannot be accepted after Dec 5th 11:59PM.
FAQ/minutia
Why won’t my code build?
You should probably be pretty comfortable with this by now. If you run into issues that you think
might be project specific, feel free to post on piazza.
I don’t understand the CSON grammars. What’s going on with this?
We provide these in order to be precise. However, because we are only doing serialization (and
thus doing no parsing) you don’t really need to understand this too well. The main focus of this
class isn’t formal languages, so don’t worry about it much. Take a look at the examples provided
in the test files and hopefully it should be fairly clear.
What’s a “codepoint”?
We use unicode as a CSON file contains non-ascii data like emoji’s. A codepoint is just a
generalized name for a character.
What is these things like “U+0020”
Those represent unicode codepoints. The number is a hexadecimal for the value of the
codepoint. So U+0020 means a byte with the decimal value of 32 will be used, which
corresponds to a space character.
What is the stuff about whitespace in the CSON spec?
Like many languages like Java or C and serialization formats like XML or JSON, non-semantic
whitespace is ignored. This allows outputs to be potentially printed in a more human readable
https://en.wikipedia.org/wiki/Code_point
https://en.wikipedia.org/wiki/Hexadecimal
https://www.compart.com/en/unicode/U+0020
form. There are a surprising number of whitespace-like characters in unicode, which is why
whitespace would need to be precisely defined for use in a specification. Because we are just
focusing on serialization for this assignment, you likely don’t need to specifically handle this.
What emojis are those?
You can copy and paste them or refer to the provided CsonCodepoints.java. Be careful with the
pepper in particular that you include all codepoints. While the provided tests give string values
of the expected CSON for aesthetic reasons, a more reliable method would be use the
constants given in CsonCodepoints.java
My strings look exactly identical but the .equals() is different. What is going on?
This might be an issue with a lingering control point like an extra variant-16 selector (if you look
at the string bytes you can see this). Try recopying the emoji’s or using the provided
constants in CsonCodepoints.java.
When I view the test files and CsonCodepoints.java I just see a bunch of weird boxes.
What’s going on?
It’s likely your editor does not support rendering of emojis. This should not be an issue for
correctness, but might make it a bit of a pain. Editors like VSCode should render it fine.
Wait, is the ordering of types and fields in the schema supposed to be ordered or not?
The CSON spec explicitly leaves this undefined (a valid CSON file can have any order), but in
order to make things easier to test, we more narrowly define an ordering for our CSON
implementation (note we use lexicographic ordering because the getFields reflection API
doesn’t guarantee any ordering), The differences between CSON and javacson’s serializer do
not matter much if we aren’t deserializing anything.
Can I use Java Beans introspect?
This is a good idea. Some java serialization libraries that the objects conform to Java Beans.
However, javacson does not. We’d prefer you get some practice with the base reflection library
for this homework.
Google Style says use 2 spaces indent, but you say 4 spaces. Why?
There are a mix of reasons for this. 4 spaces is common across languages (in particular in
Python, which we are biased towards) and is also not uncommon in the Java community (Like in
the Android Open Source Project guide). Well structured Java code should not be deeply
indented, so we find the reduction to the less-visually-distinct 2-spaces to be unnecessary. This
is purely opinion though. When coding you just have to adapt to the style of your organization.
https://en.wikipedia.org/wiki/Template:Whitespace_(Unicode)
https://emojipedia.org/variation-selector-16/
https://en.wikipedia.org/wiki/JavaBeans
https://www.python.org/dev/peps/pep-0008/#indentation
https://source.android.com/setup/contribute/code-style
What’s a 64-bit integer or a 64-bit float?
This is a long or a double in java. CSON doesn’t use terms like “long” in order to be a bit more
language-agnostic (and it’s a teaching tool to make sure you know what the primitive types mean).
Will numeric types be “upcast” to a version with higher bits?
No. Values are never converted to a CSON type with more bits (so trying serialize a char or
short results in a CsonSerializationError, rather than it being represented as a 32-bit
int in CSON)
Why do we have to escape strings?
We need some way to know whether 🌶 is a separator or a value in a string. We choose to just
escape all the non-printable characters. Note, we give most of the code to do this in Util.java.
What if reflection is disabled and I get a security exception?
This behavior is undefined and will not be tested.
How do I run checkstyle?
We give you a jar for this. You can run it with something like:
java -jar ./checkstyle-9.1-all.jar -c ./google_checks_custom.xml src/
Note how we are using v9.1 here. With seemingly perfect timing for this assignment, some kind
person contributed a bugfix to checkstyle that fixes a bug with its handling of emojis.
You also may be able to configure your editor to show linter warnings inside the editor.
I passed the supplied test cases, am I good?
Probably for the most part. However, we will also check correctness with additional tests.
Can I add new files?
You may add new source files. Additional test files are allowed, but only StudentTest.java will be
evaluated. Please do not add any new jars or maven dependencies.
How can I ask other questions?
Please ask on Piazza, during the Discussion section, or in office hours (check on canvas for
times and locations). Unless needed, do not email us questions about the assignment which
would be better placed on Piazza so that everyone can benefit from the question/answer.
What should I do if CSON seems a little too spicy for me to handle?
Take a sip of 🥛. You can do it! 😀
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
https://github.com/checkstyle/checkstyle/issues/10837
Version History
● V1.0.3 (11/23/21)
○ Typo. Submitted test should not depend on the file system
● V1.0.2 (11/23/21)
○ Typo. schama -> schema
● V1.0.1 (11/23/21)
○ More explicitly say that annotations does not change that fields appear in
lexicographic order (piazza 200)
○ Add reference to Util.java so easier to find
○ Clean up some grammar/formatting things
○ Explain a bit more the reasoning why have the ordering requirements in FAQ
○ More explicitly define behavior when renaming would result in an invalid CSON.
● v1.0.0
○ First release