CS计算机代考程序代写 SQL data structure javascript database Java COMP2100/COMP6442

COMP2100/COMP6442
Persistent Data
– Lecture 9]
Kin Chau [
Sid Chi
1

Goals of This Lecture
• What is Persistent Data? And How? • Bespoke
• Serialization • XML
• JSON
• Compare Pros and Cons
2

What is Persistent Data?
• A critical task for applications is to save/retrieve data
• Permanent data (storage of data from working memory) • It can be updated, but not as frequent as transient/volatile data • It is stored in database/SSD/harddisk/magnetic tape
• Why do we want permanent data?
• Disadvantages of holding volatile data
• To be used and reused (save and load), and fault tolerant • To be checked and validated for authentication
• How can we store data persistently?
• The choice of the persistence method is part of the design of an application • Files (JSON, XML, images, …)
• Databases
3

Uses of Data and Storage
Types
Use cases
Formats
Text files (unstructured data)
Word Processing
Raw text (ASCII, UTF-8) proprietary word processing formats .doc (generally unstructured)
Structured text files
Spreadsheet, sensor data, simple structured data
csv, tsv, bespoken, XML, JSON
Graphics
Images
png, jpeg (lossy), gif, bmp
Audio/Movie
Lecture recordings, music
mp3, mp4 (lossy)
Data compression
Large file storage
zip, tar, rar, …
4

Which is the Best Data Format
• Use case
• What does your application do? • What kind of data you have?
• Is there any restriction to meet? • Software licenses
• Storage limitation
• Rapid access to data • Rapid development
5

Aspects to Consider
• Programming Agility
• Easy to develop (no overhead) and code
• Extensibility
• Can data be easily extended? (e.g., add new fields, attributes, …)
• Is it easy to add new fields in a CSV file?
• Is it easy to add new attributes in a database?
• Portability
• Will other applications access the data? • Will it run on other hardware?
6

Aspects to Consider
• Robustness
• Bespoke vs XML vs JSON
• Well-designed and structured format
• Use of schema (how verify if your data is correctly formatted?) • Lack of schema and interoperability problems
• Size vs Completeness
• Lossy vs Lossless
• Audio/Image vs financial data/scientific data
• Internationalization
• ASCII vs UTF-8
• Who will use the data (audience)?
7

Bespoke and Serialization
• Bespoke data files
• Define your own persistent data format
• Write your own data formatting and checking methods
• Not often used in industry
• Not robust and may incur extra bugs
• Serialization
• Directly storing binary class data (and even whole executable class)
• Serialization presents technical issues
• Programming language dependent and platform dependent (big- or little-endian)
• Loss of object references
• Securityissues
• Deserialization: revert persistent data to a copy of class object
8

Bespoke and Serialization
• Bespoke
• Implement a simple logging application • Save/load log errors to/from a text file
• Java Serialization
• Implement a simple application
• Terminal command: od -c data.ser
9

Serialization in Java
• Java Serialization
• Class must implement Serializable
• public myClass implements Serializable
• Load serializable data by creating an ObjectInputStream object and casting
the stream to the appropriate class type
• Save serialized data by creating an ObjectOutputStream and writing the object to the stream
• ArrayLists are serializable by default and are commonly used for serializing a data collection (many classes, such as HashMaps, are serializable (check documentation)
10

Serialization in Java
• Deserialization of untrusted data is inherently dangerous and not recommended
• https://www.oracle.com/java/technologies/javase/seccodeguide.html
11

XML
• XML (eXtensible Markup Language)
• Open standards for general data formatting specifications
• Cross platforms, cross programming languages • Wide industry support (W3C)
• A plenty of tools and programming libraries
• Long history of deployment
• Example • HTML
• .docx (Word document) is represented using XML
12

XML
• XML Structure / Tree
• XML is case sensitive!






13

XML Example
• XML example:


Homer Simpson

Johnny
Goodman


14

XML
• XML parser error!
• Use < instead of “<” • https://www.w3schools.com/xml/xml_syntax.asp

10 < x < 100



15

Two Options for XML in Java
• Two approaches:
• SAX
• Simple API for XML
• SAX treats XML as stream and allows extraction of data as stream is read – preferable for very large documents (gigabyte)
• DOM
• Document Object Model (structured around XML standard)
• Java DOM reads in entire XML tree and generate the node object
• SAX is faster and more efficient than DOM • DOM has more structures than SAX
16

XML DOM
Element
Attribute id
Root Element

Element
Attribute id
Element

Element

Element

Element

text
Homer
text
Simpson
text
Johnny
text
Goodman
17

XML DOM
• DOM requires a number of steps to save data to file: • CreateaDocumentBuilder(usesDocumentBuilderFactory)
• Document created from a DocumentBuilder object • Create and append elements
• Transform the XML to a Result (output file)
• Similar series of steps for loading XML/DOM: • DocumentBuilderFactory
• Document Builder
• Document
• Class data structures
import javax.xml.parsers.*;
import javax.xml.transform.*;
import org.w3c.dom.*;


18

Pros and Cons of XML
Pros
Cons
• Robust,extendable
• More human readable
• Platformindependent
• Programminglanguageindependent
• XML supports Unicode (international
encoding)
• Easyformatverification
• Canrepresentmanydatastructures
(trees, lists…)
• Native support in Java
• XML syntax is verbose and redundant
• XML file sizes are usually big because of above
• Does not support Array
19

JSON
• JavaScript Object Notation (JSON)
• Like XML, is also an open standard for data format that is widely used
• Originally designed for sending data between web client and server, but also very useful for data storage
• Built around attribute-value pairs
• Produces smaller and more readable
documents than XML • JSON example:
[{“age”:11,”name”:”Bart”},
{“age”:40,”name”:”Homer”}]
{“attribute-name”:{JSON object}} {“attribute-name”:“string”} {“attribute-name”:[array]} {“attribute-name”:1} (number) {“attribute-name”:true} (boolean) {“attribute-name”:null}
20

Pros and Cons of JSON
Pros
Cons
• More lightweight
• Humanreadable
• Straightforwardtoimplement
• Support array and null
• Caneasilydistinguishboolean,
number, and string type
• Data is available as JSON objects
• Lackinglanguagespecific features of XML (e.g., XML attributes..)
• No native support in Java
• No display capabilities (no
markup language)
21

Database
• Database management systems (DBMS) are commonly used for storage of large volumes of data
• Fast and efficient large data retrieval and processing • Parallel and distributed data retrieval and processing
• Relational databases
• Linking tables through unique identifiers to avoid problems of duplicating
data entries
• Standardized data retrieval and processing commands (e.g., SQL)
22

Database Example
• Represent a person in a bespoke/csv file:
id, FullName, HomePhone, MobilePhone, WorkPhone 1, Alice, 555-555, 123-321, ?
2, Bob, ? ,123-222, ?
• Relational Database (RDB)
• SQL (Structure Query Language) designed for data query and manipulation
Person ContactPhone
id
FullName
1
Alice
2
Bob


id
PhoneNumber
1
555-555
2
123-222
1
123-321
23

Reference
• IBM developer works 5 things you need to know about serialization • https://developer.ibm.com/technologies/java/articles/j-5things1/
• Oracle serialization FAQ
• https://www.oracle.com/technetwork/java/javase/tech/serializationfaq-jsp-
136699.html
• W3C XML standards pages
• https://www.w3.org/standards/
• JSON
• https://www.json.org/
• https://www.ecma-international.org/publications/files/ECMA-ST/ECMA- 404.pdf
24