CS1-FULL-FALL2021
CS1
Week 10
Unicode, Bits and Bytes
Maíra Marques Samary
Decimal Numbering System
• Ten symbols: 0,1,2,3,4,5,6,7,8,9
• Represent larger numbers as a sequence of digits
• Each digit is one of the available symbols
• Example: 7061 in decimal (base 10)
7061 10 = (7 x 103) + (0 x 102) + (6 x 101) + (1 x 100)
Binary
• Binary is base 2
• Symbols: 0,1
• Convention: 210 = 102
• Example: 7061 in decimal (base 10)
7061 2 = (1 x 212) + (1 x 211) + (0 x 210) + (1 x 29) + (1 x 28) + (1 x 27) +
(0 x 26) + (0 x 25) + (1 x 24) + (0 x 23) + (1 x 22) + (0 x 21) + (1 x 20)
(1101110010101)2
4096 2048 0 512 256 128
0 0 16 0 4 0 1
Converting from Decimal to Binary
• Given a decimal number N:
• List increasing powers of 2 from right to left
until >= N
• Then from left to right, ask if that (power of 2)
<= N
• If YES, put a 1 below and subtract that
power from N
• If NO, put a 0 below and keep going
24 = 16 23 = 8 22 = 4 21 = 2 20 = 1
Hexadecimal
• Base 16
• Symbols? 0,1,2,3,4,5,6,7,8,9,A...,?B,C,D,E,F
• Convention: 1610 = 1016 = 0x10
• Example: What is 0xA5 in base 10
• 0xA5 = A516 = (10 x 161) + (5 x 160) = 16510
Converting from Decimal to Hexa
• Given a decimal number N:
• List increasing powers of 16 from right to left
until >= N
• Then from left to right, ask if that (power of 16)
<= N
• If YES, how many of that power go into
N and subtract from N
• If NO, put a 0 below and keep going
164 = 65536 163 = 4096 162 = 256 161 = 16 160 = 1
Hexadecimal
•7061?
Hexadecimal
•7061
• (1b95)16
164 = 65536 163 = 4096 162 = 256 161 = 16 160 = 1
Too much 1 11 9 5
• Why does all of this matter?
• Humans think about numbers in base 10,
but computers “think” about numbers in
base 2
• Binary encoding is what allows computers
to do all of the amazing things that they
do!
Why Base 2?
• Electronic implementation
• Easy to store with bi‐stable elements
• Reliably transmitted on noisy and inaccurate
wires
Numerical Encoding
• AMAZING FACT: You can represent anything
countable using numbers!
• Need to agree on an encoding
• Kind of like learning a new language
• A binary digit is known as a bit
• A group of 4 bits (1 hex digit) is called a nibble
• A group of 8 bits (2 hex digits) is called a byte
More Encodings…
• Old-school programmers know about ASCII
• Each character has its own integer byte code
• Text strings are sequences of character codes
Text Representation
• Unicode is the same idea only extended
• It defines a standard integer code for every
character used in all languages (except for
fictional ones such as Klingon, Elvish, etc.)
• The numeric value is known as a "code point"
• Typically denoted U+HHHH in conversation
Unicode
Unicode Charts
• A major problem :There are a lot of codes
• Largest supported code point U+10FFFF
• Code points are organized into charts
http://www.unicode.org/charts
• Go there and you will find charts organized by
language or topic (e.g., greek, math, music,
etc.)
Unicode String Literals
• Strings can now contain any unicode character •
Example:
t = "That's a spicy jalapeño!"
• Problem : How do you indicate such characters?
Unicode
• If you are using a Unicode-aware editor, you can
type the characters in source code (save as
UTF-8)
t = "That's a spicy Jalapeño!"
http://unicode.org/charts/PDF/U0080.pdf
Unicode
t = "That's a spicy Jalapenõ!”
t = "That's a spicy Jalape\u00f1o!"
Unicode
• \uxxxx - Embeds a Unicode code point in a
string
Unicode
• Code points also have descriptive names
• • \N{name} - Embeds a named character
t = "Spicy Jalape\N{LATIN SMALL LETTER N WITH
TILDE}o!"
Unicode Comments
• Don't overthink Unicode
• Unicode strings are mostly like ASCII strings
except that there is a greater range of codes
Unicode Representation
• Internally, Unicode character codes are stored
as multibyte integers (16 or 32 bits)
Text Encoding
• The internal representation of characters is now
almost never the same as how text is transmitted
or stored in files
Text Encoding
• There are also many possible file encodings for
text (especially for non-ASCII)
• But, they are only related to how text is stored in
files, not stored in memory
Input/Output
• All text is encoded and decoded
• If reading text, it must be decoded from its
source format into Python strings
• If writing text, it must be encoded into some kind
of well-known output format
Reading/Writing in Python
• Built-in open() function has an optional encoding
parameter
f = open("somefile.txt","rt",encoding="latin-1")
• If you omit the encoding, UTF-8 is assumed
Important Encodings
• If you're not doing anything with Unicode (e.g.,
just processing ASCII files), there are still three
encodings you should know
• ASCII
• Latin-1
• UTF-8
ASCII Encoding
• Text that is restricted to 7-bit ASCII (0-127)
• Any characters outside of that range produce
an encoding error
Latin-1 Encoding
• Text that is restricted to 8-bit bytes (0-255)
• Byte values are left "as-is"
UTF-8 Encoding
• A multibyte encoding that can represent all
Unicode characters
• Main feature of UTF-8 is that ASCII is embedded
within it
• If you're never working with international
characters, UTF-8 will work transparently
CS1
Week 10
Tuples
Maíra Marques Samary
maira.
Tuples are like lists
• Tuples are another kind of sequence that
function much like a list - they have elements
which are indexed starting at 0
>>> x = (‘Glenn’, ‘Sally’, ‘Joseph’)
>>> print x[2]Joseph
>>> y = ( 1, 9, 2 )
>>> print y
(1, 9, 2)
>>> print max(y)
9
>>> for iter in y:
… print iter
…
1
9
2
>>>
..but.. Tuples are
“immutable”
• Unlike a list, once you create a tuple, you
cannot alter its contents – similar to a string
>>> x = [9, 8, 7]
>>> x[2] = 6
>>> print x[9, 8, 6]
>>>
>>> y = ‘ABC’
>>> y[2] = ‘D’
Traceback:’str’
object does
not support item
Assignment
>>>
>>> z = (5, 4,
3)>>> z[2] = 0
Traceback:’tuple’
object does
not support item
Assignment
>>>
Things not to do with
tuples
>>> x = (3, 2, 1)
>>> x.sort()
Traceback:AttributeError: ‘tuple’ object has no
attribute ‘sort’
>>> x.append(5)
Traceback:AttributeError: ‘tuple’ object has no
attribute ‘append’
>>> x.reverse()
Traceback:AttributeError: ‘tuple’ object has no
attribute ‘reverse’
>>>
Tuples are more efficient
• Since Python does not have to build tuple
structures to be modifiable, they are simpler
and more efficient in terms of memory use and
performance than lists
• So in our program when we are making
“temporary variables” we prefer tuples over
lists.
Tuples and Assignment
• We can also put a tuple on the left hand side of
an assignment statement
• We can even omit the parenthesis
>>> (x, y) = (4, ‘fred’)
>>> print y
Fred
>>> (a, b) = (99, 98)
>>> print a
99
Tuples are Comparable
• The comparison operators work with tuples and
other sequences If the first item is equal, Python
goes on to the next element, and so on, until it
finds elements that differ.
>>> (0, 1, 2) < (5, 1, 2)
True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> ( ‘Jones’, ‘Sally’ ) < ('Jones', 'Sam')
True
>>> ( ‘Jones’, ‘Sally’) > (‘Adams’, ‘Sam’)
True