Programming as Problem Solving
COMP 1100/1130 Semester 1, 2021
The Australian National University
Acknowledgement of Country
I wish to acknowledge the traditional custodians of the land we are meeting on, the Ngunnawal people. I wish to acknowledge and respect their continuing culture and the contribution they make to the life of this city and this region. I would also like to acknowledge and welcome any other Aboriginal and Torres Strait Islander people who are enrolled in our courses.
You can learn more about Acknowledgements of Country here.
A Note on Online Study
• ANU courses, including this one, are traditionally held in person.
• However given the pandemic (and the resulting rules about gathering in large groups) large courses like this one cannot be held this way.
• We also do not have lab space to accommodate all students while maintaining social distancing.
• We now have a bit of experience with holding this course online, so we should have a reasonably smooth and successful semester!
Part One What is this course?
COMP1100 and COMP1130
• The entry point for study in Computer Science.
• A broad course aimed at many different backgrounds and
destinations.
• Teaching the fundamentals of algorithms and programming as means of solving problems.
• Introduces many concepts that will be revisited in more depth later in the CS curriculum.
• Uses the programming language Haskell.
COMP1130
• Extends COMP1100 with topics on the mathematical foundations of programming.
• More depth, more work, more ‘curiosity-driven’.
• Required for students in BAC(R&D).
• Assumes ACT Specialist Maths or NSW Maths Extension 1, or equivalent.
• Aimed at students who are mathematically elite, and interested in putting in extra work for intellectual rewards!
• Advanced hurdle – 60% in midterm exam
Programming Experience
• Many of you will have programmed before, e.g. in high school, or as a hobby.
• Many of you will not have.
• Prior programming is not expected in this course.
• Nor, in our experience, is it a major advantage: we focus on the principles and foundations of programming in a way that is very different from most students’ experience.
Mathematical Maturity
• We do expect students to be comfortable with mathematical thinking – abstraction, algorithms, attention to detail
• Building these mathematical skills at the same time as picking up programming skills can be a challenge for some students.
• Students who lack confidence in their mathematics should consider delaying COMP1100 by one semester, to take a course like MATH1005 (Discrete Mathematical Models).
Ongoing vs One-Off Study
• COMP1100/1130 is designed to be an introduction to a broader course of learning in computer science.
• May also be used as a ‘taster’ of what computer science is like.
• But not recommended for students who intend to only do one
computer science course to get useful programming skills.
• We recommend COMP1730 (Programming for Scientists) for such students.
Programming vs Other IT Skills
• COMP1100/1130 completely focused on programming.
• Other information technology skills (spreadsheets, presentation tools,
word processing, website design…) are not covered.
• We offer COMP1710 (Web Development and Design) and COMP1720 (Art & Interaction in New Media).
Exiting the Course
• Easy to do in the first week, picking another course up.
• Moving from 1130 to 1100 possible until a very late stage, but any
marks received as an 1130 student will stand.
• 31 March ‘Census date’ deadline for withdrawal without fees.
• 7 May deadline for withdrawal without failure.
• Visas – it is not always true that withdrawal from a course without replacement will violate your visa; talk to Student Services about this if you need information.
Part Two
How do I interact with this course?
2A. How do I get basic information about the course? 2B. How do I get the content of the course?
2C. How do I practise my skills?
2D. How do I get answers to my questions?
Getting Information: Course Website
https://cs.anu.edu.au/courses/comp1100
• Announcements in lectures and on Piazza forum also.
Getting Content: Lectures
• Introduces the core concepts of the course, and ‘live coding’ to show you programming in action.
• Monday, 11:00am–12:00pm
• Tuesday, 1:00pm–2:00pm
• Wednesday, 4:00pm–5:00pm
• Thursday, 1130 only, 12:00pm–1:00pm
Lecture recordings
https://echo360.org.au
These recordings are critical for students who are not able to attend the live lectures, or who wish to review the lectures again.
Please inform me urgently if the recording is inadequate.
Getting Content: The Textbook
Thompson, Simon
HASKELL: the craft of functional programming Third edition
• Not required
• To buy or not to buy? Depends on the student. • Copies in the library.
Practising Skills: Labs
• You will learn more, and have more fun, in labs than any other part of the course!
• Tutors there to answer questions and guide your learning.
• Sign up now at https://cs.anu.edu.au/streams.
• 3% of the course linked to attendance and participation in the labs, and to submission of attempts at lab exercises.
• Based on ten best labs
• Engagement hurdle for 1100 can be passed by participating in the labs across the early part of the course.
Practising Skills: Beyond Labs
• Assignments.
• Hundreds of exercises in the textbook.
• Past exam questions on the website.
• Warning: the course changes over time. Past exams may contain material you are not expected to know, and may miss material you are expected to know.
Getting Questions Answered
• Live questions in lectures extremely welcome! – Use the chat function.
• Tutors also there to answer questions in labs.
– But these questions should be related to the current content. – How to get your questions answered?
• Piazza Forum
• Consultations
• Email (in rare circumstances)
Piazza Forum
https://piazza.com/anu.edu.au/other/comp11001130/home
Drop-In Consultations
• Start in week 2
• Online and in-person
• Times and dates to be advised soon
Getting Questions Answered: Email
• Email your tutor only if you have questions specific to your interactions with that tutor (about lab attendance e.g. inability to attend, assignment marking etc.)
• Email the lecturers, instead of using Piazza, only if you have an issue that tutors could not help with or should not see.
– Use comp1100@anu.edu.au (read by Ranald, Jooyoung, Tony)
– Use your ANU address
• For all enquiries not directly related to this course, contact Student Services in person or by email (studentadmin.cecs@anu.edu.au).
Part Three Miscellaneous Resources
InstallFest
• Get a Haskell installation, and the other software you will need, up and running on your computer!
• Not completely necessary; you can use ANU’s Virtual Desktop Environment • But many students will prefer their own installation
• Coming this Thursday • Online
• Time and method to be advised
Study Sessions
•Organized by ANU Computer Science Student Association (CSSA)
•Two sessions:
– Preparation for mid-term exam – Preparation for final exam
https://services.anu.edu.au/business-units/division-of-student-life
Academic Skills
https://www.anu.edu.au/students/academic-skills
• Many useful services offered to students
• We’ll give an overview of this in a lecture soon
In particular, anu.edu.au/english a great resource for ESL students
• Access and Inclusion – supporting students, including those with a disability, a medical condition, or those involved with elite sport, to participate fully in their program of study.
• Accommodation Services – helping students find accommodation on and off campus.
• Counselling Centre – providing free and confidential counselling to students and running group wellbeing programs like “Get Up and Go.”
• Health Service – providing health check-ups, travel immunisations, minor surgical procedures, and antenatal nurse care.
• Student Experience and Career Development – supporting students’ transition to higher education study and working with students to develop their employability and ensure they have the skills and knowledge to navigate their future careers.
Part Four Course Representatives
CECS Course Representatives
Why become a course representative?
• Develop skills sought by employers, including interpersonal, dispute
resolution, leadership and communication skills.
• Become empowered. Play an active role in determining the direction of your education.
• Become more aware of issues influencing your University and current issues in higher education.
• Ensure students have a voice to their course convener, lecturer, tutors, and college.
32
CECS Course Representatives
Roles and responsibilities:
• Act as the official liaison between your peers and convener.
• Be creative, available and proactive in gathering feedback from your classmates.
• Attend regular meetings, and provide reports on course feedback to your course convener and the Associate Director (Education).
• Close the feedback loop by reporting back to the class the outcomes of your meetings.
For more information regarding roles and responsibilities, contact: ANUSA CECS representatives
Sandy Ma and Swatantra Roy: sa.cecs@anu.edu.au ANUSA President
Madhumitha Janagaraja: sa.president@anu.edu.au
33
CECS Course Representatives
Want to be a course representative? Nominate today!
Please nominate yourself via the CECS Course Representative EOI form by midday 1st March 2021. You are free to nominate yourself whether you are currently on-campus or overseas.
You will be contacted by CECS Student Services, Employability and Experience by 5th March with the outcome of your self-nomination.
All course representative meetings will be held via Zoom in Semester One 2021. There will be three meetings this semester, meeting details will be provided to course representatives shortly.
34
Course Representatives in COMP1100/1130
• We are looking for 4-5 course representatives!
• Looking for diverse representation – 1100 and 1130, male and female, Australian and international, different degrees etc.
• And at least 1 who is not yet able to travel to Australia
Sets and Functions
Sets
• A set is a collection of things, called its elements.
• Not in any particular order. (unordered)
• Each element appears only once. (distinct)
• A set is uniquely determined by its elements.
• Examples:
• {1,2,3}={2,3,1}={3,1,2} • {1,1,1,2}={1,2}
Finite Sets
Some sets have only finitely many elements, so can be listed: A singleton: {∗}
The Booleans 𝔹: 𝐹𝑎𝑙𝑠𝑒, 𝑇𝑟𝑢𝑒 Very big finite sets may not be practical to list out:
We can sometimes use ranges: 1, 2, … , 100 There are more than 137,000 valid Unicode characters!
Some Important Infinite Sets
• The natural numbers N : 0, 1, 2, …
• The integers Z : {… , −2, −1, 0, 1, 2, … } • The real numbers R
Note the use of triple dots, ‘…’, to indicate the continuation of the list
• The strings: lists of characters of any finite length.
Combining Sets: Sum
Given two sets 𝐴 and 𝐵, we can form a new set 𝐴 + 𝐵 by combining:
• Every element 𝑎 of 𝐴 along with a tag to remind us that it comes from
the left: 𝑎, left
• Every element b of 𝐵, tagged right: 𝑏, right
Combining Sets: Sum
The tags are necessary because we want the two sets to remain entirely disjoint (separate).
e.g. 𝔹 + 𝔹 is { 𝐹𝑎𝑙𝑠𝑒, left , 𝑇𝑟𝑢𝑒, left , 𝐹𝑎𝑙𝑠𝑒, right , 𝑇𝑟𝑢𝑒, right } • Different from the unions you may be used to!
Summing extends to more than two sets:
•𝐴+𝐵+𝐶+𝐷
Although we have to choose better tag names than 𝑙𝑒𝑓𝑡 and 𝑟𝑖𝑔h𝑡!
Combining Sets: Product
Given two sets 𝐴 and 𝐵, we can form a new set 𝐴 × 𝐵 by including every pair (𝑎, 𝑏) where 𝑎 is an element of 𝐴, and 𝑏 is an element of 𝐵.
e.g. 𝔹 × {Red, Green, Blue} is
{ 𝐹𝑎𝑙𝑠𝑒,𝑅𝑒𝑑 , 𝐹𝑎𝑙𝑠𝑒,𝐺𝑟𝑒𝑒𝑛 , 𝐹𝑎𝑙𝑠𝑒,𝐵𝑙𝑢𝑒 , 𝑇𝑟𝑢𝑒,𝑅𝑒𝑑 , 𝑇𝑟𝑢𝑒,𝐺𝑟𝑒𝑒𝑛 ,(𝑇𝑟𝑢𝑒,𝐵𝑙𝑢𝑒)}
Product extends to more than two sets: •𝐴×𝐵×𝐶×𝐷
Whose members have form (𝑎, 𝑏, 𝑐, 𝑑).
Functions
A function is defined by
• A set 𝐴, called the domain;
• A set 𝐵, called the codomain;
• An assignment from each element of 𝐴 (input) to one element of 𝐵 (output).
Ifwegivesuchafunctionaname𝑓,wewritethis 𝑓 ∷𝐴⟶B If𝑓assignsoutput𝑏toinput𝑎,wewrite 𝑓 𝑎 =𝑏
Functions – Examples and Non-Examples
✔✖
✔
✖
Defining Functions – Finite Domains
If the domain is finite, we can define a function by listing its action on every element.
e.g., define𝑓 ∷ 𝔹⟶Zby •𝑓 𝐹𝑎𝑙𝑠𝑒 =−6
•𝑓 𝑇𝑟𝑢𝑒 =26
Defining Functions – Infinite Domains
If the domain is infinite, we need to explain how the function is calculated:
•e.g.,𝑚𝑖𝑛𝑢𝑠 ∷Z⟶Zdefinedas𝑚𝑖𝑛𝑢𝑠 𝑥 =−𝑥
• e.g., 𝑖𝑠𝑃𝑜𝑠 ∷ Z ⟶ 𝔹 defined as 𝑖𝑠𝑃𝑜𝑠 𝑦 =
𝑇𝑟𝑢𝑒 if y is positive
𝐹𝑎𝑙𝑠𝑒 if y is zero or negative
Here 𝑥 and 𝑦 are variables, standing for elements of Z, which allow us to talk about how the output is generated from the input.
Polymorphic functions
A function that can be defined simultaneously for many different sets is called polymorphic.
•Identity𝑖𝑑 ∷A⟶Adefinedas𝑖𝑑 𝑎 =𝑎
• 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑧𝑒𝑟𝑜 ∷ B ⟶ Z defined as 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑧𝑒𝑟𝑜 𝑏 = 0
Here A and B are variables which could stand for any set – and 𝑎 and 𝑏 are variables which stand for arbitrary elements of those sets.
Polymorphic functions
Returning to sums and products:
• Left injection 𝑖𝑛𝑗𝑙𝑒𝑓𝑡 ∷ A ⟶ A + B defined as 𝑖𝑛𝑗𝑙𝑒𝑓𝑡 𝑎 = (𝑎, 𝑙𝑒𝑓𝑡)
• Right injection 𝑖𝑛𝑗𝑟𝑖𝑔h𝑡 ∷ B ⟶ A + B is 𝑖𝑛𝑗𝑟𝑖𝑔h𝑡 𝑏 = (𝑏, 𝑟𝑖𝑔h𝑡)
•Leftprojection𝑝𝑟𝑜𝑗𝑙𝑒𝑓𝑡 ∷A×B ⟶Ais𝑝𝑟𝑜𝑗𝑙𝑒𝑓𝑡 𝑎,𝑏 =𝑎 • Right projection you can work out for yourself!
Polymorphic functions
Is this function polymorphic?
•𝑚𝑖𝑛𝑢𝑠 ∷A⟶Adefinedas𝑚𝑖𝑛𝑢𝑠 𝑥 =−𝑥
Not defined for all sets, e.g. 𝔹, N.
But defined for more than one set, e.g. Z, R.
Precisely – polymorphic over any set for which ‘−’ is defined.
Combining Functions – Composition
•Ifwehavefunctions𝑓 ∷𝐴⟶Band𝑔 ∷𝐵⟶C,wedefineanew function𝑔·𝑓 ∷𝐴⟶Cby
𝑔·𝑓 𝑎 =𝑔(𝑓 𝑎)
Combining Functions – Composition
Twoslidesagowedefined𝑚𝑖𝑛𝑢𝑠 ∷Z⟶Zand𝑖𝑠𝑃𝑜𝑠 ∷Z⟶𝔹.
• Can we define 𝑚𝑖𝑛𝑢𝑠 · 𝑖𝑠𝑃𝑜𝑠 or 𝑖𝑠𝑃𝑜𝑠 · 𝑚𝑖𝑛𝑢𝑠?
• For the one we can define, what is its domain and codomain? • Can you describe in words what this combined function does?
Combining Sets – Set of Functions
Until now have thought of elements of sets, and functions, as different things.
But given sets 𝐴 and 𝐵, we can form a new set 𝐴⟶ 𝐵 – the set of all possible functions from 𝐴 to 𝐵.
There is no difference between writing 𝑓 ∷ 𝐴 ⟶ B and saying “𝑓 is an element of the set 𝐴⟶ 𝐵”.
Combining Sets – Sets of Functions
We can extend the ⟶ symbol to more than two sets via right associativity:
• e.g. 𝐴 ⟶ 𝐵 ⟶ C is 𝐴 ⟶ (𝐵 ⟶ C), the set of functions from 𝐴 to 𝐵 ⟶ C.
• Differentfrom(𝐴⟶𝐵)⟶C
Sogiven𝑓 ∷ 𝐴⟶𝐵⟶C,and𝑎anelementof𝐴,then𝑓(𝑎)isa
functionfrom𝐵to𝐶,andso 𝑓 𝑎 (𝑏)isanelementof𝐶.
We can also think of 𝑓 as a function with two inputs, from 𝐴 and 𝐵, and output to set 𝐶.
Combining Sets – Sets of Functions
(Polymorphic) examples:
•𝑝𝑎𝑖𝑟 ∷𝐴⟶B⟶A×Bisdefinedas(𝑝𝑎𝑖𝑟 𝑎 ) 𝑏 =(𝑎,𝑏)
• compose ∷ (B ⟶ C) ⟶ (A ⟶ B) ⟶ A⟶ C is defined as comp𝑜𝑠𝑒 𝑔 𝑓 𝑎 =𝑔(𝑓 𝑎)
We usually write comp𝑜𝑠𝑒 𝑔 𝑓 as 𝑔 · 𝑓, of course.
Quiz: can you define a sensible polymorphic function from the set A + B ⟶ (A ⟶ C) ⟶ (B ⟶ C) ⟶ C?
From Mathematics to Functional Programming
All of the mathematics in this lecture relates to concepts you will see (soon!) introduced for programming.
But there are some differences in outlook to keep in mind.
From Mathematics to Functional Programming
• Syntax differs between standard mathematics and any given programming language, e.g. f x in Haskell instead of 𝑓 (𝑥).
• Mathematicians are comfortable with infinite constructions, such as real numbers, but programmers restricted to finite memory.
• Mathematical functions assign an output to every input – programmed functions may crash, or get stuck in a loop.
• Mathematical functions are defined entirely by their association of output to input; in programming one mathematically identical function may be better than another, e.g. it may run faster.
Programming
What is a computer?
Photo c/o Early Office Museum Archives
Dorothy Vaughan 1910-2008
• Trained as a mathematician, worked as teacher
• Joined Langley Memorial Aeronautical Laboratory
(later part of NASA) in 1943 as a computer
• Rose to head of West Area Computing by 1949,
making important contributions to data analysis
for the development of space travel
• Reinvented herself as a programmer of electronic
computers, using FORTRAN, in early 1960s
• Background reading: Margot Lee Shetterly, `Hidden Figures’ (2016) – also a film!; Short NASA biography
Human computers to electronic computers
• Human computers provided with an algorithm written in natural language (e.g. English) and/or mathematics, along with data
• Human computers able to use common sense, domain knowledge, mathematical knowledge
• Electronic computers far faster at accurately process large amounts of data but…
• Only understand very limited, inflexible, notion of `language’
• No common sense or domain knowledge
• So more burden on (and fun for!) the programmer – writing and testing
Memory Address
Machine Code (hex)
Assembler Instructions
(Poorly coded) C program
0x100000e9b 0x100000ea2
0x100000eb9 0x100000ebc 0x100000ebf 0x100000ec5 0x100000ec9 0x100000ecd 0x100000ed0 0x100000ed3 0x100000ed6 0x100000ed9 0x100000ede 0x100000ee1
0x100000ee6 0x100000ee9 0x100000eea
c745f0 0000 0000 c7 45 ec 00 00 00 00
8b 45 ec
3b 45 f4
0f 8d 21 00 00 00 48 63 45 ec
48 8b 4d f8
8b 14 81
03 55 f0
89 55 f0
8b 45 ec
0501 0000 00 89 45 ec
e9 d3 ff ff ff
8b 45 f0 99
f7 7d f4
movl $0x0, -0x10(%rbp) movl $0x0, -0x14(%rbp)
movl -0x14(%rbp), %eax cmpl -0xc(%rbp), %eax jge 0x100000ee6
movslq -0x14(%rbp), %rax movq -0x8(%rbp), %rcx movl (%rcx,%rax,4), %edx addl -0x10(%rbp), %edx
movl
movl
addl
movl
jmp 0x100000eb9
movl -0x10(%rbp), %eax cltd
idivl -0xc(%rbp)
%edx, -0x10(%rbp)
-0x14(%rbp), %eax $0x1, %eax
%eax, -0x14(%rbp)
int sum = 0;
int i = 0; loop:
if (i >= length(values))
goto done;
sum += values[i];
i = i + 1;
goto loop; done:
mean = sum / i;
Machine level programming
Not portable: depends on specific CPU model
Error prone
Hard to handle complex problems
Machine dictates the language in which the programmer thinks Full control
Need more problem-oriented languages. Need more abstract and safer languages. Abstraction versus control is a tradeoff!
High level programming languages
Written in a style more appealing to humans
Text needs to be transformed into machine code (or into something that can in turn be transformed)
Interpreter: executes code line-by-line Compiler: transforms whole program
You will see both in this course
Photo c/o guru99.com
Programming paradigms
Control flow: imperative/declarative
Declarative: functional/logic/finite state machines
Allocations and bindings: static/dynamic
Time: event-driven/discrete/synchronous/continuous
Focus: control flow/data flow
Concurrency: sequential/concurrent/distributed
Structure: modular/generics/templates/objects/aspects/agents Determinism: deterministic/non-deterministic
You don’t need to understand all of these; just many ways to program
Imperative Programming
The oldest and most common control flow paradigm
Statements that give the computer orders
In particular commands that change the program’s state
• State corresponds roughly to physical memory of the computer
Machine / Assembly languages are imperative Supported by e.g. Java, C++, JavaScript, Python…
Declarative Programming
Focus on definition / specification of the program you want to write For example functional programs resemble mathematical definitions
Machine level details, e.g. how memory is used, handled by programming language
But not all definitions correspond to good programs, so we must always keep computation in mind.
Dominant paradigm of e.g. Haskell, SQL, Prolog…
But supported by most modern programming languages
Choosing a programming language
• Almost any reasonable programming language can be used to write almost any algorithm
• But languages differ greatly in their approach, strengths, and weaknesses
• It is fairly likely that your first job in programming will involve using a language you never used at university!
• So our job as teachers is to introduce a range of different languages across your education so that you are adaptable
Haskell
The essence of functional programming
Correspondence to mathematics: • Types play the role of sets
• Sets of functions, Sums, Products important
• Programs inhabit types, so are like elements of sets
• Programs usually have function type, i.e. are functions from input to output
• But are really ‘partial’, i.e. can crash, or fail to return an output
• But functions remain pure — they map inputs to outputs, so their result does not depend on the state of memory, external events, etc. If you wish to include such dependencies, they must explicitly be inputs to the function
Functional programming in practice
• Types model objects in the problem domain.
• Programming means defining types and writing functions over types. • Computing with functions means evaluation (reduction).
• Variables name values and cannot vary.
Haskell
• Functional programming language • Launched in 1990
• By Paul Hudak, Philip Wadler, Arvind, Brain Boutel, Jon Fairbairn, Joseph Fasel, Kevin Hammond, John Hughes, Thomas Johnsson, Dick Kieburtz, Rishiyur Nikhil, Simon Peyton Jones, Mike Reeve, David Wise, Jonathan Young
• Named after logician Haskell Curry (1900-1982) • Photo c/o Wikipedia
Haskell is Functional
• Functions are first-class, that is, functions are values which can be used in exactly the same ways as any other sort of value.
• The meaning of Haskell programs is centered around evaluating expressions rather than executing instructions.
Haskell is Pure
Haskell expressions are always referentially transparent:
• no mutations; everything (variables, data structures …) is immutable • expressions are side-effect free
• programs are deterministic – calling the same function with the same arguments results in the same output
Haskell is Lazy
Expressions are not evaluated until their results are needed.
• It is possible to define and work with infinite data structures.
• It enables a more compositional programming style
• BUT it makes reasoning about time and space usage more difficult!
Why we Choose to Teach Haskell
Reasoning about correctness – Close to mathematics, pure so can reason compositionally (piece by piece)
Well supported – Good compiler errors, documentation Important ideas – Types, functional programming
Scientifically important – New language developments often use Haskell (or similar languages)
Level Playing Field – Few students get an unfair advantage from familiarity with the language
Expressions and evaluation (reduction)
Evaluating any Haskell expression works much like a calculator:
42.0 → 42.0
3 + 4 / (1234 – 1) → 3 + 4 / 1233
→ 3 + 3.2441200324412004e-3 → 3.0032441200324413
Expressions and evaluation (reduction)
(3+4)*(5–2)
Or alternatively:
(3+4)*(5–2)
→7*(5–2) →7*3
→ 21
→(3+4)*3 →7*3
→ 21
Can even compute 3+4 and 5-2 concurrently (in parallel)!
Expressions and evaluation (reduction)
double :: Integer –> Integer double n = 2 * n ———————————- double(3+1) → double(3+1)
→ 2*(3+1) →2*4
→8
Applying a function
invert :: Picture -> Picture knight :: Picture
invert knight :: Picture
♞ invert ♘
Applying a function
scale:: Picture -> Integer -> Picture knight :: Picture
scale knight 2 :: Picture
♞ scale ♞ 2
Applying a function
scale:: Picture -> Integer -> Picture knight :: Picture
scale knight knight
♞
♞
scale
Basic Types
Booleans Characters Strings Numeric types
Various Types
Built-in:
Bool Char String Integer Double
☛ True, False ☛ ‘h’
☛ “hello” ☛42,-69
☛ 3.14 Picture ☛ ♞
Custom-defined:
Types
A type is the programming equivalent of the mathematical notion of set • numbers
• pictures
•…
• even functions!
A function accepts inputs from particular types and gives a result of a
particular type too.
For example, function *, accepts two inputs of some numerical type, and outputs a value of the same numerical type.
Static typing
• Helps clarify thinking and express program structure. • Serves as a form of documentation.
• Turns run-time errors into compile-time errors.
Haskell’s expressions are Statically Typed
• Every Haskell expression has a type
• Types are all checked at compile-time.
• Programs with type errors will not compile!
Basic Types
Some basic types and constructions are available in the Prelude: • http://hackage.haskell.org/package/base-4.12.0.0/docs/Prelude.html
• Look for the keywords ‘data’ and ‘type’
• Each comes with its own defined functions
Even more types and defined functions in the basic libraries:
• http://hackage.haskell.org/package/base-4.12.0.0
• These need to explicitly ‘imported’ if you want to use them
Booleans
Named after logician George Boole The Haskell type is called Bool.
Boolean operators:
&& logical “and”
|| logical “or” (inclusive) not logical “negation”
Bool
data Bool = False | True
Boolean Operators (logical connectives)
Operator
Description
&&
and
||
or
not
not (negation)
Truth Tables
&&
True
False
True
True
False
False
False
False
||
True
False
True
True
True
False
True
False
not
True
False
False
True
Truth tables (alternative representation)
t1
t2
t1 && t2
t1 ||t2
not t1
True
True
True
True
False
False
True
False
True
True
True
False
False
True
False
False
False
False
Boolean function definition: “exclusive or”
exOr :: Bool -> Bool -> Bool
exOr x y = (x || y) && not (x && y)
t1
t2
exOr t1 t2
True
True
False
False
True
True
True
False
True
False
False
False
Char: character
Literal characters are written inside single quotes:
‘a’, …, ‘z’, ‘A’, …, ‘Z’, etc.
Escape characters:
‘\t’ tab
‘\n’ newline
‘\\’ backslash (\) ‘\” single quote (‘) ‘\”‘ double quote (“)
String
Prelude> “This is a string!” “This is a string!”
Prelude> “cat” ++ “fish” “catfish”
Prelude> head “cat”
‘c’
Integer
Integer represents whole numbers (positive, zero and negative) of any size (up to the limit of your machine’s memory).
Operation
Description
Example
+, *, –
Add, subtract, multiply two integers
2+2
^
Raise an integer to the power
2^3
div
Whole number division (rounded down)
div 11 5
mod
The remainder from whole number division
mod 11 5
abs
The absolute value of an integer
abs (-5)
negate
Change the sign of an integer
negate (-5)
Int
The Int type represents integers in a fixed amount of space, i.e. Int is bounded.
Thus Int only represents a finite range of integers and the range is guaranteed to be at least
[−2$% …2$% −1]
However, the range can actually be bigger, depending on the compiler and your machine. To find its lowest and greatest bounds on your machine, enter in your GHCi prompt:
minBound :: Int maxBound :: Int
Arithmetic operations applicable to Integer are also applicable to Int.
However, one should take care that the result stays within minBound and maxBound limits to prevent arithmetic overflow.
Prelude> (maxBound :: Int) + 1 -9223372036854775808
Double
• Type Double can be used to represent numbers with fractional parts
(i.e., double-precision floating-point numbers).
• However, there is a fixed amount of space allocated to representing each value of type Double. Therefore, not all fractions can be represented by floating-point numbers. This may result in imprecise arithmetic results:
https://wiki.haskell.org/Performance/Floating_point
Prelude> (3.3)^2 :: Double 10.889999999999999
Operations applicable to Floating-point Numbers
Operation
Description
Example
+, *, –
Add, subtract, multiply two integers
2+2
/
Fractional division
453.3 / 1346.6
^
Exponentiation x^n for a natural number n
2^3
**
Exponentiation x^y for a floating-point number n
3.2**4.5
sqrt
Square root
sqrt 2.6
abs
Absolute value
abs (-5.442)
negate
Change the sign of a number
negate (-5.882)
cos, sin, tan
Cosine, sine and tangent
cos 43
Rational
Type Rational allows full-precision numbers and calculations on them. Its drawback is that arithmetic on numbers of type Rational is slower than on numbers of types Float and Double.
Prelude> (3.3)^2 :: Rational
1089 % 100
Numbers are represented as a ratio of two integer values.
A rational number may be constructed using the % operator.
Beware of the following
• non-numerical results
Prelude> 1 / 0 Infinity
• no automatic conversion from Integral to Float
Prelude> (floor 5.6) + 6.7
…
Prelude> fromIntegral (floor 5.6) + 6.7 11.7
Floating point ⇔ Integral Conversion
fromIntegral converts from any integral type (Int or Integer) to any other numeric type.
round, floor, ceiling convert floating point numbers to Int or Integer.
Strings and values
Prelude> show (2+3)
“5”
Prelude> show (True || False) “True”
Prelude> read “True” :: Bool True
Prelude> read “3” :: Int
3
Relational Operators
Operator
Description
==
equal to
/=
not equal to
>
greater than (and not equal to)
>=
greater than or equal to
<
less than (and not equal to)
<=
less then or equal to
type
We can give existing types new names with the type keyword type IdNumber = Int
This has no computational significance but can make programs more readable.
Algebraic Datatypes
Algebraic Datatypes
The type keyword gave us new names for old types. But what if we wish to define our own type?
We can use the keyworddata
(This is a little confusing, because both define ‘datatypes’. But learning how to use basically arbitrary syntax is part of learning a programming language)
Defining a singleton
data Singleton = LonelyElement
This is a new type, called Singleton, with one element, called
LonelyElement.
We can even write a (not very interesting) program:
lonelyToZero :: Singleton -> Int lonelyToZero LonelyElement = 0
Warnings and Errors
You might imagine a function going the other way:
allToLonely :: Int -> Singleton allToLonely x = LonelyElement
If you run ghci with warnings on (-W) you get a warning – why? If you try to test your code it throws an error – why?
(Finite) sum types
Singletons are not very interesting. How about types with finitely many elements?
We could build any finite set out of singletons using the sum +.
We use the same trick in Haskell using the symbol |.
(Indeed, any type whose main constructor is | is called a sum type)
Bool
This is how Bool is defined in the standard library: data Bool = False | True
built-in
More than Two elements
data Color = Red | Green | Blue | Indigo | Violet deriving (Show, Eq)
user-defined
data Move = Rock | Paper | Scissors
user-defined
Sum types – beyond combining singletons
Last week we saw sets like Z + 𝔹 and Z + Z + 𝔹.
We can do this with | also; we just need to choose tag names: data IntPlusBool = Left Int | Right Bool
data IntPlusIntPlusBool =
First Int | Second Int | Third Bool
What about products?
There is some special purpose notation for products:
• Types: (Int,Bool)and(Int,Int,Bool)
• Meaning: Z × 𝔹 and Z × Z × 𝔹
• Typical elements:(0,True) and (0,1,False)
• It is weird that the round brackets are used for the types – again, just syntax!
For the binary case, e.g. (Int,Bool), there are built-in functions: fst :: (Int,Bool) -> Int snd :: …
We can also use tags for products
data IntBoolPair = Combine Int Bool
Here Combine is just a name, so the meaning is Z × 𝔹
How would you write a fst function?
How about a pairing function Int -> Bool -> IntBoolPair?
Products via Tag combine well with Sums
data Accommodation = Hotel String Double Double | Hostel String Int Double
| Airbnb String Double | Friend String
user-defined
Maybe
This is how Maybe is defined in the standard library:
data Maybe a = Just a | Nothing deriving (Eq, Ord) For example
Maybe Int = Just Int | Nothing
built-in
Still to learn in the future lectures and labs
•polymorphic datatypes •recursive datatypes
Case expressions
Pattern matching
• Sum types list the different ways that a type can be constructed, each with a unique tag name (or constructor) separated by the symbol |.
• Functions that take sum types as input, or that construct something of sum type during computation, almost always need to act differently on the different constructors.
• There is a Haskell expression case that allows us to pattern match to say how each different constructor should be dealt with (also works on values!).
• Where constructors have arguments, we can give names to each argument so that we can refer to them on the right side of the function definition.
data Bool = True | False myNot :: Bool -> Bool
myNot b = case b of
True -> False False -> True
data Bool = True | False myAbs :: Int -> Int
myAbs x =
case x >= 0 of
True -> x False -> -x
data Bool = True | False myAnd :: Bool -> Bool -> Bool
myAnd b c = case (b,c) of
(True,True) -> True (True,False) -> False (False,True) -> False (False,False)-> False
data Bool = True | False myAnd :: Bool -> Bool -> Bool
myAnd b c = case (b,c) of
(True,True) -> True _ -> False
Note that the cases are run through in order from the top, so (True,True) will never trigger the _ case.
data Animal = Cat | Dog | Cow says :: Animal -> String
says x = case x of
Cat -> “meeow” Dog -> “woof” Cow -> “moo”
data Animal = Cat | Dog | Cow | Parrot String says :: Animal -> String
says x = case x of
Cat
Dog
Cow
Parrot name
-> “meeow”
-> “woof”
-> “moo”
-> “pretty ” ++ name
data Animal = Cat | Dog | Cow | Parrot String says :: Animal -> String
says x = case x of
Cat
Dog
Cow
Parrot “”
Parrot name
-> “meeow”
-> “woof”
-> “moo”
-> “caw”
-> “pretty ” ++ name
data Animal = Cat | Dog | Cow | Parrot String | Fox says :: Animal -> String
says x = case x of
Cat
Dog
Cow
Parrot “”
Parrot name -> “pretty ” ++ name
Fox -> error “what does the fox say?”
-> “meeow”
-> “woof”
-> “moo”
-> “caw”
Guarded expressions
• Often the type we are using case on is a Bool.
• There is special notation just for this, called guarded expressions, or just
guards.
• You never need to use guards, but they can make programs easier to write and read, especially when you want to subject your data to lots of different tests with Boolean results.
• Commands that are not essential, but make life easier, are generally known as syntactic sugar.
myAbs :: Int -> Int
myAbs x =
case x >= 0 of
True -> x False -> -x
myAbs x
| x >= 0 = x
| otherwise = -x
approxSize :: Int -> String
approxSize x
| abs(x) >
| abs(x) >
| abs(x) >
| abs(x) >
| abs(x) >
| abs(x) >
| otherwise = “small”
1000000000000000000 = “quintillions” 1000000000000000 = “quadrillions” 1000000000000 = “trillions” 1000000000 = “billions”
1000000 = “millions” 1000 = “thousands”
How would you write this with case?
Evaluated in order until the top-most guard that evaluates to True.
foo x1 x2 … xk
| g1 = e1 | g2 = e2 …
| otherwise
First g1, then g2, etc.
= e
More ‘Sugar’: Piecewise definitions
• Often the type(s) we are using case on are inputs.
• There is then still another way to do our case expression, using piecewise
definition.
• Unlike guards you can live without these, but some people prefer them, so you should at least see them.
myNot :: Bool -> Bool
myNot b = case b of
True -> False False -> True
versus
myNot True = False myNot False = True
myAnd :: Bool -> Bool -> Bool
myAnd b c = case (b,c) of
(True,True) -> True _ -> False
versus
myAnd True True = True myAnd _ _ = False
Introduction to Lists
Recursive Types
• A recursive type is one whose definition makes mention of the type itself.
• This sounds circular — and sometimes it is!
• But also very useful.
• Rather than starting by looking at recursive types in general, we will try to get you comfortable with one example – lists.
Empty Line
Line
Line
Line
Line
Line
…
List of People
Haskell has a built-in type of lists for a situation like this: data [Person] = [] | Person : [Person]
So a line of Persons is either
• empty, or
• A Person followed by a line of Persons.
List of People
Haskell has a built-in type of lists for a situation like this:
data [Person] = [] | Person : [Person]
It is new to see a name we are defining appear on both sides of the = sign!
But otherwise just like definitions we have seen
• [Person] is the name of the new type
• [] is a constructor with no arguments
• : is a constructor with two arguments (written infix), pronounced cons
Special Notation
A [Person] of length one can be written e.g. • Alice : []
• [Alice]
A [Person] of length three can be written e.g. • Alice : (Bob : (Charlie : []))
• Alice : Bob : Charlie : []
• [Alice, Bob, Charlie]
(syntactic sugar!)
(right associativity) (syntactic sugar!)
Lists of other types
There is nothing special about Person – lists are polymorphic.
data [Int] = [] | Int : [Int]
data [Char] = [] | Char : [Char]
data [Int -> Bool] = [] | (Int -> Bool) : [Int -> Bool]
And so on…
Lists of other types: examples
• For every type t, there is a Haskell type [t] of lists of elements from t
• [1,2,3,4,2] :: [Integer] • [True] :: [Bool]
• [‘a’, ‘a’, ‘b’] :: String
• We can build lists of items of any particular type • [[1,2,3], [], [2,6]] :: [[Integer]]
Special Notation for [Char]
type String = [Char]
We can write [‘h’,’e’,’l’,’l’,’o’]
But we would usually write “hello”
We can use all the standard list operations for Strings, including :
Polymorphic list functions in Prelude
:
a -> [a] -> [a]
++
[a] -> [a] -> [a]
!!
[a] -> Int -> a
length
[a] -> Int
head
[a] -> a
last
[a] -> a
tail
[a] -> [a]
replicate
Int -> a -> [a]
take
Int -> [a] -> [a]
drop
Int -> [a] -> [a]
reverse
[a] -> [a]
Recursion
The Droste effect
Advertisement for Droste cocoa, c. 1900
Recursion properties (informally) as observed from the pictures
Each instance size
• may be the same • may change
Termination
• the chain of instances of the problem may go forever (unbounded recursion) • the chain of instances of the problem may terminate
What are (often) the desirable properties from a programming point of view?
Why is recursion important?
• Recursive programs can perform actions (perhaps slightly different each
time) repeatedly
• This is what electronic computers are for!
• Fundamental in functional programming (no iteration)
• Recursive programs are often more succinct and easier to understand
• Recursion is often used as a method of traversing (or navigating) over complex data structures. Linked lists, binary trees, …
• Program verification is easier (induction)
Solving a problem with recursion
• Identify the smallest version(s) of the problem
• e.g. solving the problem on a very small piece of data
• This is called the base case
• Write code that solves that problem – don’t use recursion yet!
• See how solutions at certain scales can be used to solve slightly bigger versions of the problem.
• This is called the step case
• Involves one or more recursive calls, where we just trust that the right
answer will be given!
• Write a case statement that checks if we are currently looking at the base case, and acts accordingly
Fundamental rules for creating good recursive functions
• There must be a base case (or cases)
• Each recursive call must lead towards a base case
Base case – defining a value for which the function is evaluated without recursivelly calling itself
Multiplication via Addition
When you first learned multiplication, you did so via addition:
1×𝑥=𝑥
2×𝑥=𝑥+𝑥 3×𝑥=𝑥+𝑥+𝑥 4×𝑥=𝑥+𝑥+𝑥+𝑥 5×𝑥=𝑥+𝑥+𝑥+𝑥+𝑥
𝑛×𝑥=𝑥+…+𝑥 𝑛 times
Multiplication via Addition
Identify the base case, and how small solutions help to get bigger ones:
1×𝑥=𝑥 2×𝑥=𝑥+1×𝑥 3×𝑥=𝑥+2×𝑥 4×𝑥=𝑥+3×𝑥 5×𝑥=𝑥+4×𝑥
𝑛 × 𝑥 = 𝑥 + (𝑛 − 1) × 𝑥
Base case!
Step case!
Multiplication via Addition, as code
multiply :: Int -> Int -> Int
multiply n x = case n of 1 -> x
m -> x + multiply (m – 1) x
This doesn’t work for 0 or negative numbers. How could it be fixed?
Mutual Recursion
isEven :: Integer -> Bool isEven n
|n==0 =True
| otherwise = isOdd (n – 1)
isOdd :: Integer -> Bool isOdd n
|n==0 =False
| otherwise = isEven (n – 1)
Recursion with Lists
Lists
Recall the definition(s) of lists:
data [Int] = [] | Int : [Int]
data [Char] = [] | Char : [Char]
data [Int -> Bool] = [] | (Int -> Bool) : [Int -> Bool]
(actual definition is polymorphic)
This is an example of a recursive type, where the type [Int]that we are defining is defined by reference to itself.
Recursion
Recursion works by breaking up a problem into repetitive tasks:
• Base case(s) to avoid infinite repetition
• Step case(s) which use ‘smaller’ solutions to build bigger ones, where ‘smaller’ always moves us closer to a base case
• A check (with case or a guard) whether we should call a base or step case.
We saw this work with integers; this approach works especially well with recursive data types.
take
Before we do recursion on a list, let’s do a recursion on an Int that
outputs a list. From the Prelude
take :: Int -> [Double] -> [Double] • (where Double could be replaced by any other type)
take n l will return the first n elements of the list l, or the whole list if n is greater than the length of l, or [] if n is not positive.
List ranges
[1..5] will output the list [1,2,3,4,5] Also defined by recursion on numbers (Lab 5!) In fact the .. notation is quite flexible:
[0,4..20] will output the list [0,4,8,12,16,20] [1..] will output the infinite list of all positive integers [0.1,0.3..1.1] behaves very strangely!
Recursion on Lists
Lists are defined via a base case [] and a step case (:)
So recursive functions on lists often (not always!) use these as their base case and step case.
This is a general fact – recursive functions on recursive types often follow the structure of the type.
length
There is a Prelude function length on lists
Type is e.g. [Double] -> Int
• Actually it has type Foldable t => t a -> Int, but we haven’t learned
much of what that definition means yet!
Base case: an empty list has length 0.
Step case: a non-empty list is 1 longer that its tail.
A common pattern
Many recursions on lists have a similar `shape’ to length
• We choose a behaviour for the empty list
• We define how to combine the head of a list with the recursive call on the tail
squareAll :: [Int] -> [Int]
*Main> squareAll [1,2,3,4] [1,4,9,16]
A common pattern, continued
But there are examples that do not quite fit in our pattern:
minimum :: [Double] -> Double
Finds the least element of a list, or throws an error on the empty list. What are the base case(s)?
Left vs Right
In the examples so far we have done our computations ‘from the right’ of the list – first on the empty list, then the rightmost element, etc…
myProductRight :: [Double] -> Double myProductRight l = case l of
[] -> 1.0
x:xs -> x * product xs
Left vs Right
myProductRight (1.5 : 2.0 : [])
= 1.5
= 1.5
= 1.5
= 1.5
= 3.0
* myProductRight (2.0 : []) * (2.0 * myProductRight []) * (2.0 * 1.0)
* 2.0
But one can often write a recursion that computes from the left.
Left vs Right
myProductLeft :: [Double] -> Double myProductLeft = productAcc 1.0
productAcc :: Double -> [Double] -> Double productAcc x l = case l of
[] -> x
y:ys -> productAcc (x*y) ys
Acc is short for Accumulator.
Left vs Right
myProductLeft (1.5 : 2.0 : [])
= productAcc 1.0 (1.5 : 2.0 : []) = productAcc (1.0 * 1.5) (2.0 : []) = productAcc ((1.0 * 1.5) * 2.0) [] = ((1.0 * 1.5) * 2.0)
= 1.5 * 2.0
= 3.0
Acc is short for Accumulator.
Left vs Right
So which is better?
For computing products, it makes no real difference – both compute the same answer in roughly the same time.
But this is because * is associative:
1.5 * (2.0 * 1.0) == (1.5 * 2.0) * 1.0
If it were not, the order we compute in would matter very much!
Left vs Right – Acronym
acronymR :: [String] -> String acronymR l = case l of
[] -> “”
x:xs -> head x : acronymL xs
acronymL :: [String] -> String acronymL = acronymAcc “”
acronymAcc :: String -> [String] -> String acronymAcc soFar l = case l of
[] -> soFar
x:xs -> acronymAcc (head x : soFar) xs
Zip
zip :: [Int] -> [Char] -> [(Int,Char)]
Runs through two lists, pairing them as far as it can:
>>> zip [1,5] [‘c’,’d’] [(1,’c’),(5,’d’)]
>>> zip [1,5] [‘c’,’d’, ‘e’] [(1,’c’),(5,’d’)]
>>> zip [1,5] []
[]
>>> zip [] [1,4]
[]
Parametric Polymorphism
Parametric Polymorphism
• is a way to make a language more expressive, while still maintaining full static type-safety (every Haskell expression has a type, and types are all checked at compile-time; programs with type errors will not even compile)
• using parametric polymorphism, a function or a data type can be written generically so that it can handle values identically without depending on their type
• such functions and data types are called generic functions and generic datatypes
Polymorphism in Haskell
• Two kinds of polymorphism in Haskell – parametric and ad hoc (coming later!)
• Both kinds involve type variables, standing for arbitrary types. • Easy to spot because they start with lower case letters
• Usually we just use one letter on its own, e.g. a, b, c
• When we use a polymorphic function we will usually do so at a specific type, called an instance. The process is called instantiation.
Identity function
Consider the identity function:
id x = x
Prelude> :t id id :: a -> a
It does not do anything with the input other than returning it, therefore it places no constraints on the input’s type.
Prelude> :t id id :: a -> a
Prelude> id 3 3
Prelude> id “hello” “hello”
Prelude> id ‘c’ ‘c’
Polymorphic datatypes
• The identity function is the simplest possible polymorphic function but not very interesting
• Most useful polymorphic functions involve polymorphic types
• Notation – write the name(s) of the type variable(s) used to the left of the = symbol
Maybe
data Maybe a = Nothing | Just a
• a is the type variable
• When we instantiate a to something, e.g. String, the second part of the sum type will be e.g. Just String
data Maybe String = Nothing | Just String
Pairs
data (,) a b = (,) a b
Note that we usually write (,) a b as (a,b) More than one type variable
Now let’s write a polymorphic function with it…
Pairs
fst :: (a, b) -> a fst (x,_) = x
Prelude> fst (“hello”,’c’) “hello”
Prelude> fst (“hello”,”world”) “hello”
Lists
data [] a = [] | a : [a]
Note that we usually write [] a as [a], and that we have lots of
other syntactic sugar to make list programming more pleasant
When we are designing a function that manipulates lists, how do we decide whether to be parametric polymorphic?
Q: Does my function depend on the type of the list elements?
(Some) Parametric Polymorphic list functions in the Prelude
:
a-> [a] -> [a]
Add a single element to the front: 3:[2,3] ➝ [3,2,3]
++
[a] -> [a] -> [a]
Join two lists together: “Joo”++”young” ➝ “Jooyoung”
!!
[a] -> Int -> a
Return the nth element, counting from 0: [14,3,7]!!1 ➝ 3
concat
[[a]] -> [a]
Concatenate a list of lists into a single list: [[2,3],[],[4]] ➝ [2,3,4]
length
[a] -> Int
The length of a list: length “word” ➝ 4
head, last
[a] -> a
The first/last element: head “word” ➝ ‘w’, last “word” ➝ ‘d’
tail,init
[a] -> [a]
All but the first/last element: tail “word” ➝ “ord”, init “word” ➝ “wor”
replicate
Int -> a -> [a]
Make a list of n copies of an item: replicate 3 ‘c’ ➝ “ccc”
take
Int -> [a] -> [a]
Take n elements from the front: take 3 “Peccary” ➝ “Pec”
drop
Int -> [a] -> [a]
Drop n elements from the front: drop 3 “Peccary” ➝ “cary”
splitAt
Int -> [a] -> ([a],[a])
Split at a given position: splitAt 3 “Peccary” ➝ (“Pec”,”cary”)
reverse
[a] -> [a]
Reverse the order of the elements: [2,1,3] ➝ [3,1,2]
zip
[a] -> [b] -> [(a,b)]
Take a pair of lists into a list of pairs: zip [1,2] [3,4,5] ➝ [(1,3),(2,4)]
unzip
[(a,b)] -> ([a],[b])
Take a list of pairs into a pair of lists: unzip [(1,5),(3,6)] ➝ ([1,3],[5,6])
concat
concat :: [[a]] -> [a]
Prelude> concat [[1,2,3],[4,5],[],[6,7,8]] [1,2,3,4,5,6,7,8]
Prelude> concat [“hello”,” “,”world”]
“hello world”
Prelude> concat [[1,2,3],[False,True]]
– ?????
(Some) Monomorphic list functions in the Prelude
and
[Bool] -> Bool
The conjunction of a list of Booleans: and [True,False] ➝ False
or
[Bool] -> Bool
The disjunction of a list of Booleans: and [True,False] ➝ True
There are also ad hoc polymorphic list functions in the Prelude, to be met later.
Caution: List functions in the Prelude
Some Prelude list functions do not actually look like
Prelude> :t length length :: [a] -> Int
But rather
Prelude> :t length
length :: Foldable t => t a -> Int
For now, wherever you see Foldable, replace every t a by [a].
Code Quality
Testing and Style
What makes for ‘Good Code’?
• Correct
• Fast
• Readable
Ensuring Correctness
Assuming that you’ve dealt with all errors and warnings… • Reading your code
• Testing your code
• Mathematical proof about your code
The Value of Testing
Testing shows the presence, not the absence of bugs
– Edsger W. Dijkstra (1930-2002) From the same source, I also like:
The question of whether Machines Can Think… is about as relevant as the question of whether Submarines Can Swim
Photo c/o http://www.cs.utexas.edu/users/EWD/
Black Box vs White box
Black box testing involves writing tests without looking at your code • Motivated by the specification
• Indeed, can be written before you write a line of code
White box testing involves writing tests motivated by your code • Be your own enemy – how can I break this code?
What makes a good ‘black box’ test?
Identify testing groups of inputs, and test a ‘typical’ representative of each of them
These groups should collectively cover all possibilities
The division should correspond to different choices the program will need to make
• Should behave somehow similarly on inputs from the same group
• Should behave somehow differently on inputs from different groups
What makes a good ‘black box’ test?
Pay particular attention to special cases
• e.g. on the ‘boundaries’ between groups
• Input that is somehow ‘extreme’, e.g. empty input.
These special cases should be tested individually
What makes a good ‘black box’ test?
maxThree :: Int -> Int -> Int -> Int
• The group of inputs where the first is greatest • The group where the second is greatest
• The group where the third is greatest
• Boundary case: situations where some inputs are equal
What makes a good ‘white box’ test?
Identify points where your program `makes a choice’ on input, and test a ‘typical’ input for each possible choice
• case, guards, etc.
• base case vs step case of recursions
Focus particularly on inputs on the boundary between choices, or on any overlapping situations
Pay extra attention to _ and otherwise, which can sweep up many different situations
What makes a good ‘white box’ test?
maxThree :: Int -> Int -> Int -> Int maxThree x y z
| x > y && x > z = x | y > x && y > z = y | otherwise = z
The boundary of the first two tests suggests testing e.g. 2 2 1
Recording your tests
Throwing tests at the ghci prompt is probably how you start, but is not a robust approach to testing big or difficult programs
• You will forget what you’ve tested so far, so miss possible errors • If you change your code, all your old testing is now useless
Instead, write your unit tests in your code, including the expected output
• You can run the same tests repeatedly as your code changes
doctest
A Haskell program that checks all the unit tests in your code, provided they are written in the right format
— | Compute Fibonacci numbers — >>> fib 10
— 55
— >>> fib 5
— 5
fib :: Int -> Int
doctest – practicalities
Call doctest MyFileName.hs from outside ghci
Format: The | is essential
— | Compute Fibonacci numbers ✓ — Compute Fibonacci numbers ✖
Format: Indenting, other spacing, has to be perfect
–5 ✓ –5 ✖
Randomised testing
Many tests with little effort via randomised testing
We won’t spend lecture time on this, but feel free to read up about Haskell’s QuickCheck library for property-based testing in the textbook or online
Randomised testing is powerful but special cases still need to be thought about
• e.g. a program that works on any random Int – except 0
Style
You may have heard people talk about code style What does this mean?
‘Style’ Sometimes Mean This…
Photos c/o Harper’s Bazaar, Rod Authority, Francesco Giusti & World Press Photo
Style
Style (for us) does not mean flamboyance or expressing individuality
It means writing code in a way that is easy to read
This lecture will give some ideas about achieving this • Worth marks in assignments 2 and 3, and the exam!
Our Style Guide on the website (from UPenn) gives more tips
Who reads your Code?
• You
• Colleagues
• Including on a joint project at university!
• Future programmers you’ve never met
• Code reviewers
• In this course, your tutor
Comments
Haskell comments syntax
• — at the start of each line
• {- my comments here… -}
Say what the code does, not how it does it
• Your audience is not people who don’t know Haskell
A lack of comments is bad – but so are too many!
Comments
There are things you might not think of as comments, but can help to play that role
Type declarations for (top-level) functions give vital information
• Never leave them off, even where Haskell can infer them
• The type keyword can help to make type declarations more clear
Unit tests can help to explain what a function is for
Names
The names of functions, variables, constructors, and modules are, for the machine, completely arbitrary
Not so for the reader!
Descriptive names turn your code from gibberish to something readable
• Not the shortest names possible
Use standard naming conventions – camelCase
Miscellaneous
• No dead code
• Minimise repeated code
• Use helper functions (or where) to break up big definitions
• Avoid unnecessarily long code e.g. by using Prelude functions
• Avoid warnings: use _ for unused input
• Consistent indentation (spaces not tabs)
• Lines 80 characters or less wide
• etc…
Anonymous Functions
Anonymous functions
• An anonymous function is a function without a name.
• Anonymous functions will be (sometimes) useful when we use
higher order functions
• So worth getting to know even if their usefulness is not immediately obvious.
• Related to the mathematical notion of lambda calculus.
addOne:: Int -> Int
addOne x = 1 + x
can be rewritten as
addOne = \x -> 1 + x
We can now do without the name addOne…
lambda abstraction
Prelude> (\x -> 1 + x) 2
3
Prelude> (\x -> 1 + x) 100
101
Multiple arguments
add :: Int -> Int -> Int
add x y = x + y
can be rewritten as
add = \x y -> x + y
Prelude> (\x y -> x + y) 3 5 8
Higher Order Functions
Chapter 11 of Thompson
http://learnyouahaskell.com/higher-order-functions
Higher order functions
A function that takes another function as input, or returns a function as output, is called a higher-order function.
Mastering higher order functions will make you a more productive programmer.
A first example
We met the parametric polymorphic identity function:
id :: a -> a
id x = x
Function types are just like any other type, so…
id :: (Int -> Char) -> (Int -> Char)
Reminder about the associativity of -> (Int -> Char) -> (Int -> Char)
Could be written as
(Int -> Char) -> Int -> Char
Butnotas Int->Char->(Int->Char) or Int->Char->Int->Char
Functions as output
You have already seen many functions in this course with functions as output – any function with more than one ‘input’!
f :: Int -> (String -> Char)
• Two inputs (an Int and a String), returning a Char ✓
• One input (an Int), returning a function of type String -> Char ✓ The second point of view is called partial application.
multiply :: Int -> Int -> Int
multiply x y = x * y
multiply :: Int -> Int -> Int
multiply x y = x * y
multiply 2 3
2 3
6
multiply :: Int -> (Int -> Int)
(multiply x) y = x * y
(multiply 2) 3
2 3
6
Partial application
multiply :: Int -> Int -> Int
multiply x y = x * y
2
multiply 2 is a function Int -> Int that doubles its input!
So just as
multiply :: Int -> Int -> Int
is convenient shorthand for
multiply :: Int -> (Int -> Int)
We have
multiply 2 3
as convenient shorthand for (multiply 2) 3
In general:
fe1 e2 …ek
t1 ->t2 ->…tn ->t
are shorthands for:
((…((f e1) e2)…) ek)
t1 -> (t2 -> (…(tn -> t)…))
i.e., function application is left-associative -> is right-associative
Functions as input
applyTwice:: (a -> a) -> a -> a
applyTwice f x = f (f x)
> applyTwice sqrt 16 2.0
> applyTwice (+3) 10 16
Functions as input
applyTwice:: (a -> a) -> a -> a
applyTwice f x = f (f x)
> applyTwice (++ ” HAHA”) “HEY” “HEY HAHA HAHA”
> applyTwice (“HAHA ” ++) “HEY” “HAHA HAHA HEY”
> applyTwice (3:) [1]
[3,3,1]
Functions as input and output – Composition Recall from the very first week that, given functions
f :: a -> b and g :: b -> c We can define their composition, a function
g . f :: a -> c
So composition itself is a higher order function, with functions as both inputs and output.
Composition
(.) :: (b -> c) -> (a -> b) -> (a -> c) (.) g f x = g (f x)
We usually write(.) g f infix as g . f
> ((+1) . (*2)) 1 3
> ((*2) . (+1)) 1 4
We have built-in functions
even :: Int -> Bool
not :: Bool -> Bool
Assume we want to define
more succinctly:
myOdd :: Int -> Bool
myOdd x = not (even x)
myOdd’ = not . even
Fold (also called “reduce”) functions
Often we wish to traverse a list once, element by element, building up an output as we go.
This is called a fold.
There are ‘right folds’ and ‘left folds’; we will try to understand right
folds before we look at the left.
These are the most useful higher order function and are worth understanding – they will make you a more productive programmer!
Folds
Often we wish to traverse a list once, element by element, building up an output as we go.
• Add each number in a list to get the sum of all the elements;
• Multiply each number in a list to get the product of all the elements; • Increment a number by 1 for each element to get the list’s length;
• Turn a list of strings into an acronym;
• Map a function over a list element by element;
• etc…
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs -> x + mySum xs
myLen :: [a] -> Int
myLen list = case list of
[] -> 0
x:xs -> 1 + myLen xs
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs -> x * myProd xs
acro :: [String] -> String
acro list = case list of
[] -> “”
x:xs -> head x : acro xs
How are these definitions different? How are they the same?
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs->x+ mySum xs
myLen :: [a] -> Int myLen list = case list of
[] -> 0
x:xs->1+ myLen xs
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs->x* myProd xs
acro :: [String] -> String acro list = case list of
[] -> “” x:xs->headx: acro xs
The names are different – but Haskell doesn’t really care about that
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs -> x + mySum xs
myLen :: [a] -> Int
myLen list = case list of
[] -> 0
x:xs -> 1 + myLen xs
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs -> x * myProd xs
acro :: [String] -> String acro list = case list of
[] -> “”
x:xs -> head x : acro xs
The types are different – suggests folds are polymorphic
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs -> x + mySum xs
myLen :: [a] -> Int
myLen list = case list of
[] -> 0
x:xs -> 1 + myLen xs
The base cases are different
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs -> x * myProd xs
acro :: [String] -> String
acro list = case list of
[] -> “”
x:xs -> head x : acro xs
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs->x + mySumxs
myLen :: [a] -> Int
myLen list = case list of
[] -> 0
x:xs-> 1 + myLenxs
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs-> x * myProdxs
acro :: [String] -> String
acro list = case list of
[] -> “”
x:xs-> head x : acroxs
The step cases are different – but only a little! – different recipes to combine the head with the recursive call on the tail
mySum :: [Int] -> Int mySum list = case list of
[] -> 0
x:xs -> x+mySum xs
myLen :: [a] -> Int myLen list = case list of
[] -> 0
x:xs -> 1+ myLen xs
Everything else is the same!
myProd :: [Int] -> Int myProd list = case list of
[] -> 1
x:xs -> x*myProd xs
acro :: [String] -> String acro list = case list of
[] -> “”
x:xs -> head x : acro xs
Fold Right
This suggests that we could define a polymorphic higher order function that takes as input
• a base case;
• a recipe for combining the head with a recursive call on the tail and then does all of the rest of the work for us!
No more time-consuming error-prone recursions to write
• Except for all the ones that don’t follow the format of the previous slide
foldr
foldr in action
foldr::(a -> b -> b)-> b ->[a]-> b
foldr :: (Int -> Int -> Int) -> Int -> [Int] -> Int mySum = foldr (+) 0
myProd = foldr (*) 1
foldr in action
foldr::(a -> b -> b)-> b ->[a]-> b foldr::(a -> Int -> Int)-> Int ->[a]-> Int myLen = foldr (\x y -> y + 1) 0
Or, to avoid a warning:
myLen = foldr (\_ y -> y + 1) 0
foldr in action
foldr::(a -> b -> b)-> b ->[a]-> b
foldr :: (String -> String -> String) -> String -> [String] -> String acro = foldr (\x y -> head x : y) “”
How might you write a ‘safe’ version of acro that ignores empty strings instead of crashing on them?
foldr in action
foldr is just another function, and can be used as a helper anywhere:
myMaximum :: [Int] -> Int
myMaximum list = case list of
[] -> error “No maximum of an empty list” x:xs -> foldr max x xs
Take some time to understand all the types here!
Defining foldr
myFoldr :: (a -> b -> b) -> b -> [a] -> b myFoldr combine base list = case list of
[] -> base
x:xs -> combine x (myFoldr combine base xs)
Why is it fold right?
foldr (+) 0 [1,2,3]
= 1 + foldr (+) 0 [2,3]
= 1 + (2 + foldr (+) 0 [3])
= 1 + (2 + (3 + foldr (+) 0 [])) = 1 + (2 + (3 + 0))
So the combining operation associates to the right.
• i.e. start with the base case, combine it with the rightmost element of list, then continue until we reach the leftmost element of the list.
foldr and the structure of lists A list [1,2,3]
is really 1 : (2 : (3 : []) )
foldr replaces [] with a base case and : with a combining function e.g. 1 + (2 + (3 + 0) )
So folding right is very natural because lists themselves associate right
Left folds
mySuml :: [Int] -> Int
mySuml = mySumAcc 0
where
mySumAcc acc list = case list of
[] -> acc
x:xs -> mySumAcc (acc + x) xs
In the above, mySumAcc has type Int -> [Int] -> Int
Left folds
mySuml [1,2,3]
= mySumAcc 0 [1,2,3]
= mySumAcc (0 + 1) [2,3]
= mySumAcc ((0 + 1) + 2) [3]
= mySumAcc (((0 + 1) + 2) + 3) [] = (((0 + 1) + 2) + 3)
No difference for +, but not all combining operations are associative!
It folds the list up from the left side
Defining foldl
myFoldl :: (b -> a -> b) -> b -> [a] -> b myFoldl combine acc list = case list of
[] -> acc
x:xs -> myFoldl combine (combine acc x) xs
Compare to the final line of the right fold:
x:xs -> combine x (myFoldr combine base xs)
foldl, folding left
foldl (+) 0 [1,2,3]
= foldl (+) (0 + 1) [2,3]
= foldl (+) ((0 + 1) + 2) [3]
= foldl (+) (((0 + 1) + 2) + 3) [] = (((0 + 1) + 2) + 3)
So the combining operation associates to the left.
• i.e. start with the accumulator, combine it with the leftmost element of list, then continue until we reach the empty list and return the accumulator.
foldr versus foldl
An example where they give different answers:
> foldr (-) 0 [1,2,3] 2
> foldl (-) 0 [1,2,3]
-6
Because ((0 – 1) – 2) – 3 ≠ 1 – (2 – (3 – 0))
foldr versus foldl
More generally, if we think of lists as built up from the empty list by
using cons repeatedly, then lists are constructed from the right. Therefore foldr tends to follow the way lists are constructed. e.g. foldr (:) [] :: [a] -> [a] is the identity!
foldl goes in the reverse direction from the list’s construction • What happens if you use (:) with foldl?
Ad hoc polymorphism via Type classes
A reminder about parametric polymorphism
Parametric polymorphism refers to functions with some inputs or outputs that could be of any type
• We can also have parametric polymorphic types, but that is not the focus of today
id :: a -> a
length :: [a] -> Int
foldr :: (a -> b -> b) -> b -> [a] -> b
But is this polymorphic?
Is the function
(+) :: a -> a -> a
polymorphic?
Surely not! There are many types for which this does not make sense.
And yet, it does work for more than one type: Double, Int, Integer, Rational, and many more.
Ad hoc polymorphism
(+) is an example of ad hoc polymorphism.
It is defined for certain types, namely though those that belong to a
certain type class (called Num).
(+) has a different definition for e.g. Int vs Double.
The act of giving a function name different definitions for different types is called overloading.
Overloading
A symbol is overloaded if it has two (or more) meanings, distinguished by type.
There is completely different code written in the Prelude to define (+) for Int and for Double
In Haskell overloading is resolved at compile time.
If a function is overloaded, then the compiler must choose between the possible algorithms– this process is called overload resolution.
Ad hoc versus Parametric Polymorphism
• Parametric polymorphic functions use one algorithm to operate on arguments of any types
• Ad hoc polymorphism (may) use a different algorithm for different types
Overloading is an important language feature because many useful functions are not parametric polymorphic: they work only for types that support certain operations.
sum :: Num a => [a] -> a
sum only works for types that have a (+) defined.
Type classes
• Type classes are a language mechanism in Haskell designed to support general overloading in a principled way.
• They allow users to express which overloaded operations they expect to be able to use on their data, and to define their own overloaded operations.
Components of the Haskell type class mechanism
•Type class declaration
• Type class instance declaration
• or deriving • Context
Type class Declaration
defines a set of operations and their types and gives the set a name
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
Any type a that belongs to the Eq type class has two operators == and /= that return a Bool when applied to two values of that type.
Therefore is you want your type a to be an instance of the class Eq then you have to write your own code for == and /=
Instance Declaration
When you use the instance keyword, you then must define all the required functions for that type class.
instance Eq Bool where
b == c = case b of
True -> c
False -> not c
b /= c = not (b == c)
Default Method
Some classes offer default behaviour for functions, which will be used unless it is overwritten.
class Eq a where
(==), (/=)
x /= y x == y
:: a -> a -> Bool
= not (x == y)
= not (x /= y)
So it is actually only necessary to define one of == and /= for your instance.
Deriving
In fact instance Eq Bool is not defined as in the previous slides, but rather something more like
data Bool = False | True deriving Eq
We have already seen this many times in this course!
Can derive multiple type classes by e.g. deriving (Eq,Show)
Deriving
Deriving only works with six type classes
• Eq, Ord, Enum, Bounded, Show, Read
• Significant restrictions on use with Enum and Bounded
It works only with data definitions built up with products and sums (not functions!) and types that are already part of the type class, and assumes any type variables will also be part of the same type class.
It gives a default definition for your type class functions, which may or may not be what you want.
Deriving
data Maybe a = Nothing | Just a
deriving Eq
• Maybe a will be an instance of Eq only if the type instantiating a is. • The default behaviour will be something like:
m==n =case(m,n)of (Nothing, Nothing) -> True (Just x, Just y) -> (x == y) _ -> False
Context
The context of a function’s type expresses a constraint on polymorphism, i.e. which group of overloaded definitions we assume will exist.
elem :: Eq a => a -> [a] -> Bool
elem will work for any type which is an instance of Eq.
We can have multiple constraints in a context:
elem :: (Eq a, Foldable t) => a -> t a -> Bool
Class Extensions (Class Inclusions)
The class Ord inherits all of the operations in Eq,
but in addition has a set of comparison operations and minimum and maximum functions, e.g.:
class Eq a => Ord a where
(<), (<=), (>=), (>) :: a -> a -> Bool
max, min :: a -> a -> a
Eq is a superclass of Ord. Ord is a subclass of Eq.
Benefits of class inclusions
• shorter contexts: a type expression for a function that uses operations from both the Eq and Ord classes can use the context (Ord a), rather than (Eq a, Ord a), since Ord “implies” Eq
• functions for subclass operations can assume the existence of default functions for superclass operations.
Multiple Inheritance of Classes
classes may have more than one superclass
For example, the declaration
class (Eq a, Show a) => C a where …
creates a class C which inherits operations from both Eq and Show.
Type classes of the Prelude
•Eq
• Ord
• Enum
• Bounded
• Num
• Real
• Integral
• Fractional • Floating
• RealFrac
• RealFloat • Semigroup • Monoid
• Functor
• Applicative • Monad
• Foldable
• Traversable • Show
• Read
Eq
• Eq type class provides an interface for testing for equality.
• Any type where it makes sense to test for equality between two
values of that type should be a member of the Eq class.
• If there’s an Eq class constraint for a type variable in a function, the function uses == or /= somewhere inside its definition.
Ord
• Ord is for types that have an ordering
• Ord covers all the standard comparing functions such as >, <, >= and <=
ghci> :t (>)
(>) :: (Ord a) => a -> a -> Bool
Ord is is a subclass of Eq, i.e. any type that is an instance of Ord is also a subclass of Eq.
Enum
• Enum members are sequentially ordered types — they can be enumerated.
• The main advantage of the Enum type class is that we can use its types in list ranges.
• They also have defined successors and predecesors, which you can get with the succ and pred functions.
• Instances of Enum: Bool, Char, Int, Integer, Double…
ghci> [‘a’..’e’]
“abcde”
ghci> [3 .. 5]
[3,4,5]
ghci> succ ‘B’
‘C’
Bounded
Bounded members have an upper and a lower bound.
ghci> minBound :: Int
-2147483648
ghci> maxBound :: Bool
True
ghci> :t minBound
minBound :: Bounded a => a
Num
Its members have the property of being able to act like numbers.
ghci> :t 20
20 :: (Num t) => t
ghci> :t (*)
(*) :: (Num a) => a -> a -> a
Show
• Members of Show can be presented as strings.
• The most used function that deals with the Show type class is show. It takes a value whose type is a member of Show and presents it to us as a string.
ghci> show 3
“3”
ghci> show True
“True”
Read
• The read function takes a string and returns a type which is a member of Read.
ghci> read “True” || False
True
ghci> read “8.2” + 3.8
12.0
ghci> read “[1,2,3,4]” ++ [3]
[1,2,3,4,5]
Read
ghci> read “5”
*** Exception: Prelude.read: no parse
ghci> :t read
read :: (Read a) => String -> a
Type Annotations
Type annotations are a way of explicitly saying what the type of an expression should be.
We do that by adding :: at the end of the expression and then specifying a type.
ghci> read ”5” :: Int
5
ghci> read ”5” :: Double
5.0
Trees
Recap of Lists
Recall the standard definition of lists:
data [] a = [] | a : [a]
This is great, but uses things like infix notation that are slightly unusual How would we defined lists ‘from scratch’?
Defining our own Lists
• Lists are polymorphic, so the definition should include a type variable
• Each list is either • Empty; or
• The product of an element with a list (recursion!) This suggests the definition on the next slide…
User-defined Polymorphic Type for List
data CustomList a = Empty | Cons a (CustomList a)
Name of the datatype we are defining
Unconstrained type variable
Constructor for an empty list
Recursion
unconstrained type
Constructor for a non-empty list
unconstrained type
Defining our own Lists
The definition
data CustomList a = Empty | Cons a (CustomList a) is not as nice as the built-in lists, e.g. we must write [1,2,3] as Cons 1 (Cons 2 (Cons 3 Empty))
but the principle is no different.
From Lists to Trees
data CustomList a = Empty | Cons a (CustomList a)
Lists have one recursive construction attached to the non-empty case. Trees are just the same, except with more than one, for example: data BinaryTree a = Null |
But.. why?
Node (BinaryTree a) a (BinaryTree a)
Family (Binary) Tree
Mother
Father
Paternal grandmother
You
Maternal Maternal grandmother grandfather
Paternal grandfather
Organisation Hierarchy Tree
Parse Tree
Sentence
Noun Phrase
Verb Phrase
Article
Noun Phrase
bit
the
Noun Verb
The man
Article Noun
The man bit the dog
dog
List vs Tree
List is a linear structure
• One item follows the other
Tree is a non-linear structure
• More than one item can follow another.
• In some cases, the number of items that follow can vary from one item to another.
Why Trees?
• More structure
• Faster traversing
• Fits problems like • ”Keep it sorted”
• “Find fast”
• “Express hierarchies” • “Build decision paths”
Applications of Tree Data Structures
• Parsing a natural language sentence or a programming language expression
• Storing the boards in e.g. a Chess or Backgammon program which emulates many potential move combinations
• Index management of databases
• …
Terminology
Nodes
• root (the topmost node, with no incoming edges)
• leaves (the bottom nodes, with no outgoing edges) • parent node
• child nodes
a bc
defg
Edges
Any node is the root of a subtree
💡💡 consisting of it and the nodes below it. Any subtree is a tree.
leaf
leaf leaf
leaf
root
Node
• The depth of a node is the number of edges from the node to the tree’s root node. A root node has a depth of 0.
• The height of a node is the number of edges on the longest path from the node to a leaf. Any leaf node has a height of 0.
Tree
• The height of a tree is the height of its root node
• The height of a tree is equal to the depth of its deepest node
Properties of Trees
• There is exactly one path connecting any two nodes in a tree • Every node, except the root, has a unique parent
a bc
d
• AtreewithNnodeshas(N–1)edges
Trees can be classified
according to
• the precise form of the branching structure
• the location of information within the tree
• the nature of the relationships between the information in different
parts of the tree
Binary Trees
A binary tree is a tree data structure in which each node has
two children, which are referred to as the left child and the right child.
A binary tree is an ordered data structure!
5
37
5
73
is different from
Binary Trees in Haskell
An integer binary tree is either a Null or is given by combining a value and two sub-trees:
data BinaryTree a = Null |
Node (BinaryTree a) a (BinaryTree a)
data BinaryTree a = Null |
Node (BinaryTree a) a (BinaryTree a)
Node (Node Null 8 Null) 5 Null Node 5
Node 8
Null
(We would usually draw the tree without the Nulls)
Null
Null
Binary Search Trees
The type of Binary Search Trees are no different from standard Binary Trees:
type BSTree a = BinaryTree a
But we will use them differently by requiring that elements are sorted.
Binary Search Trees
A binary tree is a binary search tree if • It is Null, or
• It has form Node left val right and
• all values in left are smaller than or equal to val,
• all values in right are larger than or equal to val, and • the trees left and right are also binary search trees
The value of Binary Search Trees
Is an element in a Binary Tree?
• The only way to know is look at every node
Is an element in a Binary Search Tree?
• At each step we can ignore half our tree!
• (We lose this advantage if the tree is not balanced)
Nothing in life is free of course – inserting elements correctly is now more complicated, and slower.
Rose Trees
data Rose a = Rose a [Rose a]
This is a recursive data type which uses another recursive data type in its
definition! example:
Rose 5 [Rose 3 [Rose 1 [], Rose 4 []], Rose 7 [], Rose 4 []]
Rose tree can have an arbitrary and internally varying branching factor (0,1,2, or more).
Rose 5 [Rose 3 [Rose 1 [], Rose 4 []], Rose 7 [], Rose 4 []]
5 374
14
Game Trees
Game Tree
• A game tree is a tree, whose nodes are positions in a game, and edges are moves.
• The complete game tree for a game is the game tree starting at the initial position and containing all possible moves from each position.
The number of leaves in the complete game tree is the number of possible different ways the game can be played.
For example, the game tree for tic-tac-toe has 255,168 leaf nodes.
https://en.wikipedia.org/wiki/Game_tree
Example: Nim
• Start with 15 pebbles
• To make a move you pick up 1, 2, or 3 pebbles
• Whoever picks up the last pebble loses
• e.g.,
1. You pick up 3 (12 left) 2. Ipickup3(9left)
3. You pick up 2 (7 left) 4. Ipickup2(5left)
5. You pick up 1 (4 left) 6. Ipickup3(1left)
7. You pick 1 (0 left)
8. I win!
Example: Nim
• Nim actually has a winning strategy for the player going first!
• Consider Nim with 5 pebbles.
• If it’s my turn and there are 5 pebbles left then I lose: • if I take 1, you take 3, and there is 1 left;
• ifItake2,youtake2;
• ifItake3youtake1.
• Similarly: 9…5, 13…9, etc. • The pattern:
• if it’s my turn and #pebbles mod 4 ≅ 1 then you have a winning strategy: otherwise always leave the number of pebbles congruent to 1 mod 4.
Example: Nim
• Nim is special: a quick calculation tells who will win.
• For a game like chess (or Kalaha or ConnectX) you can’t tell (in
constant time) just by looking at a board who will win.
• So, we must use computation to search possible future states.
• Build a game tree where nodes are states and edges are moves.
• Each row is labelled with the player whose turn it is.
Minimax (for 2-player games)
Minimax is a standard AI strategy for (2-player) games that are • alternating: players take turns
• deterministic: each move has a well-defined outcome (there is no randomness)
• perfect-information: each player knows the complete state of the game (there is no hidden information)
• zero-sum: if I win you lose, and vice versa — what’s good for me is bad for you (but draws are allowed)
Minimax
A minimax algorithm is a recursive algorithm choosing the next move in a two-player game.
• A value is associated with each final state of the game (leaf of the game tree). It indicates how good it would be for me to finish with that position.
• Then for any state whose children all have values, if it is my move I will choose the move with the best value, so I give this state the maximum value from its children.
• If it is my opponent’s move, I assume they will choose the move with the worst value (for me), so I give this state the minimum value.
• This principle propagates recursively up the tree, until we reach the root. I then choose the child of the root with the best value.
Example: Nim
• Starting with 3 pebbles:
Example: Nim
• Next assign each leaf a value, saying who wins.
• Maxie wins if the value is 1.
• Minnie wins if the value is -1.
-1
11
-1
Example: Nim
• Then propagate the labels up the tree.
• If it’s Maxie’s turn the value is the maximum of the children (because Maxie will choose the maximising move).
• If it’s Minnie’s it is the minimum.
• First level.
-1
11 -1
-1
Example: Nim
• Next level.
• For example, on the rightmost subtree if there are two pebbles left then Minnie should take 1, leaving 1, rather than 2, leaving 0.
-1 1 -1 11 -1
-1
Example: Nim
• Next level.
• Maxie should take 2, leaving 1, rather than taking 3 or 1.
1
-1 1 -1 11 -1
-1
Minimax algorithm – strength and limitations
• Minimax computes the value of a game state assuming both players will play optimally.
• It does not account for things like
• “that chess board is confusing, so it will be easy to make a mistake”
• For a game with a bigger search space than Nim we can’t draw out the whole tree!
• Instead, a heuristic must approximate the value of non-final states.
• Nim has a perfect heuristic: number of pebbles congruent to 1 mod 4, i.e. # pebbles `mod` 4 = 1.
Heuristic
https://en.wikipedia.org/wiki/Heuristic_(computer_science)
• A heuristic is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution.
• A heuristic function is a function that ranks alternatives in search at each branching step based on available information to decide which branch to follow.
Trade-off criteria for deciding a heuristic
• Optimality: When several solutions exist for a given problem, does the heuristic guarantee that the best solution will be found? Is it actually necessary to find the best solution?
• Completeness: When several solutions exist for a given problem, can the heuristic find them all? Do we actually need all solutions? Many heuristics are only meant to find one solution.
• Accuracy and precision: Can the heuristic provide a confidence interval for the purported solution? Is the error bar on the solution unreasonably large?
• Execution time: Is this the best known heuristic for solving this type of problem? Some heuristics converge faster than others. Some heuristics are only marginally quicker than classic methods.
Heuristics for games
• A heuristic for a game is an assignment of a value to the state of an unfinished game, indicating how well the game is going for you.
• Many games (e.g. most team sports) have a running score. Are these always a good heuristic?
• For chess, a good heuristic would include which pieces are left, where they are positioned, etc.
• So designing a heuristic requires a strong understanding of the game, and much experimentation.
Greedy algorithm
A greedy algorithm does not bother to construct a game tree; it simply chooses the move that has the best value according to your heuristic.
The better your heuristic, the better your greedy algorithm will play • With Nim, the heuristic is perfect so the greedy algorithm is optimal.
With many games, designing a perfect heuristic is impossible, and looking through the game tree, even just for a few steps, will improve performance.
Minimax algorithm
The overall algorithm is:
1. explore the game tree up to a certain depth (lookahead)
2. use the heuristic to give a value to all leaves, where the game has finished or the depth has been reached
3. assign any node whose children all have values the maximum or minimum value of its children, depending on whose turn it is
4. recursively propagate values up to the children of the root, then choose the child of the root with the best value
Alpha-beta Pruning
The trouble with Minimax
• The full Minimax search of all possible ways a game could be played takes time.
• If there is any time constraint – and there always is! – then it may not be possible to explore the whole game tree.
• A lookahead limit solves this, but fast heuristics tend to be imperfect – a game state might be evaluated as better or worse than it turns out to be when actually playing.
• Another solution is to prune some of the branches –never look at some possible moves.
Alpha-Beta Pruning
Alpha-beta pruning is an extension of Minimax which allows you to avoid searching subtrees of moves which do not lead to the optimal minimax solution.
It will always return the same solution as Minimax, but will usually do so faster.
5 2 7 6 1 3 2 2 6 11 4 7 9 2
7
527
5 7 3 2 11 7 9
5 2 7 6 1 3 2 2 6 11 4 7 9 2
path we are currently exploring
path to currently best-known solution explored but not the best path
pruned (not explored) branches
5 2 7 6 1 3 2 2 6 11 4 7 9 2
≥𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
≤𝟓𝟓 =𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
≤𝟓𝟓 =𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
≤𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≤𝟓𝟓
≥𝟕𝟕
❗️
=𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟓𝟓 =𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟓𝟓 =𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟓𝟓 =𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟏𝟏
≥𝟓𝟓 =𝟓𝟓
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
❗️ ≥𝟕𝟕
=𝟑𝟑
=𝟓𝟓
5 2 7 6 1 3 2 2 6 11 4 7 9 2
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥4
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟕𝟕
=𝟑𝟑
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
=7
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
≤7 =7
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=𝟑𝟑
≤7 =7
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≤7
=𝟑𝟑 =7≥𝟗𝟗
≥𝟕𝟕
5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟓𝟓
=𝟓𝟓 ≤𝟑𝟑 ≤7 ❗️
≥𝟕𝟕 =𝟑𝟑 =7 ≥𝟗𝟗 5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
≥𝟓𝟓 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
=7
=𝟑𝟑 =7≥𝟗𝟗 5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
=7 =𝟓𝟓 ≤𝟑𝟑
≥𝟕𝟕
=7
=𝟑𝟑 =7≥𝟗𝟗 5 2 7 6 1 3 2 2 6 11 4 7 9 2
=𝟓𝟓
Why “Alpha-beta” ?
Any new node is considered as a game-state of a potential solution only if the following equation holdsα ≤ 𝑣𝑣 ≤ 𝛽𝛽
where 𝑣𝑣 is the current (estimate of the) value of the node. Therefore α ≤ 𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
5
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
5
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
52
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
5
52
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼 = -∞
52
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼 = -∞
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=7
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5 5
α≰𝛽𝛽
𝛽𝛽=5 𝛼𝛼=7
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=7
5
α≰𝛽𝛽
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=7
5
α≰𝛽𝛽
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=7
5
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5
527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 527
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5
𝛼𝛼=7 527 1
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5
𝛼𝛼=7 527 13
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5
𝛼𝛼=7 3 527 13
𝛽𝛽=5 𝛼𝛼 = -∞
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛼𝛼=7 3 527 13
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛼𝛼=7 3 527 13
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527 13
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=∞ 𝛼𝛼=5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527 13
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=∞ 𝛼𝛼=5
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527134
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=∞ 𝛼𝛼=5
47
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527 13
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=∞ 𝛼𝛼=7
47
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527 13
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
α≰𝛽𝛽
𝛽𝛽=3 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=∞ 𝛼𝛼=77
47
5
𝛽𝛽=∞ 𝛽𝛽=5 𝛼𝛼=5
𝛼𝛼=7 3 527 13
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=3 α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼=5
527 13 47
𝛽𝛽=∞ 𝛼𝛼=7 3 𝛼𝛼=77
𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=5
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽
𝛽𝛽=∞ 𝛼𝛼=5 5
5
𝛽𝛽=3 α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=9
𝛽𝛽=5 𝛼𝛼=5
527 13 479
𝛽𝛽=∞ 𝛼𝛼=7 3 𝛼𝛼=77
𝛽𝛽=∞
𝛼𝛼=5 𝛽𝛽=7
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=5 𝛽𝛽=∞
𝛼𝛼=73 𝛼𝛼=77 α≰𝛽𝛽
5
𝛽𝛽=3 α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5
𝛽𝛽=7 𝛼𝛼=9
527 13 479
𝛽𝛽=∞
𝛼𝛼=5 𝛽𝛽=7
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=5 𝛽𝛽=∞
𝛼𝛼=73 𝛼𝛼=77 α≰𝛽𝛽
5
𝛽𝛽=3
α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5 7 𝛽𝛽=7
𝛼𝛼=9
527 13 479
𝛽𝛽=∞
7 𝛼𝛼=5 𝛽𝛽=7
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=5 𝛽𝛽=∞
𝛼𝛼=73 𝛼𝛼=77 α≰𝛽𝛽
5
𝛽𝛽=3
α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5 7 𝛽𝛽=7
𝛼𝛼=9
527 13 479
𝛽𝛽=∞
7 𝛼𝛼=5 𝛽𝛽=7
𝛽𝛽=5 𝛼𝛼 = -∞
α≰𝛽𝛽 𝛼𝛼=5
𝛽𝛽=∞ 𝛼𝛼=5 5
𝛽𝛽=5 𝛼𝛼=5 𝛽𝛽=∞
𝛼𝛼=73 𝛼𝛼=77 α≰𝛽𝛽
5
𝛽𝛽=3
α≰𝛽𝛽 𝛽𝛽=∞ 𝛼𝛼=5 7 𝛽𝛽=7
𝛼𝛼=9
527 13 479
Alpha-Beta Pruning Practice
http://inst.eecs.berkeley.edu/~cs61b/fa14/ta-materials/apps/ab_tree_practice/
Optimizing even more…
• Make the heuristic better
• Explore children in a different order – most ‘promising’ first
• Avoid exploring some children altogether?
And remember – make the rest of the code as fast as possible!
• Hint – if you find yourself calculating the same thing multiple times in the same definition, use a variable defined via a where clause: Haskell will then calculate it at most once.
Complexity:
Time and space behaviour
Chapter 20 of the textbook
Evaluation
How fast is my code?
In ghci we can test this by writing
> :set +s
We can undo this command with
> :unset +s
Limitations with timing code
Timing your code can be a great idea! But it has some limitations. • This is testing, and testing is always limited
• What if there is some ‘corner case’ I didn’t test where my code runs slower?
• Particular to the computer I’m working on
• (Unless you’re very careful) sensitive to other things running on my computer which could slow things down unexpectedly
What would a principled, reliable, portable, approach to describing the speed of code look like?
Algorithmic complexity
How many steps does it take to execute an algorithm given an input of size n?
Complexity of functions
Consider:
fn=2×n2 +4×n+13
This function has three components:
• a constant, 13
• aterm4×n
• aterm2×n2
As the value of n increases:
• the constant component is unchanged
• the 4 × n component grows linearly
• the 2 × n2 component grows quadratically
Complexity of functions
Input
Quadratic part
Linear part
Constant part
1
11%
21%
68%
10
79%
16%
5%
100
98%
2%
0.06%
1,000
99.8%
0.2%
0.0006%
10,000
99.98%
0.02%
0.000006%
100,000
99.998%
0.002%
0.00000006%
1,000,000
99.9998%
0.0002%
0.0000000006%
Domination
In the function
f n = 2 × n2 + 4 × n + 13
The quadratic part dominates the other parts: for big inputs the other parts hardly matter at all.
Moreover the 2 attached to the quadratic part is also not very significant: the domination would happen for any constant multiplier
So if we are interested in big inputs, the n2 part is the most significant information
Complexity of functions: “big-Oh” O For large n, the quadratic term dominates in f.
So,wesaythatfisof“ordern2” andwewriteO(n2)
A function f::Integer->Integer is O(g) if there are positive integers m and d such that for all n≥m:
f n ≤ d × (g n)
i.e., that f is bounded above by g: for sufficiently large n the value of f
is no larger than a multiple of the function g.
e.g., ffromthepreviousslideisO(n2),sinceforn≥1: 2×n2 +4×n+13≤2×n2 +4×n2 +13×n2 ≤19×n2
Tight bounds
By the definition of the previous slide, f is also in O(n3).
When we ask the big-Oh class of a function, unless otherwise stated we want a tight bound – the slowest growing function that works for f
• Technical: f is in O(n2), but n2 is also in O(f). This does not hold of O(n3)
y=x2 y=x log x
y=x1
y=log x y=x0
y=10x y=10×2
y= 10x
y=10
Significance to computing
Suppose you measure the speed of your program at various output sizes n, and determine that its speed is expressed by the function
fn=2×n2 +4×n+13
The precise constants, 2, 4, and 13, almost certainly depend on your computer, and would differ from mine.
But the big O class – O(n2) – of your code is very unlikely to differ between computers.
Therefore communicating the big O class is the most relevant and important piece of information you could give me.
Best vs Average vs Worst case
Sometimes different inputs of the same size perform very differently.
Finding a specified element in a list might be very fast if it is at the front of a list, but slow if it is at the end, or not in the list at all
Therefore we may have different functions for the best case, the worst case, and (if we have some reasonable distribution) the average case
Convention – unless otherwise stated, big O notation refers to the worst case.
Space efficiency
Time is not the only relevant dimension to measure performance on.
If your code wastes space – working memory – then it is more likely to run out of space and crash, and will also run somewhat slower
Space can also measured (e.g. in bytes) and expressed as a function of the input size
Again, big O notation is the right tool for expressing this function
Search
Search
One of the most common problems for an algorithm to solve is to search for some data in some data structure.
We will discuss the (time) complexity of search in • a list
• a binary search tree
Simplifying assumptions
• Search can be defined so long as the underlying data type is in Eq. But (==) might be an expensive operation
• However this is not relevant for the complexity analysis of the ad hoc polymorphic search algorithm on its own
• We assume that the data structure we are searching in is completely computed before we start searching
• Not necessarily true due to laziness, but again, not relevant to the search algorithm on its own
Search in a list
elem :: Eq a => a -> [a] -> Bool
elem x list = case list of
[] -> False
y:ys | x == y -> True
| otherwise -> elem x ys
(This is not identical to the definition of elem in the Prelude, but it gives the idea)
Best case analysis
Given a finite list of length n, what is the best case scenario for elem? • We can assume n is large, and so not empty
Best case – the searched-for element is the head of the list • Check if the list is empty
• Check if its main constructor is cons
• Checkifx == y
• Return True
Best case analysis
These operations take time
• maybe a lot of time if (x == y) is expensive
But the time does not depend on the length of the list. Therefore in the best case elem is constant – O(1)
Worst case analysis
The worst case – the searched-for element is not in the list at all
• n+1 checks if the list is empty
• n checks if main constructor is cons • nchecksifx == y
• n recursive calls
• Return False
Worst case analysis
Remember that anything that increases as the input size increases will eventually dominate anything that is constant, so remove all constants
• n+1 checks if the list is empty
• n checks if main constructor is cons • nchecksifx == y
• n recursive calls
• Return False
Worst case analysis
Remember that anything that increases as the input size increases will eventually dominate anything that is constant, so remove all constants
• O(n) checks if the list is empty
• n checks if main constructor is cons • nchecksifx == y
• n recursive calls
Worst case analysis
The function describing the time spent searching the list will then be
n * (cost of empty check + cost of cons check + cost of (==) + cost of recursive call) + some constant
But complexity analysis asks us to ignore constants, whether they are multiplied or added, so…
In the worst case, elem is linear – O(n).
Note that the case where the element is at the end of the list is also O(n).
Average case analysis
Average cases can be tricky – on average, how likely is it that an element will be in a list?
But across the cases that the element is in the list, we can obviously define an average case – where the element is halfway through.
But if searching the whole list takes time O(n), then searching half a list takes time proportional to n/2 – and this is also O(n)
Search in a binary search tree
data BSTree a = Null | Node (BSTree a) a (BSTree a)
bSElem::Ord a => a -> BSTree a -> Bool bSElem x tree = case tree of
Null -> False
Node left node right
|x
| otherwise -> bSElem x right
Search in a binary search tree
In the best case, we are again in O(1). What about worst case? Simplifying assumptions:
• The BSTree really is a binary search tree – all operations on it have kept it correctly ordered
• The tree is balanced – Node Null 1 (Node Null 2 (Node Null 3 (Node Null 4))) is technically a BSTree, but it is not any faster to search than a list!
Logarithms
Logarithms are the inverse of exponentiation: Ifbc = a thenlogba = c
•log21=0 •Log22=1 •log24=2 •log28=3 •log216=4
because 20 = 1 because 21 = 2 because 22 = 4 because 23 = 8
…
Height of a binary search tree
In a balanced BStree the height h as a function of its size n is approximately
h = log2(n + 1)
e.g. a tree of height 4 contains (at most) 15 elements.
We can ignore the ‘+1’, and, for large n, log2n will dominate the number of any missing elements at the bottom layer.
In fact the base 2 is also irrelevant for big O analysis – we can say that h is in O(log n).
Worst and average case analyses of BSElem In the worst case BSElem looks at one element for every layer of the
tree, so it is in O(log n).
The average height of an element in a tree can be approximated as log2(n-1), so the average case analysis for search is again O(log n).
Comparison
n
log2n
multiplier
2
1
2
16
4
4
128
7
18
1,024
10
102
8,192
13
630
65,536
16
4,096
524,288
19
27,594
Sorting
Sorting
Data that is unorganised is a pain to use
It is hence important to know how to sort (put things in order) We will look at sorting lists, with a focus on complexity
Insertion Sort
Problem: sort a list of numbers into ascending order. Given a list [7,3,9,2]
1. Suppose the tail is sorted somehow to get [2,3,9]
2. Insert the head element 7 at the proper position in the sorted tail
to get [2,3,7,9]
The tail is sorted ‘somehow’ by recursion, of course!
Insertion
insert :: Ord a => a -> [a] -> [a]
insert x list = case list of
[] -> [x] y:ys
| x <= y -> x : y : ys
| otherwise -> y : insert x ys
If the input list is sorted, the output list will be also
Complexity of Insertion
Best case: compare only to the head and insert immediately – O(1) Worst case: look through the whole list and insert at the end – O(n) Average case: insert after half the list – O(n)
Insertion Sort
iSort :: Ord a => [a] -> [a]
iSort = foldr insert []
Make sure that you understand this definition • What are the types of its components?
• What does it do on the base case?
• What does it do on the step case?
Most importantly, halfway through the algorithm, which part of the list is sorted, and which is not?
Complexity of Insertion Sort
Best case: the list is already sorted, so each element is put onto the head of the list immediately – O(n)
Worst case: the list is sorted in reverse order!
Each element needs to be compared to the tail of the list. What is the cost of this?
Worst Case Complexity of Insertion Sort
Say sorting the last element of the list takes 1 step
Then sorting the next element takes 2 steps, and so on, with n steps for the first element of the list
So we have the sum 1 + 2 + 3 + … + (n-2) + (n-1) + n
= (n/2) * (n + 1)
Delete the constants and dominated parts of the expression… = O(n2)
Average Case Complexity of Insertion Sort
The average case complexity of insert was the same (with respect to big-O notation) as the worst case.
So we again have 1 + 2 + 3 + … + (n-2) + (n-1) + n = O(n2)
Merge Sort
Merge Sort is a ‘divide and conquer’ algorithm. Intuition: sorting a list is hard…
But sorting a list of half the size would be easier (Divide)
But it is easy to combine two sorted lists into one sorted list (Conquer)
Merge
merge :: Ord a => [a] -> [a] -> [a]
merge list1 list2 = case (list1,list2) of
(list,[]) -> list
([],list) -> list
(x:xs,y:ys)
| x <= y -> x : merge xs (y:ys)
| otherwise -> y : merge (x:xs) ys
If we have two lists sorted somehow, their merge is also sorted Somehow = recursion, again!
Complexity of Merge
Assuming that the two lists are of equal length, let n be the list of the two combined.
Best case: one list contains elements all smaller than the other. Only need roughly n/2 comparisons – O(n)
Worst case: Every element gets looked at, but we are always ‘dealing with’ one element per step – O(n)
Average case: Where best case = worst case, this is obvious!
Merge Sort
mSort :: Ord a => [a] -> [a] mSort list = case list of
[] ->[]
[x] -> [x]
_ -> merge (mSort firstHalf) (mSort secondHalf)
where
firstHalf = take half list secondHalf = drop half list half = div (length list) 2
Complexity of Merge Sort
Much like Insertion Sort, at every step Merge Sort is O(n)
• Length, Taking the first half, Taking the second half, Merging all O(n)
• At the next step there are twice as many mSorts, but each is working with a list only half the length, so still O(n)!
But how many steps are there?
We half the list lengths repeatedly, i.e. call on n, n/2, …,16, 8, 4, 2 So if we started with length 16, there would 4 steps = log2(16)
Cost of O(n) per step * O(log n) steps = O(n log n) • Best, worst, and average the same!
Comparison
n
n log2n
2
n2
multiplier
2
4
2
16
64
256
4
128
896
16,384
18
1,024
10,240
1,048,576
102
8,192
106,496
67,108,864
630
65,536
1,048,576
4,294,967,296
4,096
Comparison of Sorting Algorithms
In the best case, Insertion Sort outperforms Merge Sort. But on average, and in the worst case, merge sort is superior
• In the standard library Data.List, sort :: Ord a => [a] -> [a] is Merge Sort (somewhat optimised versus the one in these slides)
But merge sort is quite space inefficient, so other algorithms, such as ‘Quick Sort‘, can be preferred sometimes
• Time complexity of average case O(n log n), but worst case O(n2)
Other algorithms can outperform these generic approaches if you know a lot about the nature of your inputs
Lazy evaluation
Evaluation
• Evaluation refers to the algorithm that runs your program
• In Haskell evaluation is best understood as ‘replacing definitions by
their right hand sides’
• But most programs contain more than one definition, raising the
question of which order your program will evaluate in
• This can be significant for termination, for speed, and for the timing
of any side effects that happen during your program’s run
These slides give an informal description of Haskell’s approach to evaluation.
A simple example
Consider the program
addOne :: Int -> Int
addOne x = x + 1
How should we evaluate
addOne (2 * 3)
A simple example
Option one: evaluate the input to the function first:
addOne (2 * 3) = addOne 6 =6+1
=7
Option two: put the input expression into the function, evaluating it later:
addOne (2 * 3) = (2 * 3) + 1 =6+1
=7
Eager vs Lazy
The first approach, where we evaluate all inputs before we touch the function, is called eager
The second approach, where we insert the unevaluated input into the function and only evaluate it when needed (if at all!) is called lazy
Haskell is, by default, lazy
• You can force eager evaluation if you wish
• More languages are the other way around – eager by default, but with the capacity to force lazy evaluation
const and laziness Consider the Prelude function
const :: a -> b -> a constx_= x
const never uses its second input. In combination with laziness, this creates some interesting behaviour.
Moral – inputs that are not used are never evaluated
Programming with streams
A stream is an infinite list
In Haskell infinite lists have the same list type we’re used to using
I.e. while we have usually intended our lists to be finite with base case [], there is nothing stopping us defining a list of form
x1 : x2 : x3 : x4 : x5 : …
with no base case!
zeros zeros :: [Int]
zeros
= 0 : zeros (definition) = 0 : 0 : zeros
= 0 : 0 : 0 : zeros
= 0 : 0 : 0 : 0 : zeros
= 0 : 0 : 0 : 0 : 0 : zeros =…
Using streams
If you run zeros it will not terminate • Click Ctrl+C to kill the process
Other programs like sum zeros will also fail to terminate. But…
> const True zeros
True
> take 10 zeros
[0,0,0,0,0,0,0,0,0,0]
> zeros!!1000000000
0
A more interesting example: nats
nats :: [Integer]
nats = 0 : map (+1) nats (definition)
= 0 : map (+1) (0 : map (+1) nats)
= 0 : 1 : map (+1) (map (+1) nats)
= 0 : 1 : map (+1) (map (+1) (0 : map (+1) nats)) = 0 : 1 : map (+1) (1 : map (+1) (map (+1) nats)) = 0 : 1 : 2 : map (+1) (map (+1) (map (+1) nats))
Even more interesting
mystery :: [Integer]
mystery = 0 : 1 : zipWith (+) mystery (tail mystery)
What might this do?
A problem with lazy evaluation?
Lazy evaluation could be very costly if an input is used many times:
replicate :: Int -> a -> [a]
replicate n x
|n<=0 =[]
| otherwise = x : replicate (n-1) x
With eager evaluation replicate n foo will calculate foo once. With lazy, it might calculate it n times!
One more clever Haskell trick
In fact Haskell does not fall into this trap: if it can see that a variable is used many times, it will calculate it once and refer to that value every time it is used.
But this only works for variables (e.g. x in the last example). Haskell will not detect that you have repeated larger amounts of code!
Therefore you need to avoid repeating code, e.g. using variables defined by where clauses.
Summary
• You need to understand what order programs evaluates in to completely understand the termination behaviour and speed of your code
• Haskell is lazy – doesn’t evaluate inputs until it has to
• Laziness has advantages
• More likely to terminate
• Can avoid expensive computations except when you need them • Allows working with infinite data
• But also disadvantages
• More unpredictable with respect to side effects
• Harder to reason about space efficiency (working memory), and sometimes time efficiency
• Sometimes wastes space
• ‘Clever tricks’ can be expensive in their own right