Algorithmic thinking
Algorithmic thinking
When you come to write a computer program, the first thing to do is to develop a step-by-step solution to the problem:
break it down into a series of small, more manageable problems
consider how similar problems have been solved previously
focus only on the important information, not the details
design simple steps or rules to solve each of the smaller problems
Here’s an example
Literally, an example:
http://www.example.com/
We’re going to extract data from this website, and find out how many paragraphs there are in it, and what the text of the first paragraph says.
How do you do this?
1. break it down into a series of small, more manageable problems
Each problem will have several steps to it:
Get the information from the website
Set up the Python program to use a GET request
Point it to the desired website
Get the information back and store it in list variable
Query the information to see how many paragraphs it has
Check number of paragraphs by counting the
tags
Find out what the text of the first paragraph says
Use our knowledge of lists to ask for information about a specific paragraph
2. consider how similar problems have been solved previously
Look at online resources. This is vital: there’s no point doing all the work all over again when someone has done it before.
https://realpython.com/python-web-scraping-practical-introduction/
https://dev.to/ayushsharma/a-guide-to-web-scraping-in-python-using-beautifulsoup-1kgo
3. focus only on the important information, not the details
This is abstraction. Ignore the things that don’t apply to your situation. Focus on the overall process instead. Filter out the parts that are irrelevant.
e.g. you’re not interested in anything other than getting the data and looking at the paragraphs.
4. design simple steps or rules to solve each of the smaller problems
This is your algorithm. Take the smaller problems from step 1 and think about how Python needs the instructions.
You can write it out in pseudocode, which is a way of describing the problem in coding-type language but at a higher level.
Let’s look at the website:
This is the
section. The browser doesn’t display the information in here. The head section contains information telling the browser how to interpret the page and display it.The information (the black text) between these two blue lines is CSS, which describes the appearance of a page.
This is the part we’re interested in. This is the body of the page, and everything between and is the page content.
The body of the page:
Let’s get the whole page:
Get the information from the website:
Set up the Python program to use a GET request
Point it to the desired website URL
Get the information back and store it in a variable
# set up Python to use the requests library:
import requests
# create a variable to store the url of the website:
url = http://www.example.com/
# Tell Python to go to the URL of the website and return the data to a variable called webdata:
webdata = requests.get(url)
# Now, print webdata.content to get the raw HTML content of the webpage. It is of ‘string’ type:
print(webdata.content)
10
How do you do this?
Query the information to see how many paragraphs it has
Check number of paragraphs by counting the
tags
What’s the contents of the first paragraph?
Questions?
/docProps/thumbnail.jpeg