Lab 05

Preprocessing
Text preprocessing is an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. It is important to understand what each preprocessing method does in order to help decide if it is appropriate for your particular task.

Text Wrangling

Text wrangling is converting/gathering/extracting formatted text from raw data.

For example, HTML does not include only text. Even when you extract only the text from HTML, it is not all meaningful (e.g. advertisements).
Have a look at the news article. We might be only interested in getting the headline and body of the article.
The following code removes some irrelevant tags (i.e. script, style, link, etc.) and displays the remaining tags. We will mainly utilize two packages:
• urllib: is a package that collects several modules for working with URLs. We will use urllib.request for opening and reading URLs (See details at urllib.request).
• BeautifulSoup: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree (See details at BeautifulSoup).
In [ ]:
import urllib
from bs4 import BeautifulSoup

url = “https://www.smh.com.au/national/nsw/macquarie-uni-suspends-teaching-for-10-days-to-move-learning-online-20200317-p54avs.html”
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)

# remove irrelevant tags (script, style, link, etc.)
for script in soup([“script”, “style”, “link”, “head”, “noscript”]):
script.extract() # rip it out, i.e remove the tag from the tree

# The get_text() returns all the human-readable text beneath the tag as string
text = soup.get_text()
#print(text) # you can uncomment to have a look the returned text

# The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string
print(soup.prettify())

University of Sydney to move fully online while Macquarie cancels classes

We’re sorry, this service is currently unavailable. Please try again later.

The Sydney Morning Herald

This was published

1

year

ago

University of Sydney to move fully online while Macquarie cancels classes

By

Natassia Chrysanthos

and

Anna Patty

Updated

March 17, 2020 — 6.07pm

The University of Sydney will suspend all face-to-face teaching from Monday and move fully online while Macquarie University has cancelled classes altogether in order to make the digital transition, revealing one of its students tested positive for COVID-19.

The University of Sydney’s 10,000 staff members have been encouraged to work remotely to slow the spread of coronavirus, but the campus Wi-Fi network and facilities will remain open with enhanced cleaning protocols and social distancing measures.

Sydney University will move fully online from next week.

Credit:

Louise Kennerley

Courses with labs and practical components will be adapted for online or suspended until later in the semester while clinical placements for health students will go ahead under strict guidelines, Vice-Chancellor Michael Spence wrote to staff on Tuesday afternoon.

“We are anticipating this will be for the whole of semester and we’re planning on that,” Dr Spence told the

Herald.

He said some business school courses had been designed from scratch, while teachers would adapt other courses throughout semester based on student feedback.

“We’ve put a lot of effort and thought into how to do it. I think this is a tremendous opportunity. This could be an interesting pedagogical experiment,” he said.

The university

has already projected $200 million losses due to coronavirus

. Dr Spence said the expense of adapting courses, bolstering IT systems and student support would “cost us more overall” than regular teaching.

“At the moment we have to spend money to put education online to make sure our students’ experience is uninterrupted as possible. There has been an overwhelming response from students that [is what they want].”

Universities Australia deputy chief executive officer Anne-Marie Lansdown said 39 universities were providing online learning where possible amid the coronavirus pandemic. She said the challenge was increasing the amount of content that can be put online in a very short period of time.

“For some courses it will be much harder – especially those with significant practical or technical requirements,” Ms Lansdown said. “In the case of practical-based learning, where online may not be possible, universities are offering maximum flexibility, including delaying or deferring those components.”

Macquarie Vice-Chancellor Professor Bruce Dowton said in an email to staff on Tuesday morning that face-to-face and online teaching will be suspended for 12 days from Wednesday while the university transitions to online delivery of lectures and seminars.

“It will also allow us to redesign campus-based delivery of our units to modes that support social distancing and remote support,” he said.

Hours later, Macquarie confirmed a student had tested positive for COVID-19 the day before, and that several locations on campus had been cleaned overnight.

“The current advice is that the rest of campus can continue to operate as normal after the completion of intensive cleaning operations and in line with … moving to increase online delivery of educational programs,” a spokesperson said.

But students were concerned they had not been told where the infected student had been on campus and whether some needed to self-isolate.

“Many of the students are shocked that we could be left in the dark and that the university has not been able to make decisions for weeks about how to handle the virus situation,” one student, who requested anonymity, said.

The surrounding area of Macquarie Park was

the first hotspot for coronavirus community transmission in Sydney

. Macquarie’s mid-semester Easter break, due to take place between April 13-26, will now be a normal teaching period. Staff have been encouraged to work from home and non-essential events will been cancelled.

The University of NSW on Tuesday confirmed a third student tested positive for COVID-19 and had exhibited mild symptoms while in a three-hour evening science class last week.

UNSW is

making a quick transition to online learning

and its law school, which does not routinely record its lectures, will cease face-to-face lessons from Wednesday to move all classes online.

“We are currently working to finalise details around classes and, down the track, assessment… There will be some changes to the way that courses are delivered, including to involve more online activities as a replacement for classroom face-to-face discussions. Those are likely to evolve over the term,” acting head of the law school Melanie Schwartz wrote to students on Monday.

Unis try to avoid infection in share accommodation

Universities around the country are also seeking advice from public health experts on how to minimise the spread of COVID-19 among students living in campus accommodation and other share housing.

The Australian National University said many students congregated in residential halls “so across our student residences we’re implementing considered social distancing measures for our dining halls, kitchens and self-catering”.

“We’ve also reformatted social and academic support events and activities so that they have smaller numbers (25 people or less) and in some cases, these will be offered as online connections – we’re being creative to ensure pastoral care and community wellbeing are maintained,” a spokesperson said.

University of Sydney-owned student accommodation and residential colleges have also started additional cleaning and sanitation of common areas. Students are being provided with hand sanitisers, tissues and face masks and advised to keep a physical distance from each other. Housing

is being provided

to students that live in university-owned accommodation who need to self isolate.

Dining times have also been extended to allow students to stagger their meals and restrict access to communal utensils.

The University of NSW has asked students who need to self-isolate to avoid communal areas and avoid sharing utensils and tea towels. They have been advised to use separate bathroom and kitchen facilities and to regularly clean shared facilities. They should also wear a surgical mask while in the same room with any other people.

License this article

Most Viewed in National

Try

tag
Using

tag is a common way to extract the main contents of the online news articles. BUT, do not expect this always provides what you want.
In [ ]:
# The findAll() method returns all the specified tags, it is the same as find_all()
# Set text=True will return only the specified tags with the text inside, you can try to set text=False to compare the difference
p_tags = soup.findAll(‘p’, text=True)
for i, p_tag in enumerate(p_tags):
print(str(i) + str(p_tag))