Introduction to Computer Science II
Homework 8
Due by 11:50am on Tuesday, March 8
Reading
Read chapter 11 as well as the week 9 lecture notes.
Problems
Solve the following by implementing the corresponding functions in homework8.py. When done, submit that file
through D2L.
Problems 11.7, 11.8, 11.9(e)(f)(g), 11.16, 11.19, as well as problem
B [Lab]
You will need file frankenstein.txt
for problem 11.9 and lists.html for
problem 11.16.
Usage to test your solution for 11.19:
>>> url =
'http://reed.cs.depaul.edu/lperkovic/test1.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
set()
>>> url =
'http://reed.cs.depaul.edu/lperkovic/test2.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
{'lperkovic@cs.depaul.edu'}
>>> url =
'http://reed.cs.depaul.edu/lperkovic/test3.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
{'nobody@xyz.com'}
B. Write a recursive function spam(url,
n) that takes a url of a web page as input and a non-negative
integer n, collects all the email addresses contained in
the web page and adds them to a global set variable spam_set,
and then recursively calls itself on every HTTP hyperlink contained
in the web page. You will use a set so only one copy of every email
address is saved. The recursive call should use the argument n-1
instead of n. If n = 0, no recursive calls
should be made. The parameter n is used to limit the
recursion to at most depth n. I recommend you use the Collector
class developed in class and your solution to problem 11.19.
NOTES:
- Running spam() directly will produce no output on
the screen; to find your spam_dict, you will need to
read the value of spam_dict, and you will also need to
reset it to the empty dictionary before every run of spam().
- Recall how global variables are used.
Usage:
>>> url =
'http://reed.cs.depaul.edu/lperkovic/test1.html'
>>> spam_set = set()
>>> spam(url,0)
>>> spam_set
set()
>>> spam_set = set()
>>> spam(url,1)
>>> spam_set
{'nobody@xyz.com', 'lperkovic@cs.depaul.edu'}
>>> spam_set = set()
>>> spam(url,2)
>>> spam_set
{'nobody@xyz.com', 'lperkovic@cs.depaul.edu'}