Introduction to Computer Science II

Homework 8

Due by 11:50am on Tuesday, March 8

Reading

Read chapter 11 as well as the week 9 lecture notes.

Problems

Solve the following by implementing the corresponding functions in homework8.py. When done, submit that file through D2L.

Problems 11.7, 11.8, 11.9(e)(f)(g), 11.16, 11.19, as well as problem B [Lab]
You will need file frankenstein.txt for problem 11.9 and lists.html for problem 11.16.
Usage to test your solution for 11.19:
>>> url = 'http://reed.cs.depaul.edu/lperkovic/test1.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
set()
>>> url = 'http://reed.cs.depaul.edu/lperkovic/test2.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
{'lperkovic@cs.depaul.edu'}
>>> url = 'http://reed.cs.depaul.edu/lperkovic/test3.html'
>>> content = urlopen(url).read().decode('utf-8')
>>> emails(content)
{'nobody@xyz.com'}


B.
    Write a recursive function spam(url, n) that takes a url of a web page as input and a non-negative integer n, collects all the email addresses contained in the web page and adds them to a global set variable spam_set, and then recursively calls itself on every HTTP hyperlink contained in the web page. You will use a set so only one copy of every email address is saved. The recursive call should use the argument n-1 instead of n. If n = 0, no recursive calls should be made. The parameter n is used to limit the recursion to at most depth n. I recommend you use the Collector class developed in class and your solution to problem 11.19.

NOTES:
  1. Running spam() directly will produce no output on the screen; to find your spam_dict, you will need to read the value of spam_dict, and you will also need to reset it to the empty dictionary before every run of spam().
  2. Recall how global variables are used.
Usage:
>>> url = 'http://reed.cs.depaul.edu/lperkovic/test1.html'
>>> spam_set = set()
>>> spam(url,0)
>>> spam_set
set()
>>> spam_set = set()
>>> spam(url,1)
>>> spam_set
{'nobody@xyz.com', 'lperkovic@cs.depaul.edu'}
>>> spam_set = set()
>>> spam(url,2)
>>> spam_set
{'nobody@xyz.com', 'lperkovic@cs.depaul.edu'}