Introduction to Computer Science II

Final Exam Practice 2


Solve the below problems by implementing the appropriate methods in the file FinalPractice2.py.

Problems

1.    Implement recursive function posProd() that takes a list of integers as input and returns the product of all the positive numbers in the list; non-positive numbers are ignored. If no number is positive then 1 should be returned. Your implementation must be recursive, without any loops.
Usage:
>>> posProd([])
1
>>> posProd([0, -1, -4])
1
>>> posProd([0, -1, 4])
4
>>> posProd([0, 2, 4])
8
>>> posProd([0, 2, -2, 4, 3])
24


2.    Implement recursive function dirSize() that takes a pathname (as a string) as input. If the pathname refers to a regular file, the function should return the size of the file; if the pathname is a folder, the function should return the sum of the sizes of all regular files contained in the directory, whether directly in the folder or indirectly through subfolders.

To get the size of a regular file you can use function getsize() from module os.path that takes a file pathname as input and returns its size. You will also need functions isfile() and join() from module os.path and function listdir() from module os (all of which we have used multiple times in the course). Your implementation must be recursive and not use any Python Standarad Library functions .

Test your function on 1) your file final.py that you are working on and 2) the folder test obtained by downloading the file test.zip and unziping it in the same folder as your file FinalPractice2.py
Usage:
>>> dirSize('FinalPractice2.py')
317                     # Note: this number will be different for you
>>> dirSize('test')
102


3.    Develop class ElementCounter as a subclass of HTMLParser that takes an HTML element tag as input and, when fed an HTML file, counts the number of such elements in the HTML document. The usage shown below using lists.html shows that the file contains 3 ordered lists (with tag 'ol') and 2 unordered lists (with tag 'ul'). The class ElementCounter should support method elementCount() that takes no input arguments but returns the number of elements with the given tag. Test your solution on lists.html
Usage:
>>> infile = open('lists.html')
>>> content = infile.read()
>>> infile.close()
>>> p = ElementCounter('ol')
>>> p.feed(content)
>>> p.elementCount()
3
>>> p = ElementCounter('ul')
>>> p.feed(content)
>>> p.elementCount()
2

4.    In each of the three exercises in function test, construct the regular expression that attempts to match the below described list of words in file frankenstein.txt.

Usage:
>>> test('frankenstein.txt')


Exercise i: words that start with string 'inter' or 'Inter'
['-interest', ' interview', ' intertwined', ' interchanging', '\ninterest', ' intercourse', '\ninteresting', ' internal', '\nintervening', ' interfere', ' interest', '\ninterment', '\ninterchange', ' interests', ' interruption', '\ninterpretation', ' interference', ' intercept', ' interfered', ' interpreted', ' interspersed', ' interested', ' interpretation', ' interpreter', ' intercepted', '\nintervened', ' Interpret', ' interval', ' interesting', ' interrupt', ' interrupted', ' intermixed', ' intersected', ' interchange', ' intervals', '\ninterview', ' interment']


Exercise ii: words that start with an upper case and end with letters 'ar'
[' Hear ', ' Dear ', ' Caesar ', ' Vicar ']


Exercise iii: words that contain the string 'death' as a substring and end with 'e'
[]



5.
    Write function search() that takes a string url as input and then crawls through all the web pages that can be reached from the web page with URL url. No web page should be visited more than once. Your crawler will count the number of hyperlinks in each visited web page and store the result in a global dictionary variable d. At the end of the crawl, dictionary d will contain (key: value) pairs where the keys are URLs found through the crawl and the value for each key (URL) is the number of hyperlinks in the web page associated with the URL.

Usage:
>>> search('http://reed.cs.depaul.edu/lperkovic/test1.html')
>>> d
{'http://reed.cs.depaul.edu/lperkovic/test1.html': 2, 'http://reed.cs.depaul.edu/lperkovic/test2.html': 1, 'http://reed.cs.depaul.edu/lperkovic/test4.html': 0, 'http://reed.cs.depaul.edu/lperkovic/test3.html': 1}