Extract site that HTML document came from
I have a folder full of HTML documents that are saved copies of webpages,
but i need to know what site they came from, what function can i use to
extract the website name from the documents? I did not find anything in
the BeautifulSoup module. Is there a specific tag that i should be looking
for in the document?
No comments:
Post a Comment