Boin: Extract site that HTML document came from

Thursday, 22 August 2013

Extract site that HTML document came from

Extract site that HTML document came from

I have a folder full of HTML documents that are saved copies of webpages,
but i need to know what site they came from, what function can i use to
extract the website name from the documents? I did not find anything in
the BeautifulSoup module. Is there a specific tag that i should be looking
for in the document?

Boin

Thursday, 22 August 2013

Extract site that HTML document came from

No comments:

Post a Comment