Chém gió

Thứ Ba, 5 tháng 5, 2015

scraper website content should pay attention

I try to get a list of product from key word on amazon website and it should be automatically done.
I use beautiful soup and urllib2.
But web site I see and what I scraped it was slightly difference(on website I see more item).
After google around I found that when we use automate tool for scraping web site we have to fake a browser by providing User-agent to the header we can do this as follow:
>>> import urllib2
>>> opener = urllib2.build_opener()
>>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>>> url = ""
>>> response =
>>> page =
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(page)
Work like a cham :D