python爬虫(二) - 公司荣誉 - 长春市隆兴伟业物流有限公司
现在的位置: 主页 > 公司荣誉 > 文章正文
python爬虫(二)
作者:长春市隆兴伟业物流有限公司 来源:www.lxwywl.com 发布时间:2017-09-12 16:58:46
python爬虫(二) python爬虫(二):python爬虫又该怎么使用呢?希望下面的文章对大家有所帮助。

爬虫连续抓取数据 time.sleep(4)

from bs4 import BeautifulSoup import requests import time url_saves = '#37685322' url = 'https://cn.tripadvisor.com/Attractions-g60763-Activities-New_York_City_New_York.html' urls = ['https://cn.tripadvisor.com/Attractions-g60763-Activities-oa{}-New_York_City_New_York.html#ATTRACTION_LIST'.format(str(i)) for i in range(30,930,30)] headers = { 'User-Agent':'', 'Cookie':'' } def get_attractions(url,data=None): wb_data = requests.get(url) time.sleep(4) soup = BeautifulSoup(wb_data.text,'lxml') titles = soup.select('div.property_title > a[target="_blank"]') imgs = soup.select('img[width="160"]') cates = soup.select('div.p13n_reasoning_v2') if data == None: for title,img,cate in zip(titles,imgs,cates): data = { 'title' :title.get_text(), 'img' :img.get('src'), 'cate' :list(cate.stripped_strings), } print(data) def get_favs(url,data=None): wb_data = requests.get(url,headers=headers) soup = BeautifulSoup(wb_data.text,'lxml') titles = soup.select('a.location-name') imgs = soup.select('div.photo > div.sizedThumb > img.photo_image') metas = soup.select('span.format_address') if data == None: for title,img,meta in zip(titles,imgs,metas): data = { 'title' :title.get_text(), 'img' :img.get('src'), 'meta' :list(meta.stripped_strings) } print(data) for single_url in urls: get_attractions(single_url) # from mobile web site ''' headers = { 'User-Agent':'', #mobile device user agent from chrome } mb_data = requests.get(url,headers=headers) soup = BeautifulSoup(mb_data.text,'lxml') imgs = soup.select('div.thumb.thumbLLR.soThumb > img') for i in imgs: print(i.get('src')) '''

企业建站2800元起,携手武汉肥猫科技,做一个有见地的颜值派!更多优惠请戳:武汉网站建设 https://www.feimao666.com


  • 上一篇:python,比较两个xml
  • 下一篇:最后一页
  • 
    COPYRIGHT © 2015 长春市隆兴伟业物流有限公司 ALL RIGHTS RESERVED.
    本站所有原创信息,未经许可请勿任意转载或复制使用 网站地图 技术支持:肥猫科技
    精彩专题:网站建设
    购买本站友情链接、项目合作请联系客服QQ:2500-38-100