makerroot - 中国天气网市区、乡镇网址爬取

中国天气网市区、乡镇网址爬取

编辑时间: 2020-08-31 15:13:00
浏览量: 1783
作者: makerroot

所有标签: python
文章分类: python后端
评论数: 暂无评论

思路描述

1 首先找出中国的所有省份、直辖市、特别行政区的所有code

2 根据code在查找每个省份、直辖市、特别行政区的市/县code

3 根据市/县code在查找镇code

4 最后使用pandas进行数据存储

代码

import requests
import json
import pandas as pd


class sctrapy_data:
    '''
    思路描述：
    1 首先找出中国的所有省份、直辖市、特别行政区的所有code
    2 根据code在查找每个省份、直辖市、特别行政区的市/县code
    3 根据市/县code在查找镇code
    4 最后使用pandas进行数据存储
    '''

    def __init__(self, city={}, start_page='http://www.weather.com.cn/data/city3jdata/china.html'):
        self.start_page = start_page
        self.city = city
        self.city_code_list()
        self.prov = []
        self.city_list = []
        self.town = []
        self.page_list = []

    def deal_with_requests(self, page=None):
        if page is not None:
            response_content = requests.get(page)
            if response_content.status_code == 200:
                json_content = json.loads(response_content.content)
                return json_content
            else:
                raise Exception(page + '该页面请求异常')
        else:
            raise Exception('请传递要请求的页面链接!!!')

    def city_code_list(self):
        json_content = self.deal_with_requests(self.start_page)
        self.city = json_content

    def deal_with_data(self):
        for code in self.city:
            json_content = self.deal_with_requests(
                'http://www.weather.com.cn/data/city3jdata/provshi/' + str(code) + '.html')
            for city_code in json_content:
                detail_info = self.deal_with_requests(
                    'http://www.weather.com.cn/data/city3jdata/station/' + str(code) + str(
                        city_code) + '.html')
                for detail_code in detail_info:
                    self.prov.append(self.city.get(code))
                    self.city_list.append(json_content.get(city_code))
                    self.town.append(detail_info.get(detail_code))
                    self.page_list.append('http://www.weather.com.cn/weather/' + str(code) + str(detail_code) + str(
                        city_code) + '.shtml')
        result = pd.DataFrame({'省': self.prov, '市区': self.city_list, '区/县': self.town, '链接': self.page_list})
        result.to_excel('result.xlsx', index=False)


if __name__ == '__main__':
    sd = sctrapy_data()
    sd.deal_with_data()

上一篇: Python-asyncio异步编程、python垃圾回收机制剖析和jwt揭秘（含源码示例）

中国天气网市区、乡镇网址爬取

思路描述

代码

提交评论

评论列表

热门阅读

随机推荐

最新推荐

关于本站

读者观点

微信公众号

在线工具