requests实战之药监总局相关数据爬取

需要的打开抓取工具,并选到XHR,再次点击进行抓包,通过查看抓到的Headers能知道URL是:http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword;是一个POST请求,有五个参数,返回数据类型是text,知道这些之后不难写出如下代码。

Python 网络爬虫
在这里插入代码片import requests
if __name__=="__main__":
    url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
    }
    kw = input("输入地名")
    data={
        'cname':'',
        'pid':'',
        'keyword':kw,
        'pageIndex':'1',
        'pageSize':'10',
    }
    response = requests.post(url=url,headers=headers,data=data)
    response = response.text
    fileName = kw+'.json'
    with open(fileName,'w',encoding='utf-8') as fp:
        fp.write(response)

    print("over...")

更加完整的笔记和文件,欢迎访问,仅供学习交流

https://github.com/jiayoudangdang/python_note_chapter_two_requests_module

 
"""
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
import json
if __name__ == "__main__":
    url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
    city = input('输入城市:')
    data = {
        'cname': '',
        'pid': '',
        'keyword': city,
        'pageIndex': '1',
        'pageSize': '40'
    }
    # 发送请求
    # get就是get ,post就是post,并且要注意,参数有变化的!
    response = requests.post(url=url,data=data,headers=headers)
    # 5.获取响应数据:json()方法返回的是obj(如果确认响应数据是json类型的,才可以使用json())
    dic_obj = response.json()
    # 持久化存储
    fileName = city+'.json'
    fp = open(fileName, 'w', encoding='utf-8')
    json.dump(dic_obj,fp=fp,ensure_ascii=False)
"""
 
 
 
 
"""
import requests
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
city = input('输入城市:')
data = {
    'cname': '',
    'pid': '',
    'keyword': city,
    'pageIndex': '1',
    'pageSize': '40'
}
response = requests.post(url, data=data, headers=headers)
print(type(response))
response = response.json()
print(type(response))
for i in response['Table1']:
    store = i['storeName']
    address = i['addressDetail']
    print('store:' + store, 'address:' + address + '\n')
"""
 
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
import json
if __name__ == "__main__":
 
    url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
 
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
 
    city = input('输入城市:')
 
    data = {
        'cname': '',
        'pid': '',
        'keyword': city,
        'pageIndex': '1',
        'pageSize': '40'
    }
 
    # 发送请求
    # get就是get ,post就是post,并且要注意,参数有变化的!
    response = requests.post(url=url,data=data,headers=headers)
 
 
    # 获取响应数据-这取决于content-type中的数据类型,text就用这个,json就用.josn
    page_text = response.text
 
    # 持久化储存
    fileName = city+'.html'
    with open(fileName,'w',encoding='utf-8') as fp:
        fp.write(page_text)
    print(fileName,'保存成功!!!')

发表评论

邮箱地址不会被公开。 必填项已用*标注