数据为什么值钱,数据经过分析处理才有价值,当然你还得展示出来!
参考来自于 大江狗 的分享
Django实战: Python爬取链家上海二手房信息,存入数据库并在前端显示
微信公众号:Python Web与Django开发
感兴趣可以自行学习实践!
以下为个人简单实践,
比较渣,
望见谅!
去哪儿景点门票信息:piao.qunar.com
选定北京地区
去哪儿景点信息爬虫源码:
1.fake_useragent模块随机生成协议头
2.bs4对于信息的抓取
3.类的处理使用
#去哪儿景点信息抓取 # -*- coding: UTF-8 -*- import requests import re,time,os from bs4 import BeautifulSoup from fake_useragent import UserAgent import time class Qner(object): def __init__(self): self.ua=UserAgent() self.headers={"User-Agent":self.ua.random} self.url='https://piao.qunar.com/ticket/list.htm?keyword=' self.city=city self.pagemax=int() self.hrefs=[] def get_pagemax(self): url=f'{self.url}{city}' response=requests.get(url,headers=self.headers) if response.status_code==200: soup=BeautifulSoup(response.text,'lxml') a=soup.find('div',class_="pager").find_all('a') pagemax=a[-2].get_text() self.pagemax=int(pagemax) def get_urllist(self): for i in range(1,self.pagemax+1): url=f'{self.url}{city}&page={i}' print(url) response = requests.get(url, headers=self.headers) time.sleep(2) if response.status_code == 200: soup = BeautifulSoup(response.text, 'lxml') divs=soup.find_all('div',class_="sight_item_detail clrfix") for div in divs: name=div.find('a',class_="name").get_text() print(name) address=div.find('p',class_="address color999").find('span').get_text() print(address) try: price=div.find('span',class_="sight_item_price").find('em').get_text() print(price) except: print("价格不详!") href = div.find('h3',class_='sight_item_caption').find('a')['href'] href = f'https://piao.qunar.com{href}' self.hrefs.append(href) print(self.hrefs) time.sleep(5) if __name__ == '__main__': city="北京" spider=Qner() spider.get_pagemax() spider.get_urllist()
数据存储于django模型中:
def save_data_to_model(self): for item in self.data: new_item = Qner() new_item.name = item['name'] new_item.address = item['address'] new_item.price = item['price'] new_item.save()
第一步:创建django项目
方法一:pycham 新建django项目 qunaer_spider
方法二:黑屏终端创建,cmd命令
django-admin startproject qunaer_spider
第二步:创建app
manage.py startapp qunaer
第三步:app层 创建模型字段
app models.py
from django.db import models # Create your models here. class Qner(models.Model): name=models.CharField(max_length=20,verbose_name="景点名字") address=models.CharField(max_length=60,verbose_name="景点地址") price=models.CharField(max_length=10,verbose_name="景点地址") def __str__(self): #显示标题 return self.name
CharField 数据类型 一定要设定字符串长度 max_length属性!!
ef __str__(self): #显示标题 return self.name
返回标题
第三步:在项目层 设置里面添加 app
这一步不操作的话,数据迁移会报错!
项目 setting.py
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'qunaer', ]
直接添加 app 名称即可!
第四步:迁移数据操作
这里需要进行两步操作,也就是两个命令!
第一步:python manage.py makemigrations
第二步:python manage.py migrate
当我们执行了 python manage.py makemigrations 后,django 在 blog 应用的 migrations 目录下生成了一个 0001_initial.py 文件,这个文件是 django 用来记录我们对模型做了哪些修改的文件。
我们在 models.py 文件里创建了 模型类,django 把这些变化记录在了 0001_initial.py 里。不过此时还只是告诉了 django 我们做了哪些改变,为了让 django 真正地为我们创建数据库表,接下来又执行了 python manage.py migrate 命令。
django 通过检测应用中 migrations 目录下的文件,得知我们对数据库做了哪些操作,然后它把这些操作翻译成数据库操作语言,从而把这些操作作用于真正的数据库。
需知:当你改变模型的时候,都需要进行这两项操作,不然数据会出问题!
第五步:设置urls路径地址
项目层urls
from django.contrib import admin from django.urls import path,include urlpatterns = [ path('admin/', admin.site.urls), path('qunaer/', include('qunaer.urls')), ]
app层urls
新建urls.py
from django.urls import path from . import views #导入模型 urlpatterns=[ pass ]
第六步:实现 hello wolrd!
我们修改视图函数,app层的 views.py
修改app层 urls.py
from django.shortcuts import render from django.http import HttpResponse # Create your views here. def hello_world(request): return HttpResponse("hello world!") 修改app层 urls.py from django.urls import path from . import views urlpatterns=[ path('',views.hello_world,), ]
这个时候 我们访问 本地路径 http://127.0.0.1:8000/qunaer/
就能看到 "hello world!"
最后,我们来实现去哪儿门票信息数据展示!
app视图层
import os import django os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'qunaer_spider.settings') django.setup() from django.shortcuts import render from django.http import HttpResponse from qunaer.models import Qner from django.core.paginator import Paginator # Create your views here. def hello_world(request): return HttpResponse("hello world!") def index(request): qunaer_list=Qner.objects.all() paginator = Paginator(qunaer_list, 15) page = request.GET.get('page') page_obj = paginator.get_page(page) return render(request, 'qunaer/index.html',{ 'page_obj': page_obj, 'paginator': paginator, 'is_paginated': True, }) def get_page(request,qunaer_id): qunaer_list = Qner.objects.all() qner=None previous_index=0 next_index=0 previous_qner=None next_qner=None for index,qunaer in enumerate(qunaer_list): if index == 0: previous_index = index next_index = index + 1 elif index==len(qunaer_list)-1: previous_index = index-1 next_index = index else: previous_index = index - 1 next_index = index + 1 if qunaer.qner_id==qunaer_id: qner=qunaer previous_qner = qunaer_list[previous_index] next_qner = qunaer_list[next_index] break top8_qunaer_list=Qner.objects.order_by('-qner_id')[:8] return render(request, 'qunaer/page.html', { 'qner': qner, 'top8_qunaer_list':top8_qunaer_list, 'next_qner':next_qner, 'previous_qner':previous_qner, }) # Create your views here. from fake_useragent import UserAgent import requests,time from bs4 import BeautifulSoup class Qnaer(object): def __init__(self,city): self.ua = UserAgent() self.headers = {"User-Agent": self.ua.random} self.url = 'https://piao.qunar.com/ticket/list.htm?keyword=' self.city = city self.pagemax = 10 #self.pagemax = int() self.hrefs = [] self.data = list() def get_pagemax(self): url = f'{self.url}{self.city}' response = requests.get(url, headers=self.headers) if response.status_code == 200: soup = BeautifulSoup(response.text, 'html.parser') a = soup.find('div', class_="pager").find_all('a') pagemax = a[-2].get_text() self.pagemax = int(pagemax) def get_urllist(self): for i in range(1, self.pagemax + 1): url = f'{self.url}{self.city}&page={i}' print(url) response = requests.get(url, headers=self.headers) time.sleep(2) if response.status_code == 200: soup = BeautifulSoup(response.text, 'html.parser') divs = soup.find_all('div', class_="sight_item_detail clrfix") for div in divs: detail = dict() name = div.find('a', class_="name").get_text() detail['name']=name address = div.find('p', class_="address color999").find('span').get_text() detail['address']=address try: price = div.find('span', class_="sight_item_price").find('em').get_text() detail['price']=price except: price="价格不详!" detail['price']=price self.data.append(detail) href = div.find('h3', class_='sight_item_caption').find('a')['href'] href = f'https://piao.qunar.com{href}' self.hrefs.append(href) print(self.hrefs) time.sleep(5) def save_data_to_model(self): for item in self.data: new_item = Qner() new_item.name = item['name'] new_item.address = item['address'] new_item.price = item['price'] new_item.save() if __name__ == "__main__": spider=Qnaer("北京") #spider.get_pagemax() spider.get_urllist() spider.save_data_to_model() #qunaer_list =Qner.objects.all().delete() 清空数据
切记在django运行调试py,一定要加载django,不然会报错!
import os import django os.environ.setdefault('DJANGO_SETTINGS_MODULE', '项目名称.settings') django.setup()
超级管理员账号/密码:admin/123456
app urls.py
from django.urls import path from . import views urlpatterns=[ path('',views.hello_world,), path('index/',views.index,) ]
关键html代码 index.html
{% extends "qunaer/base.html" %} {% block content %} <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>去哪儿景点信息</title> </head> <body> <h3>去哪儿景点信息</h3> <div> <div> <table class="table table-striped"> {% for qunaer in page_obj %} <tr> <td>景点名称:<a href="{{ qunaer.qner_id }}.html">{{ qunaer.name }}</a></td> <td>景点{{ qunaer.address }}</td> <td>景点价格:{{ qunaer.price }}</td> </tr> {% endfor %} </table> </div> <div> <center> {# 注释: 下面代码实现分页 #} {% if is_paginated %} <ul> {% if page_obj.has_previous %} <li><a href="?page={{ page_obj.previous_page_number }}">Previous</a></li> {% else %} <li class="page-item disabled"><span>Previous</span></li> {% endif %} {% for i in paginator.page_range %} {% if page_obj.number == i %} <li class="page-item active"><span> {{ i }} <span>(current)</span></span></li> {% else %} <li><a href="?page={{ i }}">{{ i }}</a></li> {% endif %} {% endfor %} {% if page_obj.has_next %} <li><a href="?page={{ page_obj.next_page_number }}">Next</a></li> {% else %} <li class="page-item disabled"><span>Next</span></li> {% endif %} </ul> {% endif %} </center> </div> </div> </body> </html> {% endblock %}
base.html
{% load static %} <html> <head> <!--<title>{% block title %}去哪儿景点信息{% endblock %} </title>--> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"> <style> ul li { display:inline; list-style-type:none; } </style> </head> <body> <!-- Page content of course! --> <main> <div> {% block content %} {% if error_message %}<p><strong>{{ error_message}}</strong></p>{% endif %} {% endblock %} </div> </main> <footer> {% block footer %}{% endblock %} </footer> <!--End of Footer--> <!-- Bootstrap core JavaScript ================================================== --> <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script> </body> </html>
page.html
{% extends "qunaer/base.html" %} {% block content %} <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>{{ qner.name }}</title> </head> <body> <h3>去哪儿景点信息:{{ qner.name }}</h3> <div> <table class="table table-striped"> <tr> <td>景点名称:{{ qner.name }}</td> <td>景点{{ qner.address }}</td> <td>景点价格:{{ qner.price }}</td> </tr> </table> </div> <div> <nav aria-label="..."> <ul> <li><a href="{{ previous_qner.qner_id }}.html">上一篇:{{ previous_qner.name }}</a></li> <li><a href="{{ next_qner.qner_id }}.html">下一篇:{{ next_qner.name }}</a></li> </ul> </nav> </div> <div> <ul> {% for qner in top8_qunaer_list %} <li><a href="{{ qner.qner_id }}.html">{{ qner.name }}</a></li> {% endfor %} </ul> </div> </div> </body> </html> {% endblock %}
最终实现效果: