【项目实战】基于python+爬虫的电影数据分析及可视化系统

注意:该项目只展示部分功能,如需了解,文末咨询即可。

本文目录

在这里插入图片描述

1.开发环境

开发语言:Python
技术框架:Flask、爬虫
数据库:MySQL
开发工具:PyCharm

2 系统设计

2.1 设计背景

在数字时代,大数据技术在各行各业得到广泛应用,尤其在影视行业,通过分析大量的数据可以帮助人们更好地了解市场趋势和用户偏好。然而传统的数据分析方法难以处理如此庞大和复杂的数据集,且结果的展示不够直观,这就迫切需要开发一个既能高效处理大数据又能直观展示分析结果的系统。通过实现这一系统,不仅能够为用户提供直观的数据分析结果,帮助他们快速了解影视作品的各种统计信息,而且能够为影视行业提供决策支持,比如电影推荐和评分预测等功能,有助于提升用户体验和满意度。本系统的开发展示了大数据技术和可视化技术在影视数据分析领域的应用价值,对促进该领域的技术进步和创新发展具有重要意义。

2.2 设计内容

本课题旨在开发一个基于python的电影数据分析及可视化,该系统利用Scrapy爬虫技术从“豆瓣网站”爬取相关电影数据,然后通过Python和Spark技术进行数据处理和清理,最终存储到MySQL数据库中。系统的核心功能包括电影标签、电影星际、电影评分区间、年代、数量、词云等。所有这些功能都通过Echarts可视化框架以图形的方式展现,同时,系统还提供用户管理和登录功能,为管理员和用户提供不同的操作权限和数据展示。
1. 系统架构
系统采用分层架构设计,主要包括以下几个层次:
数据采集层: 使用Scrapy爬虫框架
数据处理层: 使用Python进行数据清洗和处理
数据存储层: 使用MySQL数据库
数据分析层: 使用Python进行数据分析
数据可视化层: 使用Echarts框架
应用层: 提供用户界面和交互功能
2. 主要功能模块
2.1 数据采集模块

使用Scrapy框架爬取豆瓣网站的电影数据
设计爬虫规则,提取电影标题、评分、标签、上映年份等信息
实现数据的增量更新和定期爬取
2.2 数据处理模块
使用Python和Spark进行数据清洗,去除重复和无效数据
数据格式化,统一数据格式
数据转换,将原始数据转换为适合分析的结构
2.3 数据存储模块
设计MySQL数据库schema,包括电影信息表、用户表等
实现数据的批量导入和更新
优化数据库查询性能
2.4 数据分析模块
实现电影标签分析功能
开发电影评分区间统计功能
实现电影年代分布分析
电影数量统计分析
词云分析功能
2.5 数据可视化模块
使用Echarts框架实现各种图表展示
包括柱状图、饼图、折线图、词云图等
实现图表的交互功能,如数据筛选、缩放等
2.6 用户管理模块
实现用户注册、登录功能
用户权限管理,区分管理员和普通用户
根据用户权限展示不同的数据和功能
3. 系统流程
数据采集: Scrapy爬虫定期从豆瓣网站爬取电影数据
数据处理: 使用PySpark进行数据清洗和转换
数据存储: 将处理后的数据存入MySQL数据库
数据分析: 使用Python进行各种统计分析
数据可视化: 使用Echarts将分析结果以图表形式展示
用户交互: 用户通过Web界面查看分析结果,管理员可进行系统管理

3 系统页面展示

3.1 用户页面

在这里插入图片描述
在这里插入图片描述

3.2 后台页面

在这里插入图片描述
在这里插入图片描述

3.3 功能展示视频

基于python+爬虫的豆瓣电影数据分析及可视化系统

4 更多推荐

计算机毕设选题精选汇总
基于Hadoop大数据电商平台用户行为分析与可视化系统
Django+Python数据分析岗位招聘信息爬取与分析
基于python爬虫的商城商品比价数据分析
基于Python的网络小说榜单信息爬取与数据可视化系统
基于Spark大数据的餐饮外卖数据分析可视化系统

5 部分功能代码

5.1 爬虫代码

# # -*- coding: utf-8 -*-# 数据爬取文件import scrapy
import pymysql
import pymssql
from ..items import DianyingxinxiItem
import time
from datetime import datetime,timedelta
import datetime as formattime
import re
import random
import platform
import json
import os
import urllib
from urllib.parse import urlparse
import requests
import emoji
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from selenium.webdriver import ChromeOptions, ActionChains
from scrapy.http import TextResponse
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
# 电影信息
class DianyingxinxiSpider(scrapy.Spider):name = 'dianyingxinxiSpider'spiderUrl = 'https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start={}&count=20&selected_categories=%7B%22%E7%B1%BB%E5%9E%8B%22:%22%E5%96%9C%E5%89%A7%22%7D&uncollect=false&tags=%E5%96%9C%E5%89%A7&ck=IsFz'start_urls = spiderUrl.split(";")protocol = ''hostname = ''realtime = Falseheaders = {'Referer':'https://movie.douban.com/explore',
'Cookie':'用你的Cookie'}def __init__(self,realtime=False,*args, **kwargs):super().__init__(*args, **kwargs)self.realtime = realtime=='true'def start_requests(self):plat = platform.system().lower()if not self.realtime and (plat == 'linux' or plat == 'windows'):connect = self.db_connect()cursor = connect.cursor()if self.table_exists(cursor, 'c2xklu0u_dianyingxinxi') == 1:cursor.close()connect.close()self.temp_data()returnpageNum = 1 + 1for url in self.start_urls:if '{}' in url:for page in range(1, pageNum):next_link = url.format(page)yield scrapy.Request(url=next_link,headers=self.headers,callback=self.parse)else:yield scrapy.Request(url=url,headers=self.headers,callback=self.parse)# 列表解析def parse(self, response):_url = urlparse(self.spiderUrl)self.protocol = _url.schemeself.hostname = _url.netlocplat = platform.system().lower()if not self.realtime and (plat == 'linux' or plat == 'windows'):connect = self.db_connect()cursor = connect.cursor()if self.table_exists(cursor, 'c2xklu0u_dianyingxinxi') == 1:cursor.close()connect.close()self.temp_data()returndata = json.loads(response.body)try:list = data["items"]except:passfor item in list:fields = DianyingxinxiItem()try:fields["title"] = str( item["title"])except:passtry:fields["year"] = int( item["year"])except:passtry:fields["picture"] = str( item["pic"]["normal"])except:passtry:fields["ypxx"] = str( item["card_subtitle"])except:passtry:fields["pingfen"] = float( item["rating"]["value"])except:passtry:fields["pjs"] = int( item["rating"]["count"])except:passtry:fields["wxs"] = float( item["rating"]["star_count"])except:passtry:fields["tags"] = str( item["tags"][0]["name"])except:passtry:fields["plnr"] = str( item["comment"]["comment"])except:passtry:fields["uname"] = str( item["comment"]["user"]["name"])except:passtry:fields["xqdz"] = str('https://movie.douban.com/subject/'+ item["id"])except:passyield fields# 详情解析def detail_parse(self, response):fields = response.meta['fields']return fields# 数据清洗def pandas_filter(self):engine = create_engine('mysql+pymysql://root:123456@localhost/spiderc2xklu0u?charset=UTF8MB4')df = pd.read_sql('select * from dianyingxinxi limit 50', con = engine)# 重复数据过滤df.duplicated()df.drop_duplicates()#空数据过滤df.isnull()df.dropna()# 填充空数据df.fillna(value = '暂无')# 异常值过滤# 滤出 大于800 和 小于 100 的a = np.random.randint(0, 1000, size = 200)cond = (a<=800) & (a>=100)a[cond]# 过滤正态分布的异常值b = np.random.randn(100000)# 3σ过滤异常值,σ即是标准差cond = np.abs(b) > 3 * 1b[cond]# 正态分布数据df2 = pd.DataFrame(data = np.random.randn(10000,3))# 3σ过滤异常值,σ即是标准差cond = (df2 > 3*df2.std()).any(axis = 1)# 不满?条件的?索引index = df2[cond].index# 根据?索引,进?数据删除df2.drop(labels=index,axis = 0)# 去除多余html标签def remove_html(self, html):if html == None:return ''pattern = re.compile(r'<[^>]+>', re.S)return pattern.sub('', html).strip()# 数据库连接def db_connect(self):type = self.settings.get('TYPE', 'mysql')host = self.settings.get('HOST', 'localhost')port = int(self.settings.get('PORT', 3306))user = self.settings.get('USER', 'root')password = self.settings.get('PASSWORD', '123456')try:database = self.databaseNameexcept:database = self.settings.get('DATABASE', '')if type == 'mysql':connect = pymysql.connect(host=host, port=port, db=database, user=user, passwd=password, charset='utf8')else:connect = pymssql.connect(host=host, user=user, password=password, database=database)return connect# 断表是否存在def table_exists(self, cursor, table_name):cursor.execute("show tables;")tables = [cursor.fetchall()]table_list = re.findall('('.*?')',str(tables))table_list = [re.sub("'",'',each) for each in table_list]if table_name in table_list:return 1else:return 0# 数据缓存源def temp_data(self):connect = self.db_connect()cursor = connect.cursor()sql = '''insert into `dianyingxinxi`(id,title,year,picture,ypxx,pingfen,pjs,wxs,tags,plnr,uname,xqdz)selectid,title,year,picture,ypxx,pingfen,pjs,wxs,tags,plnr,uname,xqdzfrom `c2xklu0u_dianyingxinxi`where(not exists (selectid,title,year,picture,ypxx,pingfen,pjs,wxs,tags,plnr,uname,xqdzfrom `dianyingxinxi` where`dianyingxinxi`.id=`c2xklu0u_dianyingxinxi`.id))order by rand()limit 50;'''cursor.execute(sql)connect.commit()connect.close()

5.2 电影信息代码

# coding:utf-8
__author__ = "ila"import logging, os, json, configparser
import time
from datetime import datetimefrom flask import request, jsonify,session
from sqlalchemy.sql import func,and_,or_,case
from sqlalchemy import cast, Integer,Float
from api.models.brush_model import *
from . import main_bp
from utils.codes import *
from utils.jwt_auth import Auth
from configs import configs
from utils.helper import *
import random
import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
from email.header import Header
from utils.baidubce_api import BaiDuBce
from api.models.config_model import configfrom flask import current_app as app
from utils.spark_func import spark_read_mysql
from utils.hdfs_func import upload_to_hdfs
from utils.mapreduce1 import MRMySQLAvg# 注册接口
@main_bp.route("/python9532dr50/dianyingxinxi/register", methods=['POST'])
def python9532dr50_dianyingxinxi_register():if request.method == 'POST':msg = {'code': normal_code, 'message': 'success', 'data': [{}]}req_dict = session.get("req_dict")error = dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = "注册用户已存在"return jsonify(msg)# 登录接口
@main_bp.route("/python9532dr50/dianyingxinxi/login", methods=['GET','POST'])
def python9532dr50_dianyingxinxi_login():if request.method == 'GET' or request.method == 'POST':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")req_model = session.get("req_dict")try:del req_model['role']except:passdatas = dianyingxinxi.getbyparams(dianyingxinxi, dianyingxinxi, req_model)if not datas:msg['code'] = password_error_codemsg['msg']='密码错误或用户不存在'return jsonify(msg)req_dict['id'] = datas[0].get('id')try:del req_dict['mima']except:passreturn Auth.authenticate(Auth, dianyingxinxi, req_dict)# 登出接口
@main_bp.route("/python9532dr50/dianyingxinxi/logout", methods=['POST'])
def python9532dr50_dianyingxinxi_logout():if request.method == 'POST':msg = {"msg": "退出成功","code": 0}req_dict = session.get("req_dict")return jsonify(msg)# 重置密码接口
@main_bp.route("/python9532dr50/dianyingxinxi/resetPass", methods=['POST'])
def python9532dr50_dianyingxinxi_resetpass():''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success"}req_dict = session.get("req_dict")if req_dict.get('mima') != None:req_dict['mima'] = '123456'error = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, req_dict)if error != None:msg['code'] = crud_error_codemsg['msg'] = errorelse:msg['msg'] = '密码已重置为:123456'return jsonify(msg)# 获取会话信息接口
@main_bp.route("/python9532dr50/dianyingxinxi/session", methods=['GET'])
def python9532dr50_dianyingxinxi_session():''''''if request.method == 'GET':msg = {"code": normal_code, "data": {}}req_dict={"id":session.get('params').get("id")}msg['data']  = dianyingxinxi.getbyparams(dianyingxinxi, dianyingxinxi, req_dict)[0]return jsonify(msg)# 分类接口(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/page", methods=['GET'])
def python9532dr50_dianyingxinxi_page():''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}req_dict = session.get("req_dict")userinfo = session.get("params")try:__hasMessage__=dianyingxinxi.__hasMessage__except:__hasMessage__=Noneif __hasMessage__ and __hasMessage__!="否":tablename=session.get("tablename")if tablename!="users" and session.get("params")!=None and dianyingxinxi!='chat':req_dict["userid"]=session.get("params").get("id")tablename=session.get("tablename")if tablename=="users" :try:passexcept:passelse:mapping_str_to_object = {}for model in Base_model._decl_class_registry.values():if hasattr(model, '__tablename__'):mapping_str_to_object[model.__tablename__] = modeltry:__isAdmin__=mapping_str_to_object[tablename].__isAdmin__except:__isAdmin__=Nonetry:__authSeparate__ =mapping_str_to_object[tablename].__authSeparate__except:__authSeparate__ = Noneif __isAdmin__!="是" and __authSeparate__ == "是" and session.get("params")!=None:req_dict["userid"]=session.get("params").get("id")else:try:del req_dict["userid"]except:passclause_args = []or_clauses = or_(*clause_args)msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict, or_clauses)return jsonify(msg)# 排序接口
@main_bp.route("/python9532dr50/dianyingxinxi/autoSort", methods=['GET'])
def python9532dr50_dianyingxinxi_autosort():''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}req_dict = session.get("req_dict")req_dict['sort']='clicktime'req_dict['order']='desc'try:__browseClick__= dianyingxinxi.__browseClick__except:__browseClick__=Noneif __browseClick__ =='是':req_dict['sort']='clicknum'elif __browseClick__ =='时长':req_dict['sort']='browseduration'else:req_dict['sort']='clicktime'msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict)return jsonify(msg)# 分页接口(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/list", methods=['GET'])
def python9532dr50_dianyingxinxi_list():''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}req_dict = session.get("req_dict")if req_dict.__contains__('vipread'):del req_dict['vipread']userinfo = session.get("params")try:__foreEndList__=dianyingxinxi.__foreEndList__except:__foreEndList__=Noneif __foreEndList__ and __foreEndList__!="否":tablename=session.get("tablename")if tablename!="users" and session.get("params")!=None:req_dict['userid']=session.get("params").get("id")try:__foreEndListAuth__=dianyingxinxi.__foreEndListAuth__except:__foreEndListAuth__=Noneif __foreEndListAuth__ and __foreEndListAuth__!="否":tablename=session.get("tablename")if tablename!="users" and session.get("params")!=None:req_dict['userid']=session.get("params").get("id")tablename=session.get("tablename")if tablename=="users" :try:del req_dict["userid"]except:passelse:mapping_str_to_object = {}for model in Base_model._decl_class_registry.values():if hasattr(model, '__tablename__'):mapping_str_to_object[model.__tablename__] = modeltry:__isAdmin__=mapping_str_to_object[tablename].__isAdmin__except:__isAdmin__=Noneif __isAdmin__!="是" and session.get("params")!=None:req_dict["userid"]=session.get("params").get("id")if 'luntan' in 'dianyingxinxi':if 'userid' in req_dict.keys():del req_dict["userid"]if 'discuss' in 'dianyingxinxi':if 'userid' in req_dict.keys():del req_dict["userid"]msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict)return jsonify(msg)# 保存接口(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/save", methods=['POST'])
def python9532dr50_dianyingxinxi_save():''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")for key in req_dict:if req_dict[key] == '':req_dict[key] = Noneerror= dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)# 添加接口(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/add", methods=['POST'])
def python9532dr50_dianyingxinxi_add():''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")try:__foreEndListAuth__=dianyingxinxi.__foreEndListAuth__except:__foreEndListAuth__=Noneif __foreEndListAuth__ and __foreEndListAuth__!="否":tablename=session.get("tablename")if tablename!="users":req_dict['userid']=session.get("params").get("id")error= dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)# 踩、赞接口
@main_bp.route("/python9532dr50/dianyingxinxi/thumbsup/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_thumbsup(id_):''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")id_=int(id_)type_=int(req_dict.get("type",0))rets=dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi,id_)update_dict={"id":id_,}if type_==1:#赞update_dict["thumbsupnum"]=int(rets[0].get('thumbsupnum'))+1elif type_==2:#踩update_dict["crazilynum"]=int(rets[0].get('crazilynum'))+1error = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, update_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)# 获取详情信息(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/info/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_info(id_):''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}data = dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))if len(data)>0:msg['data']=data[0]#浏览点击次数try:__browseClick__= dianyingxinxi.__browseClick__except:__browseClick__=Noneif __browseClick__  and  "clicknum"  in dianyingxinxi.__table__.columns:click_dict={"id":int(id_),"clicknum":str(int(data[0].get("clicknum") or 0)+1)}ret=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,click_dict)if ret!=None:msg['code'] = crud_error_codemsg['msg'] = retreturn jsonify(msg)# 获取详情信息(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/detail/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_detail(id_):''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}data = dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))if len(data)>0:msg['data']=data[0]#浏览点击次数try:__browseClick__= dianyingxinxi.__browseClick__except:__browseClick__=Noneif __browseClick__ and "clicknum" in dianyingxinxi.__table__.columns:click_dict={"id":int(id_),"clicknum":str(int(data[0].get("clicknum") or 0)+1)}ret=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,click_dict)if ret!=None:msg['code'] = crud_error_codemsg['msg'] = retreturn jsonify(msg)# 更新接口
@main_bp.route("/python9532dr50/dianyingxinxi/update", methods=['POST'])
def python9532dr50_dianyingxinxi_update():''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")if req_dict.get("mima") and "mima" not in dianyingxinxi.__table__.columns :del req_dict["mima"]if req_dict.get("password") and "password" not in dianyingxinxi.__table__.columns :del req_dict["password"]try:del req_dict["clicknum"]except:passerror = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, req_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)# 删除接口
@main_bp.route("/python9532dr50/dianyingxinxi/delete", methods=['POST'])
def python9532dr50_dianyingxinxi_delete():''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")error=dianyingxinxi.delete(dianyingxinxi,req_dict)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)# 投票接口
@main_bp.route("/python9532dr50/dianyingxinxi/vote/<int:id_>", methods=['POST'])
def python9532dr50_dianyingxinxi_vote(id_):''''''if request.method == 'POST':msg = {"code": normal_code, "msg": "success"}data= dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))for i in data:votenum=i.get('votenum')if votenum!=None:params={"id":int(id_),"votenum":votenum+1}error=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,params)if error!=None:msg['code'] = crud_error_codemsg['msg'] = errorreturn jsonify(msg)@main_bp.route("/python9532dr50/dianyingxinxi/sectionStat/pingfen", methods=['GET'])
def python9532dr50_dianyingxinxi_sectionStat_pingfen():'''分段统计接口'''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": []}where = " where 1=1"tablename=session.get("tablename")sql = """SELECT '四分以下' as pingfen,case when t.四分以下 is null then 0 else t.四分以下 end totalfrom (selectsum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '四分到六分' as pingfen,case when t.四分到六分 is null then 0 else t.四分到六分 end totalfrom (selectsum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '六分到八分' as pingfen,case when t.六分到八分 is null then 0 else t.六分到八分 end totalfrom (selectsum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '八到十分' as pingfen,case when t.八到十分 is null then 0 else t.八到十分 end totalfrom (selectsum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t  """data = db.session.execute(sql)data = data.fetchall()results = []for i in range(len(data)):result = {'pingfen': decimalEncoder(data[i][0]),'total': decimalEncoder(data[i][1])}results.append(result)msg['data'] = resultsreturn jsonify(msg)
@main_bp.route("/python9532dr50/dianyingxinxi/sectionStat/wxs", methods=['GET'])
def python9532dr50_dianyingxinxi_sectionStat_wxs():'''分段统计接口'''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": []}where = " where 1=1"tablename=session.get("tablename")sql = """SELECT '二分以下' as wxs,case when t.二分以下 is null then 0 else t.二分以下 end totalfrom (selectsum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t union all            SELECT '二到4分' as wxs,case when t.二到4分 is null then 0 else t.二到4分 end totalfrom (selectsum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t union all            SELECT '四分到五分' as wxs,case when t.四分到五分 is null then 0 else t.四分到五分 end totalfrom (selectsum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t  """data = db.session.execute(sql)data = data.fetchall()results = []for i in range(len(data)):result = {'wxs': decimalEncoder(data[i][0]),'total': decimalEncoder(data[i][1])}results.append(result)msg['data'] = resultsreturn jsonify(msg)# 分组统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/group/<columnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_group(columnName):'''分组统计接口'''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")userinfo = session.get("params")msg['data'] = dianyingxinxi.groupbycolumnname(dianyingxinxi,dianyingxinxi,columnName,req_dict)msg['data'] = msg['data'][:10]msg['data'] = [ {**i,columnName:str(i[columnName])} if columnName in i else i for i in msg['data']]json_filename='dianyingxinxi'+f'_group_{columnName}.json'where = ' where 1 = 1 'sql = "SELECT COUNT(*) AS total, " + columnName + " FROM dianyingxinxi " + where + " GROUP BY " + columnNamewith open(json_filename, 'w', encoding='utf-8') as f:f.write(json.dumps(msg['data'], indent=4, ensure_ascii=False))app.executor.submit(upload_to_hdfs, json_filename)app.executor.submit(MRMySQLAvg.run)return jsonify(msg)# 按值统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/value/<xColumnName>/<yColumnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_value(xColumnName, yColumnName):'''按值统计接口,{"code": 0,"data": [{"total": 10.0,"shangpinleibie": "aa"},{"total": 20.0,"shangpinleibie": "bb"},{"total": 15.0,"shangpinleibie": "cc"}]}'''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}req_dict = session.get("req_dict")userinfo = session.get("params")msg['data'] = dianyingxinxi.getvaluebyxycolumnname(dianyingxinxi,dianyingxinxi,xColumnName,yColumnName,req_dict)msg['data'] = msg['data'][:10]return jsonify(msg)# 按日期统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/value/<xColumnName>/<yColumnName>/<timeStatType>", methods=['GET'])
def python9532dr50_dianyingxinxi_value_riqi(xColumnName, yColumnName, timeStatType):'''按日期统计接口'''if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": {}}userinfo = session.get("params")where = ' where 1 = 1 'sql = ''if timeStatType == '日':sql = "SELECT DATE_FORMAT({0}, '%Y-%m-%d') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m-%d')".format(xColumnName, yColumnName, where, '%Y-%m-%d')if timeStatType == '月':sql = "SELECT DATE_FORMAT({0}, '%Y-%m') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m')".format(xColumnName, yColumnName, where, '%Y-%m')if timeStatType == '年':sql = "SELECT DATE_FORMAT({0}, '%Y') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y')".format(xColumnName, yColumnName, where, '%Y')data = db.session.execute(sql)data = data.fetchall()results = []for i in range(len(data)):result = {xColumnName: decimalEncoder(data[i][0]),'total': decimalEncoder(data[i][1])}results.append(result)msg['data'] = resultsjson_filename='dianyingxinxi'+f'_value_{xColumnName}_{yColumnName}.json'with open(json_filename, 'w', encoding='utf-8') as f:f.write(json.dumps(results, indent=4, ensure_ascii=False))app.executor.submit(upload_to_hdfs, json_filename)app.executor.submit(MRMySQLAvg.run)return jsonify(msg)# 按值统计(多)
@main_bp.route("/python9532dr50/dianyingxinxi/valueMul/<xColumnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_valueMul(xColumnName):if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": []}req_dict = session.get("req_dict")userinfo = session.get("params")where = ' where 1 = 1 'for item in req_dict['yColumnNameMul'].split(','):sql = "SELECT {0}, sum({1}) AS total FROM dianyingxinxi {2} GROUP BY {0} LIMIT 10".format(xColumnName, item, where)L = []data = db.session.execute(sql)data = data.fetchall() for i in range(len(data)):result = {xColumnName: decimalEncoder(data[i][0]),'total': decimalEncoder(data[i][1])}L.append(result)msg['data'].append(L)return jsonify(msg)# 按值统计(多)
@main_bp.route("/python9532dr50/dianyingxinxi/valueMul/<xColumnName>/<timeStatType>", methods=['GET'])
def python9532dr50_dianyingxinxi_valueMul_time(xColumnName):if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": []}req_dict = session.get("req_dict")userinfo = session.get("params")timeStatType = req_dict['timeStatType']where = ' where 1 = 1 'for item in req_dict['yColumnNameMul'].split(','):sql = ''if timeStatType == '日':sql = "SELECT DATE_FORMAT({0}, '%Y-%m-%d') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m-%d') LIMIT 10".format(xColumnName, item, where, '%Y-%m-%d')if timeStatType == '月':sql = "SELECT DATE_FORMAT({0}, '%Y-%m') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m') LIMIT 10".format(xColumnName, item, where, '%Y-%m')if timeStatType == '年':sql = "SELECT DATE_FORMAT({0}, '%Y') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y') LIMIT 10".format(xColumnName, item, where, '%Y')L = []data = db.session.execute(sql)data = data.fetchall() for i in range(len(data)):result = {xColumnName: decimalEncoder(data[i][0]),'total': decimalEncoder(data[i][1])}L.append(result)msg['data'].append(L)return jsonify(msg)import math
def cosine_similarity(a, b):numerator = sum([a[key] * b[key] for key in a if key in b])denominator = math.sqrt(sum([a[key]**2 for key in a])) * math.sqrt(sum([b[key]**2 for key in b]))return numerator / denominator#收藏协同算法
@main_bp.route("/python9532dr50/dianyingxinxi/autoSort2", methods=['GET'])
def python9532dr50_dianyingxinxi_autoSort2():if request.method == 'GET':user_ratings = {}req_dict = session.get("req_dict")userinfo = session.get("params")sql = "select * from storeup where type = 1 and tablename = 'dianyingxinxi' order by addtime desc"data = db.session.execute(sql)data_dict = [dict(zip(result.keys(), result)) for result in data.fetchall()]for item in data_dict:if user_ratings.__contains__(item["userid"]):ratings_dict = user_ratings[item["userid"]]if ratings_dict.__contains__(item["refid"]):ratings_dict[str(item["refid"])]+=1else:ratings_dict[str(item["refid"])] =1else:user_ratings[item["userid"]] = {str(item["refid"]):1}sorted_recommended_goods=[]try:# 计算目标用户与其他用户的相似度similarities = {other_user: cosine_similarity(user_ratings[userinfo.get("id")], user_ratings[other_user])for other_user in user_ratings if other_user != userinfo.get("id")}# 找到与目标用户最相似的用户most_similar_user = sorted(similarities, key=similarities.get, reverse=True)[0]# 找到最相似但目标用户未购买过的商品recommended_goods = {goods: rating for goods, rating in user_ratings[most_similar_user].items() ifgoods not in user_ratings[userinfo.get("id")]}# 按评分降序排列推荐sorted_recommended_goods = sorted(recommended_goods, key=recommended_goods.get, reverse=True)except:passL = []where = " AND ".join([f"{key} = '{value}'" for key, value in req_dict.items() if key!="page" and key!="limit" and key!="order"and key!="sort"])if where:sql = f'''SELECT * FROM (SELECT * FROM dianyingxinxi WHERE {where}) AS table1 WHERE id IN ('{"','".join(sorted_recommended_goods)}') union all SELECT * FROM (SELECT * FROM dianyingxinxi WHERE {where}) AS table1 WHERE id NOT IN ('{"','".join(sorted_recommended_goods)}')'''else:sql ="select * from dianyingxinxi where id in ('%s"%("','").join(sorted_recommended_goods)+"') union all select * from dianyingxinxi where id not in('%s"%("','").join(sorted_recommended_goods)+"')"data = db.session.execute(sql)data_dict = [dict(zip(result.keys(), result)) for result in data.fetchall()]for online_dict in data_dict:for key in online_dict:if 'datetime.datetime' in str(type(online_dict[key])):online_dict[key] = online_dict[key].strftime("%Y-%m-%d %H:%M:%S")elif 'datetime' in str(type(online_dict[key])):online_dict[key] = online_dict[key].strftime("%Y-%m-%d %H:%M:%S")else:passL.append(online_dict)return jsonify({"code": 0, "msg": '',  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":5,"list": L[0:int(req_dict['limit'])]}})# 总数量
@main_bp.route("/python9532dr50/dianyingxinxi/count", methods=['GET'])
def python9532dr50_dianyingxinxi_count():''''''if request.method == 'GET':msg = {"code": normal_code, "msg": "success",  "data": 0}req_dict = session.get("req_dict")userinfo = session.get("params")msg['data']  = dianyingxinxi.count(dianyingxinxi, dianyingxinxi, req_dict)return jsonify(msg)# 统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/remind/<columnName>/<type>", methods=['GET'])  #
def python9532dr50_dianyingxinxi_remind(columnName,type):''''''if request.method == 'GET':msg = {"code": normal_code, 'count': 0}# 组合查询参数params = session.get("req_dict")remindstart = 0remindend =9999990if int(type)==1:#数字if params.get('remindstart') == None and params.get('remindend') != None:remindstart = 0remindend = int(params['remindend'])elif params.get('remindstart') != None and params.get('remindend') == None:remindstart = int(params['remindstart'])remindend = 999999elif params.get('remindstart') == None and params.get('remindend') == None:remindstart = 0remindend = 999999else:remindstart = params.get('remindstart')remindend =  params.get('remindend')elif int(type)==2:#日期current_time=int(time.time())if params.get('remindstart') == None and params.get('remindend') != None:starttime=current_time-60*60*24*365*2params['remindstart'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))endtime=current_time+60*60*24*params.get('remindend')params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))elif params.get('remindstart') != None and params.get('remindend') == None:starttime= current_time - 60 * 60 * 24 * params.get('remindstart')params['remindstart']=time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))endtime=current_time+60*60*24*365*2params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))elif params.get('remindstart') == None and params.get('remindend') == None:starttime = current_time - 60 * 60 * 24 * 365 * 2params['remindstart'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))endtime = current_time + 60 * 60 * 24 * 365 * 2params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))data = dianyingxinxi.getbetweenparams(dianyingxinxi,dianyingxinxi,columnName,{"remindStart": remindstart,"remindEnd": remindend})msg['count'] = len(data)return jsonify(msg)#分类列表
@main_bp.route("/python9532dr50/dianyingxinxi/lists", methods=['GET'])
def python9532dr50_dianyingxinxi_lists():if request.method == 'GET':msg = {"code": normal_code, "msg": "success", "data": []}list,_,_,_,_ = dianyingxinxi.page(dianyingxinxi,dianyingxinxi,{})msg['data'] = listreturn jsonify(msg)

源码项目、定制开发、文档报告、PPT、代码答疑
希望和大家多多交流!!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/487217.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

生活大爆炸版石头剪刀布(洛谷P1328)

生活大爆炸版石头剪刀布(洛谷P1328) [NOIP2014 提高组] 前言&#xff1a; 由于洛谷发布题解有限制&#xff0c;所以在CSDN上发布洛谷题解。 所有题解均是Java语言, 但是思路是相同的 每篇都是刷题日常&#xff0c;尽量讲清楚算法逻辑。 希望有问题还请大佬们指导&#xff01; …

Linux安装Python2.7.5(centos自带同款)

卸载已安装的python,防止版本兼容问题 rpm -qa|grep python|xargs rpm -ev --allmatches --nodeps 删除残余文件 whereis python |xargs rm -frv 安装前提是已安装gcc和g gcc --version g --version 下载安装python2.7.5 https://www.python.org/downloads/release/pyt…

【CKA】Kubernetes(k8s)认证之CKA考题讲解

CKA考题讲解 0.考试101 0.1 kubectl命令⾃动补全 在 bash 中设置当前 shell 的⾃动补全&#xff0c;要先安装 bash-completion 包。 echo "source <(kubectl completion bash)" >> ~/.bashrc还可以在补全时为 kubectl 使⽤⼀个速记别名&#xff1a; al…

微信小程序uni-app+vue3实现局部上下拉刷新和scroll-view动态高度计算

微信小程序uni-appvue3实现局部上下拉刷新和scroll-view动态高度计算 前言 在uni-appvue3项目开发中,经常需要实现列表的局部上下拉刷新功能。由于网上相关教程较少且比较零散,本文将详细介绍如何使用scroll-view组件实现这一功能,包括动态高度计算、下拉刷新、上拉加载等完整…

PHP语法学习(第六天)-条件语句,关联数组

&#x1f4a1;依照惯例&#xff0c;回顾一下昨天讲的内容 PHP语法学习(第五天)主要讲了PHP中的常量和运算符的运用。 &#x1f525; 想要学习更多PHP语法相关内容点击“PHP专栏” 今天给大家讲课的角色是&#x1f34d;菠萝吹雪&#xff0c;“我菠萝吹雪吹的不是雪&#xff0c;而…

java中的数组(2)

大家好&#xff0c;我们今天继续来看java中数组这方面的知识点&#xff0c;那么话不多说&#xff0c;我们直接开始。 一.数组的使用 1.数组中元素访问 数组在内存中是一段连续的空间,空间的编号都是从0开始的,依次递增,数组可以通过下标访问其任意位置的元素. 也可以进行修改…

数据结构 (36)各种排序方法的综合比较

一、常见排序方法分类 插入排序类 直接插入排序&#xff1a;通过构建有序序列&#xff0c;对于未排序数据&#xff0c;在已排序序列中从后向前扫描&#xff0c;找到相应位置并插入。希尔排序&#xff1a;是插入排序的一种改进版本&#xff0c;先将整个待排序的记录序列分割成为…

计算机网络安全 —— 实体鉴别与生成大随机数

一、实体鉴别# ​ 实体鉴别&#xff08;经常简称为鉴别&#xff09;就是一方验证另一方身份的技术。一个实体可以是人、客户/服务器进程等。这里仅讨论如何鉴别通信对端 实体的身份&#xff0c;即验证正在通信的对方确实是所认为的通信实体&#xff0c;而不是其他的假冒者。进…

【SpringBug】lombok插件中@Data不能生成get和set方法

一&#xff1a;问题引入 可以看到我们在类UserInfo上写了Data注解&#xff0c;但是在测试文件中生成的反编译target文件Us二Info中没有get和set方法 二&#xff1a;解决方法 1&#xff1a;Spring升级问题&#xff08;解决了我的问题&#xff09; 原因是Spring官方进行了升级…

Unity 基于Collider 组件在3D 物体表面放置3D 物体

实现 从鼠标点击的屏幕位置发送射线&#xff0c;以射线监测点击到的物体&#xff0c;根据点击物体的法线向量调整放置物体的位置及朝向。 Ray ray Camera.main.ScreenPointToRay(Input.mousePosition); if (Physics.Raycast(ray, out RaycastHit hit, 100)) {obj.transform.…

宝塔内设置redis后,项目以及RedisDesktopManager客户端连接不上!

项目展现问题&#xff1a; Unable to connect to Redis; nested exception is io.lettuce.core.RedisConnectionException: Unable to connect to xxx.宝塔外链.ip.xxxx:6379 redis客户端连接失败&#xff1a; 1、宝塔中确认redis端口已放行 2、修改redis的配置 bind&#x…

2024 年 11 月区块链游戏研报:牛市加持下的 GameFi 破局之路

2024 年 11 月区块链游戏研报 作者&#xff1a;Stella L (stellafootprint.network) 数据来源&#xff1a;Footprint Analytics 区块链游戏 Research 页面 2024 年 11 月 Web3 游戏行业市场增长显著但大规模采用策略仍在演进。随着比特币创下历史新高并接近 10 万美元里程碑…

[软件工程]十.可靠性工程(reliable engineering)

1.什么是可靠性工程 我们希望软件在给定的时间内&#xff0c;运行的时候不会崩溃或者发生失效&#xff0c;同时能保护我们的数据和个人信息。我们要能够信任我们所使用的软件&#xff0c;这意味着软件必须是可靠的。可靠性&#xff08;reliability&#xff09;&#xff1a;系统…

学生信息管理系统(简化版)

前端部分&#xff08;vue2&#xff09; &#xff01;&#xff01;前端采用vue2框架&#xff0c;下面只写出必要的代码文件&#xff0c;想要使用需自行先创建vue项目 部分截图 下面是目录结构 下面是public文件夹里面的html文件 <!DOCTYPE html> <html lang"&q…

Facebook广告文案流量秘诀

Facebook 广告文案是制作有效 Facebook 广告的关键方面。它侧重于伴随广告视觉元素的文本内容。今天我们的博客将深入探讨成功的 Facebook 广告文案的秘密&#xff01; 一、广告文案怎么写&#xff1f; 正文&#xff1a;这是帖子的正文&#xff0c;出现在您姓名的正下方。它可…

VBA高级应用30例应用在Excel中的ListObject对象:向表中添加注释

《VBA高级应用30例》&#xff08;版权10178985&#xff09;&#xff0c;是我推出的第十套教程&#xff0c;教程是专门针对高级学员在学习VBA过程中提高路途上的案例展开&#xff0c;这套教程案例与理论结合&#xff0c;紧贴“实战”&#xff0c;并做“战术总结”&#xff0c;以…

create-react-app react19 搭建项目报错

报错截图 此时运行会报错&#xff1a; 解决方法&#xff1a; 1.根据提示安装依赖法 执行npm i web-vitals然后重新允许 2.删除文件法 在index.js中删除对报错文件的引入&#xff0c;删除报错文件

【MySQL 进阶之路】事务并发情况分析

MySQL事务并发控制分析笔记 在数据库系统中&#xff0c;事务并发控制至关重要&#xff0c;能够确保多个事务并发执行时的数据一致性、隔离性和正确性。MySQL通过不同的锁机制控制并发操作&#xff0c;以确保事务的隔离性。以下是对事务A和事务B并发行为的详细分析&#xff0c;…

NanoLog起步笔记-6-StaticLogInfo

nonolog起步笔记-6-StaticLogInfo StaticLogInfo文件名和行号文件名和行号的传入log参数 RuntimeLogger::registerInvocationSitelogid为什么只能被赋一次值 reserveAlloc加入消息头finishAlloc返回 StaticLogInfo 写C语言编译前端时&#xff0c;给我印象深刻的一部分是&#…

不是“我应该做什么”,而是“我想做什么”

1. 识别内心的渴望 首先&#xff0c;我们需要识别自己真正的愿望和激情所在。这可能需要一些时间和自我反思。问自己&#xff1a;在没有任何外界压力的情况下&#xff0c;我真正想做的是什么&#xff1f;是赚钱、生活、旅行、追星&#xff0c;还是其他什么&#xff1f;识别这些…