知行编程网知行编程网  2022-04-19 02:00 知行编程网 隐藏边栏 |   抢沙发  7 
文章评分 0 次,平均分 0.0

“等疫情过去,等我回家,抱抱爸妈,拉着他们去河边散步,听他们唠叨,再也不还嘴。我爱你们,希望你们知道。


“去公园跑步高呼,太憋了,人都要发霉了。”

去见城南朋友,聊聊昨天失败的表白。

“回杭后,要见周先生。


疫情退去后的第一天,你最想做什么,最想见的那个人是谁?


以上内容,均来自“豆瓣”热门话题#冠状疫情退去后的第一天你打算做什么#

本文爬取了该话题下的短评数据,进行高频词统计和词云可视化,来分析大家在疫情之后,最想念谁,最想做什么?



01.

保存短评数据


通过浏览器“检查”分析,得到URL数据接口。在不断往下刷新页面的过程中,发现URL中只有“start”参数不断产生变化,依次为0,20,40,60,80---


同时,为了破解“豆瓣”的防爬虫机制,请求数据时需携带“请求头(headers)”中的“User-Agent”和“Referer”两个参数。


<p style="line-height: 18px;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 16px;margin-right: 16px;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;display: -webkit-box !important;"><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> requests</span><br  /><br  /><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> i <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> range(<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">0</span>,<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">200</span>,<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">20</span>):</span><br  /><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 通过浏览器检查,得到数据的URL来源链接</span></span><br  /><span style="letter-spacing: 0.5px;">    url = <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'https://m.douban.com/rexxar/api/v2/gallery/topic/125573/items?'</span> </span><br  /><span style="letter-spacing: 0.5px;">          <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'sort=new&start={}&count=20&status_full_text=1&guest_only=0&ck=null'</span>.format(i)</span><br  /><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 破解防爬虫,带上请求头</span></span><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 这两个不能省略</span></span><br  /><span style="letter-spacing: 0.5px;">    headers = {<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'User-Agent'</span>: <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0".3809.100 Safari/537.36'</span>,</span><br  /><span style="letter-spacing: 0.5px;">               <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'Referer'</span>: <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'https://www.douban.com/gallery/topic/125573/?from=gallery_trend&sort=hot'</span>}</span><br  /><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 发送请求,获取响应</span></span><br  /><span style="letter-spacing: 0.5px;">    reponse = requests.get(url, headers=headers)</span><br  /><span style="letter-spacing: 0.5px;">    html = reponse.json()</span><br  /><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 解析数据,获得短评</span></span><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 保存到本地</span></span><br  /><span style="letter-spacing: 0.5px;">    <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> j <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> range(<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">19</span>):</span><br  /><span style="letter-spacing: 0.5px;">        abs = html[<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'items'</span>][j][<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'abstract'</span>]</span><br  /><span style="letter-spacing: 0.5px;">        <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">with</span> open(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">"want_after.txt"</span>, <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">"a"</span>, encoding=<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'utf-8'</span>) <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">as</span> f:</span><br  /><span style="letter-spacing: 0.5px;">            f.write(abs)</span><br  /><span style="letter-spacing: 0.5px;">            print(abs)</span><br  /></p>


疫情退去后的第一天,你最想做什么,最想见的那个人是谁?



02.

词云可视化


把数据保存之后,需要利用“jieba”对数据进行分词;进而,通过分词后的数据绘制词云“wordcloud”,可视化展示数据。


<p style="line-height: 18px;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 16px;margin-right: 16px;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;display: -webkit-box !important;"><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> wordcloud <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> WordCloud</span><br  /><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> matplotlib.pyplot <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">as</span> plt</span><br  /><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> pandas <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">as</span> pd</span><br  /><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> jieba</span><br  /><br  /><br  /><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 获得wordcloud 需要的 文本格式</span><br  /><span style="letter-spacing: 0.5px;"><span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">with</span> open(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">"want_after.txt"</span>, <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">"r"</span>, encoding=<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'utf-8'</span>) <span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">as</span> f:</span><br  /><span style="letter-spacing: 0.5px;">     text = <span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">' '</span>.join(jieba.cut(f.read(),cut_all=<span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">False</span>))</span><br  /><span style="letter-spacing: 0.5px;">     <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># print(text)</span></span><br  /><br  /><span style="letter-spacing: 0.5px;">backgroud_Image = plt.imread(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'豆瓣.jpg'</span>)  <span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);overflow-wrap: inherit !important;word-break: inherit !important;"># 背景图</span></span><br  /><br  /><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 词云的一些参数设置</span><br  /><span style="letter-spacing: 0.5px;">wc = WordCloud(</span><br  /><span style="letter-spacing: 0.5px;">      background_color=<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'white'</span>,</span><br  /><span style="letter-spacing: 0.5px;">      mask=backgroud_Image,</span><br  /><span style="letter-spacing: 0.5px;">      font_path=<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'SourceHanSerifCN-Medium.otf'</span>,</span><br  /><span style="letter-spacing: 0.5px;">      max_words=<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">200</span>,</span><br  /><span style="letter-spacing: 0.5px;">      max_font_size=<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">200</span>,</span><br  /><span style="letter-spacing: 0.5px;">      min_font_size=<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">8</span>,</span><br  /><span style="letter-spacing: 0.5px;">      random_state=<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">50</span>,</span><br  /><span style="letter-spacing: 0.5px;">      )</span><br  /><br  /><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 生成词云</span><br  /><span style="letter-spacing: 0.5px;">word_cloud = wc.generate_from_text(text)</span><br  /><br  /><span style="letter-spacing: 0.5px;">plt.imshow(word_cloud)</span><br  /><span style="letter-spacing: 0.5px;">plt.axis(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'off'</span>)</span><br  /><br  /><span style="letter-spacing: 0.5px;">wc.to_file(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'结果.jpg'</span>)</span></p>


疫情退去后的第一天,你最想做什么,最想见的那个人是谁?

通过词云,可以直观的看到“吃火锅”、“电影”、“朋友”、“奶茶”、“拥抱”、“疫情”等高频的关键词。


这也代表了我们大多数人的心愿。



03.

高频词统计


<p style="line-height: 18px;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 16px;margin-right: 16px;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;display: -webkit-box !important;"><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 看看词频高的有哪些</span><br  /><span style="letter-spacing: 0.5px;">process_word = WordCloud.process_text(wc, text)</span><br  /><span style="letter-spacing: 0.5px;">sort = sorted(process_word.items(), key=<span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">lambda</span> e: e[<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">1</span>], reverse=<span style="font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">True</span>)</span><br  /><span style="letter-spacing: 0.5px;">sort_after = sort[:<span style="font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">50</span>]</span><br  /><span style="letter-spacing: 0.5px;">print(sort_after)</span><br  /><br  /><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 把数据存成csv文件</span><br  /><span style="letter-spacing: 0.5px;">df = pd.DataFrame(sort_after)</span><br  /><span style="font-size: inherit;line-height: inherit;color: rgb(128, 128, 128);letter-spacing: 0.5px;overflow-wrap: inherit !important;word-break: inherit !important;"># 保证不乱码</span><br  /><span style="letter-spacing: 0.5px;">df.to_csv(<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'sort_after.csv'</span>, encoding=<span style="font-size: inherit;line-height: inherit;color: rgb(238, 220, 112);overflow-wrap: inherit !important;word-break: inherit !important;">'utf_8_sig'</span>)</span><br  /></p>


疫情退去后的第一天,你最想做什么,最想见的那个人是谁?


面朝大海,春暖花开。


-END-


后台回复“阳光”

获取文中涉及的全部源码


近期热门:
高手心得|菜鸟学Python从入门到进阶
干货来了!菜鸟入门最经典的机器学习项目,面试必考!
400多人做过的8道Python极速入门题

疫情退去后的第一天,你最想做什么,最想见的那个人是谁?


 点击阅读原文,阅读菜鸟学Python 400篇干货!

本篇文章来源于: 菜鸟学Python

本文为原创文章,版权归所有,欢迎分享本文,转载请保留出处!

知行编程网
知行编程网 关注:1    粉丝:1
这个人很懒,什么都没写

发表评论

表情 格式 链接 私密 签到
扫一扫二维码分享