python去除html标签的方法:1、“pattern.sub('',html)”方法;2、“beautifulsoup(html,'html.parser')”方法;3、“response.xpath('string(.)')”方法。

本文操作环境:windows7系统、python3.6.4版,DELL G3电脑。
python去除html标签的几种方法
import re
from bs4 import BeautifulSoup
from lxml import etree
html = '<p>你好</p><div class="aritcle_card flexRow">
<div class="artcardd flexRow">
<a class="aritcle_card_img" href="/ai/1939" title="Post AI"><img
src="https://img.php.cn/upload/ai_manual/001/246/273/68b6d442aaa43694.png" alt="Post AI" onerror="this.onerror='';this.src='/static/lhimages/moren/morentu.png'" ></a>
<div class="aritcle_card_info flexColumn">
<a href="/ai/1939" title="Post AI">Post AI</a>
<p>博客文章AI生成器</p>
</div>
<a href="/ai/1939" title="Post AI" class="aritcle_card_btn flexRow flexcenter"><b></b><span>下载</span> </a>
</div>
</div><br/><font>哈哈</font><b>大家好</b>'
# 方法一
pattern = re.compile(r'<[^>]+>',re.S)
result = pattern.sub('', html)
print(result)
<br># 方法二
soup = BeautifulSoup(html,'html.parser')
print(soup.get_text())
# 方法三
response = etree.HTML(text=html)
# print(dir(response))
print(response.xpath('string(.)'))
# 你好哈哈大家好
# 你好哈哈大家好
# 你好哈哈大家好【推荐:python视频教程】
立即学习“Python免费学习笔记(深入)”;










