从网站a数据库中读取的新闻内容(html源码格式)写入网站b的新闻表中,格式不统一,而且有很多冗余代码,很多是从office复制过去的,需要过滤掉网站a新闻内容中冗余的html代码。新闻内容在php的$news字段中,给这个字段用正则表达式处理一下。
比如
<font id=888 style="FONT-SIZE: 18px; FONT-FAMILY: FONT-SIZE: 18px"><P><STRONG><SPAN lang=EN-US style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"> 一级</SPAN><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">标题<SPAN lang=EN-US>粗体1</SPAN></SPAN></STRONG></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">新闻内容</SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2222.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG>一级标题粗体2</STRONG></SPAN></P> <BR><BR><P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN> </P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2233.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN></P> <P align=center><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为居中</SPAN></P><BR><P align=right><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为右对齐</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字是斜体</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><A title="" href="http://www.baidu.com/">加一个链接</A></SPAN></P></FONT>
要处理成
<p><strong>一级标题粗体</strong></p><p>新闻内容</p><p><img src="http://192.168.1.1/Webimage/2222.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p><span>立即学习</span>“<a href="https://pan.quark.cn/s/7fc7563c4182" style="max-width:90%" rel="nofollow" target="_blank">PHP免费学习笔记(深入)</a>”;</p><p>第二段新闻正文</p><p><strong>一级标题粗体2</strong></p><p><img src="http://192.168.1.1/Webimage/2233.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p style="max-width:90%">这段文字为居中</p><p style="text-align: right;">这段文字为右对齐</p><p><em>这段文字为斜体</em></p><p><a href="http://www.baidu.com">加一个链接</a></p>
具体的代码说明写了个网页,方便大神看
: http://www.sunmuu.com/help/editorHelp.html
后面是php连接查询的代码,方便测试,数据库mysql,表是editor,两个字段ID(INIT)和news(MEDIUMTEXT):
$mysql_db_hostname = "localhost";$mysql_db_user = "root";$mysql_db_password = "root";$mysql_db_database = "test";$con = mysqli_connect($mysql_db_hostname, $mysql_db_user, $mysql_db_password, $mysql_db_database);mysqli_query($con, "SET NAMES utf8");$sql="SELECT * FROM editor";$re=mysqli_query($con,$sql)or die("读取数据出错". mysqli_error());while($row=mysqli_fetch_array($re)){$str=$row["news"];echo $str;}
回复讨论(解决方案)
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;试试吧,其它不影响显示的你就自己去改吧
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;试试吧,其它不影响显示的你就自己去改吧
太感谢你了jam00!!
有一点,p标签里面的style align属性需要保留,style="text-align: center;"和style="text-align: left;",这个逻辑顺序是怎样的?可以给个提示吗?
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p style="max-width:90%">',$s);echo $s;
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p style="max-width:90%">',$s);echo $s;很给力,谢谢!











