c# - Regular expression to remove link from image in html -
what c# / regex syntax remove link first image in body of text like:
text <a href="..." class="..."><img src="..." class="..." width="..." /></a> more text <a href="..." class="..."><img src="..." class="..." width="..." /></a> more text
so final result be:
text <img src="..." class="..." width="..." /> more text <a href="..." class="..."><img src="..." class="..." width="..." /></a> more text
any advice appreciated! in advance.
using html agility pack (project page, nuget), trick:
htmldocument doc = new htmldocument(); doc.loadhtml("text <a href=\"...\" class=\"...\"><img src=\"...\" class=\"...\" width=\"...\" /></a> more text" +" <a href=\"...\" class=\"...\"><img src=\"...\" class=\"...\" width=\"...\" /></a> more text\""); var firstimage = doc.documentnode.descendants("img").where(node => node.parentnode.name == "a").firstordefault(); if (firstimage != null) { var anode = firstimage.parentnode; anode.removechild(firstimage); anode.parentnode.replacechild(firstimage, anode); } var fixedtext = doc.documentnode.outerhtml; //doc.save(/* stream */);
i find lot easier on eyes, states trying accomplish.
- find first img inside tag
- store img temporarily
- remove swap img , tag
- save results.
Comments
Post a Comment