c# - HTMLAgillityPack Parsing -
i trying parse following data html document using htmlagillitypack:
<a href="http://abilene.craigslist.org/">abilene</a> <br> <a href="http://albany.craigslist.org/"><b>albany</b></a> <br> <a href="http://amarillo.craigslist.org/">amarillo</a> <br> ...
i parse out url , name of city 2 separate files.
example:
urls.txt
"http://abilene.craigslist.org/"
"http://albany.craigslist.org/"
"http://amarillo.craigslist.org/"
cities.txt
abilene
albany
amarillo
here have far:
public void parsehtml() { //clear text box textbox1.clear(); //managed wrapper around html document object model (dom). htmlagilitypack.htmldocument hdoc = new htmlagilitypack.htmldocument(); //load file hdoc.load(@"c:\allcities.html"); try { //execute input xpath query text box foreach (htmlnode hnode in hdoc.documentnode.selectnodes(xpathtext.text)) { textbox1.text += hnode.innerhtml + "\r\n"; } } catch (nullreferenceexception nre) { textbox1.text += "can't process xpath query, modify , try again."; } }
any appreciated! guys!
i want parse them craigslist.org?
here's how i'd it.
list<string> links = new list<string>(); list<string> names = new list<string>(); htmldocument doc = new htmldocument(); //load html doc.load(new webclient().openread("http://geo.craigslist.org/iso/us")); //get links in div id = 'list' have href-attribute htmlnodecollection linknodes = doc.documentnode.selectnodes("//div[@id='list']/a[@href]"); //or if have links saved somewhere //htmlnodecollection linknodes = doc.documentnode.selectnodes("//a[@href]"); if (linknodes != null) { foreach (htmlnode link in linknodes) { links.add(link.getattributevalue("href", "")); names.add(link.innertext);//get innertext don't html-tags } } //write both lists file file.writealltext("urls.txt", string.join(environment.newline, links.toarray())); file.writealltext("cities.txt", string.join(environment.newline, names.toarray()));
Comments
Post a Comment