c# - HTMLAgillityPack Parsing -

- May 15, 2015

i trying parse following data html document using htmlagillitypack:

<a href="http://abilene.craigslist.org/">abilene</a> <br> <a href="http://albany.craigslist.org/"><b>albany</b></a> <br> <a href="http://amarillo.craigslist.org/">amarillo</a> <br> ...

i parse out url , name of city 2 separate files.

example:

urls.txt
"http://abilene.craigslist.org/"
"http://albany.craigslist.org/"
"http://amarillo.craigslist.org/"

cities.txt
abilene
albany
amarillo

here have far:

        public void parsehtml()     {         //clear text box          textbox1.clear();          //managed wrapper around html document object model (dom).          htmlagilitypack.htmldocument hdoc = new htmlagilitypack.htmldocument();          //load file         hdoc.load(@"c:\allcities.html");           try         {             //execute input xpath query text box             foreach (htmlnode hnode in hdoc.documentnode.selectnodes(xpathtext.text))                 {                     textbox1.text += hnode.innerhtml + "\r\n";                 }          }         catch (nullreferenceexception nre)         {             textbox1.text += "can't process xpath query, modify , try again.";         }     }

any appreciated! guys!

i want parse them craigslist.org?
here's how i'd it.

list<string> links = new list<string>(); list<string> names = new list<string>(); htmldocument doc = new htmldocument(); //load html doc.load(new webclient().openread("http://geo.craigslist.org/iso/us")); //get links in div id = 'list' have href-attribute htmlnodecollection linknodes = doc.documentnode.selectnodes("//div[@id='list']/a[@href]"); //or if have links saved somewhere //htmlnodecollection linknodes = doc.documentnode.selectnodes("//a[@href]"); if (linknodes != null) {   foreach (htmlnode link in linknodes)   {     links.add(link.getattributevalue("href", ""));     names.add(link.innertext);//get innertext don't html-tags   } } //write both lists file file.writealltext("urls.txt", string.join(environment.newline, links.toarray())); file.writealltext("cities.txt", string.join(environment.newline, names.toarray()));

Search This Blog

OSX

c# - HTMLAgillityPack Parsing -

Comments

Post a Comment

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

c# - Getting per connection bandwidth statistics -

security - SQL injection and web log files -