URL Regular Expression with Perl -


i need normalise url before store in database using perl regular expressions.

here example urls:

however, whenever try below code, instead of removing // after foo in foo//, remove double slash in http://. need keep // in http://, don’t need forward // after foo//. need rid of /../ or /./ can appear in url.

basically, this:

"http://www.codeme.com:123/../foo//bar.html" 

should become this:

"http://www.codeme.com/foo/" 

i new perl ignored , thought never need life has proven me wrong. therefore appreciate if can lead me right track.

sub main {         $line;           open(fh, "test.txt");          until(($line = <fh>) =~ /9/) {             $line =~ tr/a-z/a-z/;             $line =~  s|//|/| ;             $line =~  s|\:\d\d\d|| ;              $line =~  s|:80||;                 print $line;            }          close fh; } 

use uri module. make life better , should included perl default.

http://metacpan.org/pod/uri

use uri;  $line;   open(fh, "test.txt");  until(($line = <fh>) =~ /9/) {       chomp($line); # gets rid of newline character      $url = new uri($line);      print $url->scheme,'://',$url->host,'/',$url->path; } 

it should clean url pieces you.

also don't need sub main. in perl it's implicit.

edit @spyroboy pointed out not normalize url you. still need normalize parts through means want normalization isn't clear.


Comments

Popular posts from this blog

delphi - How to convert bitmaps to video? -

jasper reports - Fixed header in Excel using JasperReports -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -