URL Regular Expression with Perl -
i need normalise url before store in database using perl regular expressions.
here example urls:
however, whenever try below code, instead of removing //
after foo in foo//
, remove double slash in http://
. need keep //
in http://
, don’t need forward //
after foo//
. need rid of /../
or /./
can appear in url.
basically, this:
"http://www.codeme.com:123/../foo//bar.html"
should become this:
"http://www.codeme.com/foo/"
i new perl ignored , thought never need life has proven me wrong. therefore appreciate if can lead me right track.
sub main { $line; open(fh, "test.txt"); until(($line = <fh>) =~ /9/) { $line =~ tr/a-z/a-z/; $line =~ s|//|/| ; $line =~ s|\:\d\d\d|| ; $line =~ s|:80||; print $line; } close fh; }
use uri module. make life better , should included perl default.
use uri; $line; open(fh, "test.txt"); until(($line = <fh>) =~ /9/) { chomp($line); # gets rid of newline character $url = new uri($line); print $url->scheme,'://',$url->host,'/',$url->path; }
it should clean url pieces you.
also don't need sub main
. in perl it's implicit.
edit @spyroboy pointed out not normalize url you. still need normalize parts through means want normalization isn't clear.
Comments
Post a Comment