URL Regular Expression with Perl -
i need normalise url before store in database using perl regular expressions.
here example urls:
however, whenever try below code, instead of removing // after foo in foo//, remove double slash in http://. need keep // in http://, don’t need forward // after foo//. need rid of /../ or /./ can appear in url.
basically, this:
"http://www.codeme.com:123/../foo//bar.html" should become this:
"http://www.codeme.com/foo/" i new perl ignored , thought never need life has proven me wrong. therefore appreciate if can lead me right track.
sub main { $line; open(fh, "test.txt"); until(($line = <fh>) =~ /9/) { $line =~ tr/a-z/a-z/; $line =~ s|//|/| ; $line =~ s|\:\d\d\d|| ; $line =~ s|:80||; print $line; } close fh; }
use uri module. make life better , should included perl default.
use uri; $line; open(fh, "test.txt"); until(($line = <fh>) =~ /9/) { chomp($line); # gets rid of newline character $url = new uri($line); print $url->scheme,'://',$url->host,'/',$url->path; } it should clean url pieces you.
also don't need sub main. in perl it's implicit.
edit @spyroboy pointed out not normalize url you. still need normalize parts through means want normalization isn't clear.
Comments
Post a Comment