perl - WWW::Mechanize ignores base href on gzipped content -
as title says www::mechanize not recognize
<base href="" />
if page content iz gzipped. here example:
use strict; use warnings; use www::mechanize; $url = 'http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html'; $mech = www::mechanize->new; $mech->get($url); print $mech->base()."\n"; # force plain text instead of gzipped content $mech->get($url, 'accept-encoding' => 'identity'); print $mech->base()."\n";
output:
http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html http://objectmix.com/ <--- correct !
am missing here? thanks
edit: tested directly lwp::useragent , works without problems:
use lwp::useragent; $ua = lwp::useragent->new(); $res = $ua->get('http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html'); print $res->base()."\n";
output:
http://objectmix.com/
this looks www::mechanize bug?
edit 2: lwp or http::response bug, not www::mechanize. lwp not request gzip default. if set
$ua->default_header('accept-encoding' => 'gzip'),
in above example returns wrong base
edit 3: bug in lwp/useragent.pm in parse_head()
it calls html/headparser gzipped html , headparser has no idea it. lwp should gunzip content before calling parsing subroutine.
there bug report this: https://rt.cpan.org/public/bug/display.html?id=54361
conclusion: lwp missing "feature".
www::mechanize:
this solved overloading _make_request() in www::mechanize own pkg , re-seting http::response decoded_content or dirtier overwriting $mech->{base} parse base content.
Comments
Post a Comment