perl - WWW::Mechanize ignores base href on gzipped content -


as title says www::mechanize not recognize

<base href="" />  

if page content iz gzipped. here example:

use strict; use warnings; use www::mechanize;  $url = 'http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html';  $mech = www::mechanize->new; $mech->get($url); print $mech->base()."\n";   # force plain text instead of gzipped content $mech->get($url, 'accept-encoding' => 'identity'); print $mech->base()."\n"; 

output:

http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html http://objectmix.com/    <--- correct ! 

am missing here? thanks

edit: tested directly lwp::useragent , works without problems:

use lwp::useragent;  $ua = lwp::useragent->new(); $res = $ua->get('http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html'); print $res->base()."\n"; 

output:

http://objectmix.com/  

this looks www::mechanize bug?

edit 2: lwp or http::response bug, not www::mechanize. lwp not request gzip default. if set

$ua->default_header('accept-encoding' => 'gzip'), 

in above example returns wrong base

edit 3: bug in lwp/useragent.pm in parse_head()

it calls html/headparser gzipped html , headparser has no idea it. lwp should gunzip content before calling parsing subroutine.

there bug report this: https://rt.cpan.org/public/bug/display.html?id=54361

conclusion: lwp missing "feature".

www::mechanize:

this solved overloading _make_request() in www::mechanize own pkg , re-seting http::response decoded_content or dirtier overwriting $mech->{base} parse base content.


Comments

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

objective c - Language Translation API for iPhone -

jasper reports - Fixed header in Excel using JasperReports -