python - How do I extract images from html files in a directory? -
this followup question: how parse every html file in directory images? essentially, have directory of html files each of contain images save separately in same directory.
after making suggested changes program, still getting error:
image: theme/pfeil_grau.gif traceback (most recent call last): file "c:\users\gokalraina\desktop\modfile.py", line 25, in <module> im = image.open(image) file "c:\python27\lib\site-packages\pil\image.py", line 1956, in open prefix = fp.read(16) typeerror: 'nonetype' object not callable
this revised code (thanks nightcracker) using.
import os, os.path import image beautifulsoup import beautifulsoup bs path = 'c:\users\gokalraina\desktop\derm images' root, dirs, files in os.walk(path): f in files: soup = bs(open(os.path.join(root, f)).read()) image in soup.findall("img"): print "image: %(src)s" % image im = image.open(image) im.save(path+image["src"], "jpeg")
the code passing beautifulsoup.tag
object image.open
, image.open
expecting path or file object. can relative path image image["src"]
, code be:
im = image.open(image["src"])
however, path same path written in html file, relative path starting html file's directory. if so, joining root
image["src"]
absolute path each image:
im = image.open(os.path.join(root, image["src"]))
Comments
Post a Comment