python - How do I extract images from html files in a directory? -
this followup question: how parse every html file in directory images? essentially, have directory of html files each of contain images save separately in same directory.
after making suggested changes program, still getting error:
image: theme/pfeil_grau.gif traceback (most recent call last): file "c:\users\gokalraina\desktop\modfile.py", line 25, in <module> im = image.open(image) file "c:\python27\lib\site-packages\pil\image.py", line 1956, in open prefix = fp.read(16) typeerror: 'nonetype' object not callable this revised code (thanks nightcracker) using.
import os, os.path import image beautifulsoup import beautifulsoup bs path = 'c:\users\gokalraina\desktop\derm images' root, dirs, files in os.walk(path): f in files: soup = bs(open(os.path.join(root, f)).read()) image in soup.findall("img"): print "image: %(src)s" % image im = image.open(image) im.save(path+image["src"], "jpeg")
the code passing beautifulsoup.tag object image.open, image.open expecting path or file object. can relative path image image["src"], code be:
im = image.open(image["src"]) however, path same path written in html file, relative path starting html file's directory. if so, joining root image["src"] absolute path each image:
im = image.open(os.path.join(root, image["src"]))
Comments
Post a Comment