python - How do I extract images from html files in a directory? -


this followup question: how parse every html file in directory images? essentially, have directory of html files each of contain images save separately in same directory.

after making suggested changes program, still getting error:

image: theme/pfeil_grau.gif  traceback (most recent call last): file "c:\users\gokalraina\desktop\modfile.py", line 25, in <module>   im = image.open(image) file "c:\python27\lib\site-packages\pil\image.py", line 1956, in open prefix = fp.read(16) typeerror: 'nonetype' object not callable 

this revised code (thanks nightcracker) using.

 import os, os.path  import image  beautifulsoup import beautifulsoup bs    path = 'c:\users\gokalraina\desktop\derm images'   root, dirs, files in os.walk(path):     f in files:       soup = bs(open(os.path.join(root, f)).read())       image in soup.findall("img"):         print "image: %(src)s" % image         im = image.open(image)         im.save(path+image["src"], "jpeg") 

the code passing beautifulsoup.tag object image.open, image.open expecting path or file object. can relative path image image["src"], code be:

im = image.open(image["src"]) 

however, path same path written in html file, relative path starting html file's directory. if so, joining root image["src"] absolute path each image:

im = image.open(os.path.join(root, image["src"])) 

Comments

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

objective c - Language Translation API for iPhone -

jasper reports - Fixed header in Excel using JasperReports -