python - How do I extract images from html files in a directory? -

- June 15, 2011

this followup question: how parse every html file in directory images? essentially, have directory of html files each of contain images save separately in same directory.

after making suggested changes program, still getting error:

image: theme/pfeil_grau.gif  traceback (most recent call last): file "c:\users\gokalraina\desktop\modfile.py", line 25, in <module>   im = image.open(image) file "c:\python27\lib\site-packages\pil\image.py", line 1956, in open prefix = fp.read(16) typeerror: 'nonetype' object not callable

this revised code (thanks nightcracker) using.

 import os, os.path  import image  beautifulsoup import beautifulsoup bs    path = 'c:\users\gokalraina\desktop\derm images'   root, dirs, files in os.walk(path):     f in files:       soup = bs(open(os.path.join(root, f)).read())       image in soup.findall("img"):         print "image: %(src)s" % image         im = image.open(image)         im.save(path+image["src"], "jpeg")

the code passing beautifulsoup.tag object image.open, image.open expecting path or file object. can relative path image image["src"], code be:

im = image.open(image["src"])

however, path same path written in html file, relative path starting html file's directory. if so, joining root image["src"] absolute path each image:

im = image.open(os.path.join(root, image["src"]))

Search This Blog

OSX

python - How do I extract images from html files in a directory? -

Comments

Post a Comment

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

c# - Getting per connection bandwidth statistics -

security - SQL injection and web log files -