Parsing Gigantic Log File in Python -

- September 15, 2013

i trying parse gigantic log file (around 5 gb).

i want parse first 500,000 lines , don't want read whole file memory.

basically, want below code doing while loop instead of for loop , if conditional. want sure not read entire file memory.

import re collections import defaultdict file = open('logs.txt', 'r') count_words=defaultdict(int) import pickle i=0 line in file.readlines():     if < 500000:         m = re.search('key=([^&]*)', line)         count_words[m.group(1)]+=1     i+=1  csv=[] k, v in count_words.iteritems():     csv.append(k+","+str(v)) print "\n".join(csv)

calling readlines() call entire file memory, you'll have read line line until reach line 500,000 or hit eof, whichever comes first. here's should instead:

i = 0 while < 500000:     line = file.readline()     if line == "": # cuts off if end of file reached         break     m = re.search('key=([^&]*)', line)     count_words[m.group(1)]+=1     += 1

Search This Blog

OSX

Parsing Gigantic Log File in Python -

Comments

Post a Comment

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

c# - Getting per connection bandwidth statistics -

security - SQL injection and web log files -