Parsing Gigantic Log File in Python -
i trying parse gigantic log file (around 5 gb).
i want parse first 500,000 lines , don't want read whole file memory.
basically, want below code doing while
loop instead of for
loop , if
conditional. want sure not read entire file memory.
import re collections import defaultdict file = open('logs.txt', 'r') count_words=defaultdict(int) import pickle i=0 line in file.readlines(): if < 500000: m = re.search('key=([^&]*)', line) count_words[m.group(1)]+=1 i+=1 csv=[] k, v in count_words.iteritems(): csv.append(k+","+str(v)) print "\n".join(csv)
calling readlines()
call entire file memory, you'll have read line line until reach line 500,000 or hit eof, whichever comes first. here's should instead:
i = 0 while < 500000: line = file.readline() if line == "": # cuts off if end of file reached break m = re.search('key=([^&]*)', line) count_words[m.group(1)]+=1 += 1
Comments
Post a Comment