Re: Load large amount of data fast
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Load large amount of data fast
- From: Jerome Vuarand <jerome.vuarand@...>
- Date: 2010年10月17日 03:41:37 +0200
2010年10月17日 Alexander Gladysh <agladysh@gmail.com>:
> On Sun, Oct 17, 2010 at 05:06, Petite Abeille <petite.abeille@gmail.com> wrote:
>> On Oct 17, 2010, at 2:57 AM, Alexander Gladysh wrote:
>
>>> I take it that you suggest me to write my own Lua parser (or use a
>>> custom one)?
>
>> Hmmm... I doubt that your problem is the parsing itself... more of an overall design issue perhaps... but then again, not enough information about what you are trying to do with that data to offer any concrete help :)
>
> I'm trying to load that 3M entries in to Lua table in memory faster
> than I do it now.
>
> Other ways of solving my original task are out of the scope of this
> question. :-)
>
> Thanks,
> Alexander.
>
> P.S. I've accidentally killed my data crunching process and had to
> start over. :-(
>
> I've added some timing, here it is for the first 900K entries.
>
> at line 100000 : Sun Oct 17 04:47:10 2010
> at line 200000 : Sun Oct 17 04:48:29 2010
> at line 300000 : Sun Oct 17 04:50:18 2010
> at line 400000 : Sun Oct 17 04:53:02 2010
> at line 500000 : Sun Oct 17 04:55:52 2010
> at line 600000 : Sun Oct 17 04:58:55 2010
> at line 700000 : Sun Oct 17 05:01:26 2010
> at line 800000 : Sun Oct 17 05:07:00 2010
> at line 900000 : Sun Oct 17 05:10:18 2010
I generated a file with the following code :
local file = io.open("data", "wb")
for i=1,3*1000*1000 do
	file:write("{ foo"..i.." = 1; bar"..i.." = 2; baz"..i.." = 'text"..i.."' };\n")
end
file:close()
I get 3 million entries, around 180MB.
Then I parsed it with :
local file = io.open("data", "rb")
local content = file:read("*a")
local t = {}
local i,j = 0,0
for line in content:gmatch("[^\n]+") do
	t[#t+1] = assert(loadstring('return '..line))()
	i = i + 1
	if i >= 1000 then
		j = j + i
		i = 0
		io.write('\r'..j); io.flush()
	end
end
io.write('\n')
print("finished")
Raw data loading in memory (the period before first log) takes 10-15
seconds. Then the parsing goes quite fast, slowing down at around 2.9M
(probably due to some swap thrashing). And it prints "finished" within
a couple minutes, at which point the program do not exit immediately,
it continues to grow in memory usage for some time. Then memory
decreases, and after a total of 282 seconds it exits. In the end
that's quite a reasonable time compared to what you mentioned in your
first email. Did you try such a simple loading scheme in your
application ?