I have a large object graph in .NET (F# as it happens) that I need to persist to disk and then will load again periodically for use in a calculation.
The performance of deserializing is more important (will be performed many times) than the seriliazing (will only be performed once) should that have a bearing on the answer.
Currently, I am using FsPickler
and using their binary format. This is very convenient / easy-to-use but I am trying to get a handle on how much more performance I would get by customizing a serializer/deserializer...
One avenue I am considering is to persist and load from a small relational database (I have sqlite in mind). Should I expect this to be much faster?
Per request below, I have provided a slightly simplified version of the object graph that I am working on below:
CODE
type Value =
| Float of float
| String of string
| Bool of bool
[<Struct>]
type Address (i:int, j:int, k:int) =
member this.I = i
member this.J = j
member this.K = k
type Data = {
Target:Address
mutable SpecialIndex:int
mutable Parameters1 : Value []
mutable Parameters2 : Address []
Check1 : bool
Check2:bool
Parent: Address option
}
type Persisted =
{
Inputs : Address []
Outputs : Address []
Aliases : Dictionary<string, Address>
Mapping : Dictionary<string, int>
Masters: Dictionary<Address, Value[]>
BigCollection : Data [] }
The object that is persisted is an instance of
Persisted
.The large size is most likely to come about due to
Persisted.BigCollection
being in the order of 10m or more items in the array.
-
4Have you measured the time it takes and determined that there is a material performance problem?Robert Harvey– Robert Harvey2015年11月04日 15:07:40 +00:00Commented Nov 4, 2015 at 15:07
-
@RobertHarvey yes, I have profiled and the de-serialization time dominates the time taken to perform a calculation. My question above is an attempt to get guidance on where I should look should I choose to optimize and/or whether further material performance improvements should be expected.Sam– Sam2015年11月04日 15:11:41 +00:00Commented Nov 4, 2015 at 15:11
-
Can you trade off space for speed? Show us some of the object graph code looks like.Robert Harvey– Robert Harvey2015年11月04日 15:14:49 +00:00Commented Nov 4, 2015 at 15:14
-
@RobertHarvey yes - would be prepared to.Sam– Sam2015年11月04日 15:16:15 +00:00Commented Nov 4, 2015 at 15:16
-
1Depending on how much performance improvement you need, you might try a faster serializer like Protocol Buffers.Robert Harvey– Robert Harvey2015年11月04日 15:32:11 +00:00Commented Nov 4, 2015 at 15:32
1 Answer 1
One avenue I am considering is to persist and load from a small relational database (I have sqlite in mind). Should I expect this to be much faster?
No, you should not expect this. Though it is not completely impossible, to my experience using a relational database for deserializing an object graph is seldom quicker than deserializing from a file. To my experience, relational databases can only help to increase performance when you can play out their strengths like indexing capabilites or managing of external data which is too big to be loaded into memory at once.
I am trying to get a handle on how much more performance I would get by customizing a serializer/deserializer.
Whatever serializer/deserializer you use, the upper limit (and often the bottleneck) for performance is the I/O speed of your disk in "bytes per second". So look at the expected size in bytes of your serialized graph, divide it by the speed, and you will get a lower limit for the deserialization time. When the time your deserializer needs is near to that limit, the only resonable way to increase performance is to use a faster disk (like a modern SSD or something like that).
-
Thanks. Your second point makes a lotta sense! I have an SSD with "alleged" read speed of 728 mb/s. The example file I am testing on is 175mb in size and is taking 6.5s to deserialize...Sounds like there is room for improvement.Sam– Sam2015年11月04日 21:52:21 +00:00Commented Nov 4, 2015 at 21:52
-
1+1 for convincing me this question is answerable if interpreted the right way. I'd suggesting editing the off-topic-looking title to match this answer, since this clearly helped the OP, it's about to be closed (as the title, by itself, is obviously not a good question), and I don't feel I understand the question well enough to edit the title myself.Ixrec– Ixrec2015年11月07日 13:07:31 +00:00Commented Nov 7, 2015 at 13:07