I am using MongoDB and the names of my fields are strings with 10-20 characters. A typical document consists of 30.000 columns filled mostly with floats, like 1.2, 10.5, 2.55. It's size is 1MB.
Do the long string field names affect the size of the MongoDB database ?
2 Answers 2
This is covered in the developer FAQ, some relevant excerpts:
MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space
And, just a note on indexes:
Shorter field names do not reduce the size of indexes, because indexes have a predefined structure
So, yes, reducing your field name size will make storage more efficient, though it will have no impact on index sizes. Whether the saving you will make is worth it (versus loss of descriptiveness) will be up to you. As an approximation, you will probably save something like 16 bytes per field if you drop to 2 character field names from 20 (for example) and that should mean that the document size will be reduced by more than 40% (~400k).
Here's an easy way to estimate this using the MongoDB shell:
$ ./mongo --nodb
MongoDB shell version: 3.0.2
> testObject = {"12345678901234567890" : 1.23324}
{ "12345678901234567890" : 1.23324 }
> Object.bsonsize(testObject)
35
> testObject = {"1" : 1.23324}
{ "1" : 1.23324 }
> Object.bsonsize(testObject)
16
> testObject = {"12" : 1.23324}
{ "12" : 1.23324 }
> Object.bsonsize(testObject)
17
The Object.bsonsize
method will give you an approximate size of any document in bytes but does not include padding, indexes etc. that would actually be used when storing a document in the database. Hence, these are all very approximate numbers - I would recommend testing with actual data to get a more definitive example.
-
1I found the topic is removed since docs.mongodb.com/v3.2/faq, but in docs.mongodb.com/manual/core/data-model-operations/….hiroshi– hiroshi2019年03月22日 06:06:14 +00:00Commented Mar 22, 2019 at 6:06
-
Seems like the internals of MongoDB are not published. So MongoDB doesn't abstract out the filednames into some structure and not re-store the same fieldnames over and over in each document (imagine if you had 100 million documents)? Or it literally stores the JSON exactly as your pass it?NealWalters– NealWalters2021年01月19日 23:16:44 +00:00Commented Jan 19, 2021 at 23:16
-
Is there a Command or Compass/Atlas screen that shows the size of an entire collection? I would like to load up two different collections, one with short and one with long names to see the difference. @AdamC Just found it docs.mongodb.com/manual/reference/method/…NealWalters– NealWalters2021年01月19日 23:19:25 +00:00Commented Jan 19, 2021 at 23:19
I performed a little benchmark, I uploaded 252 rows of data from an Excel into two collections testShortNames and testLongNames as follows:
Long Names:
{
"_id": ObjectId("6007a81ea42c4818e5408e9c"),
"countryNameMaster": "Andorra",
"countryCapitalNameMaster": "Andorra la Vella",
"areaInSquareKilometers": 468,
"countryPopulationNumber": NumberInt("77006"),
"continentAbbreviationCode": "EU",
"currencyNameMaster": "Euro"
}
Short Names:
{
"_id": ObjectId("6007a81fa42c4818e5408e9d"),
"name": "Andorra",
"capital": "Andorra la Vella",
"area": 468,
"pop": NumberInt("77006"),
"continent": "EU",
"currency": "Euro"
}
I then got the stats for each, saved in disk files, then did a "diff" on the two files:
pprint.pprint(db.command("collstats", dbCollectionNameLongNames))
The image below shows two variables of interest: size and storageSize. My reading showed that storageSize is the amount of disk space used after compression, and basically size is the uncompressed size. So we see the storageSize is identical. Apparently the Wired Tiger engine compresses fieldnames quite well. enter image description here
The question only asked about disk space. However, I then ran a program to retrieve all data from each collection, and checked the response time.
Even though it was a sub-second query, the long names consistently took about 7 times longer. It of course will take longer to send the longer names across from the database server to the client program.
-------LongNames-------
Server Start DateTime=2021年01月20日 08:44:38
Server End DateTime=2021年01月20日 08:44:39
StartTimeMs= 606964546 EndTimeM= 606965328
ElapsedTime MilliSeconds= 782
-------ShortNames-------
Server Start DateTime=2021年01月20日 08:44:39
Server End DateTime=2021年01月20日 08:44:39
StartTimeMs= 606965328 EndTimeM= 606965421
ElapsedTime MilliSeconds= 93
In Python, I just did the following (I had to actually loop through the items to force the reads, otherwise the query returns only the cursor):
results = dbCollectionLongNames.find(query)
for result in results:
pass
FloatOfQuiteALargeListOfStuff
but the datatype isfloat
will it use the space of the column name or the datatype for storing data. Answer: The datatype defines the storage space for the data.