9

I am using MongoDB and the names of my fields are strings with 10-20 characters. A typical document consists of 30.000 columns filled mostly with floats, like 1.2, 10.5, 2.55. It's size is 1MB.

Do the long string field names affect the size of the MongoDB database ?

asked Oct 11, 2015 at 18:24
6
  • I am afraid that your question is unclear to me, but I think you are asking (as an example) for something like: If my column name is FloatOfQuiteALargeListOfStuff but the datatype is float will it use the space of the column name or the datatype for storing data. Answer: The datatype defines the storage space for the data. Commented Oct 11, 2015 at 21:11
  • I have read in an answer in Stack Overflow that each document stores the fields names too. So, I thought by using shorter strings as fields names would decrease the size of the document. Is that correct ? Commented Oct 11, 2015 at 21:20
  • No. The column name is metadata, It appears once in a table definition and not at all in the actual data. So one column name of 30 characters would be almost invisible when you have 10,000 rows for that value that are 8 bytes long. Commented Oct 11, 2015 at 22:02
  • Ok, I understand. Is 1MB for 30.000 fields with floats a reasonable document size ? Commented Oct 11, 2015 at 23:39
  • 4
    @RLF - that may be true for relational databases, but there is no table definition in MongoDB, the fields are stored for every document Commented Oct 13, 2015 at 14:13

2 Answers 2

9

This is covered in the developer FAQ, some relevant excerpts:

MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space

And, just a note on indexes:

Shorter field names do not reduce the size of indexes, because indexes have a predefined structure

So, yes, reducing your field name size will make storage more efficient, though it will have no impact on index sizes. Whether the saving you will make is worth it (versus loss of descriptiveness) will be up to you. As an approximation, you will probably save something like 16 bytes per field if you drop to 2 character field names from 20 (for example) and that should mean that the document size will be reduced by more than 40% (~400k).

Here's an easy way to estimate this using the MongoDB shell:

$ ./mongo --nodb
MongoDB shell version: 3.0.2
> testObject = {"12345678901234567890" : 1.23324}
{ "12345678901234567890" : 1.23324 }
> Object.bsonsize(testObject)
35
> testObject = {"1" : 1.23324}
{ "1" : 1.23324 }
> Object.bsonsize(testObject)
16
> testObject = {"12" : 1.23324}
{ "12" : 1.23324 }
> Object.bsonsize(testObject)
17

The Object.bsonsize method will give you an approximate size of any document in bytes but does not include padding, indexes etc. that would actually be used when storing a document in the database. Hence, these are all very approximate numbers - I would recommend testing with actual data to get a more definitive example.

answered Oct 13, 2015 at 14:20
3
  • 1
    I found the topic is removed since docs.mongodb.com/v3.2/faq, but in docs.mongodb.com/manual/core/data-model-operations/…. Commented Mar 22, 2019 at 6:06
  • Seems like the internals of MongoDB are not published. So MongoDB doesn't abstract out the filednames into some structure and not re-store the same fieldnames over and over in each document (imagine if you had 100 million documents)? Or it literally stores the JSON exactly as your pass it? Commented Jan 19, 2021 at 23:16
  • Is there a Command or Compass/Atlas screen that shows the size of an entire collection? I would like to load up two different collections, one with short and one with long names to see the difference. @AdamC Just found it docs.mongodb.com/manual/reference/method/… Commented Jan 19, 2021 at 23:19
6

I performed a little benchmark, I uploaded 252 rows of data from an Excel into two collections testShortNames and testLongNames as follows:

Long Names:

{
 "_id": ObjectId("6007a81ea42c4818e5408e9c"),
 "countryNameMaster": "Andorra",
 "countryCapitalNameMaster": "Andorra la Vella",
 "areaInSquareKilometers": 468,
 "countryPopulationNumber": NumberInt("77006"),
 "continentAbbreviationCode": "EU",
 "currencyNameMaster": "Euro"
}

Short Names:

{
 "_id": ObjectId("6007a81fa42c4818e5408e9d"),
 "name": "Andorra",
 "capital": "Andorra la Vella",
 "area": 468,
 "pop": NumberInt("77006"),
 "continent": "EU",
 "currency": "Euro"
}

I then got the stats for each, saved in disk files, then did a "diff" on the two files:

pprint.pprint(db.command("collstats", dbCollectionNameLongNames))

The image below shows two variables of interest: size and storageSize. My reading showed that storageSize is the amount of disk space used after compression, and basically size is the uncompressed size. So we see the storageSize is identical. Apparently the Wired Tiger engine compresses fieldnames quite well. enter image description here

The question only asked about disk space. However, I then ran a program to retrieve all data from each collection, and checked the response time.

Even though it was a sub-second query, the long names consistently took about 7 times longer. It of course will take longer to send the longer names across from the database server to the client program.

-------LongNames-------
Server Start DateTime=2021年01月20日 08:44:38
Server End DateTime=2021年01月20日 08:44:39
StartTimeMs= 606964546 EndTimeM= 606965328
ElapsedTime MilliSeconds= 782
-------ShortNames-------
Server Start DateTime=2021年01月20日 08:44:39
Server End DateTime=2021年01月20日 08:44:39
StartTimeMs= 606965328 EndTimeM= 606965421
ElapsedTime MilliSeconds= 93

In Python, I just did the following (I had to actually loop through the items to force the reads, otherwise the query returns only the cursor):

results = dbCollectionLongNames.find(query)
for result in results:
 pass
answered Jan 20, 2021 at 15:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.