Database sizing

Question 1

I’m writing an app that will query using customer ID with the intent that it will pull 1 customer data per time. The search criteria is the customer ID which is also the primary key; the result will pull one row. I plan on having just the 1 database with 1 table with text data only. My intent is to have 100,000 rows and 20 columns, each database field containing characters of a maximum of 50.

I’m hoping someone can point me in the right direction regarding sizing for this. I know it won’t be accurate but I’m looking for a rough estimate and guidance on how to go about sizing the database. Any guides or reference sites would be much appreciated.

Question 2

who cares? What's the smallest hard drive you can buy retail these days, 1 TB?

Question 3

That is a small database for Postgres and today's hardware. Is there some reason you are concerned about size?

Question 4

Don't worry about it.

100,000 rows is nothing.
20 columns is nothing.

You're at a totally reasonable workload to not concern yourself with this. Work on spending your time doing something more productive. We would have to know the location of

all the columns
all the types of the columns
all the inputs on the columns, and whether those specific columns were TOASTable.

Just work on features, come back when something is too slow or doesn't work.

If you want the thereabouts (napkin math). Figure for a row of m columns.

1*m (varlena/text overhead)
50*m (assuming ascii for the contents)

With the row,

23 bytes for the row overhead
8 for padding (worst case)

And the db.

rounded to 8kb for the disk page.

So something like

SELECT (( (50+1)*20 + 23 + 8) * 1e5) / 1e6; -- 1e5 rows, 1e6 bytes in a mB

So that's 105.1 MB.

In the grand scheme of things, that rounds to 0.

Verification

CREATE TABLE foo (
 c10 text DEFAULT repeat('x', 50),
 c11 text DEFAULT repeat('x', 50),
 c12 text DEFAULT repeat('x', 50),
 c13 text DEFAULT repeat('x', 50),
 c14 text DEFAULT repeat('x', 50),
 c15 text DEFAULT repeat('x', 50),
 c16 text DEFAULT repeat('x', 50),
 c17 text DEFAULT repeat('x', 50),
 c18 text DEFAULT repeat('x', 50),
 c19 text DEFAULT repeat('x', 50),
 c20 text DEFAULT repeat('x', 50),
 c21 text DEFAULT repeat('x', 50),
 c22 text DEFAULT repeat('x', 50),
 c23 text DEFAULT repeat('x', 50),
 c24 text DEFAULT repeat('x', 50),
 c25 text DEFAULT repeat('x', 50),
 c26 text DEFAULT repeat('x', 50),
 c27 text DEFAULT repeat('x', 50),
 c28 text DEFAULT repeat('x', 50),
 c29 text DEFAULT repeat('x', 50)
);
INSERT INTO foo(c10)
SELECT repeat('x',50)
FROM generate_series(1,1e5);
test=# \dt+ foo;
 List of relations
 Schema | Name | Type | Owner | Size | Description 
--------+------+-------+----------+--------+-------------
 public | foo | table | ecarroll | 112 MB |

Question 5

I am glad my napkin math was close to yours.

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges511 bronze badges · Answer 1 · 2017-10-09 22:16:21Z

Don't worry about it.

100,000 rows is nothing.
20 columns is nothing.

You're at a totally reasonable workload to not concern yourself with this. Work on spending your time doing something more productive. We would have to know the location of

all the columns
all the types of the columns
all the inputs on the columns, and whether those specific columns were TOASTable.

Just work on features, come back when something is too slow or doesn't work.

If you want the thereabouts (napkin math). Figure for a row of m columns.

1*m (varlena/text overhead)
50*m (assuming ascii for the contents)

With the row,

23 bytes for the row overhead
8 for padding (worst case)

And the db.

rounded to 8kb for the disk page.

So something like

SELECT (( (50+1)*20 + 23 + 8) * 1e5) / 1e6; -- 1e5 rows, 1e6 bytes in a mB

So that's 105.1 MB.

In the grand scheme of things, that rounds to 0.

Verification

CREATE TABLE foo (
 c10 text DEFAULT repeat('x', 50),
 c11 text DEFAULT repeat('x', 50),
 c12 text DEFAULT repeat('x', 50),
 c13 text DEFAULT repeat('x', 50),
 c14 text DEFAULT repeat('x', 50),
 c15 text DEFAULT repeat('x', 50),
 c16 text DEFAULT repeat('x', 50),
 c17 text DEFAULT repeat('x', 50),
 c18 text DEFAULT repeat('x', 50),
 c19 text DEFAULT repeat('x', 50),
 c20 text DEFAULT repeat('x', 50),
 c21 text DEFAULT repeat('x', 50),
 c22 text DEFAULT repeat('x', 50),
 c23 text DEFAULT repeat('x', 50),
 c24 text DEFAULT repeat('x', 50),
 c25 text DEFAULT repeat('x', 50),
 c26 text DEFAULT repeat('x', 50),
 c27 text DEFAULT repeat('x', 50),
 c28 text DEFAULT repeat('x', 50),
 c29 text DEFAULT repeat('x', 50)
);
INSERT INTO foo(c10)
SELECT repeat('x',50)
FROM generate_series(1,1e5);
test=# \dt+ foo;
 List of relations
 Schema | Name | Type | Owner | Size | Description 
--------+------+-------+----------+--------+-------------
 public | foo | table | ecarroll | 112 MB |

I am glad my napkin math was close to yours.

Anthony Genovese
– Anthony Genovese

2017年10月09日 23:16:30 +00:00
Commented Oct 9, 2017 at 23:16

Stack Exchange Network

Database sizing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Database sizing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions