I’ve been working with moving large databases in PostgreSQL and have found that PostgreSQL is a true memory hog when inserting millions of rows in one statement. If the target table has a number of foreign key references PostgreSQL seems to choke horribly and use insane amounts of memory, 1GB per million rows in my original case. We have a rather simple table that is composed of six integer columns, two timestamps, and one boolean. I first ran into this issue when inserting with a select statement that had a simple join and where clause. Initially I thought the select statement was the one causing the out of memory issues. But I found running the select alone worked and changing the statement to a select into piped the data into a new table without any issues.
I created a simple test case to reproduce this with the following schema,
CREATE TABLE table1 ( table1_id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY(table1_id) ); CREATE TABLE table2 ( table2_id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY(table2_id) ); CREATE TABLE table3 ( table3_id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY(table3_id) ); CREATE TABLE table4 ( table4_id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY(table4_id) ); CREATE TABLE test_target ( table1_id integer NOT NULL, table2_id integer NOT NULL, visible boolean NOT NULL, date_added timestamp with time zone NOT NULL, date_updated timestamp with time zone NOT NULL, table3_id1 integer NOT NULL, table3_id2 integer NOT NULL, table4_id1 integer NOT NULL, table4_id2 integer NOT NULL, FOREIGN KEY (table1_id) REFERENCES table1(table1_id), FOREIGN KEY (table2_id) REFERENCES table2(table2_id), FOREIGN KEY (table3_id1) REFERENCES table3(table3_id), FOREIGN KEY (table3_id2) REFERENCES table3(table3_id), FOREIGN KEY (table4_id1) REFERENCES table4(table4_id), FOREIGN KEY (table4_id2) REFERENCES table4(table4_id) );
Running this INSERT / SELECT below can cause the PostgreSQL to keep expanding in memory usage. On a 32-bit machine it aborts due to an out of memory error around 2GB, but on a 64-bit Linux machine it keeps using memory until all of the main memory and swap is full and then the oom-killer process is spawned and starts killing processes.
INSERT INTO test_target ( table1_id, table2_id, visible, date_added, date_updated, table3_id1, table3_id2, table4_id1, table4_id2 ) SELECT 0 as table1_id, 0 as table2_id, TRUE as visible, now() as date_added, now() as date_updated, 0 as table3_id1, 0 as table3_id2, 0 as table4_id1, 0 as table4_id2 FROM generate_series(1, 13000000);
I've tried this on PostgreSQL 8.3.7 under Linux and I found the same behavior in PostgreSQL 8.4 Beta for Windows.
A simple workaround I found is to simply drop the foreign keys before the insert, and re-add the foreign keys after the data is loaded into the table.