Tutorial: Inserting Multiple Rows in PostgreSQL

Avatar

By squashlabs, Last Updated: July 21, 2023

Tutorial: Inserting Multiple Rows in PostgreSQL

Use Cases of Inserting Multiple Rows in Postgresql

Related Article: PostgreSQL HyperLogLog (HLL) & Cardinality Estimation

Use Case 1: Bulk Data Import

When dealing with large datasets, it is often more efficient to insert multiple rows at once rather than executing individual insert statements for each row. Postgresql provides several methods to insert multiple rows efficiently. One common use case is bulk data import, where you need to insert a large number of rows into a table.

One way to achieve this is by using the INSERT INTO ... SELECT statement. This statement allows you to select data from one or more tables or even subqueries and insert it into another table. Here’s an example:

INSERT INTO employees (id, name, age)
SELECT id, name, age
FROM temporary_employees;

In this example, we are inserting data from the “temporary_employees” table into the “employees” table. The columns in the “employees” table must match the order and data types of the columns in the select statement.

Another method for bulk data import is by using the COPY command. This command allows you to copy data from a file directly into a table. You can create a CSV file with the data and use the COPY command to load it into the table. Here’s an example:

COPY employees (id, name, age)
FROM '/path/to/employees.csv'
DELIMITER ',' CSV HEADER;

In this example, we are copying data from the CSV file “employees.csv” into the “employees” table. The columns in the table must match the columns in the CSV file.

Use Case 2: Data Migration

Data migration is another common use case where you need to move data from one table or database to another. Inserting multiple rows can help streamline this process and make it more efficient.

One approach is to use the INSERT INTO ... SELECT statement, similar to the bulk data import use case. You can select the data from the source table and insert it into the destination table. Here’s an example:

INSERT INTO destination_table (id, name, age)
SELECT id, name, age
FROM source_table;

In this example, we are selecting data from the “source_table” and inserting it into the “destination_table”. Again, the columns in the destination table must match the columns in the select statement.

Another approach for data migration is by using the COPY command, similar to the bulk data import use case. You can export the data from the source table into a CSV file and then use the COPY command to load it into the destination table. Here’s an example:

COPY destination_table (id, name, age)
TO '/path/to/destination.csv'
DELIMITER ',' CSV HEADER;

In this example, we are exporting data from the “destination_table” to the CSV file “destination.csv”. The columns in the table must match the columns in the CSV file.

Best Practices for Inserting Multiple Rows in Postgresql

Related Article: How to Check if a Table Exists in PostgreSQL

Using Prepared Statements

When inserting multiple rows, it is important to consider the performance implications. One best practice is to use prepared statements. Prepared statements allow you to prepare an SQL statement once and execute it multiple times with different parameter values. This can significantly improve performance, especially when inserting a large number of rows.

Here’s an example of using prepared statements to insert multiple rows:

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Prepare the SQL statement
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)"
cur.prepare(stmt)

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Execute the prepared statement for each row of data
cur.executemany(stmt, data)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

In this example, we connect to the PostgreSQL database using the psycopg2 library in Python. We prepare the SQL statement using the prepare() method of the cursor object. Then, we define the data to be inserted as a list of tuples. Finally, we execute the prepared statement using the executemany() method, passing in the data.

Using prepared statements can help improve performance by reducing the overhead of parsing and planning the SQL statement for each execution.

Batching Inserts

Another best practice for inserting multiple rows is to batch the inserts. Instead of executing individual insert statements for each row, you can group multiple rows into a single insert statement. This can reduce the number of round trips to the database and improve performance.

Here’s an example of batching inserts using the INSERT INTO ... VALUES statement:

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Generate the VALUES part of the insert statement
values = ','.join(cur.mogrify("(%s,%s,%s)", row).decode('utf-8') for row in data)

# Execute the insert statement
cur.execute(f"INSERT INTO employees (id, name, age) VALUES {values}")

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

In this example, we define the data to be inserted as a list of tuples. We then generate the VALUES part of the insert statement using the mogrify() method of the cursor object. The mogrify() method returns a byte string, so we decode it to UTF-8. Finally, we execute the insert statement using the execute() method.

Batching inserts can help improve performance by reducing the overhead of executing multiple insert statements.

Real World Examples of Inserting Multiple Rows in Postgresql

Related Article: Applying Aggregate Functions in PostgreSQL WHERE Clause

Example 1: Inserting Multiple Rows from a CSV File

One common real-world example is inserting multiple rows from a CSV file. Let’s say you have a CSV file containing employee data, and you want to insert this data into a table in PostgreSQL.

Here’s an example of how you can achieve this using the psycopg2 library in Python:

import psycopg2
import csv

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Open the CSV file
with open('/path/to/employees.csv', 'r') as file:
    # Create a CSV reader
    reader = csv.reader(file)
    
    # Skip the header row
    next(reader)
    
    # Iterate over the rows in the CSV file
    for row in reader:
        # Insert each row into the table
        cur.execute("INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)", row)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

In this example, we open the CSV file using the open() function in Python’s built-in csv module. We create a CSV reader and skip the header row using the next() function. Then, we iterate over the rows in the CSV file and insert each row into the table using the execute() method.

Example 2: Inserting Multiple Rows from an API Response

Another real-world example is inserting multiple rows from an API response. Let’s say you have an API that returns a JSON response containing employee data, and you want to insert this data into a table in PostgreSQL.

Here’s an example of how you can achieve this using the requests and psycopg2 libraries in Python:

import requests
import psycopg2

# Make a GET request to the API
response = requests.get('https://api.example.com/employees')

# Parse the JSON response
data = response.json()

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Iterate over the employees in the JSON response
for employee in data['employees']:
    # Insert each employee into the table
    cur.execute("INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)",
                (employee['id'], employee['name'], employee['age']))

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

In this example, we make a GET request to the API using the requests library. We parse the JSON response using the json() method. Then, we connect to the PostgreSQL database using the psycopg2 library. We iterate over the employees in the JSON response and insert each employee into the table using the execute() method.

Performance Considerations for Inserting Multiple Rows in Postgresql

Related Article: How to Convert Columns to Rows in PostgreSQL

Inserting with Transactions

When inserting multiple rows, performance can be improved by using transactions. A transaction is a way to group multiple database operations into a single unit of work. By wrapping the insert statements in a transaction, you can reduce the overhead of committing each individual row.

Here’s an example of inserting multiple rows within a transaction:

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Begin a transaction
conn.autocommit = False

# Insert multiple rows within the transaction
try:
    cur.execute("INSERT INTO employees (id, name, age) VALUES (1, 'John Doe', 30)")
    cur.execute("INSERT INTO employees (id, name, age) VALUES (2, 'Jane Smith', 35)")
    cur.execute("INSERT INTO employees (id, name, age) VALUES (3, 'Bob Johnson', 40)")
    
    # Commit the transaction
    conn.commit()
    
except Exception as e:
    # Rollback the transaction if an error occurs
    conn.rollback()
    print(f"Error: {str(e)}")

finally:
    # Reset autocommit to True and close the connection
    conn.autocommit = True
    cur.close()
    conn.close()

In this example, we begin a transaction by setting the autocommit property of the connection to False. We then execute the insert statements within the transaction. If an error occurs, we rollback the transaction using the rollback() method. Finally, we reset the autocommit property to True and close the connection.

Using transactions can improve performance by reducing the number of commits and reducing the overhead of committing each individual row.

Optimizing Batch Size

Another performance consideration when inserting multiple rows is the batch size. The batch size refers to the number of rows inserted in a single insert statement or transaction. Optimizing the batch size can have a significant impact on performance.

If the batch size is too small, the overhead of executing multiple statements or transactions can outweigh the performance benefits of bulk inserts. On the other hand, if the batch size is too large, it can lead to increased memory usage and longer transaction times.

Finding the optimal batch size depends on several factors such as the size of the data, the available system resources, and the database configuration. It is recommended to experiment with different batch sizes to find the optimal value for your specific use case.

Advanced Techniques for Inserting Multiple Rows in Postgresql

Related Article: Detecting and Resolving Deadlocks in PostgreSQL Databases

Using the INSERT INTO … VALUES Statement with Multiple Rows

In addition to the methods mentioned earlier, Postgresql provides a concise way to insert multiple rows using the INSERT INTO ... VALUES statement with multiple rows enclosed in parentheses.

Here’s an example:

INSERT INTO employees (id, name, age)
VALUES
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40);

In this example, we are inserting three rows into the “employees” table. Each row is specified as a comma-separated list of values enclosed in parentheses.

Using this technique can simplify the insertion of multiple rows and make the SQL statement more readable.

Using the INSERT INTO … ON CONFLICT DO NOTHING Statement

Another advanced technique is to use the INSERT INTO ... ON CONFLICT DO NOTHING statement to handle conflicts when inserting multiple rows. This statement allows you to specify what action to take when a conflict occurs, such as a duplicate key violation.

Here’s an example:

INSERT INTO employees (id, name, age)
VALUES
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
ON CONFLICT (id) DO NOTHING;

In this example, we are inserting three rows into the “employees” table. The ON CONFLICT (id) DO NOTHING clause tells Postgresql to do nothing when a conflict occurs on the “id” column, effectively skipping the conflicting rows.

This technique can be useful when inserting data from external sources where conflicts are expected, such as data from a CSV file or an API response.

Code Snippet Ideas – Inserting Multiple Rows in Postgresql

Related Article: Executing Efficient Spatial Queries in PostgreSQL

Snippet 1: Inserting Multiple Rows using psycopg2’s executemany()

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Prepare the SQL statement
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)"

# Execute the prepared statement for each row of data
cur.executemany(stmt, data)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to use psycopg2’s executemany() method to insert multiple rows into a PostgreSQL table. The data list contains the rows to be inserted, and the stmt variable holds the SQL statement with placeholders for the values.

Snippet 2: Inserting Multiple Rows from a Pandas DataFrame

import psycopg2
import pandas as pd

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Load data from a CSV file into a Pandas DataFrame
df = pd.read_csv('/path/to/employees.csv')

# Insert the DataFrame into the table
df.to_sql('employees', conn, if_exists='append', index=False)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to insert multiple rows from a Pandas DataFrame into a PostgreSQL table. The read_csv() function reads the data from a CSV file into a DataFrame. The to_sql() method inserts the DataFrame into the specified table in the PostgreSQL database. The if_exists='append' parameter appends the data to the existing table. Finally, the changes are committed, and the connection is closed.

Code Snippet Ideas – Advanced Multiple Rows Insertion with Postgresql

Related Article: Preventing Locking Queries in Read-Only PostgreSQL Databases

Snippet 1: Inserting Multiple Rows with Returning Clause

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Prepare the SQL statement with the RETURNING clause
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s) RETURNING id"

# Execute the prepared statement for each row of data
cur.executemany(stmt, data)

# Fetch the inserted IDs
inserted_ids = cur.fetchall()
print("Inserted IDs:", inserted_ids)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to use the RETURNING clause in an insert statement to fetch the inserted IDs. The stmt SQL statement includes the RETURNING clause to return the id column after each insert. The fetchall() method retrieves all the inserted IDs, which can be useful for further processing or verification.

Snippet 2: Inserting Multiple Rows with a Default Value

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Prepare the SQL statement with a default value
stmt = "INSERT INTO employees (id, name, age, created_at) VALUES (%s, %s, %s, DEFAULT)"

# Execute the prepared statement for each row of data
cur.executemany(stmt, data)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to insert multiple rows into a PostgreSQL table with a default value for a column. The stmt SQL statement includes the DEFAULT keyword for the created_at column, which automatically assigns the default value defined in the table schema. This can be useful for columns like timestamps or auto-generated values.

Code Snippet Ideas – Postgresql Features to Insert Multiple Rows

Related Article: Passing Query Results to a SQL Function in PostgreSQL

Snippet 1: Upsert (INSERT … ON CONFLICT UPDATE)

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted or updated
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Prepare the SQL statement with the ON CONFLICT DO UPDATE clause
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, age = EXCLUDED.age"

# Execute the prepared statement for each row of data
cur.executemany(stmt, data)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to use the INSERT ... ON CONFLICT UPDATE statement, also known as an upsert, to insert or update multiple rows in a PostgreSQL table. The ON CONFLICT clause specifies the conflict resolution strategy. In this example, it updates the “name” and “age” columns of the existing rows with the values from the new rows.

Snippet 2: Inserting with a Subquery

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Prepare the SQL statement with a subquery
stmt = "INSERT INTO employees (id, name, age) SELECT id, name, age FROM temporary_employees"

# Execute the prepared statement
cur.execute(stmt)

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

This code snippet demonstrates how to insert multiple rows into a PostgreSQL table using a subquery. The SELECT statement inside the INSERT INTO statement selects the data from the “temporary_employees” table and inserts it into the “employees” table. This can be useful when you want to insert rows based on certain conditions or from another table.

Code Snippet Ideas – Different Ways to Insert Multiple Rows with Postgresql

Related Article: Resolving Access Issues with Query Pg Node in PostgreSQL

Snippet 1: Inserting Rows with SERIAL Primary Key

-- Create a table with a SERIAL primary key
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    age INTEGER
);

-- Insert rows without specifying the primary key
INSERT INTO employees (name, age) VALUES ('John Doe', 30), ('Jane Smith', 35), ('Bob Johnson', 40);

-- Query the table
SELECT * FROM employees;

This code snippet demonstrates how to insert multiple rows into a table with a SERIAL primary key in PostgreSQL. The SERIAL column automatically generates a unique integer value for each inserted row. When inserting rows, you don’t need to specify a value for the SERIAL column; it will be automatically assigned.

Snippet 2: Inserting Rows with a Subquery

-- Create a temporary table
CREATE TEMPORARY TABLE temporary_employees (id INT, name VARCHAR(100), age INTEGER);

-- Insert rows into the temporary table
INSERT INTO temporary_employees (id, name, age) VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 35), (3, 'Bob Johnson', 40);

-- Insert rows into the main table using a subquery
INSERT INTO employees (id, name, age)
SELECT id, name, age
FROM temporary_employees;

-- Query the main table
SELECT * FROM employees;

This code snippet demonstrates how to insert multiple rows into a PostgreSQL table using a subquery. First, a temporary table “temporary_employees” is created, and rows are inserted into it. Then, the rows from the temporary table are inserted into the main table “employees” using a subquery. This can be useful when you want to insert rows based on certain conditions or from another table.

Code Snippet Ideas – Error Handling in Postgresql

Related Article: Does PostgreSQL Have a Maximum SQL Query Length?

Snippet 1: Handling Errors with Try-Except

import psycopg2

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40)
]

# Prepare the SQL statement
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)"

# Execute the prepared statement for each row of data
try:
    cur.executemany(stmt, data)
    conn.commit()
    print("Data inserted successfully!")
except psycopg2.Error as e:
    conn.rollback()
    print(f"Error: {str(e)}")

# Close the connection
cur.close()
conn.close()

This code snippet demonstrates how to handle errors when inserting multiple rows into a PostgreSQL table using the try-except block in Python. The executemany() method is executed within the try block, and if an error occurs, the changes are rolled back using the rollback() method. The error message is then printed. If no error occurs, the changes are committed, and a success message is printed.

Snippet 2: Raising Custom Exceptions

import psycopg2

class CustomInsertError(Exception):
    pass

# Connect to the PostgreSQL database
conn = psycopg2.connect(database="mydb", user="myuser", password="mypassword", host="localhost", port="5432")
cur = conn.cursor()

# Define the data to be inserted
data = [
    (1, 'John Doe', 30),
    (2, 'Jane Smith', 35),
    (3, 'Bob Johnson', 40),
    (4, 'Alice Brown', 25)  # This row will cause an error due to duplicate key violation
]

# Prepare the SQL statement
stmt = "INSERT INTO employees (id, name, age) VALUES (%s, %s, %s)"

# Execute the prepared statement for each row of data
try:
    cur.executemany(stmt, data)
    conn.commit()
    print("Data inserted successfully!")
except psycopg2.IntegrityError as e:
    conn.rollback()
    raise CustomInsertError("Error: Duplicate key violation") from e

# Close the connection
cur.close()
conn.close()

This code snippet demonstrates how to raise custom exceptions when handling errors during multiple row inserts in PostgreSQL. In this example, a custom exception class called CustomInsertError is defined. If a duplicate key violation occurs, an IntegrityError is raised, and a custom exception is raised with a specific error message. The custom exception is raised with the original exception as the cause, providing more information about the error.

Tutorial: Dealing with Non-Existent Relations in PostgreSQL

Handling the 'relation does not exist' error in PostgreSQL databases can be a challenging task. In this tutorial, you will learn how to deal with non-existent relations... read more

How to Use PostgreSQL SELECT INTO TEMP Table

PostgreSQL SELECT INTO TEMP table is a powerful feature that allows you to easily manipulate data in temporary tables. This tutorial provides step-by-step instructions... read more

Exploring Natural Join in PostgreSQL Databases

PostgreSQL is a powerful relational database management system that offers various join operations to combine data from multiple tables. This article provides a concise... read more

How to Implement Database Sharding in PostgreSQL

Database sharding is a critical technique for scaling databases and improving performance. This article provides a step-by-step guide on implementing database sharding... read more

How to Extract Data from PostgreSQL Databases: PSQL ETL

In this article, we will guide you through the process of extracting data from PostgreSQL databases using PSQL ETL. You will learn about various techniques, tools, and... read more