PostgreSQL
Resources
- Useful PostgreSQL Queries and Commands
- Disk Usage
- Lock Monitoring
- Things I Wished More Developers Knew About Databases
- pgbadger - Analyze and generate reports from logs.
- WAL write amplification from UUID indexes
- Lesser Known PostgreSQL Features
- ⭐ The Internals of PostgreSQL
- SQL tips and tricks
Queries
Show running queries (pre 9.2):
SELECT procpid, age(clock_timestamp(), query_start), usename, client_addr, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start ASC;
Show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, client_addr, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start ASC;
Kill running query
SELECT pg_cancel_backend(procpid);
Show currently-held locks
SELECT l.relation::regclass, l.mode, l.locktype, l.pid, age(clock_timestamp(), a.query_start) as age
FROM pg_locks l, pg_stat_activity a
WHERE l.GRANTED and l.pid = a.pid;
Show when autovacuum last ran
SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables ORDER BY last_autovacuum;
Get an estimated count of rows in a table
https://wiki.postgresql.org/wiki/Count_estimate
SELECT reltuples AS estimate FROM pg_class WHERE relname = 'table_name';
(Disk usage) Show size of relations
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
(Disk usage) Show size of tables
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
(Disk usage) Show size of column on disk
Warning: expensive query on large tables.
select
pg_size_pretty(sum(pg_column_size(column_name))) as total_size,
pg_size_pretty(avg(pg_column_size(column_name))) as average_size,
sum(pg_column_size(column_name)) * 100.0 / pg_total_relation_size('table_name') as percentage
from table_name;
(Indexes) Show index usage
Source: https://www.cybertec-postgresql.com/en/get-rid-of-your-unused-indexes/
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
AND 0 <>ALL (i.indkey) -- no index column is an expression
AND NOT i.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;
(pg_stat_statements) Get query statistics
SELECT
calls, total_time, rows,
100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent,
round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu,
query
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 100;
Guides
Select latest row in each group
If we have a table containing multiple groups, and we want to get the latest row in each group, we can use the DISTINCT ON
clause. In other databases, this would be done with a window function or joins instead.
As an example, assume we have the following table:
id | key | timestamp
-- | --- | ----------
1 | a | 1000
2 | b | 1002
3 | c | 1003
4 | a | 1004
5 | c | 1005
We want to retrieve the row with the latest timestamp grouped by key. We can do:
SELECT DISTINCT ON (key) id, key, timestamp FROM tbl ORDER BY key, timestamp DESC;
Which will produce the following result:
id | key | timestamp
-- | --- | ----------
2 | b | 1002
4 | a | 1004
5 | c | 1005
Identity columns
In PostgreSQL 10, autoincrementing columns should be defined as identity columns instead of using the SERIAL
data type.
CREATE TABLE test_old (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
payload text
);
Also consider using GENERATED ALWAYS
instead of GENERATED BY DEFAULT
to prevent
the value of an autogenerated key from being manually set.
FILTER
clause
An aggregate function can be extended with a FILTER (WHERE filter_clause)
expression. If FILTER
is specified, then only the input rows for which the
filter clause evaluates to true are fed to the aggregate function; other rows
are discarded.
For example:
SELECT
count(*) as total_jobs,
count(*) filter (where state=active) as active_jobs,
count(*) filter (where state=inactive) as inactive_jobs
FROM jobs;
Concepts
Buffers (shared/local blocks)
https://www.postgresql.org/docs/10/sql-explain.html
Shared blocks contain data from regular tables and indexes; local blocks contain data from temporary tables and indexes; while temp blocks contain short-term working data used in sorts, hashes, Materialize plan nodes, and similar cases. The number of blocks dirtied indicates the number of previously unmodified blocks that were changed by this query; while the number of blocks written indicates the number of previously-dirtied blocks evicted from cache by this backend during query processing.
psql customizations
Separate histories for different remote hosts:
\set HISTFILE ~/.config/psql/history/.psql_history_ :HOST _ :DBNAME
More information in the psql prompt:
\set PROMPT1 '%[%033[37m%][%M] %n@%/%R%#%x %[%033[0m%]%'
Autovacuuum
Regular autovacuum processes will be automatically interrupted when the process detects
that it is blocking another operation. For instance when executing DDLs like
ALTER TABLE
that would acquire a SHARE UPDATE EXCLUSIVE
lock.
The exception is if the autovacuum process is being triggered by wraparound
protection. In this mode, the autovauum query will show (to prevent wraparound)
. This process will not be automatically interrupted, and will
block any DDL statements requiring a lock.
Performance analysis
I’ve had good results with pg_stat_statements in the past. It does require restarting the database however. It can be enabled gracefully on a slave by taking it out of rotation, applying the config change, then putting it back into rotation. I did not observe any performance impact to having this setting enabled.
pgBadger is another option. It does require updating log-related configuration. Typically I have logs for slow queries enabled, and pgBadger can be used to aggregate and analyze those.