原文链接:https://www.depesz.com/2020/01/29/which-tables-should-be-auto-vacuumed-or-auto-analyzed/,译者(guobo507)水平有限,不足之处,请将就看吧,哈哈。文中所有命令及输出,全部照搬原文,等有时间了我再自己验证的,各位看官请轻喷。

让我们开始吧!

最近,我正在烦恼自动回收无法跟上数据库表变化的情况。为了解决该问题,我最终决定手动对所有表进行 vacuum 和 analyze(手动 vacuum/analyze 比通过 autovacuum 守护程序自动运行更快)。

但是令我感到恼火的是,我对如何检查哪些表正在等待自动清理工作没有好的方法。

所以,我写了这篇文章。

首先,让我们从最基础的地方开始。在确定是否应该进行 VACUUM 和/或 ANALYZE 时,自动清理(Autovacuum)会查看以下参数配置:

在我的测试案例中,配置参数的值为:

=$ SELECT name, setting
FROM pg_settings
WHERE name IN ('autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold')

ORDER BY name;
            name               | setting
---------------------------------+---------
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold    | 50
autovacuum_vacuum_scale_factor  | 0.2
autovacuum_vacuum_threshold     | 50
(4 ROWS)

规则很简单

  • 如果n_dead_tup大于reltuples * autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold,则运行 vacuum 操作。
  • 如果n_mod_since_analyze大于reltuples * autovacuum_analyze_scale_factor + autovacuum_analyze_threshold,则运行 analyze 操作。

让我们做一些数学运算。假设我的表有 10000 行,如果我删除(或更新)10000 x 0.2 + 50 = 2050 行,它将被执行 vacuum 操作。如果我更新(或插入)10000 x 0.1 + 50 = 1050 行,它将被执行 analyze 操作。非常简单,是吧!

有了数学运算,让我们看看这些值实际上是如何变化的。首先我们创建一些测试表:

=$ CREATE TABLE t1 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t2 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t3 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));

创建后立即就有明显的统计数据产生:

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;

relname | reltuples | n_dead_tup | n_mod_since_analyze
---------+-----------+------------+---------------------
t1      |         0 |          0 |                   0
t2      |         0 |          0 |                   0
t3      |         0 |          0 |                   0
(3 ROWS)

OK。让我们加载一些数据到测试表中:

=$ copy t1 (payload) FROM '/usr/share/dict/words';
COPY 102401

=$ copy t2 (payload) FROM '/usr/share/dict/words';
COPY 102401

=$ copy t3 (payload) FROM '/usr/share/dict/words';
COPY 102401

紧接着,在禁用自动真空的情况下,你会发现统计信息会更加有趣:

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;

relname | reltuples | n_dead_tup | n_mod_since_analyze
---------+-----------+------------+---------------------
t1      |         0 |          0 |              102401
t2      |         0 |          0 |                   0
t3      |         0 |          0 |                   0
(3 ROWS)

请注意,reltuples仍显示为 0,因为这个值是通过 vacuum 操作来更新的(当前 autovacuum 被禁用了)。

让我们更新一些行:

=$ UPDATE t1 SET payload = 'new ' || payload WHERE random() < 0.01;
UPDATE 1068

=$ UPDATE t2 SET payload = 'new ' || payload WHERE random() < 0.1;
UPDATE 10135

=$ UPDATE t3 SET payload = 'new ' || payload WHERE random() < 0.5;
UPDATE 51028

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;

relname | reltuples | n_dead_tup | n_mod_since_analyze
---------+-----------+------------+---------------------
t1      |         0 |       1068 |              103469
t2      |         0 |      10135 |               10135
t3      |         0 |      51028 |               51028
(3 ROWS)

你会发现,n_dead_tup变为了已更新的行数,并且n_mod_since_analyze包含了插入的行和更新的行数之和。

最后,我们来测试 DELETE 操作。但是,为了能够清楚地看到它的变化,首先我们对表进行 vacuum/analyze,以便将计数器清零:

=$ vacuum analyze t1, t2, t3;
VACUUM

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;

relname | reltuples | n_dead_tup | n_mod_since_analyze
---------+-----------+------------+---------------------
t1      |    102401 |          0 |                   0
t2      |    102401 |          0 |                   0
t3      |    102401 |          0 |                   0
(3 ROWS)

=$ DELETE FROM t1 WHERE random() < 0.01;
DELETE 999

=$ DELETE FROM t2 WHERE random() < 0.1;
DELETE 10307

=$ DELETE FROM t3 WHERE random() < 0.5;
DELETE 51087

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;

relname | reltuples | n_dead_tup | n_mod_since_analyze
---------+-----------+------------+---------------------
t1      |    102401 |        999 |                 999
t2      |    102401 |      10307 |               10307
t3      |    102401 |      51087 |               51087
(3 ROWS)

很好,我们发现n_dead_tupn_mod_since_analyze都增加了。

接下来,让我们写一个查询,它将我们显示哪些表正在等待自动 vacuum 操作。

首先,为了将所有参数放在同一个位置,我将编写如下的 CTE:

WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
)

然后,基于这个 CTE 查询,我可以获得所有表的列表,以及有关是否应该清理或分析的具体信息:

=$ WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
)
SELECT
    n.nspname,
    c.relname,
    c.oid AS relid,
    t.n_dead_tup,
    t.n_mod_since_analyze,
    c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
    c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
FROM
    s,
    pg_class c
    JOIN pg_namespace n ON c.relnamespace = n.oid
    JOIN pg_stat_all_tables t ON c.oid = t.relid
WHERE
    c.relkind = 'r'
    AND c.relname ~ '^t[123]$'
ORDER BY c.relname;

nspname | relname | relid | n_dead_tup | n_mod_since_analyze | v_threshold | a_threshold
---------+---------+-------+------------+---------------------+-------------+-------------
public  | t1      | 17626 |        999 |                 999 |     20530.2 |     10290.1
public  | t2      | 17636 |      10307 |              112708 |     20530.2 |     10290.1
public  | t3      | 17646 |      51087 |              153488 |     20530.2 |     10290.1
(3 ROWS)

我在调试整个查询时包括了relname的过滤条件(第 23 行),以避免在调试查询时显示此数据库中的所有表。当调试完成后,条件c.relname〜'^ t [123] $'将被删除。

所以,我们有表名,以及它们的n_dead_tupn_mod_since_analyze以及 vacuum 和 analyze 的阈值。因此,让我们以更好的方式展示这个查询/结果:

=$ WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
), tt AS (
    SELECT
        n.nspname,
        c.relname,
        c.oid AS relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
    FROM
        s,
        pg_class c
        JOIN pg_namespace n ON c.relnamespace = n.oid
        JOIN pg_stat_all_tables t ON c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    CASE WHEN n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END AS do_vacuum,
    CASE WHEN n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END AS do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold
ORDER BY nspname, relname;

nspname | relname | relid | do_vacuum | do_analyze
---------+---------+-------+-----------+------------
public  | t2      | 17636 |           | ANALYZE
public  | t3      | 17646 | VACUUM    | ANALYZE
(2 ROWS)

该查询可以存储为视图:

=$ CREATE VIEW autovacuum_queue AS
WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
), tt AS (
    SELECT
        n.nspname,
        c.relname,
        c.oid AS relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
    FROM
        s,
        pg_class c
        JOIN pg_namespace n ON c.relnamespace = n.oid
        JOIN pg_stat_all_tables t ON c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    CASE WHEN n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END AS do_vacuum,
    CASE WHEN n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END AS do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold;
CREATE VIEW

现在,我可以通过此视图,并附加一些其他条件/信息,来展示我希望看到的内容:

=$ SELECT *,
    pg_relation_size(relid),
    pg_total_relation_size(relid)
FROM autovacuum_queue
ORDER BY pg_total_relation_size(relid) DESC;

nspname | relname | relid | do_vacuum | do_analyze | pg_relation_size | pg_total_relation_size
---------+---------+-------+-----------+------------+------------------+------------------------
public | t3 | 17646 | VACUUM | ANALYZE | 7143424 | 11804672
public | t2 | 17636 | | ANALYZE | 5128192 | 8011776
(2 ROWS)

好了,希望您会发现它真的非常有用。