类型名称	说明	存储需求	取值范围
SMALLINT	小范围的整数	2 个字节	(-2^15, 2^15-1)	-32768 ~ 32767
INT (INTEGER)	普通大小的整数	4 个字节	(-2^31, 2^31-1)	-2147483648 ~ 2147483647
BIGINT	大整数	8 个字节	(-2^63, 2^63-1)	-9223372036854775808 ~ 9223372036854775807

类型名称	说明	存储需求
REAL	6位十进制数字精度	4个字节
DOUBLE PRECISION	15位十进制数字精度	8 个字节

日期/时间类型

PostgreSQL 中有多种表示日期的数据类型, 主要有: TIME, DATE, TIMESTAMP.

PostgreSQL 还支持时区, 并能够按照不同时区对夏令时进行自动转换. 此外, PostgreSQL 还支持一些特殊类型的数据类型, 如 INTERVAL. 该类型可以用于对日期时间进行数学运算.

DATE 类型只用于日期, 即仅存储年、月、日, 没有时区、小时、分和秒的信息. 格式可以设置为: YYYY-MM-DD, 存储需4个字节.
TIME 类型只用于一日内的时间, 格式为: HH:MM:SS, 存储需8个字节. 取值范围为 00:00:00 至 24:00:00. time 又称 time without time zone
TIMESTAMP 类型用于记录日期和时间, 存储需8个字节.
timestamptz (又称 timestamp with time zone), 该类型同时存储了日期、时间以及时区信息. 在系统内部, 该类型的字段值是以 UTC 世界标准时间格式存储的, 但当查询显示时, 会按照服务器的时区设置进行换算后再显示(时区也可以在库级/用户级/会话级分别进行设置). 如果你输入的时间戳不带时区数据, 那么存入 timestamptz 类型字段中时, PostgreSQL 会自动使用当前数据库服务器的时区信息来补充. 如果修改了数据库服务器的时区设置, 你可以看到查询出来的时间数据发生了变化.
timetz （又称 time with time zone）. 与 timestamptz 类型类似, 但该类型使用频率较低, 因为它虽然携带了时区信息但却没有日期信息. 该类型永远假设当前时间是夏令时. 有的编程语言不支持这种仅有时间而无日期的数据类型, 因此可能会将其自动转换为带时区的时间戳类型, 转换时日期就取计算机系统时间的初始值(例如, Unix 时间纪元起始于 1970年, 因此转换后的日期就是 1970年1月1日, 时区和时间不变, 夏令时).
interval. 该类型描述了一个时间段的长度, 单位可以是小时、天、月、分钟或者其他粒度. 该类型适用于对日期和时间进行数学运算的场景.
tsrange. 该类型是 9.2 版引入的, 可用于定义 timestamp with no timezone 的开区间和闭区间. 该类型包含两个时间戳以及开区间和闭区间限定符. 例如, '[2012-01-01 14:00, 2012-01-01 15:00)'::tsrange 定义了从 2012年1月1日14:00开始到那天15:00之前结束的一个时间段.
tstzrange. 该类型也是 9.2 版引入的, 可用于定义 timestamp with timezone 的开区间和闭区间.
daterange. 该类型也是 9.2 版引入的, 可用于定义日期的开区间和闭区间.

对于 TIME 和 TIMESTAMP 类型, 默认不带时区(即 without time zone), 如果需要, 可以加上 with time zone.

在插入日期数据时, 可以使用形如 'YYYY-MM-DD'或者'YYYYMMDD' 的字符串.
以 'YY-MM-DD' 或者 'YYMMDD' 表示的日期, PostgreSQL 使用以下规则解释其所在年份. '00~69' 范围的年份值转换为 '2000~2069'; '70~99'范围的年份值被转换为 '1979~1999'.
使用 CURRENT_DATE 或者 NOW() 插入当前系统的日期.

TIMESTAMP 类型与 MySQL 中的 DATETIME 类似. 其日期格式为 'YYYY-MM-DD HH:MM:SS'. 在插入数据时, 要保证在合法的取值范围内.

注意, PostgreSQL 中没有 DATETIME 类型.

时区详解

时区

全球划分为24个时区

时区（Time Zone）, 是指地球上的各个区域使用同一个时间定义。

地球是自西向东自转, 东边比西边先看到太阳, 东边的时间也就比西边的早.

为了克服时间上的混乱, 1884年在华盛顿召开的一次国际经度会议（又称国际子午线会议）上, 规定将全球划分为24个时区：

以英国格林尼治天文台旧址为中时区（零时区）, 东边划分1-12区, 西边划分1-12区. 每个时区横跨经度15度, 时间正好是1小时. 最后的东、西第12区各跨经度7.5度, 以东、西经180度为界.

世界统一时间

世界标准时间（Universal Time Coordinated），又称世界统一时间、国际时间、全球时钟、协调世界时，简称UTC，由原子钟提供，以国际原子时的秒长为基准计量，非常精确。

UTC 指 Universal Time Coordinated 或 Universal Coordinated Time. 译为协调世界时, 又称世界统一时间、世界标准时间、国际协调时间.

由于英文(Coodinated Universal Time, 简写为CUT)和法文(Temps universel coordonné, 简写为TUC)的缩写不同, 作为妥协, 简称UTC. UTC 是现在全球通用的时间标准, 全球各地都同意将各自的时间进行同步协调. UTC 时间是经过平均太阳时（以格林威治时间GMT为准）、地轴运动修正后的新时标以及以秒为单位的国际原子时所综合精算而成.

时差

世界标准时间与北京时间的时差是8小时. 以现在北京时间减去8小时, 即为世界标准时间.

即, 世界标准时间=北京时间 - 8小时

PostgreSQL 对时区的处理

PostgreSQL 对时区的处理并不是简单地在日期和时间类型的基础上额外增加一个标记.

比如对于带时区的信息 2012-2-14 18:08:00-8, 这里 -8 代表比 UTC 时间迟 8 小时的时区. PostgreSQL 内部其实是这么工作的.

通过计算得到 2012-02-14 18:08:00-8 代表的 UTC 标准时间, 就是 2012-2-14 18:08:00 加上 8 小时, 得到 2012-02-15 04:08:00-0
把上述计算得到的 UTC 标准时间存储下来.

当你回调该数据以用于显示时, PostgreSQL 内部是这样运作的:

找到服务器所观察到的时区或者请求的时区 (例如 America/New_York).
计算该时区相对 UTC 标准时间的时差 (对于 America/New_York 时区来说, 与 UTC 的时差是 -5 小时.)
根据 UTC 标准时间和时区的时差计算出当地时间 (如果 UTC 标准时间为 2012-2-15 16:08:00, 则加上时差 -5 小时后得到 2012-2-15 21:08:00).
显示计算结果 (2012-2-15 21:08:00).

可以看到, PostgreSQL 并没有存储时区信息而仅是使用时区信息来把日期和时间转换为 UTC 标准时间再存储下来.

此后, 时区信息就丢失了. 当 PostgreSQL 需要显示该日期时间信息时, 它会按顺序查找当前会话级、用户级、数据库级、服务器级的时区设置, 然后使用找到的第一个时区来将 UTC 标准时间转换为对应时区的时间值并显示之.

夏令时

世界各国对于夏令时的规定五花八门, 如果某个数据库可能会被全球各地的应用访问, 那么就需要及时按照最新的全球夏令时规定来更新库中的时间信息. 手工跟踪全球夏令时的变化是一件无比繁琐的工作, 这需要一个全职的程序员专门来收集各国的夏令时安排, 并在前述数据库中刷新这些国家(包括海外飞地)的相关时间数据.

一个有趣的例子

一位出差中的销售员需要坐飞机回家, 起点是旧金山, 终点是奥克兰附近. 当她登上飞机时, 当地时钟显示的时间为 2012年3月11日凌晨1点50分. 当她降落时, 当地时钟显示时间为2012年3月11日凌晨3点10分. 那么请问这段旅程共花了多长时间?

下面使用带时区信息的时间戳, 计算出来的时间仅为 20 分钟.

postgres@localhost:5436 tutorial# SELECT '2012-03-11 3:10 AM America/Los_Angeles'::timestamptz
tutorial-# -'2012-03-11 1:50 AM America/Los_Angeles'::timestamptz;
 ?column?
----------
 00:20:00
(1 行记录)

主要是这段飞行过程中发生了夏令时的转换, 也就是时间向前跃迁了. 比如 1时50分变为 2时50分.

下面的查询返回结果是 1小时20分钟.

postgres@localhost:5436 tutorial# SELECT '2012-03-11 3:10 AM'::timestamp-'2012-03-11 1:50 AM'::timestamp;
 ?column?
----------
 01:20:00
(1 行记录)

例. 输入时使用的是一个时区的本地时间, 输出却是另一个时区的本地时间.

postgres@localhost:5436 tutorial# SELECT '2012-02-28 10:00 PM America/Los_Angeles'::timestamptz;
      timestamptz
------------------------
 2012-02-29 14:00:00+08
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '2025-11-05 10:00 PM Asia/Shanghai'::timestamptz;
      timestamptz
------------------------
 2025-11-05 22:00:00+08
(1 行记录)

假设当前时间是 2025年11月6日 15:40, 那么输入下面的会发生错误.

postgres@localhost:5436 tutorial# SELECT '2025-11-06 15:40 PM Asia/Shanghai'::timestamptz;
ERROR:  date/time field value out of range: "2025-11-06 15:40 PM Asia/Shanghai"
第1行SELECT '2025-11-06 15:40 PM Asia/Shanghai'::timestamptz;
            ^
时间：11.347 ms

原因是2025年11月6日 15:40是我们这边的本地时间.

例. 将带时区信息的时间戳数据转换为不带时区的时间戳数据. 也就是将 timestamptz(timestamp with time zone)类型的数据转换为 timestamp (timestamp without time zone)数据.

假设洛杉矶本地时间是 2012年2月28日晚上10:00, 问此时巴黎当地时间是几号几点?

postgres@localhost:5436 tutorial# SELECT '2012-02-28 10:00 PM America/Los_Angeles'::timestamptz AT TIME ZONE 'Europe/Paris';
      timezone
---------------------
 2012-02-29 07:00:00
(1 行记录)

巴黎比洛杉矶早9个小时.

假设现在扬州本地时间是 2025年11月06日下午3点50分, 请问此时巴黎当地时间是几号几点, 此时洛杉矶当地时间是多少?

postgres@localhost:5436 tutorial# SELECT '2025-11-06 3:50 PM Asia/Shanghai'::timestamptz AT TIME ZONE 'Europe/Paris';
      timezone
---------------------
 2025-11-06 08:50:00
(1 行记录)

巴黎比我们晚7个小时.

postgres@localhost:5436 tutorial# SELECT '2025-11-06 3:50 PM Asia/Shanghai'::timestamptz AT TIME ZONE 'America/Los_Angeles';
      timezone
---------------------
 2025-11-05 23:50:00
(1 行记录)

洛杉矶比我们晚16个小时.

日期时间类型的运算符和函数

时间间隔类型(interval)的引入极大简化了 PostgreSQL 中日期和时间类型的数学运算过程.

如果没有 interval 类型, 我们就得创建一堆专门的函数来实现这些运算功能. 通过 interval 类型, 我们可以使用熟悉的加减运算符对日期和时间进行相加或相减的操作.

下面的例子展示了可用于日期和时间类型的运算符和函数.

运算符可以在一个时间类型值上加上一段时间间隔.

将时间值 2025年11月06日下午3点50分加上一个小时.

postgres@localhost:5436 tutorial# SELECT '2025-11-06 3:50 PM'::timestamp + interval '1 hour';
      ?column?
---------------------
 2025-11-06 16:50:00
(1 行记录)

时间间隔可以是 1.5 个小时这种值.

postgres@localhost:5436 tutorial# SELECT '2025-11-06 3:50 PM'::timestamp + interval '1.5 hour';
      ?column?
---------------------
 2025-11-06 17:20:00
(1 行记录)

可以将两个时间间隔类型的值相加. 表示时间间隔时, 可以在字符串前加上 interval, 或者字符串后面跟::interval.

postgres@localhost:5436 tutorial# SELECT '23 hours 20 minutes'::interval + interval '1.5 hour';
 ?column?
----------
 24:50:00
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '23 hours 20 minutes'::interval + '1.5 hour'::interval;
 ?column?
----------
 24:50:00
(1 行记录)

可以将时间类型值减去一段时间间隔. 例, 将时间值 2025年11月06日下午3点50分减去两个小时.

postgres@localhost:5436 tutorial# SELECT '2025-11-06 3:50 PM'::timestamp - '2 hours'::interval;
      ?column?
---------------------
 2025-11-06 13:50:00
(1 行记录)

区间重叠运算符OVERLAPS

区间重叠运算符 OVERLAPS 用于判断两个时间区间是否相重叠, 用集合的语言即两个时间区间是否相交非空. 如果是则返回 true; 否则返回 false.

postgres@localhost:5436 tutorial# SELECT ('2025-11-06 10:00 AM'::timestamp, '2025-11-06 2:00 PM'::timestamp)
tutorial-# OVERLAPS ('2025-11-06 11:00 AM'::timestamp, '2025-11-06 3:00 PM'::timestamp);
 overlaps
----------
 t
(1 行记录)

postgres@localhost:5436 tutorial# SELECT ('2025-10-06'::date, '2025-11-06'::date) OVERLAPS ('2025-11-12'::date, '2025-11-21'::date);
 overlaps
----------
 f
(1 行记录)

一些时间类型的函数

除了运算符以外, PostgreSQL 还提供了一些时间类型的函数. 参见 https://www.postgresql.org/docs/current/functions-datetime.html

generate_series() 函数也可以用来生成时间序列组成的数组.

postgres@localhost:5436 tutorial# SELECT generate_series('2025/01/02', '2025/11/02', interval '1 month') As dt;
           dt
------------------------
 2025-01-02 00:00:00+08
 2025-02-02 00:00:00+08
 2025-03-02 00:00:00+08
 2025-04-02 00:00:00+08
 2025-05-02 00:00:00+08
 2025-06-02 00:00:00+08
 2025-07-02 00:00:00+08
 2025-08-02 00:00:00+08
 2025-09-02 00:00:00+08
 2025-10-02 00:00:00+08
 2025-11-02 00:00:00+08
(11 行记录)

将上面结果集中的每个时间值减去一天.

postgres@localhost:5436 tutorial# SELECT (dt - interval '1 day')::date As eom
tutorial-# FROM generate_series('2025/01/02', '2025/11/02', interval '1 month') As dt;
    eom
------------
 2025-01-01
 2025-02-01
 2025-03-01
 2025-04-01
 2025-05-01
 2025-06-01
 2025-07-01
 2025-08-01
 2025-09-01
 2025-10-01
 2025-11-01
(11 行记录)

从日期时间类型中提取部分元素

另一种经常使用的操作是从日期和时间类型的数值中抽取某一部分. 在 PostgreSQL 中, 联用 date_part 和 to_char 函数可以实现此目标.

SELECT dt, date_part('hour', dt) As hour, to_char(dt, 'HH12:MI AM') As tm
FROM
generate_series('2025-11-05 12:30 AM', '2025-11-06 3:00 AM', interval '15 minutes') As dt;

postgres@localhost:5436 tutorial# SELECT dt, date_part('hour', dt) As hour, to_char(dt, 'HH12:MI AM') As tm
tutorial-# FROM
tutorial-# generate_series('2025-11-05 12:30 AM', '2025-11-06 3:00 AM', interval '15 minutes') As dt;
           dt           | hour |    tm
------------------------+------+----------
 2025-11-05 00:30:00+08 |    0 | 12:30 AM
 2025-11-05 00:45:00+08 |    0 | 12:45 AM
 2025-11-05 01:00:00+08 |    1 | 01:00 AM
 2025-11-05 01:15:00+08 |    1 | 01:15 AM
 2025-11-05 01:30:00+08 |    1 | 01:30 AM
 2025-11-05 01:45:00+08 |    1 | 01:45 AM
 2025-11-05 02:00:00+08 |    2 | 02:00 AM
 2025-11-05 02:15:00+08 |    2 | 02:15 AM
 2025-11-05 02:30:00+08 |    2 | 02:30 AM
 2025-11-05 02:45:00+08 |    2 | 02:45 AM
 2025-11-05 03:00:00+08 |    3 | 03:00 AM
 2025-11-05 03:15:00+08 |    3 | 03:15 AM
 2025-11-05 03:30:00+08 |    3 | 03:30 AM
 2025-11-05 03:45:00+08 |    3 | 03:45 AM
 2025-11-05 04:00:00+08 |    4 | 04:00 AM
 2025-11-05 04:15:00+08 |    4 | 04:15 AM
 2025-11-05 04:30:00+08 |    4 | 04:30 AM
 2025-11-05 04:45:00+08 |    4 | 04:45 AM
 2025-11-05 05:00:00+08 |    5 | 05:00 AM
 2025-11-05 05:15:00+08 |    5 | 05:15 AM
 2025-11-05 05:30:00+08 |    5 | 05:30 AM
 2025-11-05 05:45:00+08 |    5 | 05:45 AM
 2025-11-05 06:00:00+08 |    6 | 06:00 AM
 2025-11-05 06:15:00+08 |    6 | 06:15 AM
 2025-11-05 06:30:00+08 |    6 | 06:30 AM
 2025-11-05 06:45:00+08 |    6 | 06:45 AM
 2025-11-05 07:00:00+08 |    7 | 07:00 AM
 2025-11-05 07:15:00+08 |    7 | 07:15 AM
 2025-11-05 07:30:00+08 |    7 | 07:30 AM
 2025-11-05 07:45:00+08 |    7 | 07:45 AM
 2025-11-05 08:00:00+08 |    8 | 08:00 AM
 2025-11-05 08:15:00+08 |    8 | 08:15 AM
 2025-11-05 08:30:00+08 |    8 | 08:30 AM
 2025-11-05 08:45:00+08 |    8 | 08:45 AM
 2025-11-05 09:00:00+08 |    9 | 09:00 AM
 2025-11-05 09:15:00+08 |    9 | 09:15 AM
 2025-11-05 09:30:00+08 |    9 | 09:30 AM
 2025-11-05 09:45:00+08 |    9 | 09:45 AM
 2025-11-05 10:00:00+08 |   10 | 10:00 AM
 2025-11-05 10:15:00+08 |   10 | 10:15 AM
 2025-11-05 10:30:00+08 |   10 | 10:30 AM
 2025-11-05 10:45:00+08 |   10 | 10:45 AM
 2025-11-05 11:00:00+08 |   11 | 11:00 AM
 2025-11-05 11:15:00+08 |   11 | 11:15 AM
 2025-11-05 11:30:00+08 |   11 | 11:30 AM
 2025-11-05 11:45:00+08 |   11 | 11:45 AM
 2025-11-05 12:00:00+08 |   12 | 12:00 PM
 2025-11-05 12:15:00+08 |   12 | 12:15 PM
 2025-11-05 12:30:00+08 |   12 | 12:30 PM
 2025-11-05 12:45:00+08 |   12 | 12:45 PM
 2025-11-05 13:00:00+08 |   13 | 01:00 PM
 2025-11-05 13:15:00+08 |   13 | 01:15 PM
 2025-11-05 13:30:00+08 |   13 | 01:30 PM
 2025-11-05 13:45:00+08 |   13 | 01:45 PM
 2025-11-05 14:00:00+08 |   14 | 02:00 PM
 2025-11-05 14:15:00+08 |   14 | 02:15 PM
 2025-11-05 14:30:00+08 |   14 | 02:30 PM
 2025-11-05 14:45:00+08 |   14 | 02:45 PM
 2025-11-05 15:00:00+08 |   15 | 03:00 PM
 2025-11-05 15:15:00+08 |   15 | 03:15 PM
 2025-11-05 15:30:00+08 |   15 | 03:30 PM
 2025-11-05 15:45:00+08 |   15 | 03:45 PM
 2025-11-05 16:00:00+08 |   16 | 04:00 PM
 2025-11-05 16:15:00+08 |   16 | 04:15 PM
 2025-11-05 16:30:00+08 |   16 | 04:30 PM
 2025-11-05 16:45:00+08 |   16 | 04:45 PM
 2025-11-05 17:00:00+08 |   17 | 05:00 PM
 2025-11-05 17:15:00+08 |   17 | 05:15 PM
 2025-11-05 17:30:00+08 |   17 | 05:30 PM
 2025-11-05 17:45:00+08 |   17 | 05:45 PM
 2025-11-05 18:00:00+08 |   18 | 06:00 PM
 2025-11-05 18:15:00+08 |   18 | 06:15 PM
 2025-11-05 18:30:00+08 |   18 | 06:30 PM
 2025-11-05 18:45:00+08 |   18 | 06:45 PM
 2025-11-05 19:00:00+08 |   19 | 07:00 PM
 2025-11-05 19:15:00+08 |   19 | 07:15 PM
 2025-11-05 19:30:00+08 |   19 | 07:30 PM
 2025-11-05 19:45:00+08 |   19 | 07:45 PM
 2025-11-05 20:00:00+08 |   20 | 08:00 PM
 2025-11-05 20:15:00+08 |   20 | 08:15 PM
 2025-11-05 20:30:00+08 |   20 | 08:30 PM
 2025-11-05 20:45:00+08 |   20 | 08:45 PM
 2025-11-05 21:00:00+08 |   21 | 09:00 PM
 2025-11-05 21:15:00+08 |   21 | 09:15 PM
 2025-11-05 21:30:00+08 |   21 | 09:30 PM
 2025-11-05 21:45:00+08 |   21 | 09:45 PM
 2025-11-05 22:00:00+08 |   22 | 10:00 PM
 2025-11-05 22:15:00+08 |   22 | 10:15 PM
 2025-11-05 22:30:00+08 |   22 | 10:30 PM
 2025-11-05 22:45:00+08 |   22 | 10:45 PM
 2025-11-05 23:00:00+08 |   23 | 11:00 PM
 2025-11-05 23:15:00+08 |   23 | 11:15 PM
 2025-11-05 23:30:00+08 |   23 | 11:30 PM
 2025-11-05 23:45:00+08 |   23 | 11:45 PM
 2025-11-06 00:00:00+08 |    0 | 12:00 AM
 2025-11-06 00:15:00+08 |    0 | 12:15 AM
 2025-11-06 00:30:00+08 |    0 | 12:30 AM
 2025-11-06 00:45:00+08 |    0 | 12:45 AM
 2025-11-06 01:00:00+08 |    1 | 01:00 AM
 2025-11-06 01:15:00+08 |    1 | 01:15 AM
 2025-11-06 01:30:00+08 |    1 | 01:30 AM
 2025-11-06 01:45:00+08 |    1 | 01:45 AM
 2025-11-06 02:00:00+08 |    2 | 02:00 AM
 2025-11-06 02:15:00+08 |    2 | 02:15 AM
 2025-11-06 02:30:00+08 |    2 | 02:30 AM
 2025-11-06 02:45:00+08 |    2 | 02:45 AM
 2025-11-06 03:00:00+08 |    3 | 03:00 AM
(107 行记录)

字符串类型

PostgreSQL 中的字符串类型有:

CHAR(n) 或 CHARACTER(n);
VARCHAR(n) 或 CHARACTER VARYING(n);
TEXT.

这里 n 是正整数.

CHAR(n)

CHAR(n) 或 CHARACTER(n) 是定长字符串类型. 声明为此类型的字段, 如果插入长度超过 n 的字符串, 则该字符串会被截断. (相当于使用了 C++ 中的 substr(0,n) 函数.)

VARCHAR(n)

声明为 VARCHAR(n) 或 CHARACTER VARYING(n) 的字段可以最多存储 n 个字符.

TEXT

TEXT 类型可以存储任何长度的字符串.

各种类型间的差异

varchar 和 text 适用于存储长度可变化的文本, 这两种类型都是按需分配空间存储, 它们的存储方式是完全一致的, 性能表现也没有差别.

char 类型占用的存储空间是固定的, 适用于如邮政编码、电话号码以及社会保险号等定长字符串的存储.

如果存储的字符串长度达不到 char(n) 中定义的长度 n, 则会在后面用空格填充, 不管存储时还是显示时都是这样. 这种模式对于存储空间有所浪费, 但这也是 ANSI SQL 标准中规定的做法. 除此以外, 在 PostgreSQL 中, char 和 varchar 没有别的性能差别.

没有大小修饰的 varchar 与 text 之间几乎没有什么差别. 对于 text 列来说, 不管它所包含的字符有多少, 你都可以对其进行排序. 有些数据库驱动程序（比如 ODBC）可能会对二者的处理略有差别.

varchar 与 text 的存储空间上限均为约 1 GB, 但事实上系统在后台会把超过一个物理存储页大小的内容用 TOAST 机制处理.

PostgreSQL 的 TOAST（The Oversized-Attribute Storage Technique, 超大属性存储技术）是针对大尺寸数据（如长文本、二进制数据等）的存储优化机制, 当字段数据超过阈值时, 会自动将其压缩或拆分后存储到独立的 TOAST 表中, 主表仅保留引用指针, 既解决了单条记录存储容量受限问题, 又通过透明操作、多种存储策略和独立表设计, 平衡了存储效率与访问性能, 对 TEXT、BYTEA、JSONB 等可能存储大数据的类型尤为有效.

有时候, 为了保持跨平台应用的兼容性, 你需要使字符串类型的操作变得不区分大小写. 要实现此目标, 你需要重写那些区分大小写的比较运算符. 相比 text, 对varchar进行运算符重写要容易一些.

字符串函数

常见的字符串操作包括:

填充(lpad、rpad);
修整空白(rtrim、ltrim、trim、btrim);
提取子字符串(substring);
连接(||)

使用 lpad 和 rpad 进行填充操作.

postgres@localhost:5436 tutorial# SELECT lpad('ab',4,'0') As ab_lpad,
tutorial-# rpad('ab',4,'0') As ab_rpad,
tutorial-# lpad('abcde',4,'0') As ab_lpad_trunc;
 ab_lpad | ab_rpad | ab_lpad_trunc
---------+---------+---------------
 00ab    | ab00    | abcd
(1 行记录)

从上面看到, 如果字符串超过指定长度, lpad 不但不会填充, 反而会对其进行截断.

默认情况下, trim 函数用于移除空格, 但你也可以传入一个可选实参, 指示要剪裁的其他字符.

下面的代码, 先使用 generate_series(0,200,50) 生成一个序列, 记作 i. 这个序列为 (0,50,100,150,200)^T.

SELECT
a As a_before, trim(a) As a_trim, rtrim(a) As a_rtrim,
i As i_before, ltrim(i,'0') As i_ltrim_0, rtrim(i,'0') As i_rtrim_0, trim(i,'0') As i_trim_0
FROM (
       SELECT repeat(' ',4) || i || repeat(' ',4) As a,
              '0' || i As i
       FROM generate_series(0,200,50) As i
      ) As x;

postgres@localhost:5436 tutorial# SELECT
tutorial-# a As a_before, trim(a) As a_trim, rtrim(a) As a_rtrim,
tutorial-# i As i_before, ltrim(i,'0') As i_ltrim_0, rtrim(i,'0') As i_rtrim_0, trim(i,'0') As i_trim_0
tutorial-# FROM (
tutorial(#       SELECT repeat(' ',4) || i || repeat(' ',4) As a,
tutorial(#              '0' || i As i
tutorial(#       FROM generate_series(0,200,50) As i
tutorial(#      ) As x;
  a_before   | a_trim | a_rtrim | i_before | i_ltrim_0 | i_rtrim_0 | i_trim_0
-------------+--------+---------+----------+-----------+-----------+----------
     0       | 0      |     0   | 00       |           |           |
     50      | 50     |     50  | 050      | 50        | 05        | 5
     100     | 100    |     100 | 0100     | 100       | 01        | 1
     150     | 150    |     150 | 0150     | 150       | 015       | 15
     200     | 200    |     200 | 0200     | 200       | 02        | 2
(5 行记录)

postgres@localhost:5436 tutorial# SELECT repeat(' ',4) || i || repeat(' ',4) As a,
tutorial-#               '0' || i As i
tutorial-#        FROM generate_series(0,200,50) As i
tutorial-# ;
      a      |  i
-------------+------
     0       | 00
     50      | 050
     100     | 0100
     150     | 0150
     200     | 0200
(5 行记录)

如果将其中的空格换成其他符号, 比如 x 和 *. 则变为

postgres@localhost:5436 tutorial# SELECT repeat('x',4) || i || repeat('*',4) As a,
tutorial-#               '0' || i As i
tutorial-#        FROM generate_series(0,200,50) As i;
      a      |  i
-------------+------
 xxxx0****   | 00
 xxxx50****  | 050
 xxxx100**** | 0100
 xxxx150**** | 0150
 xxxx200**** | 0200
(5 行记录)

实验

执行下面的查询.

SELECT
a As a_before, trim(a,'x') As a_trim, rtrim(a,'x') As a_rtrim,
i As i_before, ltrim(i,'0') As i_ltrim_0, rtrim(i,'0') As i_rtrim_0, trim(i,'0') As i_trim_0
FROM (
       SELECT repeat('x',4) || i || repeat('x',4) As a,
              '0' || i As i
       FROM generate_series(0,200,50) As i
      ) As x;

postgres@localhost:5436 tutorial# SELECT
tutorial-# a As a_before, trim(a,'x') As a_trim, rtrim(a,'x') As a_rtrim,
tutorial-# i As i_before, ltrim(i,'0') As i_ltrim_0, rtrim(i,'0') As i_rtrim_0, trim(i,'0') As i_trim_0
tutorial-# FROM (
tutorial(#        SELECT repeat('x',4) || i || repeat('x',4) As a,
tutorial(#               '0' || i As i
tutorial(#        FROM generate_series(0,200,50) As i
tutorial(#       ) As x;
  a_before   | a_trim | a_rtrim | i_before | i_ltrim_0 | i_rtrim_0 | i_trim_0
-------------+--------+---------+----------+-----------+-----------+----------
 xxxx0xxxx   | 0      | xxxx0   | 00       |           |           |
 xxxx50xxxx  | 50     | xxxx50  | 050      | 50        | 05        | 5
 xxxx100xxxx | 100    | xxxx100 | 0100     | 100       | 01        | 1
 xxxx150xxxx | 150    | xxxx150 | 0150     | 150       | 015       | 15
 xxxx200xxxx | 200    | xxxx200 | 0200     | 200       | 02        | 2
(5 行记录)

将字符串拆分为数组、表或者子字符串

PostgreSQL 中有一些函数可以对字符串进行拆分操作, 比如 split_part()、string_to_array() 等.

split_part

split_part()

postgres@localhost:5436 tutorial# SELECT split_part('www.atzjg.net','.',2) As x;
   x
-------
 atzjg
(1 行记录)

分隔符也可以是一个字符串.

postgres@localhost:5436 tutorial# SELECT split_part('www.atzjg.net','zjg',2) As y;
  y
------
 .net
(1 行记录)

string_to_array

string_to_array() 函数可以将基于固定分隔符的字符串拆分为一个数组.

unnest() 函数用于将数组展开为一些行向量. 当你需要在查询中处理数组的单个元素时, 这个函数尤其有用.

这里我们将 string_to_array() 函数和 unnest() 函数结合使用, 将一个字符串在分隔符下展开为若干记录行.

postgres@localhost:5436 tutorial# SELECT unnest(string_to_array('www.atzjg.net','.')) As z;
   z
-------
 www
 atzjg
 net
(3 行记录)

正则表达式

正则表达式和模式匹配

PostgreSQL 中的 regexp_replace() 函数是一个强大的工具, 用于根据正则表达式模式替换字符串中的子字符串. 它支持高级文本操作, 非常适合用于数据清理和转换任务, 是用于动态文本转换的重要工具.

其语法为

regexp_replace(string, pattern, replacement[,flags])

或者

regexp_replace(string, pattern, replacement, start[, N[, flags]])

source: 源字符串.
pattern: 待匹配的正则表达式, 或称模式字符串或模式.
replacement: 替换匹配模式字符串的字符串.
flag: 这是可选参数, 控制替换时的行为, 例如 'g' 代表整体替换, 'i' 代表大小写不敏感.

如果在源字符串 string 中未找到与 pattern 匹配的子字符串, 则返回源字符串.
如果存在匹配子字符串, 则源字符串在被 replacement 字符串替换后被返回.

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('apple pie', 'apple', 'banana');
 regexp_replace
----------------
 banana pie
(1 行记录)

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('apple pie', 'Apple', 'banana');
 regexp_replace
----------------
 apple pie
(1 行记录)

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('apple pie', 'Apple', 'banana', 'i');
 regexp_replace
----------------
 banana pie
(1 行记录)

使用回溯引用(Using Backreferences)

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('123-456-7890', '(\d{3})-(\d{3})-(\d{4})', E'(\\1) \\2-\\3');
 regexp_replace
----------------
 (123) 456-7890
(1 行记录)

圆括号(小括号) () 用于分组.
\是转义符, 由于圆括号()已经用于分组, 如果要显示圆括号, 则需要转义, 即使用 $ 和 $.
\d 表示数字, \d{3} 表示匹配三个数字.
E'' 是 PostgreSQL 的构造符语法, 将后续所跟字符串视作一个表达式, 根据需要进行扩展.
\\1 表示引用第一个匹配到的字符串.

这里回溯引用如果不加 E, 则会得到如下结果.

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('123-456-7890', '(\d{3})-(\d{3})-(\d{4})', '(\\1) \\2-\\3');
 regexp_replace
----------------
 (\1) \2-\3
(1 行记录)

可以看到, 这里的 \\\\1 被解释为 \1, 且没有被扩展.

如果 \\\\1 改成 \1 等, 则会得到如下结果.

postgres@localhost:5436 tutorial# SELECT REGEXP_REPLACE('123-456-7890', '(\d{3})-(\d{3})-(\d{4})', E'(\1) \2-\3');
  regexp_replace
------------------
 (\x01) \x02-\x03
(1 行记录)

这里将 \1 扩展为 \x01.

注意事项

正则表达式在大数据集上的计算开销较大, 请谨慎使用.
在 SQL 字符串中对特殊字符使用双反斜杠进行转移, 例如这里的回溯引用 \1 在 SQL 中要写为 \\1.

例. 下面的SQL语句返回的结果是什么?


SELECT regexp_replace('6197306254','([0-9]{3})([0-9]{3})([0-9]{4})',E'\(\\1\) \\2-\\3') As x;

postgres@localhost:5436 tutorial# SELECT regexp_replace(
tutorial(#   '6197306254',
tutorial(#   '([0-9]{3})([0-9]{3})([0-9]{4})',
tutorial(#   E'\(\\1\) \\2-\\3'
tutorial(# ) As x;
       x
----------------
 (619) 730-6254
(1 行记录)

regexp_matches() 函数会返回根据一个正则表达式筛选匹配得到的字符串数组. 如果不传入 g 形参, 则仅返回第一个命中的字符串. g 表示 global, 即需要进行完整搜索并返回所有匹配上的字符串, 每个字符串作为数组中的一个元素.

unnest() 函数将一个数组分解为一个行集(即列向量).

SELECT unnest(regexp_matches( 'Cell (619)852-5083. Casa 619-730-6254. Bésame mucho.',
                              E'[(]{0,1}[0-9]{3}[)-.]{0,1}[0-9]{3}[-.]{0,1}[0-9]{4}',
							  'g')
	   ) As x;

[]: 匹配方括号中所列的单个字符, 后面如果有花括号, 则表示可以出现的次数.

[(]{0,1} 表示匹配单个字符(, 其出现的次数是 0 次或 1 次.
[0-9]{3} 表示接着跟三个数字, 范围是0到9之间.
[)-.]{0,1} 表示紧接着是 ) 或 - 或 ., 它们出现的次数是 0 次或 1 次.
[0-9]{3} 表示再跟三个0到9之间的数字.
[-.]{0,1} 表示后跟- 或 ., 它们出现的次数是 0 次或 1 次.
[0-9]{4} 表示最后跟四个0到9之间的数字.

注意事项

同一个正则表达式可以有多种写法. 比如 \d 代表 [0-9]. 建议采用更容易理解的写法.

可以将正则表达式与相似运算符 ~ 一起使用.

以下查询可以查出所有内嵌了电话号码的字符串.

SELECT description
FROM mytable
WHERE description ~ E'[(]{0,1}[0-9]{3}[)-.]{0,1}[0-9]{3}[-.]{0,1}[0-9]{4}';

二进制类型

PostgreSQL 支持两类字符型数据: 文本字符串和二进制字符串.

对于二进制字符串的存储, PostgreSQL 提供了 BYTEA 类型. BYTEA 类型存储空间为4字节加上实际的二进制字符串.

postgres@localhost:5436 tutorial# CREATE TABLE t1(b BYTEA);
CREATE TABLE
时间：16.390 ms

postgres@localhost:5436 tutorial# INSERT INTO t1 VALUES (E'\000');
ERROR:  invalid byte sequence for encoding "UTF8": 0x00
时间：11.965 ms
postgres@localhost:5436 tutorial# INSERT INTO t1 VALUES (E'\\000');
INSERT 0 1
时间：10.865 ms

postgres@localhost:5436 tutorial# SELECT * FROM t1;
  b
------
 \x00
(1 行记录)

布尔类型

BOOLEAN 类型是 PostgreSQL 所提供的布尔类型. 其存储只需一个字节, 即存储 TRUE 和 FALSE 两个值.

用户也可以使用文本值替代 TRUE 和 FALSE.

替代 TRUE 的文本值可以是: 't', 'true', 'y', 'yes', '1';
替代 FALSE 的文本值可以是: 'f', 'false', 'n', 'no', '0'.

实验

创建表 tmp11(b BOOLEAN). 插入一些布尔值.

数组类型

数组在 PostgreSQL 中扮演着重要的角色. 它在构造聚合函数、形成 IN 和 ANY 子句、承载数据类型转换过程中生成的中间值等领域发挥着重要作用.

在 PostgreSQL 中, 每种数据类型都有相应的以该类型为基础的数组类型. 如果你自定义了一个数据类型, 那么 PostgreSQL 会在后台自动为此类型创建一个数组类型. 例如,

integer 整数类型有一个相应的整数数组类型 integer[];
字符类型character也有相应的字符数组类型 character[];

PostgreSQL 支持多维数组. 例如,

primes INT[];
A INT[3][3];
remark TEXT[][];

第一行定义了一个名为 primes 的整数数组;
第二行定义了一个名为 A 的二维数组, 并且给出了每一维中元素的个数.
第三行定义了一个名为 remark 的二维数组, 并未声明每一维的长度.

目前 PostgreSQL 并不强制数组的长度.

如果是一维数组, 也可以使用 SQL 标准声明, SQL 语句如下:

PAY_BY_QUARTER INT ARRAY[5];

此种声明方式仍然不强制数组的长度.

插入数组数值

插入数组元素时, 用大括号把数组元素括起来并且用逗号将它们分开.

postgres@localhost:5436 tutorial# CREATE TABLE tmp12(bt int[]);
CREATE TABLE
时间：101.550 ms
postgres@localhost:5436 tutorial# INSERT INTO tmp12 VALUES(2,3,5,7,11);
ERROR:  INSERT has more expressions than target columns
第1行INSERT INTO tmp12 VALUES(2,3,5,7,11);
                                ^
时间：72.312 ms
postgres@localhost:5436 tutorial# SELECT * FROM tmp12;
 bt
----
(0 行记录)

postgres@localhost:5436 tutorial# INSERT INTO tmp12 VALUES({2,3,5,7,11});
ERROR:  syntax error at or near "{"
第1行INSERT INTO tmp12 VALUES({2,3,5,7,11});
                              ^
时间：19.539 ms

注意使用正确的语法.

postgres@localhost:5436 tutorial# INSERT INTO tmp12 VALUES('{2,3,5,7,11}');
INSERT 0 1
时间：40.035 ms

postgres@localhost:5436 tutorial# INSERT INTO tmp12 VALUES('{1,2,3},{3,5,7},{11,23,0}');
ERROR:  malformed array literal: "{1,2,3},{3,5,7},{11,23,0}"
第1行INSERT INTO tmp12 VALUES('{1,2,3},{3,5,7},{11,23,0}');
                              ^
描述:  Junk after closing right brace.
时间：25.851 ms

postgres@localhost:5436 tutorial# INSERT INTO tmp12 VALUES('{{1,2,3},{3,5,7},{11,23,0}}');
INSERT 0 1
时间：38.187 ms
postgres@localhost:5436 tutorial# SELECT * FROM tmp12;
             bt
-----------------------------
 {2,3,5,7,11}
 {{1,2,3},{3,5,7},{11,23,0}}
(2 行记录)


postgres@localhost:5436 tutorial# SELECT bt[2] FROM tmp12;
  bt
------
    3
 NULL
(2 行记录)

postgres@localhost:5436 tutorial# SELECT bt[2][3] FROM tmp12;
  bt
------
 NULL
    7
(2 行记录)

bt[2] 表示bt这一列作为数组, 取其第2个元素. 因此第一行 {2,3,5,7,11} 作为一维数组, bt[2] 的值是 3. 注意 PostgreSQL 对于数组标号是从1开始的.

但第二行 {{1,2,3},{3,5,7},{11,23,0}}是数组的数组, 可以认为是二维数组, 因此 bt[2] 返回 NULL. 要使用 bt[][] 的形式才能返回相应位置的元素值.

如果要返回第二行中的元素 {3,5,7}, 则可以使用下面的语句:

select bt[2:2] from tmp12;

或者

select bt[2:2][1:3] from tmp12;

也就是, 不同维数的数组类型应使用不同的数组索引形式去获取其元素或子集.

数组构造函数

最基本的构造数组的方法就是一个个元素手动录入, 语法如下:

postgres@localhost:5436 tutorial# SELECT ARRAY[2001,2002,2003] As yrs;
       yrs
------------------
 {2001,2002,2003}
(1 行记录)

你可以把一个直接以字符串书写的数组转换为一个真正的数组, 语法如下:

postgres@localhost:5436 tutorial# SELECT '{Alex,Sonia}'::text[] As name, '{43,40}'::smallint[] As age;
     name     |   age
--------------+---------
 {Alex,Sonia} | {43,40}
(1 行记录)

可以用 string_to_array() 函数将一个用固定分隔符分隔的字符串转换为数组.

postgres@localhost:5436 tutorial# SELECT string_to_array('数学分析,高等代数,常微分方程,实变函数',',') As 课程;
                  课程
-----------------------------------------
 {数学分析,高等代数,常微分方程,实变函数}
(1 行记录)

array_agg() 是一种变型聚合函数, 它可以采用一组任何类型的数据并将其转换为数组.

SELECT array_agg(log_ts ORDER BY log_ts) As x
FROM logs
WHERE log_ts BETWEEN '2025-11-01'::timestamptz AND '2025-11-06'::timestamptz;

数组中元素的引用

在 PostgreSQL 中, 可以用数组下标来引用数组元素, 与 C/C++ 不一样的是, PostgreSQL 的数组下标是从 1 开始. 并且, 如果越界访问一个数组并不会返回错误, 而是会得到一个空值(NULL).

postgres@localhost:5436 postgresql_book# SELECT fact_subcats[1] As primero,
postgresql_book-# fact_subcats[array_upper(fact_subcats, 1)] As segundo
postgresql_book-# FROM census.lu_fact_types;
 primero |                     segundo
---------+--------------------------------------------------
 S01     | Number
 S02     | Number
 S03     | Number
 S04     | Number
 S05     | Number
 S06     | Number
 S07     | Number
 S08     | Number
 S09     | Number
 S10     | Number
 S11     | Number
 S12     | Number
 S13     | Number
 S14     | Number
 S15     | Number
 S16     | Number
 S17     | Number
 S18     | Number
 S19     | Number
 S20     | Number
 S21     | Number
 S22     | Number
 S23     | Number
 S24     | Number
 S25     | Number
 S26     | Number
 S27     | Number
 S28     | Number
 S29     | Number
 S30     | Number
 S31     | Number
 S32     | Number
 S33     | Number
 S34     | Number
 S35     | Number
 S36     | Number
 S37     | Number
 S38     | Number
 S39     | Number
 S40     | Number
 S41     | Number
 S42     | Number
 S43     | Number
 S44     | Number
 S45     | Number
 S46     | Number
 S47     | Number
 S48     | Number
 S49     | Number
 S50     | Number
 S51     | Number
 D001    | Total:
 D002    | Not Hispanic or Latino:
 D003    | White alone
 D004    | Black or African American alone
 D005    | American Indian and Alaska Native alone
 D006    | Asian alone
 D007    | Native Hawaiian and Other Pacific Islander alone
 D008    | Some Other Race alone
 D009    | Two or More Races
 D010    | Hispanic or Latino:
 D011    | White alone
 D012    | Black or African American alone
 D013    | American Indian and Alaska Native alone
 D014    | Asian alone
 D015    | Native Hawaiian and Other Pacific Islander alone
 D016    | Some Other Race alone
 D017    | Two or More Races
(68 行记录)

这里 array_upper() 函数用来获取数组元素的个数. 该函数的第二个必需的形参代表数组的维度. 这里 array_upper(fact_subcats, 1) 表示获取一维数组 fact_subcats 的元素个数.

postgres@localhost:5436 postgresql_book# SELECT array_upper(fact_subcats, 1) FROM census.lu_fact_types;
 array_upper
-------------
           5
           6
           6
           5
           6
           7
           7
           6
           5
           6
           6
           6
           6
           6
           6
           6
           5
           6
           7
           8
           8
           8
           8
           8
           8
           8
           7
           8
           8
           8
           8
           8
           8
           8
           6
           7
           8
           8
           8
           8
           8
           8
           8
           7
           8
           8
           8
           8
           8
           8
           8
           2
           3
           4
           4
           4
           4
           4
           4
           4
           3
           4
           4
           4
           4
           4
           4
           4
(68 行记录)

数组的拆分与连接

PostgreSQL 支持使用 start:end 语法对数组进行拆分. 操作结果是原数组的一个子数组.

下面列出表 census.lu_fact_types 中 fact_subcats 属性中所存数组的第2至第4个元素. 由于总共有 68 行, 故为了篇幅, 这里只列出前10行记录.

postgres@localhost:5436 postgresql_book# SELECT fact_subcats[2:4] FROM census.lu_fact_types limit 10;
                            fact_subcats
---------------------------------------------------------------------
 {"OCCUPANCY STATUS","Total housing units",HD01}
 {"OCCUPANCY STATUS","Total housing units","Occupied housing units"}
 {"OCCUPANCY STATUS","Total housing units","Vacant housing units"}
 {TENURE,"Occupied housing units",HD01}
 {TENURE,"Occupied housing units","Owner occupied"}
 {TENURE,"Occupied housing units","Owner occupied"}
 {TENURE,"Occupied housing units","Owner occupied"}
 {TENURE,"Occupied housing units","Renter occupied"}
 {"VACANCY STATUS","Vacant housing units",HD01}
 {"VACANCY STATUS","Vacant housing units","For rent"}
(10 行记录)

注意 limit 后面只能跟非负整数.

如果要将两个数组连接到一起, 则可以使用连接运算符 ||.

postgres@localhost:5436 postgresql_book# SELECT fact_subcats[1:2] || fact_subcats[3:4] FROM census.lu_fact_types limit 10;
                                ?column?
-------------------------------------------------------------------------
 {S01,"OCCUPANCY STATUS","Total housing units",HD01}
 {S02,"OCCUPANCY STATUS","Total housing units","Occupied housing units"}
 {S03,"OCCUPANCY STATUS","Total housing units","Vacant housing units"}
 {S04,TENURE,"Occupied housing units",HD01}
 {S05,TENURE,"Occupied housing units","Owner occupied"}
 {S06,TENURE,"Occupied housing units","Owner occupied"}
 {S07,TENURE,"Occupied housing units","Owner occupied"}
 {S08,TENURE,"Occupied housing units","Renter occupied"}
 {S09,"VACANCY STATUS","Vacant housing units",HD01}
 {S10,"VACANCY STATUS","Vacant housing units","For rent"}
(10 行记录)

postgres@localhost:5436 postgresql_book# SELECT fact_subcats[3:4] || fact_subcats[1:2] FROM census.lu_fact_types limit 10;
                                ?column?
-------------------------------------------------------------------------
 {"Total housing units",HD01,S01,"OCCUPANCY STATUS"}
 {"Total housing units","Occupied housing units",S02,"OCCUPANCY STATUS"}
 {"Total housing units","Vacant housing units",S03,"OCCUPANCY STATUS"}
 {"Occupied housing units",HD01,S04,TENURE}
 {"Occupied housing units","Owner occupied",S05,TENURE}
 {"Occupied housing units","Owner occupied",S06,TENURE}
 {"Occupied housing units","Owner occupied",S07,TENURE}
 {"Occupied housing units","Renter occupied",S08,TENURE}
 {"Vacant housing units",HD01,S09,"VACANCY STATUS"}
 {"Vacant housing units","For rent",S10,"VACANCY STATUS"}
(10 行记录)

将数组元素展开为记录行

unnest() 函数前面已经讲过. 通过它可以将数组元素纵向展开成一个包含若干条记录的结果值. (也即将数组变为列向量.)

postgres@localhost:5436 postgresql_book# SELECT unnest('{ABC,ACB,BAC,BCA,CAB,CBA}'::char(3)[]) As ABC;
 abc
-----
 ABC
 ACB
 BAC
 BCA
 CAB
 CBA
(6 行记录)

你可以在一个 SELECT 语句中使用多个 unnest() 函数, 但如果每个 unnest() 函数展开后的记录数不一致, 即得到列向量的维数不一致, 则最终结果是这些结果集之间的笛卡尔积.

postgres@localhost:5436 postgresql_book# SELECT unnest('{南京,苏州,扬州}'::text[]) As city,
postgresql_book-# unnest('{025, 0512, 0514}'::smallint[]) As 区号;
 city | 区号
------+------
 南京 |   25
 苏州 |  512
 扬州 |  514
(3 行记录)

当然这里区号应该使用字符串类型比较好. 这里两个向量是对齐的, 从而组成一个二维矩阵.

尝试将其中一个数组中的元素删除.

postgres@localhost:5436 postgresql_book# SELECT unnest('{南京,苏州,扬州}'::text[]) As city,
postgresql_book-# unnest('{025, 0512}'::varchar[]) As 区号;
 city | 区号
------+------
 南京 | 025
 苏州 | 0512
 扬州 | 025
 南京 | 0512
 苏州 | 025
 扬州 | 0512
(6 行记录)

PostgreSQL 9.4 版引入了一个多实参 unnest 函数. 该函数会在数组不平衡的位置置入空占位符(NULL).

postgres@localhost:5436 postgresql_book# SELECT * FROM unnest('{南京,苏州,扬州}'::text[], '{025,0512}'::varchar[]) As f(city, 区号);
 city | 区号
------+------
 南京 | 025
 苏州 | 0512
 扬州 | NULL
(3 行记录)

serial 类型

serial 类型和它的兄弟类型 bigserial 是两种可以自动生成递增整数值的数据类型, 一般如果表本身的字段不适合作为主键字段时, 会增加一个专门的字段并指定为 serial 类型以作为主键.

建表时如果指定了一个字段类型为 serial, 那么 PostgreSQL 会首先将其作为整型处理, 同时自动在该表所在 schema 中创建一个名为 table_name_column_name_seq 的序列. 然后设定该序列为该整型字段的取值来源. 如果修改了表定义并删除此 serial 字段, 那么系统同时也会自动删除掉附属的序列.

在 PostgreSQL 中, 序列自身是一种数据库对象. 可以通过 pgAdmin 图形界面或者 ALTER SEQUENCE 语句来管理该对象.

生成数组序列的函数

PostgreSQL 有一个名为 generate_series 数组生成函数. 它可以有效模仿 SQL 中的 for 循环.

postgres@localhost:5436 tutorial# SELECT x FROM generate_series(1,51,13) As x;
 x
----
  1
 14
 27
 40
(4 行记录)

这里第三个参数是步长, 可以不指定, 在不指定的情况下默认是1. 另外, 生成的序列不会超出指定的区间.

有点类似于 Sowya 中的 printSeries()函数.

>> printSeries(13*n+1,n,0,5)
1,14,27,40,53,66,

------------------------

区间类型

PostgreSQL 9.2 引入了区间数据类型(range types). 该数据类型可以定义一个值区间. 由于该类型的出现, 原本需要两个字段才能定义的区间现在仅使用一个字段即可.

PostgreSQL 为区间类型提供了很多配套的运算符和函数. 例如判定区间是否重叠, 判定某个值是否落在区间内, 以及将相邻的若干区间合并为一个完整的区间等.

离散区间和连续区间

PostgreSQL 对离散区间和连续区间是区别对待的. 整数类型或者日期类型的区间是离散区间, 因为区间内每一个值都是可以被枚举出来的.

对于离散区间, 很多写法认为是一样的. 比如 [-2,2) 等同于 [-2,1] 或 (-3,2), (-3,1]. 在这四种写法中, PostgreSQL 规定 [-2,2) 为规范写法. 采用这种写法仅是为了运算的统一, 即规定离散区间总是左闭右开的, 简化编程的实现. PostgreSQL 会自动对所有离散区间进行规范化, 不管是存储还是显示时都会这么做.

原生支持的区间类型

PostgreSQL 原生支持六种区间类型, 都是关于数字和日期时间型.

int4range, int8range: 这是整数型离散区间, 其定义符合左闭右开的规范化要求.
numrange: 这是连续区间, 可以用于描述小数、浮点数、或者双精度数字的区间.
daterange: 这是不带时区信息的日期离散区间.
tsrange, tstzrange: 这是时间戳(timestamp)(日期加时间)类型的连续区间, 秒值部分支持小数. 其中 tsrange 不带时区信息, 而 tstzrange 带时区信息.

对于数字类型的区间来说, 如果区间的起点值(左端点值)或终点值(右端点值)未指定, 那么 PostgreSQL 会自动填入 null 值.

理论上讲, 你可以将该 null 理解为代表左侧的 -infinity(负无穷)或右侧的 infinity(正无穷). 但实际上会受限于特定数据类型的最小值和最大值. 比如对于 int4range 数据类型来说, 区间 [,) 实际上代表的是 [-2147483648,-2147483647).

定义区间的方法

使用类型转换的方法来定义区间.

postgres@localhost:5436 tutorial# SELECT '[2025/11/07, 2025/11/08]'::daterange;
        daterange
-------------------------
 [2025-11-07,2025-11-09)
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '[2025-11-07, 2025-11-09]'::daterange;
        daterange
-------------------------
 [2025-11-07,2025-11-10)
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '[2025#11#07, 2025#11#09]'::daterange;
        daterange
-------------------------
 [2025-11-07,2025-11-10)
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '(2025#11@07, 2025$11*09]'::daterange;
        daterange
-------------------------
 [2025-11-08,2025-11-10)
(1 行记录)

可以测试一下年月日间可以使用哪些分隔符.

postgres@localhost:5436 tutorial# SELECT '(,)'::int8range;
 int8range
-----------
 (,)
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '[0,)'::int8range;
 int8range
-----------
 [0,)
(1 行记录)

查询当前的时间戳.

postgres@localhost:5436 tutorial# SELECT CURRENT_TIMESTAMP;
            now
----------------------------
 2025-11-07 10:43:39.794+08
(1 行记录)

postgres@localhost:5436 tutorial# SELECT '(2025-11-07 10:43:39, 2025-12-09 12:43:59]'::tsrange;
                    tsrange
-----------------------------------------------
 ("2025-11-07 10:43:39","2025-12-09 12:43:59"]
(1 行记录)

这里定义了一个从 2025-11-07 10:43:39 到 2025-12-09 12:43:59 的左开右闭的连续区间.

区间也可以使用范围的构造函数(constructor of range)来定义. 该函数的名称与区间类型的名称一致. 可以输入两个或三个实参. 示例如下:

postgres@localhost:5436 tutorial# SELECT daterange('2025-11-07','2025-12-19','[]');
        daterange
-------------------------
 [2025-11-07,2025-12-20)
(1 行记录)

第三个参数默认是 '[)'. 为清晰起见, 建议总是显示指定该实参.

定义含区间类型字段的表

时间类型区间是很常用的. 假如你有一张雇佣表 employment, 表中存储了公司聘请雇员的历史记录. 其中有一个字段用来存储员工在公司服务的年限, 此时只需将该字段设定为 daterange 类型. 这避免了用两个字段来表示服务年限的数据.

employment 表的模式为 employment(id, employee, period). 其中 id 指定为 serial 类型并设为主键(PRIMARY KEY), employee 指定为varchar(20) 类型, period 用于存储服务年限, 采用 daterange 类型.

postgres@localhost:5436 tutorial# CREATE TABLE employment(
tutorial(# id serial PRIMARY KEY,
tutorial(# employee VARCHAR(20),
tutorial(# period daterange);
CREATE TABLE
时间：87.593 ms

postgres@localhost:5436 tutorial# \d employment
                                   数据表 "public.employment"
   栏位   |         类型          | 校对规则 |  可空的  |                  预设
----------+-----------------------+----------+----------+----------------------------------------
 id       | integer               |          | not null | nextval('employment_id_seq'::regclass)
 employee | character varying(20) |          |          |
 period   | daterange             |          |          |
索引：
    "employment_pkey" PRIMARY KEY, btree (id)

在 period 列添加一个索引以便加速对区间列period的查询.

postgres@localhost:5436 tutorial# CREATE INDEX idx_employment_period
tutorial-# ON employment USING gist(period);
CREATE INDEX
时间：16.857 ms

其中 idx_employment_period 是索引的名称.
ON 后面跟表的名称, 表示在哪个表上建索引.
USING gist(period) 表示在 period 列上建立一个 GiST 索引.
GiST 是Generalized Search Tree(通用搜索树) 的缩写.

GiST（通用搜索树）索引是一种灵活的索引机制, 专为几何数据、文本搜索和范围查询等复杂数据类型设计. 它们通过支持自定义操作符和数据类型, 实现高效查询.

查看一下表 employment 的结构是否有变化.

postgres@localhost:5436 tutorial# \d employment
                                   数据表 "public.employment"
   栏位   |         类型          | 校对规则 |  可空的  |                  预设
----------+-----------------------+----------+----------+----------------------------------------
 id       | integer               |          | not null | nextval('employment_id_seq'::regclass)
 employee | character varying(20) |          |          |
 period   | daterange             |          |          |
索引：
    "employment_pkey" PRIMARY KEY, btree (id)
    "idx_employment_period" gist (period)

对 employment 表插入一些值.

postgres@localhost:5436 tutorial# INSERT INTO employment (employee, period)
tutorial-# VALUES ('Alex', '[2012-04-24, infinity)'::daterange),
tutorial-#        ('Sonia', '[2011-04-24, 2012-06-01)'::daterange),
tutorial-#        ('Leo', '[2012-06-20, 2013-04-20)'::daterange),
tutorial-#        ('Regina', '[2012-06-20, 2013-04-20)'::daterange);
INSERT 0 4
时间：42.800 ms

postgres@localhost:5436 tutorial# SELECT * FROM employment;
 id | employee |         period
----+----------+-------------------------
  1 | Alex     | [2012-04-24,infinity)
  2 | Sonia    | [2011-04-24,2012-06-01)
  3 | Leo      | [2012-06-20,2013-04-20)
  4 | Regina   | [2012-06-20,2013-04-20)
(4 行记录)

适用于区间类型的运算符

区间类型上用得最多的两个运算符是重叠运算符(&&)和包含运算符(@>).

重叠运算符

顾名思义, 重叠运算符 && 用于判定两个区间是否重叠. 有重叠部分则返回 true; 否则返回 false.

查询谁与谁曾经同在公司工作过.

SELECT e1.employee, string_agg(DISTINCT e2.employee, ',' ORDER BY e2.employee) As colleagues
FROM employment As e1 INNER JOIN employment As e2
ON e1.period && e2.period
WHERE e1.employee <> e2.employee
GROUP BY e1.employee;

string_agg 函数是一个聚合函数, 它将字符串列表连接起来, 并在它们之间放置分隔符.

postgres@localhost:5436 tutorial# SELECT e1.employee, string_agg(DISTINCT e2.employee, ',' ORDER BY e2.employee) As colleagues
tutorial-# FROM employment As e1 INNER JOIN employment As e2
tutorial-# ON e1.period && e2.period
tutorial-# WHERE e1.employee <> e2.employee
tutorial-# GROUP BY e1.employee;
 employee |    colleagues
----------+------------------
 Alex     | Leo,Regina,Sonia
 Leo      | Alex,Regina
 Regina   | Alex,Leo
 Sonia    | Alex
(4 行记录)

包含与被包含关系运算符

对于包含关系运算符 @> 来说, 第一个实参是区间, 第二个实参是待判定的值. 如果第二个实参的值落在第一个实参的区间内的话, 运算返回 true, 否则返回 false.

查询当前还在公司工作的雇员名单.

postgres@localhost:5436 tutorial# SELECT employee FROM employment
tutorial-# WHERE period @> CURRENT_DATE
tutorial-# GROUP BY employee;
 employee
----------
 Alex
(1 行记录)

@> 即数学上的 $\ni$; <@ 即数学上的 $\in$.

JSON 数据类型

JSON 数据类型及其相关操作函数是从 9.2 版开始支持的. JSON 是 Web 开发领域非常流行的一种数据类型, 它是 JavaScript 语言中的通用数据交换格式.

9.3 版针对 JSON 类型新增了一些功能函数, 用以执行读取、编辑以及转换为其他数据类型的操作.
9.4 版引入了 jsonb 数据类型, 该类型是 JSON 类型的二进制版.

jsonb 与 json 的最主要差别是 JSONB 可以支持索引而 JSON 不能.

插入 JSON 数据

首先在 tutorial 数据库中新建一张表 families_j, 其模式为 families_j(id, profile). 其中 id 为 serial 类型且设为主键, profile 设定为 json 类型. (profile 是个人资料的意思.)

postgres@localhost:5436 tutorial# CREATE TABLE families_j(
tutorial(# id serial PRIMARY KEY,
tutorial(# profile json);
CREATE TABLE
时间：148.359 ms

postgres@localhost:5436 tutorial# \d families_j
                            数据表 "public.families_j"
  栏位   |  类型   | 校对规则 |  可空的  |                  预设
---------+---------+----------+----------+----------------------------------------
 id      | integer |          | not null | nextval('families_j_id_seq'::regclass)
 profile | json    |          |          |
索引：
    "families_j_pkey" PRIMARY KEY, btree (id)

插入一条 JSON 数据记录.

INSERT INTO families_j (profile) VALUES(
'{"name":"Gomez", "members":[
{"member":{"relation":"padre", "name":"Alex"}},
{"member":{"relation":"madre", "name":"Sonia"}},
{"member":{"relation":"hijo", "name":"Brandon"}},
{"member":{"relation":"hija", "name":"Azaleah"}}
]}');

注: 这里 padre、madre、hijo、hija 是西班牙语单词, 分别是 “父亲”、“母亲”、“儿子”、“女儿”的意思.

postgres@localhost:5436 tutorial# INSERT INTO families_j (profile) VALUES(
tutorial(# '{"name":"Gomez", "members":[
tutorial'# {"member":{"relation":"padre", "name":"Alex"}},
tutorial'# {"member":{"relation":"madre", "name":"Sonia"}},
tutorial'# {"member":{"relation":"hijo", "name":"Brandon"}},
tutorial'# {"member":{"relation":"hija", "name":"Azaleah"}}
tutorial'# ]}');
INSERT 0 1
时间：46.298 ms

postgres@localhost:5436 tutorial# SELECT * FROM families_j;
 id |                      profile
----+---------------------------------------------------
  1 | {"name":"Gomez", "members":[                     +
    | {"member":{"relation":"padre", "name":"Alex"}},  +
    | {"member":{"relation":"madre", "name":"Sonia"}}, +
    | {"member":{"relation":"hijo", "name":"Brandon"}},+
    | {"member":{"relation":"hija", "name":"Azaleah"}} +
    | ]}
(1 行记录)

查询 JSON 数据

使用 json_extrac_path()、json_array_elements() 以及 json_extrac_path_text() 这三个函数来读取表中所有家庭成员的信息.

例

从 families_j 表中提取 profile 列中关于属性 name 的值, 以文本格式输出.

postgres@localhost:5436 tutorial# SELECT json_extract_path_text(profile, 'name') As family FROM families_j;
 family
--------
 Gomez
(1 行记录)

json_extract_path_text(字段名, 属性名) 函数也可以用运算符代替, 即字段名->>属性名.

postgres@localhost:5436 tutorial# SELECT profile->>'name' As family FROM families_j;
 family
--------
 Gomez
(1 行记录)

例

使用 SELECT json_extract_path(profile, 'members') FROM families_j; 提取家庭成员的信息.

postgres@localhost:5436 tutorial# SELECT json_extract_path(profile, 'members') FROM families_j;
                 json_extract_path
---------------------------------------------------
 [                                                +
 {"member":{"relation":"padre", "name":"Alex"}},  +
 {"member":{"relation":"madre", "name":"Sonia"}}, +
 {"member":{"relation":"hijo", "name":"Brandon"}},+
 {"member":{"relation":"hija", "name":"Azaleah"}} +
 ]
(1 行记录)

组织成更好的格式输出

SELECT json_extract_path_text(profile, 'name') As family, -- 提取出家庭的名称, 以文本格式输出
       json_extract_path_text(                            -- 提取出家庭成员的名称, 以文本格式输出
	       json_array_elements(                           -- 将家庭成员信息数组中的每个元素展开为独立的 JSON 对象.
		       json_extract_path(profile, 'members')      -- 获取所有家庭成员的信息列表, 作为一个独立的 JSON 对象输出
		   ), 'member', 'name'
	   ) As member
FROM families_j;

postgres@localhost:5436 tutorial# SELECT json_extract_path_text(profile, 'name') As family, -- 提取出家庭的名称, 以文本格式输出
tutorial-#        json_extract_path_text(                            -- 提取出家庭成员的名称, 以文本格式输出
tutorial(#             json_array_elements(                           -- 将家庭成员信息数组中的每个元素展开为独立的 JSON 对象.
tutorial(#                     json_extract_path(profile, 'members')      -- 获取所有家庭成员的信息列表, 作为一个独立的 JSON 对象输出
tutorial(#                 ), 'member', 'name'
tutorial(#         ) As member
tutorial-# FROM families_j;
 family | member
--------+---------
 Gomez  | Alex
 Gomez  | Sonia
 Gomez  | Brandon
 Gomez  | Azaleah
(4 行记录)

运算符 ->> 和 #>> 是 json_extract_path_text 的简写. #>> 取用某个路径数组.

SELECT profile->>'name' As family, json_array_elements((profile->'members'))
#>> '{member,name}'::text[] As member
FROM families_j;

这里 json_array_elements((profile->'members')) #>> '{member,name}'::text[] 就等价于 json_extract_path_text(json_array_elements((profile->'members')), 'member','name')

SELECT profile->>'name' As family, json_extract_path_text(json_array_elements((profile->'members')), 'member','name') As member
FROM families_j;

postgres@localhost:5436 tutorial# SELECT profile->>'name' As family, json_array_elements((profile->'members'))
tutorial-# #>> '{member,name}'::text[] As member
tutorial-# FROM families_j;
 family | member
--------+---------
 Gomez  | Alex
 Gomez  | Sonia
 Gomez  | Brandon
 Gomez  | Azaleah
(4 行记录)

postgres@localhost:5436 tutorial# SELECT profile->>'name' As family, json_extract_path_text(json_array_elements((profile->'members')), 'member','name') As member
tutorial-# FROM families_j;
 family | member
--------+---------
 Gomez  | Alex
 Gomez  | Sonia
 Gomez  | Brandon
 Gomez  | Azaleah
(4 行记录)

json_extract_path

json_extract_path 是 json_extract_path_text 的兄弟函数, 它对应的运算符是 -> 和 #>. 该函数输出的结果是当前 JSON 对象的子对象.

如果要把一个复合 JSON 对象剥离出来并传递给别的函数作进一步处理, 就需要使用 json_extract_path 函数.

例

SELECT id, json_array_length(profile->'members') As numero, profile->'members'->0#>>'{member,name}'::text[] As primero
FROM families_j;

postgres@localhost:5436 tutorial# SELECT id, json_array_length(profile->'members') As numero, profile->'members'->0#>>'{member,name}'::text[] As primero
tutorial-# FROM families_j;
 id | numero | primero
----+--------+---------
  1 |      4 | Alex
(1 行记录)

-> 运算符的返回结果是一个 json或 jsonb 对象.
-> 运算符的第二个参数(即右边的参数)要么是一个text字段(json_object_field的简写), 要么是一个整数 (json_array_element 的简写).
profile->'members' 返回 json 对象的 members 字段, 该字段本身是一个包含多条记录的 JSON 数组.
->0 即 json_array_element(0), 该操作提取了上面 JSON 对象数组的首个元素.
#>>'{member,name}'::text[] 即 json_extract_path_text(,'member','name'), 得到的结果是首个家庭成员JSON对象中按照"member/name" 路径寻址到的节点的文本格式的值.

通过这个例子可以看出,

->、#>、->>、#>> 这些运算符可以级联使用.
jsonb 类型也有相同的运算符.
jsonb 类型的对应函数分别是:
1. jsonb_object_field()
2. jsonb_array_element()

PostgreSQL 中更多关于 json/jsonb 类型的函数和运算符可参考 https://www.postgresql.org/docs/current/functions-json.html

输出 JSON 数据

PostgreSQL 除了可以查询库中已有的 JSON 数据外, 还支持将别的数据类型转换为 JSON 类型.

系统内置的 JSON 转换函数

row_to_json()

例.

将多条记录转换为单个 JSON 对象. (注意版本要求 >=9.3)

postgres@localhost:5436 tutorial# SELECT  row_to_json(f) As x
tutorial-# FROM (SELECT id, profile->>'name' As name FROM families_j) As f;
            x
-------------------------
 {"id":1,"name":"Gomez"}
(1 行记录)

回顾一下 families_j 中的内容

postgres@localhost:5436 tutorial# select * from families_j;
 id |                      profile
----+---------------------------------------------------
  1 | {"name":"Gomez", "members":[                     +
    | {"member":{"relation":"padre", "name":"Alex"}},  +
    | {"member":{"relation":"madre", "name":"Sonia"}}, +
    | {"member":{"relation":"hijo", "name":"Brandon"}},+
    | {"member":{"relation":"hija", "name":"Azaleah"}} +
    | ]}
(1 行记录)

如果要将 families_j 表中的所有记录行整体打包转换为一个 JSON 对象, 可以使用以下语法(版本要求>=9.2):

SELECT row_to_json(f) FROM families_j As f;

postgres@localhost:5436 tutorial# SELECT row_to_json(f) FROM families_j As f;
                    row_to_json
---------------------------------------------------
 {"id":1,"profile":{"name":"Gomez", "members":[   +
 {"member":{"relation":"padre", "name":"Alex"}},  +
 {"member":{"relation":"madre", "name":"Sonia"}}, +
 {"member":{"relation":"hijo", "name":"Brandon"}},+
 {"member":{"relation":"hija", "name":"Azaleah"}} +
 ]}}
(1 行记录)

“查询时将一行记录作为单个字段输出”这种功能只有 PostgreSQL 才支持. 该功能对于创建复合 JSON 对象特别有用.

JSON 类型的二进制版本: jsonb

PostgreSQL 9.4 版本中引入了新的 jsonb 数据类型.

从运算符的角度看, jsonb 有若干 json 类型不支持的运算符, 此外二者的运算符完全相同.
从处理函数的角度看, 二者适用的处理函数一一对应, 仅在命名上略有差别, 一个以“json”开头, 一个以“jsonb”开头.

jsonb 数据类型与 json 数据类型的关键区别是

json 是以原始文本格式存储, 而 jsonb 存储的是原始文本解析以后生成的二进制数据结构, 该二进制结构中不再保存原始文本中的空格, 存储下来的数字的形式也发生一定的变化, 并且对其内部记录属性值进行了排序.
jsonb 不允许其内部记录的键值重复. 如果出现重复则会从中自动选择一条, 其余的重复记录会被丢弃. 但 json 类型中的键值重复是允许的.
jsonb 的性能要远好于 json. 因为 jsonb 类型在处理过程中不需要再进行文本解析.
jsonb 类型由于是解析过的二进制结构, 因此 jsonb 类型的字段上可以直接建立 GIN 索引, 但 json 类型字段上却只能建立函数索引, 因为只有通过函数才能从 JSON 的字符串中提取出具体字段值.

为此, 我们新建一张名为 families_b 的表, 其结构与前面的 families_j 表类似.

CREATE TABLE families_b(id serial PRIMARY KEY, profile jsonb);

postgres@localhost:5436 tutorial# CREATE TABLE families_b(id serial PRIMARY KEY, profile jsonb);
CREATE TABLE
时间：103.746 ms

重复前面的过程, 插入一条 JSON 数据记录.

INSERT INTO families_b (profile) VALUES(
'{"name":"Gomez", "members":[
{"member":{"relation":"padre", "name":"Alex"}},
{"member":{"relation":"madre", "name":"Sonia"}},
{"member":{"relation":"hijo", "name":"Brandon"}},
{"member":{"relation":"hija", "name":"Azaleah"}}
]}');

postgres@localhost:5436 tutorial# INSERT INTO families_b (profile) VALUES(
tutorial(# '{"name":"Gomez", "members":[
tutorial'# {"member":{"relation":"padre", "name":"Alex"}},
tutorial'# {"member":{"relation":"madre", "name":"Sonia"}},
tutorial'# {"member":{"relation":"hijo", "name":"Brandon"}},
tutorial'# {"member":{"relation":"hija", "name":"Azaleah"}}
tutorial'# ]}');
INSERT 0 1
时间：16.052 ms

postgres@localhost:5436 tutorial# select * from families_b;
 id |                                                                                                                     profile                                                                                                               
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"name": "Gomez", "members": [{"member": {"name": "Alex", "relation": "padre"}}, {"member": {"name": "Sonia", "relation": "madre"}}, {"member": {"name": "Brandon", "relation": "hijo"}}, {"member": {"name": "Azaleah", "relation": "hija"}}]}
(1 行记录)

可以看出, jsonb 类型的输出是对输入的内容进行了重新格式化并删掉了输入时文本中的空格, 此外, relation 和 name 两个字段也进行了排序.

而 json 类型的输出保持了输入时的原样, 包括原文本中的空格以及属性字段的顺序.

postgres@localhost:5436 tutorial# select * from families_j;
 id |                      profile
----+---------------------------------------------------
  1 | {"name":"Gomez", "members":[                     +
    | {"member":{"relation":"padre", "name":"Alex"}},  +
    | {"member":{"relation":"madre", "name":"Sonia"}}, +
    | {"member":{"relation":"hijo", "name":"Brandon"}},+
    | {"member":{"relation":"hija", "name":"Azaleah"}} +
    | ]}
(1 行记录)

jsonb 和 json 运算符的区别

jsonb 支持的运算符集合是 json 支持的运算符集合的超集.

jsonb 比 json 多支持的运算符有以下几个:

等值运算符: =
包含关系运算符: @> (相当于数学中的 $\ni$)
被包含关系运算符: <@ (相当于数学中的 $\in$)
键值已存在运算符: ?
一组键值中是否有任意一个已存在运算符: ?| (即只要有一个存在即返回true, 故使用或运算符|.)
一组键值中的每一个是否均已存在运算符: ?& (即每一个都存在才返回true, 故使用与运算符&.)

例

列出所有包含姓名为 "Alex" 的家庭成员的家庭.

SELECT profile->>'name' As family
FROM families_b
WHERE profile @> '{"members":[{"member":{"name":"Alex"} }]}';

postgres@localhost:5436 tutorial# SELECT profile->>'name' As family
tutorial-# FROM families_b
tutorial-# WHERE profile @> '{"members":[{"member":{"name":"Alex"} }]}';
 family
--------
 Gomez
(1 行记录)

在 jsonb 列上创建索引

如果再 jsonb 列上建了 GIN 索引, 那么前述这几个运算符的操作速度是极快的.

CREATE INDEX idx_familes_jb_profile_gin ON families_b USING gin (profile);

未创建索引前, families_b 表的结构:

postgres@localhost:5436 tutorial# \d families_b
                            数据表 "public.families_b"
  栏位   |  类型   | 校对规则 |  可空的  |                  预设
---------+---------+----------+----------+----------------------------------------
 id      | integer |          | not null | nextval('families_b_id_seq'::regclass)
 profile | jsonb   |          |          |
索引：
    "families_b_pkey" PRIMARY KEY, btree (id)

创建索引后, families_b 表的结构:

postgres@localhost:5436 tutorial# CREATE INDEX idx_familes_jb_profile_gin ON families_b USING gin (profile);
CREATE INDEX
时间：39.720 ms
postgres@localhost:5436 tutorial# \d families_b
                            数据表 "public.families_b"
  栏位   |  类型   | 校对规则 |  可空的  |                  预设
---------+---------+----------+----------+----------------------------------------
 id      | integer |          | not null | nextval('families_b_id_seq'::regclass)
 profile | jsonb   |          |          |
索引：
    "families_b_pkey" PRIMARY KEY, btree (id)
    "idx_familes_jb_profile_gin" gin (profile)

XML 数据类型

XML 是 eXtensible Markup Language 的缩写, 即可扩展标记语言的意思. 如果对 XML 不太熟悉, 可以先查看 http://atzjg.net/content/slide/database/lecture_9.html.

XML 和 JSON 这两种数据类型都属于非规范化数据.

所有的高级关系型数据库(比如 IBM DB2、Oracle、SQL Server)都支持 XML 数据类型.

插入 XML 数据

XML 数据可以存入到类型为 xml 或 text 的列中. 下面是这两种数据类型的区别.

在往一个 xml 数据类型的列中插入数据时, PostgreSQL 会自动判定并确保只有格式合法的 XML 文本才会创建成果.
text 类型字段中也可以存入一段 XML 文本, 但存入时不会进行格式合法性判断.

注意, 即使 XML 文本中附带了 DTD(Document Type Definition) 或者 XSD(XML Schema Definition) 的格式描述, PostgreSQL 也不会按照这些格式要求对 XML 文本进行验证.

例

首先创建表 families, 其模式为 families(id, profile), 其中 id 采用 serial 类型并设定为主键, profile 采用 xml 类型.

CREATE TABLE families(id serial PRIMARY KEY, profile xml);

postgres@localhost:5436 tutorial# CREATE TABLE families(id serial PRIMARY KEY, profile xml);
CREATE TABLE
时间：41.888 ms

插入数据.

INSERT INTO families(profile)
VALUES(
    '<family name="Gomez">
	   <member><relation>padre</relation><name>Alex</name></member>
	   <member><relation>madre</relation><name>Sonia</name></member>
	   <member><relation>hijo</relation><name>Brandon</name></member>
	   <member><relation>hija</relation><name>Azaleah</name></member>
	 </family>
	'
);

postgres@localhost:5436 tutorial# CREATE TABLE families(id serial PRIMARY KEY, profile xml);
CREATE TABLE
时间：41.888 ms
postgres@localhost:5436 tutorial# INSERT INTO families(profile)
tutorial-# VALUES(
tutorial(#     '<family name="Gomez">
tutorial'#         <member><relation>padre</relation><name>Alex</name></member>
tutorial'#         <member><relation>madre</relation><name>Sonia</name></member>
tutorial'#         <member><relation>hijo</relation><name>Brandon</name></member>
tutorial'#         <member><relation>hija</relation><name>Azaleah</name></member>
tutorial'#       </family>
tutorial'#      '
tutorial(# );
INSERT 0 1
时间：35.583 ms

postgres@localhost:5436 tutorial# select * from families;
 id |                                  profile
----+---------------------------------------------------------------------------
  1 | <family name="Gomez">                                                    +
    |            <member><relation>padre</relation><name>Alex</name></member>  +
    |            <member><relation>madre</relation><name>Sonia</name></member> +
    |            <member><relation>hijo</relation><name>Brandon</name></member>+
    |            <member><relation>hija</relation><name>Azaleah</name></member>+
    |          </family>                                                       +
    |
(1 行记录)

;

对 XML 字段设置 check 约束

可以对 XML 字段设置一个 check 约束以确保输入的 XML 数据都符合某种格式.

例

确保所有 XML 字段记录中都有至少一个 member 节点和一个 relation 节点. (有时也将节点写为结点.)

ALTER TABLE families ADD CONSTRAINT chk_has_relation
CHECK (xpath_exists('/family/member/relation', profile));

未添加约束前的 families 表的结构.

postgres@localhost:5436 tutorial# \d families
                            数据表 "public.families"
  栏位   |  类型   | 校对规则 |  可空的  |                 预设
---------+---------+----------+----------+--------------------------------------
 id      | integer |          | not null | nextval('families_id_seq'::regclass)
 profile | xml     |          |          |
索引：
    "families_pkey" PRIMARY KEY, btree (id)

执行上面的添加约束(ADD CONSTRAINT)语句.

postgres@localhost:5436 tutorial# ALTER TABLE families ADD CONSTRAINT chk_has_relation
tutorial-# CHECK (xpath_exists('/family/member/relation', profile));
ALTER TABLE
时间：10.940 ms
postgres@localhost:5436 tutorial# \d families
                            数据表 "public.families"
  栏位   |  类型   | 校对规则 |  可空的  |                 预设
---------+---------+----------+----------+--------------------------------------
 id      | integer |          | not null | nextval('families_id_seq'::regclass)
 profile | xml     |          |          |
索引：
    "families_pkey" PRIMARY KEY, btree (id)
检查约束限制
    "chk_has_relation" CHECK (xpath_exists('/family/member/relation'::text, profile))

此时下面的插入语句是否违反约束.

INSERT INTO families (profile) VALUES('<family name="HsuObe"></family>');

postgres@localhost:5436 tutorial# INSERT INTO families (profile) VALUES('<family name="HsuObe"></family>');
ERROR:  new row for relation "families" violates check constraint "chk_has_relation"
描述:  Failing row contains (2, <family name="HsuObe"></family>).
时间：22.147 ms

如果需要基于 DTD 或者 XSD 对 XML 数据进行格式检查, 你需要自行编写格式检查函数, 然后将此函数放到 check 约束中调用.

查询 XML 数据

xpath() 函数在查询 XML 数据时发挥着重要作用. 其第一个参数是一个 XPath 查询表达式, 第二个实参是一个 xml 对象. 查询结果是 XPath 查询语句所要查找的 XML 元素的列表.

例

SELECT family,
    (xpath('/member/relation/text()',f))[1]::text As relation,
	(xpath('/member/name/text()',f))[1]::text As mem_name
FROM
    (SELECT (xpath('/family/@name', profile))[1]::text As family,
	    unnest(xpath('/family/member', profile)) As f 
	FROM families) x;

postgres@localhost:5436 tutorial# SELECT family,
tutorial-#     (xpath('/member/relation/text()',f))[1]::text As relation,
tutorial-#      (xpath('/member/name/text()',f))[1]::text As mem_name
tutorial-# FROM
tutorial-#     (SELECT (xpath('/family/@name', profile))[1]::text As family,
tutorial(#          unnest(xpath('/family/member', profile)) As f
tutorial(#      FROM families) x;
 family | relation | mem_name
--------+----------+----------
 Gomez  | padre    | Alex
 Gomez  | madre    | Sonia
 Gomez  | hijo     | Brandon
 Gomez  | hija     | Azaleah
(4 行记录)

自定义数据类型和复合数据类型

PostgreSQL 中所有表都有一个对应的自定义数据类型. 在建表时, PostgreSQL 会自动创建一个与表结构完全相同的自定义数据类型, 称为表类型. 而且这种类型与其他的数据类型在使用上毫无区别.

表的嵌套

可以在建表时指定某字段为表类型或着表数组类型. 也就是说, 可以把一张表的字段定义为另一张表. 表与表是可以嵌套的.

例

建立关于矩阵的嵌套表. 每个 $n$ 阶方阵都有 $n$ 个特征值.

CREATE TABLE eigenvalues(id integer PRIMARY KEY);

构建自定义数据类型

尽管通过建表就可以创建复合数据类型, 但有时我们需要从头开始构建自己的数据类型.

例

构建复数类型.

CREATE TYPE complex_number AS (
  r double precision,
  i double precision
);

postgres@localhost:5436 tutorial# CREATE TYPE complex_number AS (
tutorial(#   r double precision,
tutorial(#   i double precision
tutorial(# );
CREATE TYPE
时间：29.416 ms

使用 \dT 命令(display Types)列出数据类型列表.

postgres@localhost:5436 tutorial# \dT
           数据类型列表
 架构模式 |      名称      | 描述
----------+----------------+------
 public   | complex_number | NULL
(1 行记录)

将此类型作为字段类型定义使用. 例如下面建立一张电路表(circuits). 字段 ac_volt 表示相位角为 $\phi$ 的交流电压, 可以表示为 $V=V_m\cdot(\cos\phi+j\sin\phi)$, 其中 $V_m$ 是电压的最大值. 因此 ac_volt 使用复数类型.

CREATE TABLE circuits(
  circuit_id serial PRIMARY KEY,
  ac_volt complex_number
);

postgres@localhost:5436 tutorial# CREATE TABLE circuits(
tutorial(#   circuit_id serial PRIMARY KEY,
tutorial(#   ac_volt complex_number
tutorial(# );
CREATE TABLE
时间：19.125 ms

插入数据.

INSERT INTO circuits(ac_volt) VALUES((1,2)::complex_number);

postgres@localhost:5436 tutorial# INSERT INTO circuits(ac_volt) VALUES((1,2)::complex_number);
INSERT 0 1
时间：16.660 ms

对这个表的查询可以使用下面的语句:

SELECT circuit_id, (ac_volt).* FROM circuits;

postgres@localhost:5436 tutorial# SELECT circuit_id, (ac_volt).* FROM circuits;
 circuit_id | r | i
------------+---+---
          1 | 1 | 2
(1 行记录)

或者

SELECT circuit_id, (ac_volt).r, (ac_volt).i FROM circuits;

postgres@localhost:5436 tutorial# SELECT circuit_id, (ac_volt).r, (ac_volt).i FROM circuits;
 circuit_id | r | i
------------+---+---
          1 | 1 | 2
(1 行记录)

上面 ac_volt 必须要加圆括号, 否则 PostgreSQL 会认为 ac_volt.r 表示表 ac_volt 中的 r 字段. 加圆括号就是让 PostgreSQL 不要理解为表的名称.

为自定义数据类型构建运算符和函数

对于构建的自定义数据类型, 可以为其创建相应的函数和运算符.

每个运算符都有一个底层实现函数, 该函数需要一个或两个实参, 运算符就是这个函数的符号化别名.

运算符不仅仅是其底层实现函数的别名, 它还可以提供一些可以帮助规划器更好工作的优化信息. 规划器借助这些信息可以判定如何使用索引, 如何以最低的成本访问数据, 以及哪些运算符表达式是等价的.

例

创建运算符的第一步是创建其底层实现函数. 我们先使用 \df (display functions)列出函数列表.

postgres@localhost:5436 tutorial# \df
                       函数列表
 架构模式 | 名称 | 结果数据类型 | 参数数据类型 | 类型
----------+------+--------------+--------------+------
(0 行记录)

下面为 complex_number 创建底层实现函数.

CREATE OR REPLACE FUNCTION add(complex_number, complex_number)
RETURNS complex_number AS
$$
  SELECT ( 
      (COALESCE(($1).r,0)+COALESCE(($2).r,0)),
      (COALESCE(($1).i,0)+COALESCE(($2).i,0)) 
	  )::complex_number;
$$
language sql;

postgres@localhost:5436 tutorial# CREATE OR REPLACE FUNCTION add(complex_number, complex_number)
tutorial-# RETURNS complex_number AS
tutorial-# $$
tutorial$#   SELECT (
tutorial$#       (COALESCE(($1).r,0)+COALESCE(($2).r,0)),
tutorial$#       (COALESCE(($1).i,0)+COALESCE(($2).i,0))
tutorial$#        )::complex_number;
tutorial$# $$
tutorial-# language sql;
CREATE FUNCTION
时间：27.808 ms

使用 \df 列出函数列表.

postgres@localhost:5436 tutorial# \df
                                 函数列表
 架构模式 | 名称 |  结果数据类型  |          参数数据类型          | 类型
----------+------+----------------+--------------------------------+------
 public   | add  | complex_number | complex_number, complex_number | 函数
(1 行记录)

例

为 complex_number 类型定义加号(+)运算符. 在此之前, 先在 tutorial 数据库中查询有没有已经定义的其他运算符.

postgres@localhost:5436 tutorial# \do
                         运算子列表
 架构模式 | 名称 | 左参数类型 | 右参数类型 | 结果类型 | 描述
----------+------+------------+------------+----------+------
(0 行记录)

输入下面的 CREATE OPERATOR 语句创建 + 运算符.

CREATE OPERATOR +(
  PROCEDURE = add,
  LEFTARG = complex_number,
  RIGHTARG = complex_number,
  COMMUTATOR = +
);

postgres@localhost:5436 tutorial# CREATE OPERATOR +(
tutorial(#   PROCEDURE = add,
tutorial(#   LEFTARG = complex_number,
tutorial(#   RIGHTARG = complex_number,
tutorial(#   COMMUTATOR = +
tutorial(# );
CREATE OPERATOR
时间：36.946 ms

然后再使用 \do 命令查看一下.

postgres@localhost:5436 tutorial# \do
                                运算子列表
 架构模式 | 名称 |   左参数类型   |   右参数类型   |    结果类型    | 描述
----------+------+----------------+----------------+----------------+------
 public   | +    | complex_number | complex_number | complex_number | NULL
(1 行记录)

使用 \do+ 会列出更多信息.

postgres@localhost:5436 tutorial# \do+
                                    运算子列表
 架构模式 | 名称 |   左参数类型   |   右参数类型   |    结果类型    | 函数 | 描述
----------+------+----------------+----------------+----------------+------+------
 public   | +    | complex_number | complex_number | complex_number | add  | NULL
(1 行记录)

测试一下 + 运算符. 求复数 $1+2i$ 与 $3-10i$ 的和.

SELECT (1,2)::complex_number + (3,-10)::complex_number;

postgres@localhost:5436 tutorial# SELECT (1,2)::complex_number + (3,-10)::complex_number As result;
 result
--------
 (4,-8)
(1 行记录)

运算符的重载

上面的 + 运算符只支持两个 complex_number 类型的数相加, 如果要求 $1+2i$ 与 $3$ 的和, 则可以使用运算符的重载, 支持complex_number 类型和整数类型(或 double 类型)的数相加.

首先我们要重载 add 函数, 即增加下面的定义.

CREATE OR REPLACE FUNCTION add(complex_number, integer)
RETURNS complex_number AS
$$
  SELECT ( 
      (COALESCE(($1).r,0)+COALESCE(($2),0)),
       COALESCE(($1).i,0) 
	  )::complex_number;
$$
language sql;

postgres@localhost:5436 tutorial# CREATE OR REPLACE FUNCTION add(complex_number, integer)
tutorial-# RETURNS complex_number AS
tutorial-# $$
tutorial$#   SELECT (
tutorial$#       (COALESCE(($1).r,0)+COALESCE(($2),0)),
tutorial$#        COALESCE(($1).i,0)
tutorial$#        )::complex_number;
tutorial$# $$
tutorial-# language sql;
CREATE FUNCTION
时间：23.598 ms

postgres@localhost:5436 tutorial# \df
                                 函数列表
 架构模式 | 名称 |  结果数据类型  |          参数数据类型          | 类型
----------+------+----------------+--------------------------------+------
 public   | add  | complex_number | complex_number, complex_number | 函数
 public   | add  | complex_number | complex_number, integer        | 函数
(2 行记录)

然后重载 + 号运算符. 如果未重载, 则会提示错误, 如下.

postgres@localhost:5436 tutorial# SELECT (1,2)::complex_number +3;
ERROR:  operator does not exist: complex_number + integer
第1行SELECT (1,2)::complex_number +3;
                                  ^
提示:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
时间：26.860 ms
postgres@localhost:5436 tutorial# SELECT (1,2)::complex_number +3::integer;
ERROR:  operator does not exist: complex_number + integer
第1行SELECT (1,2)::complex_number +3::integer;
                                  ^
提示:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
时间：7.600 ms

输入下面的 CREATE OPERATOR 语句, 注意右参数设置为 integer 类型.

CREATE OPERATOR +(
  PROCEDURE = add,
  LEFTARG = complex_number,
  RIGHTARG = integer,
  COMMUTATOR = +
);

postgres@localhost:5436 tutorial# CREATE OPERATOR +(
tutorial(#   PROCEDURE = add,
tutorial(#   LEFTARG = complex_number,
tutorial(#   RIGHTARG = integer,
tutorial(#   COMMUTATOR = +
tutorial(# );
CREATE OPERATOR
时间：39.932 ms
postgres@localhost:5436 tutorial# \do
                                运算子列表
 架构模式 | 名称 |   左参数类型   |   右参数类型   |    结果类型    | 描述
----------+------+----------------+----------------+----------------+------
 public   | +    | complex_number | complex_number | complex_number | NULL
 public   | +    | complex_number | integer        | complex_number | NULL
 public   | +    | integer        | complex_number | -              | NULL
(3 行记录)

可见 COMMUTATOR 意味着内部定义了两个运算符.

重新执行 $1+2i$ 加上 $3$ 的运算.

postgres@localhost:5436 tutorial# SELECT (1,2)::complex_number +3;
 ?column?
----------
 (4,2)
(1 行记录)

首页

Haifeng Xu

(hfxu@yzu.edu.cn)

目录

数值类型

数值类型

整数类型

浮点数类型

任意精度类型

日期/时间类型

日期/时间类型

时区详解

时区

世界统一时间

时差

PostgreSQL 对时区的处理

夏令时

一个有趣的例子

日期时间类型的运算符和函数

区间重叠运算符OVERLAPS

一些时间类型的函数

从日期时间类型中提取部分元素

字符串类型

字符串类型

CHAR(n)

VARCHAR(n)

TEXT

各种类型间的差异

字符串函数

实验

将字符串拆分为数组、表或者子字符串

split_part

string_to_array

正则表达式

正则表达式

正则表达式和模式匹配

使用回溯引用(Using Backreferences)

注意事项

注意事项

二进制类型

二进制类型

布尔类型

布尔类型

实验

数组类型

数组类型

插入数组数值

数组构造函数

数组中元素的引用

数组的拆分与连接

将数组元素展开为记录行

serial 类型

serial 类型

生成数组序列的函数

生成数组序列的函数

区间类型

区间类型

离散区间和连续区间

原生支持的区间类型

定义区间的方法

定义含区间类型字段的表

适用于区间类型的运算符

重叠运算符

包含与被包含关系运算符

JSON 数据类型

JSON 数据类型

插入 JSON 数据

查询 JSON 数据

例

例

组织成更好的格式输出

json_extract_path

例

输出 JSON 数据

系统内置的 JSON 转换函数

row_to_json()

例.

JSON 类型的二进制版本: jsonb

jsonb 和 json 运算符的区别

例