Skip to content

Cherry-pick gpcontrib/diskqouta and other recent main changes.#1616

Merged
reshke merged 346 commits intoapache:REL_2_STABLEfrom
reshke:cp_2_main_2
Mar 13, 2026
Merged

Cherry-pick gpcontrib/diskqouta and other recent main changes.#1616
reshke merged 346 commits intoapache:REL_2_STABLEfrom
reshke:cp_2_main_2

Conversation

@reshke
Copy link
Contributor

@reshke reshke commented Mar 12, 2026

cherry-pick from ffe370b to 41c1750

higuoxing and others added 30 commits March 12, 2026 18:03
This PR is trying to fix CI building failure by ignoring distribution
notice in CTAS statements.

Co-authored-by: Hao Zhang <hzhang2@vmware.com>
…full (apache#112)

* Fix bug
The relation_cache_entry of temporary table, created during vacuum full, is not be removed after vacuum full. This table will be treated as an uncommitted table, although it has been dropped after vacuum full. And its table size still remains in diskquota.table_size, which causes quota size to be larger than real status.
Use RelidByRelfilenode() to check whether the table is committed, and remove its relation_cache_entry.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
The SQL statement is not equal to the expected output in test_vacuum.sql.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
…e index on ao_table` is committed (apache#113)

We can not calculate the size of pg_aoblkdir_xxxx before `create index on ao_table` is committed.
1. Lack of the ability to parse the name of pg_aoblkdir_xxxx.
2. pg_aoblkdir_xxxx is created by `create index on ao_table`, which can not be searched by diskquota_get_appendonly_aux_oid_list() before index's creation.
Solution:
1. parse the name begin with `pg_aoblkdir`.
2. When blkdirrelid is missing, we try to fetch it by traversing relation_cache.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
Co-authored-by: Xuebin Su (苏学斌) <12034000+xuebinsu@users.noreply.github.com>
Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: Hao Zhang <hzhang2@vmware.com>
Consider a user session that does a DELETE followed by a VACUUM FULL to
reclaim the disk space. If, at the same time, the bgworker loads config
by doing a SELECT, and the SELECT begins before the DELETE ends, while ends
after the VACUUM FULL begins:

bgw: ---------[ SELECT ]----------->
usr: ---[ DELETE ]-[ VACUUM FULL ]-->

then the tuples deleted will be marked as RECENTLY_DEAD instead of DEAD.
As a result, the deleted tuples cannot be removed by VACUUM FULL.

The fix lets the user session wait for the bgworker to finish the current
SELECT before starting VACUUM FULL .
…ache#116)

When doing VACUUM FULL, The table size may not be updated if
the table's oid is pulled before its relfilenode is swapped.

This fix keeps the table's oid in the shared memory if the table
is being altered, i.e., is locked in ACCESS EXCLUSIVE mode.

Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, diskquota.pause() only takes effect on quota checking.
Bgworkers still go over the loop to refreshing quota even if diskquota
is paused. This wastes computation resources and can cause flaky issues.

This fix makes bgworkers skip refreshing quota when the user pauses
diskquota entirely to avoid those issues. Table sizes can be updated
correctly after resume.
ci: create rhel8 release build.

Signed-off-by: Sasasu <i@sasa.su>
Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, deadlock can occur when

1. A user session is doing DROP EXTENSION, and
2. A bgworker is loading quota configs using SPI.

This patch fixes the issue by pausing diskquota before DROP
EXTENSION so that the bgworker will not load config anymore.

Note that this cannot be done using object_access_hook() because
the extension object is dropped AFTER dropping all tables that belong
to the extension.
Test case test_primary_failure will stop/start segment to produce a
mirror switch. But the segment start could fail while replaying xlog.
The failure was caused by the deleted tablespace directories in previous
test cases.

This commit removes the "rm" statement in those tablespace test cases and
add "-p" to the "mkdir" command line. The corresponding sub-directories
will be deleted by "DROP TABLESPACE" if the case passes.

Relevant logs:
2022-02-08 10:09:30.458183 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","entering standby mode",,,,,,,0,,"xlog.c",6537,
2022-02-08 10:09:30.458670 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","redo starts at E/24638A28",,,,,,,0,,"xlog.c",7153,
2022-02-08 10:09:30.468323 CST,"cc","postgres",p1182588,th1235613568,"[local]",,2022-02-08 10:09:30 CST,0,,,seg1,,,,,"FATAL","57P03","the database system is starting up"
,"last replayed record at E/2481EA70",,,,,,0,,"postmaster.c",2552,
2022-02-08 10:09:30.484792 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"FATAL","58P01","directory ""/tmp/test_spc"" does not exist",,"Create this directory for the table
space before restarting the server.",,,"xlog redo create tablespace: 2590660 ""/tmp/test_spc""",,0,,"tablespace.c",749,
Otherwise, compiler reports a warning:
"comparison of constant ‘20’ with boolean expression is always false"
Each time the state of Diskquota is changed, we need to wait for the
change to take effect using diskquota.wait_for_worker_new_epoch().
However, when the bgworker is not alive, such wait can last forever.

This patch fixes the issue by adding a timeout GUC so that wait() will
throw an NOTICE if it times out, making it more user-friendly.

To fix a race condition when CREATE EXTENSION, the user needs to
SELECT wait_for_worker_new_epoch() manually before writing data.
This is to wait until the current database is added to the monitored
db cache so that active tables in the current database can be
recorded.

This patch also fix test script for activating standby and rename
some of the cases to make them more clear.
- Change to use GUC to set hardlimit instead of UDF. Since the hardlimit
  setting needs to be persistent after postmaster restarting.
- Fix relevant test cases.
Currently, the Diskquota launcher first starts a worker, then creates
the worker entry. However, after the worker starts, it cannot find the
entry when trying to check the is_paused status. Also, after a GPDB
restart, when the QD checks whether the worker is running by
checking the epoch, it might also fail to find the entry.

This patch fixes the issue by first create the worker entry then
starting the bgworker process.
The db cache stores which databases enables diskquota. Active tables
will be recorded only if they are in those databases. Previously,
we created a new UDF update_diskquota_db_list() to add the current db
to the cache. However, the UDF is install in a wrong database. As a
result, after the user upgrade from a previous version to 1.0.3, the
bgworker does not find the UDF and can do nothing.

This patch fixes the issue by removing update_diskquota_db_list() and
using fetch_table_stat() to update db cache. fetch_table_stat() already
exists since version 1.0.0 so that no new UDF is needed.

This PR is to replace PR apache#99 , and depends on PR apache#130 to fix a race
condition that occurs after CREATE EXTENSION.
fairyfar and others added 19 commits March 12, 2026 18:03
This is a code defect of a original GPDB on resource group enabled.
There is a bug to calculate length of pg_wchar in `gpvars_check_gp_resource_group_cgroup_parent` function.
For example, the value "greenplum database" was originally supposed to be judged as an illegal name, and report error:
"gp_resource_group_cgroup_parent can only contains alphabet, number and non-leading . _ -".
But it was wrongly judged as legal.
Use absolute artifact paths in the GPG verification step of
devops/release/cloudberry-release.sh.

Previously, the script verified SHA-512 using an absolute path but
called `gpg --verify` with relative file names. When running with
`--repo` from a different working directory, this could fail with
"No such file or directory" even though the `.asc` file existed in
the artifacts directory.

This change aligns the GPG verify command with the SHA-512 check by
verifying:
  $ARTIFACTS_DIR/${TAR_NAME}.asc
against:
  $ARTIFACTS_DIR/$TAR_NAME

No behavior change for successful local runs besides making path
resolution robust.
Add GUC_GPDB_NEED_SYNC flag to pax.enable_sparse_filter and
pax.enable_row_filter so their values are dispatched from QD
to QE segments. Without this flag, SET on the coordinator has
no effect because scans run on QE segments.
…scoding

COPY FROM with SEGMENT REJECT LIMIT had two bugs when encountering
invalid multi-byte encoding sequences:

1. Encoding errors were double-counted: HandleCopyError() incremented
   rejectcount, then RemoveInvalidDataInBuf() incremented it again for
   the same error. This caused the reject limit to be reached twice as
   fast as expected.

2. SREH (Single Row Error Handling) was completely disabled when
   transcoding was required (file encoding != database encoding). Any
   encoding error during transcoding would raise an ERROR instead of
   skipping the bad row.

Fix by removing the duplicate rejectcount++ from RemoveInvalidDataInBuf(),
removing the !need_transcoding guard that blocked SREH for transcoding,
and adding proper buffer cleanup for the transcoding case (advance
raw_buf past the bad line using FindEolInUnverifyRawBuf).

Add regression tests covering both non-transcoding (invalid UTF-8) and
transcoding (invalid EUC_CN to UTF-8) cases with various reject limits.

Fixes apache#1425
src/test/regress/sql/misc.sql is generated by
src/test/regress/input/misc.souce, it should not be add in sql directory.
macOS BSD sed requires an explicit empty string argument after
-i (sed -i '' 'script' file), unlike GNU sed which takes -i
without a suffix argument. Without this fix, BSD sed misinterprets
the sed script as a backup suffix and treats the filename as the
script, causing "unterminated substitute pattern" error.
Previously, GetViewBaseRelids() rejected any query with more than one
base table, so materialized views defined with JOINs were never
registered in gp_matview_aux/gp_matview_tables. This meant no status
tracking and no staleness propagation for join matviews.

Add a recursive helper extract_base_relids_from_jointree() that walks
RangeTblRef, JoinExpr, and FromExpr nodes to collect all base relation
OIDs. This is the only C function changed -- the existing downstream
infrastructure (InsertMatviewTablesEntries, SetRelativeMatviewAuxStatus,
MaintainMaterializedViewStatus, reference counting) already supports
N base tables per matview.

This is a first step toward AQUMV support for join queries. Users can
also inspect a join matview's freshness status manually via
gp_matview_aux.

Key behaviors:
- Self-joins (t1 JOIN t1) are deduplicated to one catalog entry
- All join types supported: INNER, LEFT, RIGHT, FULL, implicit cross
- Subquery/function RTEs in FROM are still rejected
- Partitioned tables in joins propagate DML status correctly
- Status escalation across multiple base tables works (i→e on delete)
- Transaction rollback correctly reverts status changes

Includes regression tests for: two/three-table joins, implicit joins,
self-joins, all outer join types, mixed join types, join with GROUP BY,
shared base tables across multiple MVs, multi-DML transactions,
transaction rollback, cross joins, partitioned tables in joins,
VACUUM FULL, TRUNCATE, WITH NO DATA, and DROP CASCADE.
ADD_DEFINITIONS(-DRUN_GTEST) and ADD_DEFINITIONS(-DRUN_GBENCH)
are directory-scoped CMake commands that apply to ALL targets,
including the production pax shared library. This caused test-
only macros to be defined in production builds.

In pax_porc_adpater.cc, the leaked RUN_GTEST activates:

    expect_hdr = rel_tuple_desc_->attrs[index].attlen == -1 &&
                 rel_tuple_desc_->attrs[index].attbyval == false;

    #ifdef RUN_GTEST
    expect_hdr = false;
    #endif

This forces expect_hdr to false in production, skipping the
stripping of PostgreSQL varlena headers from dictionary
entries. As a result, dictionary-encoded string columns
return garbled data (varlena header bytes are included as
part of the string content).

Replace ADD_DEFINITIONS with target_compile_definitions
scoped to test_main and bench_main targets only, so
RUN_GTEST and RUN_GBENCH are no longer defined when
building pax.so.
Oid is unsigned int. Therefore, when the Oid reaches 2^31, printing it with %d will display a negative value.
This is a defect of the original GPDB. GPDB has fixed similar defects on commit 7279a1e('Fix getResUsage integer overflow'), but there are still omissions.
…gn partitions (apache#1524)

The storage type detection logic failed to properly identify mixed storage when
foreign and non-foreign partitions coexisted, leading to incorrect metadata that
could cause issues with scan type selection and query planning.
When ALTER TABLE ... SET WITH (reorganize=true) runs concurrently with
COPY TO, COPY may return 0 rows instead of all rows.  The root cause is
a snapshot/lock ordering problem: PortalRunUtility() pushes the active
snapshot before calling DoCopy(), so the snapshot predates any
concurrent reorganize that had not yet committed.  After COPY TO blocks
on AccessExclusiveLock and the reorganize commits, the stale snapshot
cannot see the new physical files (xmin = reorganize_xid is invisible)
while the old physical files have already been removed, yielding 0 rows.

Three code paths are fixed:

1. Relation-based COPY TO (copy.c, DoCopy):
   After table_openrv() acquires AccessShareLock — which blocks until
   any concurrent reorganize commits — pop and re-push the active
   snapshot so it reflects all committed data at lock-grant time.

2. Query-based COPY TO, RLS COPY TO, and CTAS (copyto.c, BeginCopy):
   After pg_analyze_and_rewrite() -> AcquireRewriteLocks() acquires
   all direct relation locks, refresh the snapshot.  This covers
   COPY (SELECT ...) TO, COPY on RLS-protected tables (internally
   rewritten to a query), and CREATE TABLE AS SELECT.

3. Partitioned table COPY TO (copy.c, DoCopy):
   Before entering BeginCopy, call find_all_inheritors() to eagerly
   acquire AccessShareLock on all child partitions.  Child partition
   locks are normally acquired later in ExecutorStart -> ExecInitAppend,
   after PushCopiedSnapshot has already embedded a stale snapshot.
   Locking all children upfront ensures the snapshot refresh in fixes
   1 and 2 covers all concurrent child-partition reorganize commits.

In REPEATABLE READ or SERIALIZABLE isolation, GetTransactionSnapshot()
returns the same transaction-level snapshot, so the Pop/Push is a
harmless no-op.

Tests added:
- src/test/isolation2/sql/copy_to_concurrent_reorganize.sql
  Tests 2.1-2.5 for relation-based, query-based, partitioned, RLS,
  and CTAS paths across heap, AO row, and AO column storage.
- contrib/pax_storage/src/test/isolation2/sql/pax/
  copy_to_concurrent_reorganize.sql
  Same coverage for PAX columnar storage.

See: Issue#1545 <apache#1545>
For RC tags like X.Y.Z-incubating-rcN, generate the source tarball
filename and top-level directory using BASE_VERSION (without -rcN).

This keeps the voted bits ready for promotion without rebuilding and
avoids -rcN showing up in the extracted source directory.
This commit introduces comprehensive support for Ubuntu 24.04 (Noble
Numbat) across build environments and packaging metadata.

Key changes and package updates for Ubuntu 24.04:

- Compiler Upgrade: Migrated from GCC/G++ 11 to GCC/G++ 13 to align
  with Noble's default toolchain.
- Python 3.12 Migration: Updated system Python to 3.12. Removed
  python3-distutils as it has been deprecated and removed from
  Ubuntu 24.04 repositories (PEP 632).
- t64 Transition: Updated DEB runtime dependencies to include the
  't64' suffix (e.g., libssl3t64, libapr1t64, libcurl4t64) to
  comply with Noble's mandatory 64-bit time_t ABI transition.
- libcgroup Update: Switched from libcgroup1 to libcgroup2 to
  match the updated library names in Ubuntu 24.04.
- PIP Compliance: Added --break-system-packages flag for PIP
  installations within the Dockerfile to satisfy PEP 668 requirements.
…t but we need them when re-redoing some tablespace related xlogs (e.g. database create with a tablespace) on mirror."

This reverts commit 7a09e80.
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records.  Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:

    CREATE DATABASE
    DROP DATABASE
    DROP TABLESPACE

If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.

A fix for this problem was already attempted in 49d9cfc, but it
was reverted because of design issues.  This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency.  Tablespaces
are created as real directories, and should be deleted
by later replay.  CheckRecoveryConsistency ensures
they have disappeared.

The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING.  Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
On FreeBSD, the new test fails due to a WAL file being removed before
the standby has had the chance to copy it.  Fix by adding a replication
slot to prevent the removal until after the standby has connected.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wj5nau_qpjbwihvmXLfkAWOZ5TKdbnqOc6nKSiRJEoPyQ@mail.gmail.com
(cherry picked from commit 5794f50416361e8528ae5dfea269eee50261a741)

Co-authored-by: Leonid <63977577+leborchuk@users.noreply.github.com>
@tuhaihe
Copy link
Member

tuhaihe commented Mar 13, 2026

Very Cool! Thanks!

Due to many commits in this PR, we need to merge this PR via the CLI. I can help with this.

(Steps need to be added here.)

@tuhaihe tuhaihe self-requested a review March 13, 2026 04:11
@reshke
Copy link
Contributor Author

reshke commented Mar 13, 2026

Very Cool! Thanks!

Due to many commits in this PR, we need to merge this PR via the CLI. I can help with this.

(Steps need to be added here.)

Hi!
Thank you, your help will be much appreciated! Merging via CLI looks dangerous... I requires to git push into branch at some point, isn't it? Rebase & merge in UI looks much more safe :)

@tuhaihe
Copy link
Member

tuhaihe commented Mar 13, 2026

Very Cool! Thanks!
Due to many commits in this PR, we need to merge this PR via the CLI. I can help with this.
(Steps need to be added here.)

Hi! Thank you, your help will be much appreciated! Merging via CLI looks dangerous... I requires to git push into branch at some point, isn't it? Rebase & merge in UI looks much more safe :)

Hi @reshke,

Thanks for raising this. The main reason I suggested merging via CLI is that this PR contains many commits, and GitHub has limitations when handling large PRs (100+ commits) via the UI.

For example, in another PR I recently helped with — #1547 — we also had to merge via CLI for the same reason.

That said, once another committer approves this PR, we can try the UI merge option first. If GitHub allows it, we can certainly proceed that way. Otherwise, we can fall back to merging via CLI.

@tuhaihe
Copy link
Member

tuhaihe commented Mar 13, 2026

Just for reference: I’ve organized the CLI steps in this wiki page, which we can follow if we need to merge via CLI:

https://github.com/apache/cloudberry/wiki/Rebase-and-merge

@reshke
Copy link
Contributor Author

reshke commented Mar 13, 2026

Screenshot 2026-03-13 at 14 28 37 :) Will try https://github.com/apache/cloudberry/wiki/Rebase-and-merge

@tuhaihe
Copy link
Member

tuhaihe commented Mar 13, 2026

Screenshot 2026-03-13 at 14 28 37 :) Will try https://github.com/apache/cloudberry/wiki/Rebase-and-merge

Take it easy. It's in our expectations. Feel free to try the CLI method yourself. I'm happy to help. By the way, I did a dry run on my local machine, and it works. If you have any questions, please let me know.

@reshke reshke merged commit e42fcdd into apache:REL_2_STABLE Mar 13, 2026
54 checks passed
@reshke
Copy link
Contributor Author

reshke commented Mar 13, 2026

cp_rel_2_stable_from_main.txt
Did everything as described in wiki, here is log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.